 I want to thank the ODC India organizers for inviting me to speak today and thank you all for attending this talk and for anyone who's watching the recording in the future. Today I'm happy to be speaking on what I've learned about leading data science teams. The three D's of data science leadership are my best working hypothesis of what it takes to be successful as a data science leader. I don't pretend to fully live up to these principles and I've learned about them many through trial and error and making mistakes. But they're the compass that I used to guide how I improve as a data science leader and I'll be curious to hear your thoughts and questions after the presentation. So first I want to give you an overview of my background so you understand a little bit about where I'm coming from. I also want to share with you how I understand the job of a manager or leader more broadly defined. And then I'll really get to the core of my presentation, the three D's of data science leadership. So to tell you a little bit about me, I am trained as a behavioral scientist. I've had a chance to work in the White House Social and Behavioral Sciences team during the Obama administration in this capacity. But most of my professional life have actually been spent as a data scientist as an individual contributor at the World D.C. Company and at Capital One. And then also as a data science leader at both the Democratic National Committee, the Democratic Party of the U.S. and at even a fintech startup based out of California. So this diverse set of experiences has given me a chance to learn a little bit about data science and its leadership from a small fintech company to one of the largest media companies in the world and across different domains from banking to politics to entertainment and to financial technology. So drawing these experiences, this is how I've come to understand the job that a manager or a leader does. I think the guiding principle of my philosophy as a manager is that first and foremost, a manager reports to their team, not the other way around. A manager is good at their job only to the extent that the team that they're leading is good at theirs. And more specifically, a manager has three responsibilities that are important to keep in mind. The first is that they give the team strategic direction and tactical advice in the form of a roadmap for what the team is going to accomplish and in the form of one-on-ones individual conversations with members of their team. They scavenge for and they deliver timely and important information to the team that helps them do their job. And then finally, they secure resources to keep the team happy and productive, whether this is hiring additional team members, whether this is finding budget for educational opportunities or for tools. I think these are the primary responsibilities of a manager broadly defined. And I certainly think that they apply in the case of a data science leader. So moving on, sort of with this definition of management as a framework, this again is my best working hypothesis of what it takes to be a successful data science leader, what I call the 3Ds of data science leadership. And the idea here is that a data science leader wears three hats. First, they wear the hat of a diplomat. So they have to get the lay of the land and figure out sort of what is happening in the organization and what their team can do in that space. Second is they have to act as a diagnostician, figuring out how the team can deliver value to the organization, given where the team is and given where the organization is as well. And also how the organization can provide value to the data science team. And then finally, a leader has to also be a developer. So they have to figure out, given what they've learned and what the problems they've identified are, what they can actually deliver both for the team and for the organization as a whole. So jumping into the first diplomat, there's three things to call out here. The first is that as a diplomat, you as a data science leader need to speak the language of your cross-functional partners. Likely data science works in conjunction with a lot of other organizations in the company. And you need to understand a few things about them. You need to understand their values, what matters to them, their objectives, what they're trying to accomplish and their needs. So where is it that they're struggling and what is it that they need help? Because it is from these three things that you can then design a strategic roadmap for a data science team that is informed by where the organization and where the company is. The second is you need to be able to speak the language of your team. This means you want to understand their aspirations. So what is it that they want to work on and how is it that they individually want to grow, understand their needs? So where is it that they are currently struggling and working to help unlock that next level for each individual of the team individually, but also for the team collectively? And finally, you also want to think about and understand the strengths and challenges of your team, which is what are the things that they are best suited to do and where are areas of opportunity for the team as a whole. And these team attributes, importantly, should then inform and shape the strategic roadmap that you produce for your team. You don't want to have a roadmap that isn't taken into account who your team is and where they're starting from. And then finally, you then need to help your data science team and the organization speak and communicate about each other. So you want to help others in the organization understand the value that your team can deliver and how it is that they can help the team to do so, how to set them up for success. And then you need to help your team understand the role and the value that data science can provide to the organization now and in the future. So the second hat that you wear as a data science leader, I think, is not of a diagnostician. And here I start with the premise that data science is a really good way to solve some problems, but it's actually a terrible way to solve most problems. And you need to figure out which among the problems that an organization is facing, data science is sort of well suited to work for in your particular company. And it's also data science is something that's also very easy to get wrong. So you need to also figure out where it could potentially be going wrong and how to fix it. So the questions that you're asking yourself are, can any of the organization's objectives be further by data science in any way? And if so, how? You want to understand what is currently preventing your team from delivering more value to the organization. And then you also want to understand what the bounds are of data science in the organization and whether these are technical. So maybe you just haven't been able to work with the engineering team to set up some sort of pipeline or whether they are cultural, whether the organization doesn't fully understand or recognize the value that data science could be providing in one of its domains. And then the final hat that a data science leader has to wear is that of a developer. So here you have to develop a few different things. First, you have to develop yourself. Leading a data science team is very difficult. You probably haven't been trained as a data science manager, I haven't for sure. And you can probably do better. So I think it's important for data science leaders to read, think, talk, and write more and sort of think more about what data science leadership is and how they themselves can grow in this particular domain. If you have a manager, if your manager was a data scientist in a past life or maybe currently is some sort of data science leader themselves, this is a great person to learn from. But oftentimes you may find yourself in a smaller company where you don't have access to this kind of resource. And so in that case it's really useful to find mentors and any sort of external research you can find on data science leadership. The second thing you need to develop is your team. Data science is a very hard craft to master well and it's constantly changing over time. And this is why it's important to have a data science leader to often guide a team and teach them some of what they know in terms of how to think about how to apply and how to do good data science at an organization. At the same time it's important to once you've sort of recognized the extent to what you can teach get out of the way and let the team do what they do including learning and picking up new skills. As I mentioned data science is such a fast growing field and as I'm sure that you all know so often your team will end up learning and developing skills that you may not be familiar with tactically but you can still give them some sort of strategic guidance on how to apply and think about using these tools. And then also leadership as I mentioned is a very hard thing to master but there are likely people in your team who one day want to become data science managers and I think it's important to figure out who these folks are how to help them grow that talent as best as you can given whatever resources are available to you and set them up to one day in a sense to take your job either at your same team or maybe at a different team down the road. As a data science leader you also want to develop the org by which I mean that as I've said data science is a complex and dynamic technical field and it's very easy for organizations to get data science wrong and to not fully understand what value can provide to them and not necessarily to set them up for success by not giving them the right infrastructure and the right tools to get their job done. And this is one of those cases in which you could argue well maybe it's the organization's responsibility to try to understand the data science team and data science is a field on how that can help but I operate from the principle that when you have two groups and one is more technical than the other I think the onus is on the technical group to be able to explain and communicate and talk about with the non-technical group as opposed to the other way around and so here I think a data science leader can provide a very useful role in helping the organization understand what data science can do for them and what it currently isn't and why. And here again you're trying to identify the technical and cultural boundaries of data science in the organization and trying to push them to the next level whatever exactly that may be. And then finally a data science leader is also developing software so you should probably be spending some of your time writing code. You know this doesn't have to be 50% of your time you may be 10 to 20 to 30 it really depends on where you are and sort of the size of your team and the maturity of the team but I think to really understand some of the things that I mentioned are important to understand about your team it's very difficult to do that if you're not actually in the trenches with them some amount of time. You may not have the bandwidth to own a very large model you may not have the bandwidth to be responsible for any kind of large scale analysis effort but I think it's still important to be able to understand a lot of the stack and the tools that you're using from their perspective so although the primary advantage that you may be able to bring to a company may not be to be an individual contributor I think a really effective data science leader is able to do that some of their time. And I think that ultimately makes them more successful in the other areas in which they do bring more competitive advantage to the company. So just to wrap up and I'm happy to leave the rest of the time for questions and sort of conversation and discussion because as I said a lot of this stuff I've learned from trial and error and I'm curious to hear any other data science leaders in the audience what you think about this and what your own thoughts are but broadly I would say that a data science leader to be successful has to wear three hats. First they have to be a diplomat they have to get the lay of the landing of the organization they have to be a diagnostician they have to figure out how they can best help the organization they can best help their team to improve and finally they have to be a developer they have to be able to do something about what they found to grow themselves, their team and the organization. And with that I want to thank you for your attention and just leave it up for questions and discussion. Thank you. Thanks for this wonderful talk. I do have a two question. One question which is more related to you know the management skill or like the manager skill which used to be earlier what is the main difference which comes when it comes to the data science? Second thing see these things the second question is yes these things looks very good on the slide but when you try to implement into the real team and all you face those sort of a challenge is that the challenge which you see more from the team perspective or more from the leader who is managing in that sense. So the first question was about the difference between data science management and general management and then the second question can you repeat it one more time? Yeah so the second question is as I said so these things looks very good on the slide everyone knows it like you need to be a good coach you need to you know let them know the problem statement and all but when you try to implement on the real ground you face lots of challenges it is not that much easier. So the challenge in your experience was more from the team perspective or more from the leader perspective that he was not able to handle. Got it. So let me take it on the first question first. So in terms of the difference between data science management and broad general management I think here the probably the biggest difference comes in this let me just try to get the slides to cooperate. So let me go here. So I mean I think these are the ones that tend to be very different for data scientists as opposed to other organizations. The diplomacy part is particularly challenging because a lot of people don't fully understand data science or what sort of value you can deliver. So I think that's one area that's particularly different about data science. The diagnostician is also important because often for data science to be successful you need to have in place and infrastructure in terms of data engineering and data analysis and a really good and tight partnership with product and with engineering that may not be, may not exist necessarily once you arrive so helping the organization understand why that's important I think is really useful. And then on the developer part of it I mean maybe there's some difference there with the data science leader having to be understanding some of their time as an individual contributor. I don't know if that's the case for designers or for other organizations but maybe that's another difference. And then to go to your second question here I think the challenges are that I face are from a team perspective in helping the team understand their role in the organization and largely I think also from a leadership perspective particularly in smaller companies in helping them understand what it takes to be successful I think a data science team especially for a very early company can be very aspirational so it can be very much this is the kind of work that we'd love to be doing without necessarily understanding what needs to be in place to get that work to be really effective. Okay, thank you. So this might be more to do with your democratic party association so what went wrong with the 2016 election in terms of the data science I'm not talking about the politics you would have had your own models including the vote shares or who will win in the US election in 2016, right? You're saying if people are building those models? No, I'm asking what part of the data science or the data went wrong in the 2016 election for you I mean if you're associated with that election Oh yeah, I'm no longer associated with them I don't know what kinds of models they're working on right now You were there during that time? No, no, I'm no longer working on that now Okay So one question that I had was you talked about how do you look for the bounds of taking data science forward in an organization, right? Whether they're technical or cultural So if I'm a technical manager and I've laid the platform that okay, data science is now possible I've created RFPs use cases but the business are not there yet right? The culture of the organization is bound to an Excel-based analysis right? So how do you as a leader you might have faced the same challenges How do you as a leader propose that those boundaries of data science from a cultural standpoint can be pushed so that they understand the value and then of course embrace it themselves rather than we pushing it from a technical standpoint That's a good question and something I think very difficult to do One suggestion is to build really strong relationships with individual leaders in the company who may have some of those views and then ultimately trying to understand what their objective is from a business perspective so ultimately they don't really care about the methods as much as I would imagine as much as ultimately delivering some kind of value so maybe it's unused as a technical leader to help them demonstrate what the advantages are of data science I would imagine that people largely care about delivering impact so to the extent that they can understand how you're going to do that and ideally how you're going to do that better then that's probably a case where they should be more open to thinking about something different and you could also pitch it as we have this idea when we start with a small pilot and see how it works and sort of try to be as transparent throughout as possible to help them understand why and how you can help and then hopefully from that the best of intentions and they really care about hearing you out then ideally they can sort of start breaking some of those boundaries yes absolutely especially if you're having a data science team and there's no one above you that does data you are absolutely an entrepreneur for data science consulting company embedded inside in a lot of larger organizations for sure Hi, so I have one question so you often mention that it's very easy in data science to go wrong soon right so is there a way to strategically define data science problem is there a way to avoid the earliest that we are actually going wrong and not to go this way yeah I think so so for one it's important to be clear with your stakeholders and your partners about what you can reasonably deliver within what period of time and what kind of resources and support you're going to need to be able to do that so imagine that you want to use a predictive model to score a certain number of people over some period of time but you actually don't have buy-in from engineering to provide you the software development that's required to get a data infrastructure in place to do that kind of real-time score so that's for example is a case where you may be setting yourself up for failure so I think just being aware of and communicating to others what it takes to do good data science is one way to try to avoid that I'll go next so my question is the skills that we mentioned are all three different sets of skills like when you have a developer hat, you have a diagnostician hat and you have a diplomat's hat so generally one starts with a developer hat or getting stats in the background but then out of the other two skills one of which needs a people skill the other one is product thinking which one so there are a lot of unknowns that are thrown and everything seems important so what is the order one should go around and what is the rational behind that sure so you're saying what the order is and what you should learn these skills and which ones to prioritize that's a good question I actually don't know the answer to that I would imagine that it's probably better rather than focusing on being 100% at one before delving into the others is to try to be 20% at all of them and then 30% at all of them and then 40% at all of them and sort of recognizing that you want to be understanding how you're doing these three domains and try to grow in them in parallel I would imagine that there are different kinds of errors you'll make if you're really good at two of these and are missing a third one depending on what the third one is but I think maybe the ones where data scientists at least from conversations other data science leaders and just my experience most struggle with is the diplomat part diagnosis is something that we do as technical people and have been doing for a long time developing is maybe something that when you sort of think about it we can figure out how to get better at but the diplomat part can be one of the hardest ones for people sort of at higher levels increasing the more sophisticated relationship building skills yeah hey this is Gagan so this is Gagan and you did talk about one very important point it's more about the manager reporting to the team than the team reporting to the manager but ideally you expect some quality in your team as well but that's an ideal scenario but what happens is it's not idle every time and you can expect you have a team member you are a part of data science team and we do talk about skills that are needed like maths and coding but there are some basic skills as well that are required let's say the logical thinking which is the most important skill that I feel so let's say you have a team where you are managing a team and you face a challenging scenario where a person cannot think logically I mean he or she has the intent to deliver and give you the results but he or she cannot think logically and there are some boundaries which he or she can think about so in your experience how would you lead, guide that team member in order for him or her to improve and move ahead in the data science area sure sure that's a good question I don't think the framework of you reporting to your team necessarily is incongruent with that I think you reporting to them you needing to work for them involves identifying their areas of improvement to both get them to see that and then to work with them on how to improve it for the particular example you described I mean I think that's a really tricky one I haven't seen it myself before but I think a part of it could be a sort of joint process of discovery so if you're pointing out to them what it is that you're seeing and then try to understand from their perspective what they're seeing and sort of what they felt and what they thought at particular junctures they kind of dual a little bit of an analysis of their decision making and an analysis of the kinds of assumptions that they made of the kind of reasons why they chose one path over another and then ideally hopefully over time assuming that they're sort of curious and sort of willing to learn about why they did the things that they did then you can start getting them on a path where they are a little bit more thoughtful about what decisions they're making and why yeah I mean there's an extent to which maybe some people are less want to certain kinds of thought processes but to the extent that a lot of it could just be that maybe they're a more junior contributor and maybe they they can be really effective at that thought process but only when they sort of take a step back and think more broadly about certain assumptions and I think maybe that's one of the things that's useful for a manager to do is to help them take a step back and sort of think through things that they might not actually be considering otherwise isn't a problem of hiring or proper assignment of a right problem to a right person who has the acumen to think in those terms I completely agree with you on that part but let's say you are actually on a project where you are working with a team and you cannot say that you gave me wrong people I mean I agree with you that could have been corrected at the first place itself but let's say you are in that scenario how it could be tackled because I mean like it's all about him or her realizing that this is not a good place for him or her right but at the same time also figuring out if he or she can continue in this domain right Srinivas here thank you for the nice summary so it is the extension of few people that ask the question about the cultural thing so in our organization right I see a lot of potential to add value we have the data in a certain way right it's not stored in a particular place for it to be stored individual teams have to do certain additional work okay the challenge here is that okay I give you the data you get the visibility what am I going to get right so can you please add some inputs over there on how to get the data in place what are the things that we need to do sure so I imagine that that will depend somewhat on your organization and what kind of data you have and what it looks like I guess I'm curious and understanding is there a reason why I mean where the data are currently being collected but they're just sort of stored in different places but there's no data lake that has that gives you access to all of them is that the scenario you're imagining yeah the production data yeah I got it yeah oh god I got it that sounds like a case in which you can maybe work with a data engineering team who can help you understand how to bring the two data sets together into the same place like a data lake kind of situation yeah this is Krishna I have a question so the three hats of data science leader or a manager so suppose a leader manager dons a hat of a diplomat and then he goes on getting going entrepreneurial for his team and then you know six months and nine months is gone and then when it comes to the developer landscape maybe the landscape itself has changed like it's a high it's a high changing field so everything changes on a daily or a monthly basis so how can a manager who has gone out of the technical landscape I mean it's not that he doesn't know he knows it but the thing is he has lost touch of it and then everything has changed how can he lead a team of developers who are actually far ahead of him so I'd say here maybe a little bit of what is relevant is this slide so I think some of what you can do is so you're particularly talking about data science is changing and you need to be able to still lead people who are doing work that maybe you're not familiar with in terms of the final strategic details sort of tactical details so I think actually contributing as an I see is a helpful way to do that by actually coding and contributing to models and contributing to analyses that's one way of sort of continuing to be the tools of the craft another one is probably attending conferences maybe doing some training on the subject but at the end of the day I think it's also just important to recognize that you're not going to be able to know all the things that the people in your team are working on and that's okay it's scary but it's okay and that ultimately what you need to be primarily for your team is a thought partner on strategically how should they should think about the kinds of tools that they're using and the kinds of methods that they're using and how to help them ask the right questions there was one advantage to actually not being familiar with the tools which is that you can actually set up certain questions from a sort of more naive perspective where you're asking hey you made this decision here to fit the model this way why did you do that or hey you're using this tool as opposed to this other tool what was the reason for that and ideally in those questions draw answers that help you understand whether strategically they're sort of thinking about their analysis and they're modeling from the right perspective but then even at that level you can be extremely helpful to someone even if you can't catch like syntax errors on like a particular way in which they're choosing to implement a particular kind of model so the question I would like to ask is what would be the number one advice that you would have loved to know ten years ago or maybe when you were starting out your career as a data scientist that would have helped you now as a data science leader maybe within a small team in a large organization or maybe as the founder of your own company that's a great question I want to thought a lot about so things I wish I would have known ten years ago when I started I think one of them is thinking of yourself as a consultant inside of an organization and you're ultimately there to deliver value to others I was trained as an academic and I think academic sometimes have this idea that we want to pursue very interesting and challenging problems and we care about the sort of complexity and sort of elegance of a lot of our work and at the end of the day in an organization I think that tends to not matter as much to the extent that that's begins optimizing away from delivering value within some period of time and I wish at a lot of the jobs that I've had that I could have spent more time in the beginning just almost working as an anthropologist to understand how the organization its leaders, their objectives, their values, their needs just really digging into the sort of interviews with them to understand how do they work why do they do the work the way they do and from that ideally assuming that you can think critically you can start understanding why you and your team can add value I think sometimes it would have much too long to do that really strong relationships with other managers at your level across your organization those weekly, bi-weekly things are really helpful to give you a sense into what's going on around the organization and then as the landscape is changing figure out how you should adapt your team's strategic roadmap to changes or results so maybe to be that one just to spend a lot more time talking to people talking to as many people as possible and always trying to think about the work that you're doing the value you're going to deliver and how much is it going to cost you to deliver that value thank you sir I think there are a couple of questions hi okay so I'll just give you a background I'm kind of a manager I do a lot of back-end work now we started to get into data science the biggest problem I have right now is I'm not well versed and if I look at all the tools there are hundreds of tools coming up every day ML, deep learning I mean this is a state of a manager given the amount of data scientists we have I mean you can't even hire them to be honest they're very few so I'm kind of going through a transition so at this stage what sort of references I mean like what do I learn I mean how do I begin with this whole thing because I tried a course in Udacity for deep learning I mean I'm technical myself so I've done a few things like that but it's very hard to get into this because after this I have to take care of hiring staff I mean I'm a small we run a small company a small business so what would be the first steps how do I go about with this like what are the good entry points to addressing this sort of situation I'm actually looking for specific newsletters if you have any recommendation because this was a very generic answer something I've been doing personally but maybe some specific newsletters that you personally would recommend sure ODAC actually has a newsletter that's the one I was thinking about and then on Twitter I'm happy to follow up line up towards and just give you some accounts because a lot of these things are very technical I'm looking for something a little more broader and kind of use case oriented okay I'll sync up with you that's very good so my question is specifically about one aspect of this diplomacy hat one issue that we run into a lot as an organization is kind of trying to translate technical timelines to non-technical folks so often with non-technical problems to a certain extent you can kind of define the road map you can identify the roadblocks that you might hit and estimate about how much time they might take but a data science problem often kind of defines itself as you go like even just things like data cleaning like people often they're like so why haven't you gotten started I'm like well I spent spending the last month trying to like fix the data set that you sent me and I'm finding you problems every day I can't tell you exactly how long it's gonna take and so I think sometimes like team health like working overworking data scientists can be at odds with keeping up the momentum and keeping non-technical folks happy in a company and I didn't know if you run into problems like that or if you have recommendations about how to better address that or scope problems yeah definitely run into those problems before I think it's a common issue found talking to other data science leaders as well one recommendation on the kind of time estimation side is and there's some literature right here around how to estimate the time completion of software development projects so that literature I think you just Google software engineering or software development time estimates will give you some useful guidance and heuristics to use about how better to try to sort of estimate the time of a lot of these things that can be really amorphous and can really change in many ways another suggestion would be to just be very transparent with your stakeholders around what's required to get from where you are to the final product and just make them aware that there's a research and development period that will likely take some period of time but may end up taking more for whatever reason so that when you come to them maybe later and say hey this particular piece is taking longer than expected one is they know what that piece is they knew that it was going to be a part of what was going to happen and you can at least sort of go back to the expectation that you said before that this is potentially going to take a little longer than it had to and sort of be transparent about why and how and ultimately why you think that this decision to invest some more time now is ultimately going to pay more dividends more dividends later another thing that I here as a manager is particularly helpful to do is for these broader research and development tasks is to work to be really thoughtful and mindful about how they are prioritizing the questions that they're asking even in the research and development time so you can just go ahead and do research and development for two weeks but you can also be thoughtful about what are the sets of questions that we want to ask and in what order given what kind of return on value we think we're going to get so some questions we may be able to answer within a day and they may tell us the 80-20 of this problem but the focus on those verses is supposed to starting by building a really complex model that may find out we may find out what is actually wrong a week down the road thank you you can have this so what I've seen in my organization is that several non-technical leaders go from not being aware of machine learning and data sense at all to now having really high expectations because of the hype so how do you manage the hype from data sense as in they just want to apply data sense to everything and expect really good results so how in your experience have you managed that yeah so that thing that's in some ways a good problem to have and that people are really excited about your team and what they can deliver and then I think for managing expectations when they want maybe too much or something too quick one could be helping them understand how the bandwidth of your team is delivering against different kinds of projects so given the fact that we have three or four people now reasonably among these five different things that you want to do, we could do one or two this quarter so that I think is a useful way to start helping them understand the amount of time and effort that it takes to get certain projects out the door and that might first help them think through among these five things that I want you to do really which are the ones that are most important or maybe we want to keep on growing your team because we actually want these five things you know student rather than later and then the second thing I think is and I think where I've seen companies go wrong is not thinking through a little bit okay if you want you know a model that does X and Y and Z by this date there's a whole series of things that need to be put in place in terms of data engineering and analytics and maybe reporting and sort of building some of the underlying tables that are going to support that use case and how to help the company understand that until you have those things and until you have people who can help you build those things or until you can maybe hire internally to sort of be able to get those things in place you're going to really struggle delivering value and why and I think if you can sit down with folks and just be transparent about that then I think that's maybe the best way to start you're saying the question is whether I've seen companies want to expect that a data science solution is going to completely change or transform the way that something is done yeah I haven't seen that in particular but I imagine it can be common I think again that's maybe a case in which you can help them understand you know whether that's feasible at all and two if it is feasible what the cost to get there is as you're pointing out maybe it takes a month to get to 80% maybe it takes another month to get to 90% and then you can help them figure out whether given their priorities and given other requirements that the team has to complete whether it really makes sense to be spending a whole nother month on just you know 10% less