 Hello everyone, my name is Nahla Salem and I'm here today to tell you how to build a machine learning product team. First, a little bit about myself. My specialty is machine learning product management. In the past, I've helped set up machine learning and product teams, mostly in Toronto startups and mostly in enterprise product management. My current position is a senior product manager in Yelp, where I'm helping build out a marketing analytics practice. My past customers include Fortune 50, Manufacturing Conglomerates United Technologies Corporation, as well as AS Watson, which is the largest health and beauty retail group in the world. In the past, I've also done software development and consulting. I have a master's in business analytics as well as an EMBA. However, I would like you to not worry too much because you do not really have to have a master's in analytics or data science to benefit from this presentation. So this presentation will be geared towards product leaders and tech leaders who would like to answer the question of should we start a machine learning team and how do we go about it? Because of my background, it will mostly be relevant to startups or medium sized companies who have some sort of core product that could or not necessarily include machine learning. So you're either looking to add machine learning from scratch or to scale a tiny team of maybe one or more data science people. I'm going to tell you first of all what machine learning is. Where should you start? Because this could be a daunting question to many. A mini crash course of machine learning for product managers. And then I'll look at the question of building a machine learning team from three axes, who will build, how to build and what to build. We'll talk about tips for starting your first machine learning features as well as pitfalls to avoid. Let's get started. What's machine learning? Well, I can tell you that it is not this. It is not artificial intelligence that's going to take over the world. There is a lot of hype going around machine learning that I would like to dispel. And surprisingly to me, the question I get most often is what's the difference between machine learning and data science and AI. Well, let me tell you that you do not really need to worry about this. For the most part, these are very technical or academic differences. For the sake of what you want to do with this presentation of starting a machine learning team, just choose a name and be consistent and use it internally and externally to avoid confusion. And I'm going to be using machine learning today. So what machine learning is to oversimplify it, it is using data, processing it in order to understand the world and then make a decision to predict a future that is specific to a certain business problem. So that would involve capturing large amounts of data and the processing would include finding trends in this data, hence the need for large quantities so that you can get the right trend and not overfit, for example. And then use that decision for a specific business problem and defining the problem and what you want to do with the result is the most important part. So machine learning is ubiquitous in our life right now. You have probably used it even if you hadn't realized. So examples for machine learning systems that are out there you probably used are spam alerts. So that's a decision that is made for an incoming email based on previous historical emails, including spam and non spam to issue an alert for a specific email. Netflix uses your watch history as well as that of millions of others to give you recommendations for things to watch usually works very well. And you could have something like a sales call be transcripted into text, which would involve a discipline of natural language processing subset of machine learning. Andrew Nguyen is a thought leader, really in the space. He is also the co founder of Google Brain, Cloudera and an academic and he puts it very well in terms of describing how the future is machine learning. He says AI is the new electricity, just as electricity transformed almost everything 100 years ago. Today I actually have a hard time thinking of an industry that I do not think AI will transform in the next several years. Those of us who've been in the industry long enough will remember the days in which CTOs and CIOs were asking themselves whether they should move to the cloud. Now this is a no brainer. And in the last couple of years, this is the same thing that's been happening with machine learning. So where should you start? I think you should start with the following question of should we get a machine learning team in our company? And as good product managers are, I would like to ask people to approach that question with a bit of skepticism. And that's because more often than not, I see people rushing into the hype and I think more thoughtfulness is needed to answer that question. And I'm going to give you some helpful questions to ask yourself in order to decide. So you have to be very clear on what your value proposition is and whether and then ask yourself whether it can be enhanced by machine learning. Ask yourself about your use cases and whether they could make use of predictive power. And then ask yourself if you have the data, even if the answer is no, this doesn't necessarily shoot it down. But that would help you understand the following question, which is what are the costs and what's the ROI. And you need really to educate yourself. You need to know just enough about it in order to start. And one good resource is actually Andrew Nguyen, who I think is great in terms of explaining very technical concepts very well to people who will not have a strong technical background. He has a Coursera course called AI for everyone, which could be a good start. Point is get help and do some homework in order to educate yourself and your team. So machine learning for PMs, this could be a presentation in and of its own. So I'll hold on on some things that from my experience are useful. Where the first question is, are machine learning products different? Are they different from other software products? Well, it depends on how you look at it. Something I see happening often is machine learning or a machine learning team getting special treatment, which is not necessarily conducive of the productivity that we want to see in our teams. I think we should look at machine learning. That's my personal philosophy. We should look at machine learning as just another technology and be thoughtful about the product value that we want to give to our customers. However, there are some differences that I will highlight now in terms of the development lifecycle that are indeed different from your quote unquote regular software development lifecycle. First and foremost, as I alluded, you have to start with formulating a problem. And if you don't get that right, then the rest of the steps are not really going to work out well for you. And then you ask yourself based on the problem, what's the data that can help me answer this problem or solve it rather. And you will need with your team to understand the data and do pre processing, etc. With the aim of finding out doing feature engineering, which is essentially finding out what are the elements that affect the decision that we're trying to reach. And then you will build a model, trade the model on the data that you have provided earlier, test it and then tune the model. So if you've been paying attention, you will notice that this development lifecycle unlike maybe others you've seen in your past is cyclical. And that's exactly the point you with machine learning rarely get it right from the first time. And you need an iterative process in order to sometimes even answer the question of, can we solve this problem using data and machine learning. But more often than not, the iterations are going to include things like, oh, we need more data, or we need to go and review the features that we have defined, or we need to keep tuning the model until we reach a performance that we're happy with. And in terms of model performance, this is really what I think the most partnership between product and data science needs to happen. And where, if you're as a PM going to spend time on education, this is where you need to spend time, not so much on the inner workings of a model, but spend time understanding the things that I have here, which we won't have time to get into in details, but essentially the ways in which you can assess a model's performance. If you're a technical product manager or a technically oriented product manager, then you will have conversations with your data science team, regarding the iterations that we just talked about when to stop and guide them with how to conduct the iterations in a way that optimizes for the business goals. So go and educate yourself on things like confusion matrix and why saying a model accuracy is simply not enough to describe a model's performance entirely. You will perhaps guess that a model's accuracy of say 85% is better than 70%, but then an accuracy metric doesn't give you the full picture of a model's performance and spend some time understanding why that's the case and how you can better make decisions around the choice of models. Alright, so let's now approach the access of who is going to build and if you're just starting this is going to involve hiring people to work on machine learning. Typically that's going to involve data scientists as well as machine learning engineers. Data scientist is the person who is going to build out the model and machine learning engineer generally speaking is the person that's going to operationalize the models so that will involve making the data available for the model and then making the results available inside your product. What to look for in my opinion is a very good focus and understanding of the business aspects and the business goals that you can use machine learning to serve more importantly for the data scientists. You will probably find a lot of people who are good with the theory, but the application is what you would want to focus on in the hiring process. Look also at people's backgrounds, someone coming fresh out of academia is going to be different from someone coming from a well established company and spend years there. Someone from a startup for example will tend to be scrappier as those who work in startups would know. As for the interview process, I think a very good way to go about it is to present the candidates with a business problem and then see how they're going to approach it and then work together in a collaborative manner because that's going to simulate how things play out in real life. Unlike the picture of the interview we have here should be an interactive collaborative process. As for the machine learning engineer, their interview process is going to be engineering oriented, but you still need to measure whether or not they have a good understanding of machine learning concepts as well. One pitfall that I've seen many companies fall into is this dilemma of if you're starting a machine learning team and you do not have in-house experience, how do you then assess candidates? And for that I highly suggest getting outside help, you can fall on your network and see if someone can help you or get some consultancy done for you in order to guide the interview process. But don't try and do it on your own because this is a complex discipline and not knowing enough to judge people might not give you the best ability to judge candidates. And finally, since we're for the most part working remotely, don't forget the social aspect of building out a new team in a remote setting and try to compensate for the lack of face-to-face interaction. Alright, so team structure, this is something I'm very passionate about, I could go on and on about so I'll try to be brief and make an important point that I also see very important. So technical people tend to gravitate based on their skills and when it comes to machine learning this is not any different. So a setup that I've seen in many companies is they start out a machine learning team as a separate machine learning team, which is separate from maybe the multiple product development teams that they have. And this does not give you the best performance. It results in a siloed team that is not connected to the business, is not connected to the priorities, is not connected to the goals. And very often I've seen companies do that and then they reach a point where they realize that they do need product help and then they start adding a product manager to the mix, but then that still doesn't give them the optimal setting. And the optimal setting I think is what we've seen in the Spotify model and really what is the model in many other companies at least if you boil it down to the basics, which is having an autonomous product development team that includes all the required skills to push a feature to production, including data science. So a friend I have who worked in Netflix mentioned that they have data scientists embedded in product teams, so those would be working on product problems as per the team they're in. However, they do have a separate machine learning team, but that team services general requests that can come from even outside the product teams in the company like finance. And that's a good model that straddles the benefits of both setups. However, if you just rely on a separate machine learning team, then inevitably that team is going to operate more like a service desk. They will have a hard time prioritizing it might end up being a first come first serve basis, which is again suboptimal because you always want prioritization to be connected to business goals you want the people who are working on code to be connected to how that code is going to be used and the only way of doing that is to be embedded in the product team that's working on that specific problem. Now the obvious elephant in the room is if you're starting you will not have the capacity to embed a data scientist in the couple of or more product development teams that you have and that's fine. That's fine. It is okay to start out with a separate machine learning team, but then you have to compensate for the things that I mentioned by perhaps creating dotted line teams Tiger teams. However, you name it, have a data scientist be embedded with a product team for the duration of anywhere between a sprint and a quarter depending on your needs, but do your best to compensate for not having an embedded team. Okay, so how how then should you go about building machine learning features. So we've we've seen the process and how it is different in and then in order to work with that difference what is very important is for you to educate and educate and educate your stakeholders and internally in your company. I've, I've done in companies that I've worked in presentations like this set expectations have people be aligned at least on a conceptual level on the different development cycle the fact that it needs more time and includes uncertainty sometimes to the point of failing to build a model that actually improves the user experience, and you only know that after spending iterations and spending time. So, so find a good balance between setting expectations and not being too pessimistic, but but I found it very important, because sometimes people have an expectation that machine learning is a magic one that's going to solve all the problems that we have been moving from quarter to quarter into the future and unable to to solve and generate artifacts, spread the word use your company's wiki and whatever tools you have at hand for internal education. Alright, so on the question of what to build. I think an ideal setup is to have some sort of research roadmap that precedes your regular roadmap or your feature roadmap. So a research roadmap would include working on models and doing proofs of concept of concept, and then a feature roadmap that makes use of machine learning will include the actual building of features that use those models, the productizing and sometimes also the scaling because when you initially build a model you build it on a subset of the data, then with the help of machine learning engineering you will scale that to production size data. After the model is found to be successful so in the past I've done both. We have had research roadmaps in addition to feature roadmaps maintained separately. When you're starting out you'll probably not do that, but at least do it in a in the lines manner in which you have one roadmap but you plan by having research ahead of the actual development of a machine learning feature. And this gives you the best productivity because it gives you a practical tool to deal with the uncertainty we talked about in terms of building machine learning features. And at the same time it gives efficiency in terms of using resources because the data science work is going to be very heavy in the modeling stage, but then in the productizing stage the machine learning engineering work will be heavy and the data science work will start to wind down. So if you plan it well ahead of time and at that point have another research problem for the data science to work on, data science team or data scientist, then they will work on it and then make it ready for the next chunk of time for the future development to take place and so on and so forth. Okay, so especially if you're going to use machine learning to automate a decision that's been previously done by a human, you have to keep in mind building trust with the users as you're doing this replacement. So use your good old change management practices to make that successful, prioritize the trust aspect in machine learning feature, don't let it be an afterthought towards the end, build fallback scenarios because sometimes the model is not going to perform well and you have to keep that in mind and have a plan B. So for example, if the model results are not as expected, then you revert back to an old process that was there before the machine learning feature, depending on your product and the feature that you're building. This is a diagram from an article I've written, you'll find it on my website and the article is dedicated to building trustable machine learning products and things that we have done in the past such as have the model be tuned through business rules and then allow a user to set those, have somewhere in the product and give access at least to an admin for some sanity validation reports for model outputs or a sample review of a model's outputs. And something we've done in the past is especially if this is a very machine learning heavy operation that's replacing a human decision, give the users the ability to override model outputs if they're not happy with them, and give the users the ability to affect model parameters and that's a way of both combining the human experience and the business experience that the stakeholders would have with the power of machine learning. So you'll see more about this in the article, but the two takeaways here is to keep trust first and foremost in your mind. And then secondly, the takeaway is we have seen that with time also because models get better, users trust the models more and features that we have had such as overriding model results actually tend to get used less as time goes on. Okay, so now you feel you know what to do and you're jumping into building your first machine learning feature, definitely a moment to celebrate, but I also want you to be thoughtful as you do that and we'll give you some tips for building your first machine learning feature. Start small. Think with a quick win mentality. If you even start with without building a model on your own and use a model from an existing library, that's a success. It will allow you to test your process and structure, build out the infrastructure as you go and understand what you need. And finally, make sure that you choose the best product problem and not the best machine learning problem because more often than not, these are two different things. Some pitfalls to avoid that I've seen companies fall into in the past. Machine learning is a powerful tool and as the saying goes, if you have a hammer, all problems turn into nails. So be very conscious to not become technology centric and to be user centric. If you are in Toronto like me, then you've probably seen that there are startups that do AI for insert between brackets, whatever business problem there is out there. And we have seen that a lot of these startups fail and the reason why, in my opinion, is because they've become sort of too fond of the technology to the point of losing user focus. Resist the temptation to talk about the technology in your product messaging. And sometimes this comes from business teams that are excited about machine learning. However, as good product managers know the technology should be as inconspicuous as possible for the user, so shouldn't be spelled out in, say, the UI of the product, for example. And good engineers would want to build things right and do things well. And this in our situation might translate into wanting to build out a whole data infrastructure before we even start working on machine learning features. I would advise against that because you are not going to really know what you need until you start building your machine learning features. So I would advise that you find some middle ground between building some infrastructure while you're building your first features and only invest after you get a picture of what is it that you're going to do with machine learning. And if it's even going to be an investment worth making in your special case. So the last words I want to leave you with machine learning is magical, but it's not a magic one. I want you to be both excited about the potential that it could unblock for you. But at the same time, be aware of the challenges and to go about it thoughtfully. Be business and user focused and not get too fond of the technology. Sometimes you realize that at the start, but as time goes by sort of the technology takes over. So as good product managers do, you should always see it as part of your job to keep your team focused on the strategy and the priorities. And pay attention to team structure and dynamics. These could make it or break it in terms of the success of your machine learning initiatives or at least the efficiency of the teams operations. Thank you so much for listening. If you want to connect in the future, you can go to my website and hope you found this useful and see you perhaps in the next talk. Cheers.