 Hi everyone. My name is Mal. I'm glad that you are here. My work experience just a little bit about me is started off working at McKinsey and Company in the Chicago office after graduating from the University of Michigan, moved out to SF and have worked at Circle Love and Waymo and today we'll sort of go over the basics of being a machine learning product manager and give you the overview of what the role actually looks like and a couple of things that this group can do leaving this session in terms of getting prepared for the role and sort of excelling at that role. I want to kind of caveat this the session by saying that a lot of these learnings experiences observations are from my own experience so take that take that as you will. So let me share my screen. Awesome. So being machine learning, a machine learning product manager 101. So just to start in terms of the agenda, I'll first start by actually walking through what are the types of machine learning product managers and the roles especially as you're going through the application process and you see a specific job. It's sort of important to understand the differences. Second, I'll go through the types of machine learning that exist at least like the basics that you should understand if you're kind of fresh into this topic. Third is giving you that overview of what exactly an end-to-end ML lifecycle actually looks like. Fourth is really digging into my own observations around where machine learning has gone wrong or where there tends to be a lot more complexity and where most organizations face challenges. I'll then kind of continue on and talk about what the exact role of a product manager is within this space. Talk through a few examples of where I think machine learning has done a great job or thrived and then a couple of examples around machine learning applications that have gone wrong and then I'll sort of give you a final summary around takeaways and what you can do to prepare for a role in this sort of space, this technical space. So to start things off, I think it's really important to understand that there are actually two types of mesh product manager roles that are out there from what I've seen. There might be a ton more nuanced topics that bleed either one or both of these or somewhere in between but for the most part there is typically something called a data product manager and then a something called a machine learning product manager. A data product manager actually focuses on the foundations of the organization getting them ready to apply machine learning and for me that means things like making sure that the data and infrastructure are in a good place so that data scientists, engineers, machine learning engineers can actually work on top of that data confidently and can actually do it quickly efficiently name your adverb there and so they're typically focused on things like data infrastructure, pipeline health and data readiness and really think about things like how do I think through the problem of data acquisition? What sources are we actually putting into our data warehouse? Is it sources coming in externally? Is it sources internally? How do we clean that data? How do we make it ready? How do we stitch it together? What are some issues that might come up around data quality that our team needs to be aware of? Overall, how do we make sure that downstream applications or use cases on top of that data are successful? They think about things like access controls and ease of data retrieval as well and so the key metrics that sort of this product manager focuses on ranges from anything as fuzzy as data quality to something as measurable like query traffic, latency, reliability, error rates, response time and availability which really signal how good that data is, how quickly it is able to be accessed and so forth. The second type of product manager is actually a machine learning product manager that tends to focus on a problem area for the most part. I think the background of this person is usually they were a data scientist or they understand a lot of the trends and core concepts of machine learning and they're able to apply it effectively to solve a problem. So typically a machine learning product manager is oriented towards a problem area and that problem area might be I'm trying to accelerate this end state business outcome like number of customers, conversion, revenue, growth specific product line and they believe the organization believes that a machine learning lens is the right way to go accelerate that business problem. So you'll typically be hired to sort of own the machine learning application to go accelerate that outcome. So that first piece is actually vertically towards a product or a problem. But the second type of machine learning product manager you might come across is actually a horizontal capability type of product manager. So someone that focuses on a capability like pricing or experimentation or some sort of capability that the organization wants to sharpen with a machine learning lens and so they hire this person and this person is typically centrally oriented in the organization they can sort of farm out this capability to multiple business lines, products or problems that the organization has. The key metrics that this product manager typically owns from top to bottom as they tend to own the business outcome themselves because the organization believes machine learning will actually accelerate that outcome. Second is actually the experimentation results. So if you're building a model or a suite of models the actual outputs or those those experimentation results that come from should I actually deploy this model in production or not is something that you would own. And the third is actually the nitty gritty performance metrics to make sure that the quality of the model needs a certain threshold that you feel comfortable actually deploying it and making use of it to actually impact something that happens to the business. So kind of taking a huge step back in terms of types of machine learning. I found this diagram I think it's somewhat captures what machine learning applications are in the simplest sense but largely you can pretty much only do four things. The first is predict stuff. Second is classify stuff so you can kind of classify the very you know classify something as a you know hot dog not hot dog is I think the most classic example. You can group stuff or you can cluster things based on how similar the attributes are to make them group together or you can actually teach stuff which is that reinforcement learning piece. I think it's really important to keep in mind that machines or machine learning can do all of these things only as long as it doesn't look too different from the data that it's training on. So the machine itself tends to understand patterns and data and then take those patterns and apply it to unseen data or what it doesn't know. So because it's learning from a handful of data it'll typically only do what it has seen before. So that's that's the idea of training. So it's always important as a product manager to understand the the biases and what exactly you're training on. So you have an understanding of how those biases might carry through to what you end up doing or predicting or cost me on on in terms of the unseen data. So I obviously think the the audience here might be beginners to more advance in terms of in terms of the the sophistication around understanding machine learning. So just to take it down to the super super super simple what exactly is machine learning. To me it's sort of like the simplest example I can give is if you're trying to predict the value of a home and so in this example you see 10 different records of you know different features or variables like number of bedrooms number of floors whether the house has a garage or not the school district value the square feet in the city and you use those variables or features to actually predict what people call the outcome or response variable which is the home value itself. And so you see at the bottom here that there's a handful of features or inputs or variables and then on the right hand side data scientists would actually predict the outcome or response variable. And so for the the you know next step in machine learning it's sort of this idea that if you want to predict some of the values which are the home values themselves you can actually cover the unseen or that cover a few records or what folks would call this unseen data and train on a handful of data and then say well knowing the weights of those variables which are how much should we weight the number of bedrooms or the number of floors or their square footage or the garage or not. Can we actually come up with this weighted equation learn the equation to home value and be so good at it that we can actually predict the home values just using those variables alone which are the number of bedrooms all the way down to the city. And I think the typical answer would be if you were kind of starting this fresh or from scratch yes I would be able to do something like that and give you an idea of what those weights actually look like but only if the data that I see now is representative of the weights that would apply to the data I can't see. So that's an assumption that goes in that the data that you're training on is representative of that unseen data. Oh and I think a lot of you would say I'd probably do a better job if I can practice my weights by hiding some of the data or the records myself and seeing how well I do in sort of this in the records that I do have access to. So imagine in this row of eight records you hide the final two, train your model on the first six and then see if you were able to successfully predict the seventh and eighth record from the data you do have access to and sort of cycle through that process over different folds of the data that you have actual access to so that you know that your your model is quote unquote generalizable and then from there you can finally uncover the data you've never seen before which is the two last columns described in black and then say you know I practiced I've covered a couple of records in the eighth that I had and sharpened or optimized those weights and now I'm confident that if I uncover the final two records my weights are going to be representative and I'm going to have the best weighted equation or prediction of home value. So super super simple here's a little bit more you know a little bit deeper into what that overall ML life cycle looks like which from our product manager standpoint you really have to understand how this works. The first is is just making sure that the problem statement and the success criteria for your data science team or your engineering team is crystal clear which is what exactly is the goal of the model that you're building or the suite of models that you're building. How do the performance metrics or the outcomes of the models themselves tie to business outcomes so how does for example the precision and recall of some sort of classification model that your that your team is working on tied to you know metrics that matter for the business. Third is how do you know how will you know if your model is actually performing sufficiently. Understanding that and outlining it and making sure that's published for your team is a way that you can not only you know define success for that model or set of models but you sort of know when resources need to be shifted from a handful of models that are that your team is currently working on to a new set of models. I think the fourth thing is just making sure you have a good understanding of whether machine learning whether your organization is ready for machine learning in a lot of startups or small companies teams tend to jump first into machine learning without first understanding the readiness around data infrastructure and metrics experimentation which we'll go into and it's important to just understand the health of health of you know deploying something like this before you jump into the most complex answer. Second is actually data preparation how high quality is the data you'll be working with how far back in time does it actually go is it representative of that population we talked about representative or how data sort of learns patterns and what it's exposed to and then how much training data do you have it's really important to understand these questions this again sort of gives you a sense or a score if you will of the readiness of your organization. Third is actually training so training refers to how your model learns this is sort of the steps that model parameters are typically determined or they are determined which is those weights how representative your training data is ultimately dictates how generalizable the output is to the population you'll sort of end up making claims about so it's important to understand the heartbeat around what your training data is how easy is it for you to collect it how representative is it how much of it do you have and making sure that you have like the biases understood so you know you know how applicable generalizable your predictions or outcomes actually are fifth is your testing so testing your model refers to how it performs an unseen data to avoid overfitting or conforming into just the sliver of data that you don't see that you'll be predicting on to assess the quality of your model data scientists will do something called cross validation to build something more generalizable which is of the training data that you have or those eight home record values the top of that you know record sheet from the previous screen you kind of hide a couple of those records yourself train on the rest and then see how you perform on the data that you do have access to so that you can kind of understand or generalize the weights in the best way possible and then you'll sort of shift the records that you don't have access to and do it over and over again across k number of folds which is what cross validation exactly is and then finally you'll deploy so you have a model it works you've built it in your notebook but it actually needs to be able to be consistently you know predicting or classifying or helping the business every week every day every month so doing that actually means putting it in production which is a lot harder than you think especially because there tends to be a lot of inputs or features there needs to your production model needs to take into account how you feature engineer you need to consistently output those predictions and do it on a regular cadence and make sure that that entire pipeline works successfully on a regular basis and so digging into this a little bit more I've noticed as a product manager that there's two things that typically go wrong the most which is the first and last thing I think people might argue that data preparation or understanding the biases as well is super important but for today I'll go through the two examples or the two pillars of where things go wrong starting at the problem statement and success all the way to actually deploying that model and making sure that it's successful I alluded to this sort of organizational readiness framework a little bit earlier but for me this is tends to be a problem when you have a young data science team you're trying to get your footing around how to use advanced analytics in your organization and so what this pyramid is is like my interpretation of how you assess organizational readiness starting at the bottom which is the problem followed by the data infrastructure needs then the metrics then the experimentation and then finally the optimization or actually the machine learning that goes on top of all of those core components so starting at the very bottom it's really really important to understand the problem what problem are you actually solving for the business and how are you gauging that the use case is amenable to machine learning I think that problems where machine learning tends to do or use cases where machine learning actually thrives will go into the very really clearly later but just to give you a little bit of a taste it tends to be when you know training data is easy to collect you have a lot of it it's representative of that population there are fast feedback loops you're able to measure you know the implementation of your machine learning model versus something like a status quo understand how much better it actually is towards these business outcomes and then make a conclusion that it makes sense to actually move forward with the machine learning model in a specific use case so in framing it you really have to ask yourself does this problem have fast feedback loops or ways to assess if we're making better decisions because of this model do we have a lot of data to train for this problem is that training data representative how easy is it to get more training data and maybe maybe even more thoughtful would be if there are tons of you know failure cases around where the model doesn't assess you know specific classification prediction whatever maybe well is it easy to get more data on those errors or failure examples to asking yourself that question is important because if your model consistently doesn't perform well in a specific use case how is it easy is it to get that data to make it better or improve the model and what exactly would success look like for you when you actually output this model you have a handful of performance metrics and how that ties to something that the business actually cares about the second is actually assessing the health and of your data and infrastructure and so for me this means outside of the kind of examples that we've talked about how high quality is the data that you're working with you have an ability to stitch together data that have commonalities and and be able to actually put together a feature set like features set of features that you would use for this model how do you even define data quality in your organization make sure you document that and circulate it and that might mean that you know you go through a subjective assessment and then you sort of go through ways you can quantify that in your data set and how easy is it to maintain I think there are lots of research there's lots of papers and actual you know experience out there that says that most time spent is actually around cleaning and making sure that data is accessible and ready and usable and so assessing that is really important to seeing how much of my time of my team's time is going to be spent just making sure that data is even usable and that data and infrastructure is healthy enough to build on top of the third type of assessment that you have to do in terms of organizational readiness is understand the metrics I think one of my mentors had shared this with me and I think it's really a nice way to assess you know the the value of the model is just apply a simple heuristic to advance the problem forward and and apply heuristic or an equation that you think you can just do based off of heuristic or human intuition and then from there sort of take a step back and say applying this heuristic am I even able to measure how it is better than the status quo and if you can't articulate the outcomes of applying a heuristic and a measurement and how that measurement is better than the status quo I would suggest you don't necessarily want to jump into machine learning because you don't understand what metrics you're actually comparing against if you can great at least you know what business metrics to improve what that measurement looks like how do you compare to the status quo and how you advance the problem forward the third kind of component here in terms of metrics is understanding your performance metrics and how they relate to this business metric prop this metric business metric improvement this is important for teams and morale overall this idea that they need a north star around how their outcomes tied to overall business value but also more importantly is you have to sometimes make these decisions around this set the this this team is done working or doesn't necessarily need to spend as much time working on this model or set of models let's sort of move them over to thinking about something new or a different problem and only when you understand performance metrics and how they tied to outcomes can you actually make those resource shift decisions effectively the fourth you know piece of this this pyramid is experimentation which is how do you even know that they're your models ready to be put in production and how does it compare to the status quo what metrics are you assessing under what time horizon and what is your experimentation framework and so this is for relevant for companies like Netflix that actually have you know some level of an you know model that they want to put into a production because it affects something like customer conversion in terms of what you know folks actually watch on their platform you sort of test it you have an experiment that you're running you assess that in either like a confined or dev environment or you tend to kind of field it out in a smaller set or smaller geography to understand what those conversions actually look like and under what time horizon understand the demographics that are present there and all the biases make a conclusion on those experimentation results and then eventually roll it out to the rest of the population so just understanding your experimentation framework and how you go about making decisions is something really important to document understanding the fallbacks with that experimentation framework and then and getting that alignment within your organization and now finally once you've understood all four of these things it's time for Amal at the very top because you've understood your the problem you're solving the data and infrastructure health the metrics very simply that you'll compare to see your advancing the problem forward experimentation and now optimization i i kind of threw this caution but folks tend to over invest without really understanding how much cost is associated with acquiring the data infrastructure resources to build out and maintain the model and production environment i'm a huge sort of fan for simpler is better but making sure you understand all those things and then sort of end with with machine learning at the very top so i alluded to two sort of parts that go wrong with with ml the first is just kind of understanding the problem and that pyramid structure but the last kind of piece is actually the iteration overall of machine learning and putting something in production and iterating through this process of data model training test evaluation of models actual outcomes putting it in production and then recycling through that entire process over and over again these types of cycles are relevant for small companies that need to monitor the health of their models are probably way more complex meaty than in companies like wemo google facebook etc that have these like huge models that they're actually working with and all of those models data inputs actually change on a regular basis so you need to be able to monitor it pretty pretty well and so from the very top here the data itself requires a really clear strong understanding of the annotation schema that you use to actually train your model product managers should have a strong understanding of how labeling works in the organization and then overall improving your model generally involves two things understanding those failure cases that i talked about and then making sure you have more data that you can acquire that helps you get better at those specific instances and then rebalancing your data set based on any biases biases that your model has learned and so just to take a huge step back one thing that i would love to kind of leave this group with is pms tend to need to know those biases and data assumptions pretty well so they know how generalizable things are and as new data comes in they can understand how their current biases might you know apply or not apply to new data that's actually coming in the second is actually the the model itself and the two examples here sort of call out in terms of complexity of maintaining an ML model is actually this idea of experiment tracking so large organizations will run tons of experiments but it's really important to understand different versions of the model the parameters the hyper parameters that are actually going into these different versions and being able to track well which one of them actually performs the best and with so much complexity it tends to be it's really important to sort of maintain the versions of different models and have the health of the system maintained in a nice way there are tons of tools out there now that help you track these types of things and and version control if you will but the the complexity as organizations get bigger and bigger definitely increases so having that monitor well tracked well and transparently communicated to organization helps you make make decisions the third thing in terms of ML complexity things that go wrong is actually the evaluation so it's really important to be able to visualize the outputs of your models you know metrics that come out I think a lot of organizations tend to proliferate or have too many metrics so it's important to make sure that there's a clear understanding around what metrics matter and how do we go about making decisions how are those metrics tracked how many people have access to those metrics and do we need to change them at all moving forward how do you dig into failure cases observe them explore them you know get more of those failure cases and then repeat the cycle over again the evaluation part tends to be where there's a lot of complexity around you know making the model better digging in further problem solving required simplification required so a lot of the larger organizations do heavy ML tend to spend a lot of time on eval and simplifying it and streamlining it for for organizations to have tons of data scientists that are moving in the same direction and finally the last piece here is just around production or monitoring the model health as it is put in production how you evaluate new data coming in and what edge cases and biases that you actually need to watch out for we talked about this a little bit so I'll move on so to kind of synthesize what the role of an ML PM is I have done all of these things so I guess it's it's something that I feel like is important to understand in terms of what your role is within a data science team the first is really just setting the problem statement what business problem are you actually solving second is actually validating the org readiness so how do you measure success and is that data in for a metric experimentation pyramid ready and what is the cost of actually maintaining it so that you know that your organization is ready for an ML use case third is ensuring that the value is measurable so how do performance metrics actually tie to business outcomes so that you can understand and sort of work through that resource alignment question that we talked about fourth is actually defining requirements so this is one of those things that we didn't get a chance to talk about today but I will put out more content just around how we can how product managers do this around writing out a prd for a model and for me that's a lot about you know does that model need to be interpretable who actually uses that what decision isn't making how complex does it need to be how what did the performance metrics need to be for it to be sufficient to put in a production etc defining those requirements both from outputs like we described interpretability complexity performance metrics etc in addition to the inputs is the training data representative what kind of training data that we need how labeling work in this ideal state what biases and labels do we need to avoid those types of requirements are always outlined from a pm perspective in this space fifth is experimentation success so validating that go-to-market strategy biases and when models should be deployed writing out experimentation framework how you would actually test that experimentation works and then eventually roll it out to the rest of your population sure a lot of pms and roles like netflix probably spend a lot of time thinking about this and the final pieces around resource alignment what problems should the data science team be solving today next six months next one year from now and i think the the cherry on top is are we ready to be a you know solving those problems given that pyramid that we talked about one year from now or do we need to start preparing our data metric experimentation etc so we're able to solve that data science problem one year from now i think i've talked said the word performance metrics about a hundred times and so performance metrics are in the simplest sense the actual model outputs they're a handful of classification regression metrics that are out there in the world you have to have a really good understanding of exactly what these metrics are and how how you know if they're good or not and then how you can actually take these metrics and tie them to these business outcomes and so having a good heartbeat of these and studying them if you will is a good way to sort of get enter the space and speak the language so that you're able to be thoughtful around connecting this to a little bit more of the qualitative problems that come up in an organization i also talked a little bit about where ml typically tends to go wrong versus right and i think that it tends to go wrong for a lot of the use cases i've seen before and so i've sort of done my my classification of these two things the good use cases is are when ml itself is the product when you have b to see large-scale products you're able to run the ab testing or experimentation to know the outcome of the model and how that compares to something like the status quo um when you have small changes and that it has a big impact so if you are running you know google ads and there's small changes have an impact of millions of dollars in revenue you can sort of have that data science team that really focuses on sharpening small changes to the actual model so that you can reap those benefits in terms of large-scale revenue impact to the business fast feedback loops are really important so that you can train collect more training data and assess a versus b clean for classification it's easy to get training data i think those use cases are awesome for for machine learning and then finally ml big bets this is sort of the way most of the world where the ml itself is the product and it's a good application of it bad use case are just as important to sort of outline seen a lot of teams actually try to do b2b account acquisition or conversion with really limited training data uh not as much historical data number of points in terms of training that we you can actually use to determine something like a lead score for account acquisition i've tended to see those things don't actually turn out great um so that's that's what i would characterize as a bad use case any sort of pricing strategy where you only have so much months so many months of data or so much training data is typically a bad use case anywhere where you don't have the experimentation to test value or i think this last piece here is a super super nuanced decision making where you can't even get a foot set of people in a room to decide what is a good versus bad outcome or have that consistent training data to be able to say the super nuanced human decision making doesn't have a clear labeling that we can actually apply so that we can apply that to you know a next set of unseen actions so the takeaway sort of wrapping things up are ml product managers and this is sort of that second type of product manager that we talked about at the very beginning is um having a really good heartbeat of the problem the readiness of your organization the model bills and how it actually works the metrics that come out of a model the evaluation so how people make decisions and then how do you actually put it in production um understanding that entire iteration process and mapping that is typically the first thing you should do when you get into an organization not only to assess that org readiness but understand the problem and then understand where there are bottlenecks where things are breaking that you can help influence and and sort of help move smoother number two is actually understanding outcomes and knowing your performance metric so making sure you take the time to read up on the statistics that you need to um so you can speak the language of your team third is actually understanding the data assumptions so you have an understanding of the biases of both the inputs and the outputs and when you data comes in you can inform generalizability based on your knowledge of the biases fourth is just practicing that ml toolkit of problem you know data infrastructure metrics experimentation and model you know optimization in traditional ways where you can apply that even using something as simple as a heuristic and seeing if you can put together an experimentation framework is something is a place to start and then just keeping up to date on foundational skills new trends new things that are coming out and and being able to bring that to the table as a as a way to shape the next six months one year for your team we did go into a lot of you know technical concepts like you know the linear models and the basics of them ridge versus lasso what are what is a neural net three base models and all of these technical terms that will probably come up in your in your time as a product manager i plan to put out a series on just simplifying or stripping it down into the most simplest terms and walking you through a series of of models themselves and how they work and and how they're applied in terms of real world applications and so follow me on twitter where i'll be putting that content out but hopefully this is the first good step in terms of overview and then we'll sort of dig into a few topics that are important to understand our technical topics and and get into the details of those so thank you for the time um