 Hi everyone. Thank you for joining me today. My name is Natalia Kuznetsova, and I will talk to you about product management, machine learning, and artificial intelligence products. Before we start, let me do a quick introduction so you would understand where I come from and why I would want to talk about machine learning. I have over 10 years of experience with product management for the last two years I've been with Meta as a senior product management. Before, I worked with boogie.com and a search engine called Yandex. Most of my career worked with products like search, personalization, monetization, that rely on machine learning. Hence my interest in the topic, hence the topic of this talk today. I also hold an MBA degree from Bocconi University in Milan, Italy. We always start. If you have any questions, I would like to get in touch with any other reason. After this talk, please reach out via LinkedIn. This has been linked to my profile. And then let's get started. I would like to structure this talk today in the following way. First, we talked about machine learning products and how they're built and what's so special about the development process. To better understand the laser foundation for the additional expectations that the my product managers have. And finally, what a different path available for you on your own else to become a machine learning product manager if you choose so. Okay, so machine learning products. I would like to start there talking about the terminology here. We can talk about machine learning. We can talk about artificial intelligence pretty much interchangeably for all kinds of purposes, especially in the context of business, marketing ourselves, but strictly speaking, we need to understand that artificial intelligence is a broader field and a broader problem than machine learning. Artificial intelligence is about teaching computers to think broadly. And machine learning is about teaching computers to perform specific tasks that they can learn from data. Machine learning is often leveraged to achieve tasks, goals of artificial intelligence, but strictly speaking, it's not the same. However, as I said, you can use them pretty much interchangeably. However, when you talk to your melt development team, typically you would want to use terms like machine learning is more common, at least in my experience. Okay, now a little bit more about the role of machine learning in different products. Here try to visualize a scale which reflects the importance or criticality to weight the machine models have in different products. In one side, on the left side of the scale, you can see that the machine learning pretty much is a product. It has a thin wrapper of user interface around it, but still most of the functionality is coming from the model itself. So the examples here would be generate URIs, such as charge GPTO Dali, or Google search engine. The model is the product here. On the other side of the scale, you can see, for example, product recommendations on a commercial website like Amazon's, or recommendations on Netflix. So even though the core task or the core functionality of the product not necessarily related to ML, ML enhances this and makes the experiments more engaging for the end user. So in that case, machine learning is a nice to have, but usually pretty important for business outcomes anyway. And then you have many products in the middle. I just put two examples here. One is virtual assistants that rely on speech recognition since of this. And then we have robotics and, for example, self-driving cars that rely on computer vision to be able to function autonomously. So in this case, machine learning is half of the functionality compared to everything else that's included in the product. So despite the wide difference of ML functionality within the product, the good side is that the development principles for all this machine learning models remain the same regardless of the product. And this is what I'm gonna talk to you about now. So what do we need for machine learning? For developing machine learning models. Broadly, there are two types of resources. The first two are the building blocks, things that you absolutely need to actually be able to build the model, which is data and the algorithm that is the code that allows you to train models. This is on like abstract philosophical play, so to say on the practical side, you need the infrastructure and people with dedicated skills to actually implement that. Now let's dig into each of them a little bit deeper. Data. Data is the foundation of machine learning. If there is no data, there's no machine learning. But what do we mean by data here, most specifically? For the purpose of machine learning, we talk about data as a structured information about the entity that we want to model or predict and the signals that we are gonna leverage to achieve that. To give you an example, say we want to predict clicks on search results in a search engine, clicks would be the entities that we model. The technical term for it would be a label. And all the user behavior information, all the information about different search results shared in the query and the search query itself, those are gonna be signals. Technical terms are gonna be called features. So we're gonna leverage features to be able to predict the label. That is the click. When I talk about structured data, that's what I mean. This example is a Boston housing data set, one of the public data sets you can find on the internet. So here you can see a list of different features. You may say different characteristics of houses in Boston area and you have the label here as well, which is a price. So this data set is used for like benchmarking or testing different types of models or modeling approaches because it's a public domain, leverage the features to predict the label. You can see here's presented as a top separated file. So this is how you can think about any training data set. Then algorithms. In short, this is a code that actually allows you to learn from signals or features to predict the label. It could be as simple, quote unquote, simple as linear regression or as complex as deep learning, meaning neural networks. There are many different types of algorithms. It's not that one, anyone is better than the other, but different tasks and different circumstances, you would want to use different algorithm. Here are just three examples from linear regression on the right, sorry, on the left, neural network on the right. And in the middle you have an ensemble, which is a technical term for a combination of a few decision trees, which are independent models and then combine in a bigger model. Moving on to the plane of the real world, how do you actually make this happen in real world? So first of all, you need infrastructure. If you are going after complex advanced models, or if you are going to serve them at large scale, you will need dedicated data storage and compute capacity. This is one of the reasons why ML can be expensive. And finally, the foundation of all is people. As with data, machine learning is impossible without people. You need to have dedicated talent, specialized talents that will be able to actually train the model, leveraging the data. This role is called differently different companies. It's some companies called data scientists, it's companies that's called machine learning engineer, but these are ultimately the people who will be training the models. On top of that, typically you also need data engineers who are accountable for procuring high quality data and infrastructure engineers that need to make sure that the entire stack works and to end from data storage and processes, to model training, to serving the model in production in a efficient reliable way. All this role, all this talent, is another reason why ML is expensive. So now let's talk a little bit about how the machine product development cycle looks like. So outlined here are four steps. We can actually think about the three steps. So first step, this is about training a model candidate. So here there are two sides. One is about preparing the data and other one is actually selecting and fine tuning the algorithm that will then produce the model. So preparation of the data might sound trivial, but in practice it's not. So aside from cleaning the data, making sure that there isn't much noise and so on, the non-trivial part is typically creating the relevant signals, the relevant features for the models to leverage during training. And this is quite a creative process. That's why data preparation is an important step of any machine learning project. And then we have training in the model here. We need to select the right algorithm for the problem and for the data and we need to tune it without going into technical details. You can spend quite a bit of time on improving the algorithm even after you selected it. So let's say you prepare the data, you train it in the model, then you evaluate it offline. Evaluation of flying typically means that we are looking at the prediction errors. No matter what we are predicting, we are classifying the just to want to pinpoint the specific numbers and regression task doesn't matter. So there's always going to be some error in predictions and we just want to get it to be small. As long as it's small or smaller than the previous iteration of the model had, you might choose to consider its success and test it in production environment. The most typical case of test in a production environment would be an AB test, a split test. When you test your model, either against no model or against the previous version and you observe the relevant metrics as a business metrics. So any future success, but you move the metrics in the way that you wanted them to move, you should put it to production. And typically you start all over again. That was the happy path of the development process. Typically it takes more than one iteration between training the model, evaluating the flying and then usually going back to retraining the model because the performance could be better or it's not what you expected. So it takes a few cycles there usually and then you can also get more than one cycle between getting a decent result offline and getting the results that you wanted online. If it sounds counter-intuitive, okay, not much, I can double that overall you would expect a fair positive correlation between offline and online results. However, you cannot count on them in 100% of cases. There are many reasons why you can observe this discrepancy. One reason could be there are some technical problems in this experiment. For example, the model didn't work as expected. For example, it wasn't getting the features, the signals that actually compute the score nor can expect that there was some high fallback rate or some simplistic solution behind it. Or for example, if you are testing your model, embedded as a small feature in the broader interface, there could have been some UI changes that change the visibility of your module, which might be harder for you to actually capture the effect that you wanted to see as an experiment. And so on, so many different reasons why you can see discrepancy between steps two and three. In most cases, you should see them align. If not, usually it's good to prepare like a list of processes that you would want to check if something is not going as expected to go. One more thing to say in this slide is typically it takes a few weeks, if not a month, to actually get success. So a model typically does have long development cycles. And to summarize the differences or special aspects of machinery and products. So on one side you have high performance that comes at a high cost. And this is what makes them special. So it's expensive. So you want it to perform well. And high performance is what uses a wow effect for your users. Typically the performance you can get with sufficiently advanced machinery models, you cannot get it in any other way. You cannot write rules, you cannot do some simple computations and put some, I don't know, magical constants in place. Advanced ML, you can admit it. So if it's sufficiently impactful for your product, for your use case, there's nothing better than ML at the end of the day. And as a plus of a male is that it can adapt to a changing user behavior or any other like environmental behaviors. Assuming that you retrain the module regularly to supply data that contains this new behavior, this new patterns, then it scales well. For example, when you started with this one user segment and then you want to apply to another user segment, it should scale well as long as you have sufficient capacity to do so. And if you actually aced machinery and you really tuned it well, it works really well. It gives a sustainable competitive edge as a business because it's not easy to do. And also this is the scales that you can apply to other problems, leveraging the same technology and get another great outcome there. So that's why ML is about high performance. But this high performance comes at a high cost. As we discussed, already three class specialized talent and this is a talent that you cannot repurpose for any other project. So I'm sure engineers not going to work on front end and not going to work on back end. They only going to work on machine learning. So if you hire a machine engineer, if you stand up machine learning team, you need to make sure that you have a sufficient backlog for them to work on for a while, not just one of project. Then talk about infrastructure. So if you are serving complex heavy models at scale, you need to make special investment into compute capacity and data storage. For long development cycles, this might be uncomfortable for people, for example, who use two small step optimizations, which is a very effective optimizations strategy on its own. But here, together with the opacity that was put here as the last bullet point, can make a lot of people uncomfortable who don't have experience working with ML. And finally, maintenance. Machine learning is complex. And there are a lot of things that needs to work together well for it to be as effective as was intended to. For example, you need to make sure that all the important signals are always delivered to your model in real time. So you can actually compute the scores and whatever you need to do is on scores. Your model needs to be trained regularly to capitalize on the first signal, fresh patterns, carry things in the data. All of that needs to work well. This is just only for production serving, right? So maintenance and monitoring and debugging is going to be another expensive part of machine learning. Okay, so we discussed machine learning products and what is so particular about them. So now let's talk a little bit more about the role of machine learning product managers. So to start, it started with good news. Good news is machine learning product managers are still product managers, which means that all the core competencies, the expectations are the same. So it's about taking direction for a team, defining the measuring success and supporting the team throughout the execution. If we were to summarize it just one word, created extreme priority for all of your stakeholders, the team, sponsors, dependencies, partners, all should be super clear on what you are doing, why you're doing that and how you're going to reach it. However, since machine learning is special, so there are also a few extra requirements that MLPMs don't need to fulfill. I put them in four buckets. Cannot say that this increasing or decreasing priority or importance, all of them are important. So everything starts with actual understanding of that. To be comfortable working with a machine learning team, selecting the right products for this, sorry, selecting the right problems for this team and setting the right expectation of your stakeholders, you need to understand how the tech actually works and how the development cycle actually looks and there's no way around it. If you want to be effective, you need to understand the tech. I'm not saying that you need to go and get a degree on computer-sized specialization machine learning, but you need to understand the basics. As a next step further, you'll likely have to also champion the tech and the product that you built and your team because despite the wide-enwinded application of machine learning everywhere, there are still a lot of people who are skeptical about it or maybe overly optimistic about it, so you need to be able to manage the expectations of both these groups of stakeholders. Then the regulations. Regulations, there are already some regulations in place that will impact your work as a machine learning product managers and more likely coming given the boom of Genevieve AI. So you need to keep an eye on that because they will limit your access to certain data and it will impose stricter requirements on how and when you can deploy a man and what criteria it should meet the features to yourself. And finally, you need to plan for scalability. This goes back to the point about it might be expensive, so you need to make sure that you are making the most out of it if you choose to invest in it. Okay, now a little bit more on each of the access here. Understanding the technology. As I said already, it's my view vital for success in this role. You need to understand the fundamentals. This goes to understanding what good problems to solve with MLR. Because not everything can or even should be solved with ML. Good ML candidates are project candidates are about identifying complex patterns in data, especially if the environment is changing. For example, use of behavior is constantly changing or the inventory of products is changing all the time. And you have a lot of data to be actually able to learn from them and those kinds of patterns and use them for your purposes. If something can be addressed by one of analysis and well-written rules or heuristics and magical constant, it's okay. You don't have to do it with MLM. So you need to be very clear about when it's worth investing in ML. The next thing here is to lead the ML team effectively. You need to build trust and you build trust by showing the understanding of the work and the complexities or the nuance. It also gives you the foundations to, how should I say, to challenge your team in the right way to make sure that they're still working on the right problems or that they're focusing their efforts in the right areas. So, again, the same is important. And finally, ML is a super powerful, amazing technology that ML models are still models, meaning that they are a simplified representation of reality of some process and sort of phenomenon, which means that not all the nuance is going to be picked from the data, no matter how advanced the model is. And if there are any biases in the data, ML will amplify it exactly because it's just like roughened up or what is absorption in the data. For example, if you're building a tool for helping recruiters identify more promising candidates and you're looking at historical data of hiding in tech, and if you are not careful, your model might learn. For example, women are not good candidates for technical jobs because the majority of technical roles in history are occupied by men. So this one example of how your model could amplify the historical bias. So, in the end of the day, it's your job as a product manager because model is going to be a product to make sure that the product works as expected. It's evaluated objectively and is used responsibly. We can say to know how it's one of the things that you need to keep in mind when working on a machine. Especially in sense of details. Next one, evangelize by champions attack. I already touched upon that before, but ultimately in many cases, machinery and designs are perceived as a magical black box, completely opaque, or as a magic wand, like it's not a limitation, it can do anything. Both are always quite extreme point of views. And the people who viewed as black box typically also distressed at least to some degree. And people who are overly optimistic think it's a magic wand, they haven't really seen expectation about what they achieved at that. Often, the job of educating your stakeholders on the actual benefits and actual capabilities as well as drawbacks and costs of machinery is going to be the end job. So again, this goes back to understanding the tech but also a little bit of soft skills about how do you change that to the right people and get the buy. Next one, regulations. So as we discussed, the two fundamental blocks of ML is data and algorithm, and both can be affected by existing or upcoming regulations. That's why it's important for you to monitor them. First of all, be up to speed on everything that's already in place and also keep an eye on what's upcoming. There are a lot of regulations regarding privacy and in some cases, complications that may limit access to the data that you can use for training models. And there are other regulations that are upcoming most at this point and may impose specific requirements on the transparency of algorithms. So I guess I don't have to explain to you that you need to comply with these laws but I'm starting to recommend taking it one step further. So laws are coming from a place of actual concern, right? I mean, some problems been observed in the past. For example, we have excessive use on transparent use of user data for training algorithms that modify the procedure. For example, we were absolutely proprietary algorithm mainly like opaque to anyone being used for making college admission decision. So those are real problems that actually exist in the real world. So the regulations are continuously catching up with all of them. So I suggest that you comply with these regulations and maybe you even pre-empt those regulations on your site to put in the new controls in place and the transparency features because this is how you build trust with your end users as well and also how you build trust with your transparency folder such as regulators if you have to. So a little bit more about types of regulations that need to be aware of. First of all, it's data privacy laws. There are a few examples of them. GDPR is Europe is one California Consumer Privacy Act and the other one that also pretty much in any part of the world right now exist some regulations that covers how you cannot, cannot collect and use user data and what kind of control should you put in place for users to be able to agree or disagree with the use of those data. Then there is antitrust laws. They're probably more relevant if you're going to work for large companies which occupy a dominant position for example in online advertising or commerce because that would prevent those companies from using the data from the market where they're already dominant and there are rules they position in another market that they're trying to enter. Now artificial intelligence you're probably all aware or you probably have heard about progress with artificial intelligence genitive artificial intelligence the most specific and the major public debate around it. So in many different regions there's a lot of work happening preparing new regulations that they'll govern how AI is used can be used or cannot be used and under what conditions with what transparency mechanism plays so I'll keep an eye on that it's going to be developing very fast I expect and one more which is recently it was like a really recent development and really caught my attention I think it's very interesting that the genitive AI discussions revealed or maybe highlighted is a vision of copyright for example, genitive models ones that abuse say image visual video data trained on data script from internet and a lot of those script data contained work on many different artists, designers and so on and those data the script is used without their knowledge or consent and then there is a threat that these genitive AI models will be used by people to actually circumvent like designers or artists instead of going to them, people are going to use these models instead so the work of those designers and those artists was used to actually reduce their potential impact in the future so a lot of discussions about the copyright or for training data, for genitive AI I've been happening lately so I'm really curious to see where it will end up and you need to look at it sorry, you need to follow it if you're going to work on ML in this area because of my impact what data you can actually gather for training your models and last one scalability so one of the points that I've been making is that ML is specialized and ML is expensive so you really need to make sure that if you invest in ML your company engineer will be sure but you as well as product manager will represent the business side in development you need to plan for scalability how you're going to use the models or the team the team's capabilities and the impact for the company so on one side you should be considering how you can scale in this institution, developing even a single model is really good typically quite long and heavy investment so you might want to think about different ways you can use output on the same model for different use cases at the same time you should always think ahead of what other problems your team could solve and start preparing those cases in advance so you get better and better ROI on machine learning investment that was the four points that I wanted to stress about the efficient expectation of machine and product managers and now I would like to go and talk about a little bit more on different path how you can become a machine and product manager if you chose so here is the three options that I identified in I would say in the order of increasing complexity or maybe boldness the most obvious path is of course to join companies that already have machine learning typically those companies also support on the job training and transfer between ML and ML team so if you join the company as regular ML time transfer to a machine learning team and get ramped up there usually the companies that do ML are also large companies and they might have also associate or rotational PM programs for people who are just entering this specialization for them this option should also be available another one and this is the option for most senior PMs is to pitch investment into ML in the company where it currently work if you have enough sway and you believe that you identify a good portfolio or machine learning project as it could benefit your company you can try to pitch it to the business case and see if you can persuade the leadership to invest and finally this is for the boldest you can always consider starting your own ML company if you have the right expertise and you think you see a nice business opportunity for that I would recommend trying I think it would be amazing and this is all for me today so let's quickly recap the key takeaways that I ultimately want you to remember is first of all that machine learning PM are still PMs so creating a clarity for those stakeholders on the priorities and success definitions and the next steps is your bread and butter however if you choose to work on machine learning there are a few more things that you will have to do first you need to bring yourself up to speed on the technology its requirements its capabilities and limitations and be able to pitch that technology and the product and your team to all the relevant stakeholders for the existing and upcoming regulations because it can impact your portfolio of data and your portfolio algorithm that you can use and you can always plan for scalability to get a good ROI on machine learning investment for a company and finally there are different paths available to become a machine learning PM this is one of them will be to join a machine learning company and over time if not straight away join a machine learning team and this is it from me today thank you for listening I hope you heard something new and interesting for yourself and if you have any more questions or just like to get in touch please reach out via LinkedIn that is it thanks again and have a great day