 Let's get started, so as it says, my name is Ramana and I'm here to talk to you about my own experiences in practicing machine learning and how we eventually productionize it at Symantec 3. So this, I'll just give you a short introduction to Symantec 3, promised to be the only site which mentions it, but what we do is we help e-commerce focused companies, we provide a lot of data solutions, intelligent solutions and some of those are just there, so we do automated categorization from text to different categories, we do a lot of NLV problems where we want to parse the description present and get our structured data as much as possible. We also do a lot more, like unsupervised extraction, product matching, we also have very talented distributed crawling engineers, one of them is also here, where we help try to index as much as possible, so that's about Symantec 3, but let's get started to the top right. So all of you might have heard about it, it's happening, right? So the whole machine learning revolution, people are talking about democratic AI, if you have the password bingo sheets, that's one word for this conference, so let's go ahead, you might recognize this picture or you might not, the clue is at the edge, so these are the TPUs, our 10th floor processing units as Google likes to call them, and this is the AlphaGo system, which was used to beat these at all, so this is like the future, just go to replace us. First, this is from ImageNet, where the most recent results for a few years seem to be doing better than human level accuracy rates, images are being classified by robots much better than many of us here. Google never wants to be left out, this was from their IO, where they had automated learning throughout all their systems, so you literally point your camera and then Google lens figures out all the information that you're seeing. Amazon not wanting to be left behind has its own personal assistant, the Alexa, which of course is entering all the phones and then the data power revolution is here, and all of this is happening and then the next day you go to the office and then you talk about all of these breakthroughs, and then your CEO or your board comes to you and then they say, where are the applications? You are my data scientist, right? So why don't you go ahead and build all these systems for me? And then you try and then you fail, because every time you see all of these fancy breakthroughs, there seems to be this gap which sort of jumps, you need to cross and you need to jump to make these applications a reality, and that's something which most of us face, or I think I face a lot at the start, and let's just go through it. So this is sort of like a journey of how everything turned out in the way it is. So there are problems about when you start off, you start hearing all these hard problems, you face all the hard problems, you need to talk, people talk about pre-processing your data, training, so which models do you choose, how do you start building them out? You choose this framework, you choose another framework, how do you scale it out? You've had a number of doubts here, even distributed systems, distributed machine learning, all of these ideas of how you take these models and then you scale them out for the next billion users, and delivery. You always are interested in your users, so how do you reach them carefully? So whenever you start off like, you see all of these problems, and then on that note I have like some bad news and some good news, right? So the bad news is that all the companies claim to be unique, they claim to be a special snowflake, the bad news is that you're not, I don't think that you're that unique, that your problems are so hard, but again, flip it around, I think that's also the good news, right? Many companies end up facing the same problem, and then most of them have worked out all of these solutions, independent of each other, pretty much duplicating their approach. So that's sort of like the single line in which we really need to bring about and start discussing all these problems which have been solved over the years really need to be shared. And this brings me to a story also. So a few weeks back, some of my friends drilled this idea into my brain that there's a mountain out there that we need to climb with. So and then I get into this, and then this is what, half way, three days later, I find myself like staring up this peak. It's full of sand, you slip every two feet, and then I'm like thinking, what have I gotten myself into? This seems like a really hard problem to solve. And then this other guy on my expedition, he just slapped me on the back, and he says, look back for a minute, and then yeah, this was the view. At that point I was like, okay, maybe it's worth it. So even when you start off, you see all of these hard problems that you need to solve. But then looking back is always very, very beautiful. And that's sort of the point here also, not to just look forward, but also reminisce on the distance that you've covered so far. And maybe that helps you to motivate you to go further. So this stuff is structured in four parts. Not necessarily a play button, but some structure to it. Data, then data, and then data. So good name with Christ. The whole idea is that you have to start from fundamental, and then the idea of being a data powered solution is what needs to be the first thing on your mind. Your first model on how you go about processing your data or rather building your first training model, or you think about the integration. This is again something that is often overlooked by data scientists but which I think is a crucial part of the whole ecosystem. You need to be able to stop with your other systems in your company or in your own model whenever you build it. So integration is something that I also like to focus on. Of course, onwards and upwards, let's just see where do we go from there or what are the final points that really need to be discussed. So let's start with data. First, there was nothing and then let there be data. For that, we are a computer scientist. So let's start counting from zero, so day zero. What do you do? You go into the office and then there's a problem that needs solved. Then you decide that's when I start collecting my data sources. Then I start identifying my metrics which need to be known to solve the problem and then hopefully just go ahead and solve it straight forward. But the idea is that this is not the optimal order at least in our experience. We need to start with the collection first. It doesn't really matter if you have a realistic problem that you want to solve. The idea is that we start collecting as many metrics as possible within our own company whether it's within the engineering systems or the customer facing systems or just simply because it's there. So just collecting the data and identifying your metrics, I think simplifies the later stages. When you have a problem to solve, you can short-circuit it. You can go from problem and then solution. So identify your metrics and start collecting them because I think by far that's the most effective way to become a rapid product of the system. So think big. This also ties in because storage is cheap and data is not. In that sense, I mean that the next time you say that you're running out of it, just go ahead and buy a few more terabytes of partners. They're very cheap. But then realizing that you needed historical data for the last three years and you see that on it, that's going to cost you a lot of money. And then there are numerous data sources out there. There's the UCI machine learning repository. There are so many companies which are open sourcing their data. Even today we have the development initiative products discussed where there's just so many data sources that you really need to think outside the box and be able to identify the relevant sources which migrate it to your problem. And another interesting aspect here, I think jumping ahead, but it's about pre-trained weights. So some of these companies claim very specific initiatives where they don't want to leave their customer data but then they end up giving you pre-trained weights which are optimized to solve certain problems. In deep learning, one popular approach is that for most of the inception one, you just load it in and then you use that on top of your images. Sometimes it works fine. To think about pre-trained weights is also like a way of transplanting data which might have been used in some other context. Speaking of context, most of you will be able to auto-complete this sentence. It starts with the carbidin and then you know what's going to come out. So in that aspect, I think one of the important things to consider is sources of bias because many times when people build out these data and they see that the performance that we get during training or during our initial releases often is quite different from what we see in production. And that also is a way in which we realize that whether consciously or most probably unconsciously we as developers here find this we start identifying patterns and then we start fitting our data collection methods to those patterns. The sources of bias is something we need to look out for together with two other points like contextualization and localization. So what do I mean by that? The context is how was the data collected, what's it? So very popularly in psychology studies, the students, first year psychology students are the data set so you're only going to be focused whereas you can't interpret it as a general copy. Again, that's a problem. Localization, if you want to do consumer behavior in India, you don't want to look at economic history of the western world or maybe the US, it doesn't matter. And there was a very recent comic also from XKCD, the second time you're seeing it today I think, where you have data at one end, you're shoveling it into your system and just keep churning it until you start seeing some relevant results. Hopefully this is not how it's good but then most of your linear algebra is going to process your data. So once you have these data sets, there's a slightly bigger context in which you need to place them and one of the most important problems that we have seen is about unbalanced data fit. For example, as an e-commerce company say you're processing credit card transactions and 99% even higher percent of the time, the credit card we use is not a fraudulent case. Most of the time the customers are legitimate customers but then you still don't want to let out the fraudsters. So if you build a model which has 99% of one class, you keep guessing that class is going to be 99% accurate. Again, that's not very useful because it's just a hard coded one which keeps coming on your system. So of course there are other approaches to fix this. Resampling, I think we just wanted to talk about a few so there's under sampling where you make sure that the overrepresented classes were used. There are techniques like SMOTE which introduce synthetic samples from the existing ones and of course you can use variable loss function. The idea is to provide an overview of all the ways that are possible to actually solve these problems. And of course as I mentioned there are a lot of these framework which have been publicly announced by a lot of these companies and there's the same giants and shoulders. You just make sure to use them. So there's text. For example, when you start with text, localization, starports, dictionaries, embedding them in a vector space, encoding them, adding attention frame. Again, all of these are very standard practices which most companies are expected of the type of problem that they're solving in a text context. Just using these often gives you significant boost. And being able to incorporate these techniques sort of gives you a playbook. So again, it's not about building your best model from scratch. You've got to really ask for one of the standard best practices and I think these things really can translate across the domain. And of course for images, not wanting to be left out, there are a few techniques as well. You standardize them to 300 by 300 pixels that average mean to divide by the standard deviation. You are just dimensionality. Again, most of these are applicable across domains. And just being aware of the context in which they were used often gives you a significant advantage in how you can actually end up deploying them. So that was the quick run through of the data and at least the preprocessing techniques. So how do you go about, once you have this well curated, perfect data system, how do you go about solving? And then I like to call it add-on recognition and you might think that this is add-on recognition where you look at an image and then you process it using your computer, right? But that's not what I'm talking about here. What I like to think of it as humans be ourselves need to be able to do calming solving by practice. And what I mean is just being familiar with the number of approaches about here. So when you see something that looks like a tough spot, take a tough what do you do with it. So again, it's being able to take a problem, identify the mental frame in which it operates and then being able to solve it. Machines done by something called gradient descent very popularly. And I like to call it something else for humans. Maybe it's someone else who calls it graduate student descent. So it's not gradient descent. It's where a professor employs 50 graduate students to attack the same problem with 50 methods. If you do it yourself, you realize one after the other you spend 6, 1, 6, 1 attacking in different ways and then in the end of it you get a PhD. Graduate student descent is maybe the way to end up learning. This is something I found from PsychicLearn very popular. They call this how to choose the right estimator. I'm not calling it as the best mental model out there but the idea is that we need to be able to identify these clusters of problems which need to be solved. It doesn't matter if you can't read it and make sure the slides are up at the end and you'll be able to check it out later. Again, it's on PsychicLearn. There are classification problems, clustering problems, regression, dimensionality reduction. So again, you start with this mental model within yourself. You think, what am I looking at? Is it a model which is computing a number? Is it trying to predict a category? Am I just looking for patterns in the data? And the whole idea is that through experience you start to realize the models that you end up working in each situation. So we're modeling right here. So let's think of it as a problem. So let's say we are focused on e-commerce. Let's say we have the e-commerce website which asks us how do you end up ranking the products for a visitor? When someone visits the website what is the order in which you show the products? One easy way when they come to us we build a complex model and say, heat all your data here and then we end up solving it. But again, that's not the way things work right. The idea I would recommend is that talk with a simple heuristic. And what do I mean by heuristic? That's just the fancy of saying hard coded rules. So if you want a simple heuristic you can come up with a way in which you decide okay, I'm going to rank my products by the number of views that they receive on my website. Maybe that will get you half the way there. Maybe that will already start putting the right products in front of people. The products that other people are looking at. And maybe that might end up working for a few months before you come to me. Maybe that will help. And then you realize views are not enough. Maybe I should start sorting my products by the actual orders place. Maybe that gives me a better conversion. You go ahead build a more complex heuristic. You start hand tuning it. At this point you're at sort of a slippery slope. I would suggest don't do that. That's at the point where you are considered actual machine learning models. And then you start building a simple model. And then eventually you start working upwards. So the idea is that not everything needs to be perfect from day one. Maybe you start with a heuristic and then eventually you start realizing the limitations. And from there you can start taking things forward. So again, driving home the point again, starting simple. Simplify your first objective. And what do I mean by that? So when you ask a startup like a small company what's your aim? To make the world a better place. Fortunately we can't optimize that. Unfortunately feeding that into the model gives us a bad answer. So we decide we want to simplify our first objective. We decide we just want to optimize the conversion rate. Maybe that's a small enough number that we can work with. And then once you realize that you have a valid objective in mind, there's this trade-off which people consider initial also. Which I am slightly biased in favor of where do you need interpretable models? You start off with a black box where you feed all the numbers in, then it spits an answer. There was a talk earlier about explainability in machine learning system. And then this sort of things related to it. You would find out accuracy for explainability. My understanding is that at least initially explainability sort of takes you a long way. Feature engineering is again very linked to being able to explain the results of your model where you realize which features are leading to the conclusion. And maybe that's really the way to start. Of course this being the fifth element, the first four elements are standing on top of a turtle and then it's turtles all the way down. So ensembles of small models right? Ensembles of small models are what meant the most popular competitions. You go to Kaggle today and then you look at the top results. Most of them are just people who put together five models and then tune the respective model and then they end up winning it. So ensembles seem to have great promise and most of the time your simple model when put together is able to deliver significant results. So the next topic is slightly related to something which I just very recently learned. Maybe it's useful for you guys as well. It's about model calibration. Let's talk about this graph. It's like an ROC curve. On the x-axis you have something which you're predicting. It's a probability between zero and one. That's the probability of a customer journey over here and then you end up on the y-axis plotting the number of times that the customer actually churns. You do that multiple times you see how many times they churn. So say you're predicting 0.2 that the customer is going to churn but then they end up churning 0.5 percent of the time. That means your model is sort of an underconfirmed model. Ideally the best case is along this center diagram but you end up in this sort of model where it's underconfirmed. So generally there's an approach where you end up predicting 0.8 but then it ends up out of the 50 experiment, very clear of temperature and it's like 50 percent. This is a thing called a rare score. It's from this book called Super Forecasting. I'm quite sure it's popular in other domains as well but this is something which you can think about of how your models are calibrated not just in the context of probabilities of one object but across all the objects that you predict. Again looking at some of the results this is again the same graph and both of these models are 100 percent accurate. Both of them are on the straight line but the first one predicts its probabilities within 40 to 60 and it ends up being correct. The second one predicts maybe 0.1 and then 0.9 still ends up being correct. So both of these are well calibrated models but you would say that the first one is a partial system and then the second one is a decisive system. Both of them have the same accuracy rates but then these sort of ideas of looking at the model in a bigger picture helps give you a better understanding of how your model is calibrated. So again autocomplete here, pass performance, then you leave financial documents the first thing they say is not indicative of future results. So when you start building your model the idea is to launch first move past, make things, iterate next. And this I made in a lot of context especially in relation to the next section or integration so when you start building out the probability that the system might work on day one and then eventually the system will begin to iterate because people's behavior tends to change. The parameters that you trained on might no longer be applicable later on. So the idea is to get a stable pipeline in place and then later you begin to iterate on the result. Later you begin to see okay these things will eventually let's build a complex model. Again launch first and then later you figure it out. When you're starting with the model it doesn't matter if it's really simple as long as it works push it through and push it through to this page about integration and the idea that the ML code that you built is not isolated. And for that I have this great picture. It's from a point of Google published document which talks about how complex their systems are. So in all my slides if you see this sort of link over here eventually if you open the slides that link through to the actual content. So that should help you later when I post link for the slides. Over here you look at the data collection feature extraction data management systems on the left and in the middle you have this very small black box. You can't read it that just says ML code. So that's effectively the small component that you end up building over all this time. And then eventually you have a lot of these other components. You need to look at your serving infrastructure. You need to manage your monitoring. You need to manage your debugging and the process management. The whole point is that you need to look at the big picture with the small ML model. Not necessarily small but there's component connections to the other parts of your code. It's about an ecosystem which needs to be built. And I think over here at this conference the idea is also to bring power to this conversation between not just data scientists but also the engineers. Data engineers like to call them the whole field but the idea is to bring the conversation forward and treat them as engineering problems. Because blindly increasing complexity of the models increases the costs involved. The manpower cost, the time cost, the money cost, every single aspect of your project is going to be delayed if you just focus on optimistic model. I started out initially focused on the engineering side. So this quote at least starts going through. It says to machine learning like the great machine learning expert is I'm not here or so forth here. So the idea is that we need to identify the strengths to execute a successful product. It's not the aim to publish a research paper which gets another 50% improvement on the state of the art. And it's by Martin Vankovich. It leads to the actual problem. It's about pumping pipelines. The first level of your ecosystem as I like to think about it. The first level of your ecosystem, the plumbing pipeline, is about your data sources. You need to be able to get a solid pipeline. And there are excellent talks over the day which talk about how you stream data, how you use your data store earlier today, how you stream your data from your data store to your machine learning systems. It's part of the launch that I created because the idea similar to another ecological concern is to reduce reuse and recycle. Try to duplicate the data sources as much as you can between your systems which are used for training and when you eventually do use them in production. Or when you use them for serving, the specific is making prediction machine systems. The idea is that these plumbing pipelines need to be built not just within your own use or your small Python snippet for processing but think of it as a framework which needs to be used across multiple projects. Versioning and testing. And this is speaking as how you develop software problems, how you teach your software systems. So whenever a new CDO or a new head of hearing comes in, they throw out the whole system, they curse it and then they say let's rewrite everything in the new country language today. Is it modern? Yeah. So how they say minimize both system rewrites. And this I think is crucial when you start building the system in your own companies. The idea is to build something which can be incrementally updated. Think of it as using git over your model. Maybe you want a version control system which has the predictions over how many accuracy rates or how many features, how many parameters, maybe the data set over which it was trained. The time when you're in which the data set was generated. The idea is that being able to run a disk command over your models is quite crucial. And this often translates to a lot of other problems like production skew. This is something that you might not have heard of at least but the idea of production skew is how your deployment environment or your whole system which is running starts to deviate in certain ways from the context in which it was trained. So there starts to be this skew between how your system was intended to run and how your system runs. And one of the easiest ways to think about it is about maintaining selected features. And what I mean is that suppose today you're collecting the number of page views on the website maybe five days later the UI guy decides that sending back metric information about page views is slowing down the website so they drop that column writing. Then eventually your system will no longer be able to work with the data. And identifying these sort of features which are no longer applicable is something which people really just think about when they start building the models. And for that I like this diagram over here which talks about how you do deployments. So you have this development system where you work and then your code out of closed downstream to your actual deployment. And then the idea is that while code flows in this direction we need to be able to get data to flow upstream or in the other direction where whatever you're collecting is production maybe after some certain anonymization required and aggregation you sort of get the data to flow back into your development environments or at least a snapshot of the data where code flows downstream and then your data goes back upstream. And this replication of data sources is also tied in with our previous stage where we really want to duplicate the pre-processing steps involved. So when you start off and start using some of the techniques in development the idea is to try to see if just with scaling out the systems that they run on that you can use the same techniques in your production deployments. And of course monitoring is a very popular aspect most of us don't know about the DevOps engineers who end up going somewhere else and complaining about their synthesis. But the idea is that monitoring is crucial not just for performance and uptime point of view but also in terms of the accuracy of your models and maintaining its own capability to the problems it's solving. Logging your predictions and by this I mean not just doing your training where you do cross validation or you do your testing set and then you verify it against your own data set. The idea is to log it across the population that you're serving and eventually you start to see the trends that the model reports are consistent with your own expectations or maybe the logging eventually also alerts you to failures of your model which eventually you need to go back and retrain or refix. And there's also another aspect of this which I like about ABA testing but it's also like a finer point to it where it's better sometimes to hold your model from touching the users. And what do I mean is that it's connected to something called feedback use. Say again going back to the same example. You're ranking the products on your website according to a certain metric and it ends up that the first ranked product eventually becomes the most caught product. Now the question is was your ranking the factor that actually caused the product to become obsolete? Doesn't matter you just take your first product and that eventually becomes the truth. So again causation, correlation. So sometimes just ordering the products in a random way or maybe showing it to a certain portion of the users without touching your model. That helps you do what I like to call a cleaner data set. So this sort of cleaner data set can sometimes really help you identify that these feedback loops can actually overpower the other features that you are looking for. So that's something which eventually the monitoring systems will really help you realize a quick front row of everything here. Onwards and upwards. This is like the last section if you're already sleeping and it's just a few minutes to go through here. Onwards and upwards then what do I mean by this? It's easy to get bogged down by the mini-day, by the small points and it's really difficult when you build these systems to look at the application or the purpose for which it will be. And many times having an idea of the bigger picture whether it's a small company or a big company doesn't really matter but you need to have a bigger idea of what the goals are when you start building these machine learning systems. And this is something just quite important when you look at production or actual commercial applications instead of just doing it on a training set. And for that I like this aspect which I call walk by the W's. And by the W's I just mean the W questions. I mean about questions like who will use the model? Why are you solving this problem? And what will happen? So again these are the questions which when provided clearly when understood really by yourself really helps you translate how you build your model. Because things like whether the model is going to be used internally by your company or maybe externally by your consumers that might translate the complexity of the input that you give to your model. Maybe you're solving this problem to optimize a certain aspect of your business. Again that is the reason that you're solving it and then what will happen? Are you planning on changing the output of your manufacturing sector by certain or are you planning on optimizing your delivery route or delivery number? Again all of these help you realize that the features that you select and the assumptions that you make sort of are guided by these W questions which most people as machine learning practitioners often ignore. But it's quite crucial that people end up seeing it. And then this ties into the part of business bottom lines. And by this I mean the evil business department which is not interested in the purity of their engineering team but it's about business bottom lines and the decision to launch. So how do you decide that you want to launch your product? Again I believe that this is really a proxy on how you end up visioning your own product or company for the future. The way you launch or the way you update your product is sort of a proxy for where you see your company is a few years later or maybe even a few months later. For example metric and objective. This is one thing which is quite common. So when I mentioned that you need to start collecting your metrics you need to select most probably one of them as your objective. So when you choose to optimize for a certain metric it might turn out that most probably the other metrics are going to suffer. If you are going to optimize for something like the number of sales maybe the average sale price is going to go down. Again this is an idea of how the company itself looks at its own future and how these model updates of which objectives you optimize for are actually a proxy for your business goal. And again similar to the earlier credit card example false positives and false negatives. Both of these are bad which one is the worst evil right? Do you want to let the one person who conduct their product transaction through or do you want to put a hassle and break on thousands of people who just look like they are false positives. Again these sort of model updates are parameters which end up being a decision which needs to be made together with not just yourself but an idea of how it affects the business bottom line. So is this the end? Is all of this covered? I don't think so right. It's what putting all of it together. So when you start with your data, you go about building it, pre-processing it, you construct your own mental model, how it calls structure, you end up integrating it into your systems and then the idea is that you set yourself up for success in any of these projects by being able to get a workflow in place and then if you miss all the points out, it's in three words which is simplify as much as possible measure your result whether it's initially or during monitoring and then be prepared to iterate. I think iteration is the part where you end up really deploying successful systems. So that was my talk for today. It's machine learning from time to production. If you're interested in the slides, you can scan this queue report it'll take you back to the talk panel page. There are a whole bunch of references that I strongly suggest you check out. Most of them are about successful big companies but of course you can all find it here. Thank you. We have a few minutes for questions and answers. Okay, so the question was I think that then if you have two different domains, do we share the data from one in the other or? Okay, so when it comes to our own systems, we of course have a whole bunch of transfer processes in place where we end up seeing that data which is used in one context can often significantly boost. So as a data scientist, most often you realize these patterns and then it does help to take these learnings and then apply them in a different context. But if you look at the raw data itself, we do not support taking data from one store and then giving it to another store. That's something here. Okay, so looking back at Semantic 3, what did we do? We do deduplication across stores. So if the same product is sold at multiple stores, we treat it as one product. So we make sure that the same product contains the best source of information which is possible. I'm not sure if it's really implement. We don't get that information from the store owners but I would think that better structured data is definitely better. You've seen the more popular stores make sure that their data is much more higher quality. Yes, that actually makes perfect sense. The question is, when you start with a model which is say 90% but then accurate, you see the performance degrading over time. That's something we see quite regularly. I think each model comes with its own half-life value and then the idea is to keep monitoring the result as much as possible. You need to have some other loops of verification that ties into the versioning, the testing and the monitoring part. So you really need to hold out on a data test and make sure that you're able to collect these data tests from time to time through the snapshot or not just across segments but also in a temporal dimension. Yes, so again, those are fairly valid points. I'm quite sure that you have had these problems over the past. Just being able to train a model to a certain threshold currency may be 98. That's not going to be 98 two weeks later. So maybe a trade-off is being able to deploy a good enough model now and then make sure it remains good enough. That's perfect today and one last question. Again, I think this really depends. So the idea is that when do you know that the model is good enough to replace another? Then do you make sure that one of them is good enough that you can roll it out? One against an ensemble. So my idea was that the ensemble is not defeated as one. So when I look at models, again it's a model below a model below a model. The idea is that you think of it as a business decision or maybe you can actually think of it as the business thing approach, right? If you end up being able to segment your user space into maybe statistically significant portion, if you're able to do a hopefully cheap test, not too expensive in terms of what losses it will entail, so that you can test your model against your ensemble, that might be the best way to look at it. How long you'll do it? There's no straight forward answer, I think, about being understanding what data you're looking at, the data you're going to change with your own. Thank you very much. I hope that the office has the most apps.