 Thank you. So We're gonna spend the next 30 minutes to talk about our use case which is experiment tracking hyper parameter tuning And some infrastructure we have to build internally solve some of these problems So first off just want to give you a little overview of what we're gonna be talking about today We're gonna spend the next 30 minutes just to go over high level. What is experiment tracking? what is hyper parameter tuning and Some of the challenges when building machine learning models We'll give some examples of unsupervised supervised machine learning and we'll show you a demo And yeah, let's get into it So when you're building machine learning Systems right you're building maybe machine learning models and you know, there are some challenges there You might use different algorithms to solve your problem You might want to use different hyper parameters to get the best results Sometimes experiments are hard to reproduce so if you deploy something a model into production You want to have kind of metadata of what hyper parameters were used what? All all these details what data set maybe the git hash if it was stored in git So you can kind of know exactly what code causes problem And then you know when you're training models there might be you know latency and performance Like for example when you're doing grid search it might be you know time-consuming to search all these different hyper parameters So Typical, you know software engineering Involves what like you write your code You test your code you deploy your code Maybe you do some integration tests and some CI stuff you monitor your code Pretty simple, but then But then like to be successful at software engineering and then be successful in building machine learning systems They have different goals for example software engineering your goal is to meet a functional specification with machine learning your goal is to optimize the metric for example accuracy of your prediction and Quality quality is measured maybe in traditional software engineering in your code And machine learning quality is measured with the code the data and tuning it And you're gonna be regularly you're gonna need regular updated data to keep on continuously training your model And then the process continues, you know, you're gonna be constantly experimenting with maybe different libraries and different models and Trying to productionize it and we'll dig into more details on What all this stuff means? so What are hyper parameters, right? So when you when you're creating a machine learning model You'll you'll be presented with like different ways to define your model architecture At first you don't know what the optimal architecture is You sometimes like is finding the right hyper parameters kind of like a Black magic, right? Like you don't know exactly how many How many clusters should you have in maybe k-means for example, right? You might have to try different Numbers of clusters to find that the optimal one We'll show you more details on how to do that, but just remember this one thing parameters That which you define in your model architecture are hyper parameters and the process of searching For the ideal model architecture is referred to as hyper parameter tuning Hem is gonna tell us some some examples of some hyper parameters that and give us more deeper details Yeah, um, so typically hyper parameters are numerical in nature So I have just given a few of them that we commonly sort of use in a machine learning model that you're trying to train So one such example is the number of epochs and the number of epochs actually go hand-in-hand with Another hyper parameter, which is the learning rate. So when you have a machine learning model You obviously have a huge amount of data so this is what we call as the training data that you need to feed to your model and With a lot of memory constraints, you cannot feed the entire Training data set at once so you would need to split these up into smaller Sizes which we call as batches So and what you do is you basically feed each batch one at a time to your model for it to train So a complete So when every batch is basically fed into the model at least once is what we say as one Complete epoch, right? So it's basically equivalent to when the model each of the training data set is completely fed into your model is what comprises of one epoch and the learning rates are basically a Small value which ranges typically between zero and one We commonly use this in a deeper neural network Models so these learning rates basically allow you to control the amount of weights that you can assign for your input data So there is really no fixed learning rate. Sometimes a high learning rate can be Really bad for your model because in the end you're trying to reduce the loss of your model as well So you're trying to see where is that optimal learning rate for which you're able to reduce The loss that the model is facing But also you're able to pass more number of epochs because obviously just having one epoch to train your data set is not Enough you need to constantly keep training your data So increasing the epochs, but at the same time reducing the loss function is what we are looking at Which are considered as two important hyperparameters and Then again you have in case of neural networks We know that a neural network has a single input layer and then you have a single output layer But the intermediate layer is called what we call as the hidden layers which comprises of different hidden units So what happens in a neural network is we're trying to create the input signal into a form of the output signal Which is typically like a zero or a one and what the hidden layers are doing inside is how do we sort of fine-tune the signal to get that output that we want again you have a bunch of Mathematical signal functions that you apply which we call as activation functions So these are basically when you add weights to each data points You're summing them up you're feeding it to your hidden layer and inside that hidden layer is where you can sort of play around with different Activation functions that you can apply to each of them so that you can fine-tune your input to the required output that you want So these are all again user defined parameters Which is why we call them as hyperparameters and you sort of do a trial and error each time that you're training your model So now that we know what exactly are hyperparameters, why do we need them right? So we need them because they directly control the behavior of your training algorithm in any machine learning Performance that you're trying to do it has a huge impact on the performance of this model So a good choice of hyperparameters really can make your algorithm stand out so Machine learning lifecycle you're getting raw data sometimes not clean data. You might have to clean your data And then you'd run your data through some training In that training box over there. That's where you'll be doing some maybe some Some feature engineering maybe some hyperparameter tuning and then you'll be deploying your model Maybe it could be a physical file or it could be a function and the process continues as you get new data keep on training Deploying and and also you want to monitor the live system of how well and how accurate the results are when you're running in a production environment and When you're deciding you want to decide either to do unsupervised or supervised machine learning unsupervised means like It's unlabeled data and supervised means that you have some labels to your data We'll talk about some examples of An example of each of these In a net next slide. So for example, there is k-means K-means is an unsupervised machine learning technique The way k-means works is Basically you partition your You partition and data points into k clusters here K is a numeric number Suppose k Basically and so similar data points are grouped into under one cluster say k is three and you have ten data points What k-means does is it it takes the features of each of these ten data points and assigns each point to Either cluster one two or three and the data points which are similar are grouped together in under one cluster And Hem is going to tell us more about that Yeah, so as Zach mentioned here. We're basically Plugging in different values of K. So that's an example of a hyper parameter for this k-means Model because you don't have a fixed number of clusters. So it really depends on how many clusters you want to sort of Group your randomly distributed data points. So this is where ML flow comes into the picture And we'll be talking more about ML flow after this So ML flow is the tool that we sort of think is pretty useful for these hyper parameter Tuning aspects because you can keep track of the different values of K that you are playing around with and how that helped improve the performance of each of your models respectively So now that we know what exactly is k-means and a little bit about the differences between supervised and unsupervised machine learning Where exactly can we apply k-means, right? So Typically k-means is useful when your data is numeric in nature When it's preferably that it should have a small number of dimensions But of course, that's not always the case and then you'll have to do further dimensionality reduction And also it's it's it's good when the data is also sort of having a continuous Nature to it. So that's where we would find k-means to be a suitable use case when you are trying to group things Which are similar together So now that we know What exactly is k-means and sort of where it can be applied? How many of you think you would have used k-means somewhere or have an idea of where k-means might be? Or that you've used something that uses the k-means algorithm. Don't always Okay What what did you use that use k-means So Like, um, yeah, I'm just gonna give us More details on examples of systems or apps or something that you've used that rely on So I'm sure all of us would have used uber or any of the other right sharing applications, right? So you have uber you've lived so all of these do have some kind of k-means algorithm behind it because When you're booking for an uber right you're usually paired up with some driver And if you're booking like an uber pool you have other passengers in the car as well So that's sort of how they use k-means to identify based on your location What are the nearest possible drivers that we can allocate to you to pick up and also they use it in their back-end where they do a More statistical analysis of rather than just using it in real time They actually use it for their purpose of identifying which areas or which locations had the most demand for Ubers at this particular time. What were the number of customers that got in and things like that where they have used? k-means Another use case is like an e-commerce or online Shopping experience again, they they do have a lot of integration of some kind of k-means in them Where they're trying to predict how they do the delivery estimation Allocating nearest truck drivers and things like that is where k-means is used Another one is It's sort of more complicated use case of it But they do when you are do a lot of research about a particular location or area You're trying to figure out if it's safe or not. So you have these third-party Websites which kind of give you a statistics about the crime rates in these localities So they do a lot of training on pre Historical data over the past years and they sort of give you that insights into which areas are probably safe or not And then another use case would be Netflix, right? So when you are streaming on Netflix, they have a bunch of servers So whenever you're requesting for a TV show or a movie that you want to watch They try to see based on your geographic location What are the nearest available? Servers that you can probably that they can assign to you so that you can easily get access to that particular video that you want to watch another Very popular algorithm is K nearest neighbor And K nearest neighbor is a supervised machine learning technique With CNN you have some data points that you already know what class it is and you We use these to figure out the data points that that isn't associated to a particular class Based on proximity and Hem is going to tell us more Yes, so there is just a simple Animation for this to sort of understand what key a KNN is trying to do so you already have Classified Labels assigned for some of the data points So you have like a class a and a class B which is the blue and the orange Color dots respectively. So when you have a new data point, which is sort of trying to identify itself to be classified in either Class a or class B. It already has a set of labels assigned So that's how it's different from an unsupervised where there were no labels at all So here trying to figure out do I belong to class a or do I belong to class B and Here again, the K is another hyper parameter that you sort of tune and you're trying to identify Let's say in this example, it's trying to find the nearest three neighbors So the black dot is basically saying, okay, these are my three nearest neighbors of which I have one Which belongs to class a I have two neighbors who belong to class B and it also does further distance Calculations to sort of figure out who am I more closer to and then it associates itself as belonging to I belong to probably class B so that's an example of KNN and a useful analogy to remember this is birds of a feather flock together So again some Applications where KNN might have been used Typically, it's usually in like object detection or pattern recognition kind of systems So in all of these image processing techniques where you're trying to associate like pixel Belonging to which kind of nearest pixels that it should be a grouped under is where like you might have KNNs You have like video streaming services like YouTube So these are all statistically based where they use some kind of KNN to identify where are my nearest Customers and what kind of playlists or videos do they have and what can we recommend based off of that? We also have like a popular one is like gene sequence matching So when you have a lot of pharmaceutical companies who do a lot of research on this they're trying to identify the gene composition and see if they can Classify them into belonging to a given DNA sequence so that they can do it for further research analysis aspects of it And then you also have the credit card application so these are which credit card or bank financial sectors sort of have a KNN to sort out their customers and potentially reach out to like future customers what kind of Users do belonging to which category of customers that I can further reach out to for further analysis So in all of these examples, you are doing a lot of trial and error methods where you're changing different parameters to identify the best performance of these models and that's exactly where ML flow is sort of coming into the picture and Zach is going to talk a little bit more about that Thank you. So As you saw, you know ML flow is Just is also another Project that does hyper parameter tuning. There's many projects that do hyper parameter tuning I believe cat tip is another project that does hyper parameter tuning ML flow is great. It's an open source project. It's you can deploy your models save your models in different clouds and also It's it has a python library that you can embed into your jupyter notebook and It has many great features But just to kind of give you bird's eye view There are three sub projects the sub project that is of interest to us for experiment tracking is just the tracking portion The tracking portion we use that it's basically a server that runs in a container and that server can basically you you'd use the python library to connect to that server and store metadata and You can also hook up that server to Maybe deploy these models into s3 or other cloud storage systems to Save your models and also track The get the get hash and all these other metadata and you can also store your Model along with the visualizations that you generated through maybe a matplot lib along with it as part of your experiment and And it has some great features like Ability to search for the best hyper parameters. It has a python API I'll show an example of it in a later slide Let's kind of look at Just kind of a bird's eye view of like let's say I want to do hyper parameter tuning I want to do k-means and I want to do three different. I want to try three different hyperparameters for K I'll try k with hyperparameter four five and six and let's see which one returns the best results, right? I like to kind of look at it in a kind of like a high level Each blue box is a container and they're all connected to the same data set and Then you have the ML flow tracking server, which I mentioned this was one of the sub projects that we're running in the container and We're gonna be storing all these Metadata and and we can also store the model if you want as well This is just a code snippet Of what you would need to add into your Jupyter notebook in order to take advantage of ML flow You'd have to import the Python library It's just a simple pip install and then you start your run you log your parameter And then you log your your metrics you can have more than one metric You can have more than one parameter in our case. We're just one parameter or one metric For example, if you were to run this on many many experiments and Say you have like hundreds of experiments It's like kind of cumbersome to like look through click click click click click and search through stuff It has a really nice Python API What this statement over here does is out of all the runs that I've done I want you to search The experiment ID zero and use this filter string. The filter string is right over here metric dot R2 Is less than zero point zero four Six right and only returned to me one result. So if there's multiple with the same criteria, just return to me one Slide This is the ML flow UI. I showed you in a previous slide You're doing everything in Python You could do a lot of stuff with the UI as well the UI lets you click on three different experiments and compare them and In here the fonts are not very big But I'm gonna tell you the cake closer is four here It's seven here and it's five here and the metric that we we've said we're getting back a 96% Score on here we're getting a 90 we're getting a 94 So technically four seems to be the the sweet spot for our hyper parameter here So when we're deploying this we'll make sure our case for right next slide Let's say I want to do this in all on Jupiter if I like Jupiter a lot I'll just import ML flow. I'll add ML flow dot and whatever libraries in so if you use a tensor flow You'd use the tensor flow package under ML flow dot Or if you're using other libraries, they have multiple different libraries that you can use And then you would do the Just track it with the log param log metric next slide and When you're doing things like this you want to build containers reusable containers and OpenShift is a great platform to be able to Do built builds and have a container built for you If I don't care about writing Docker files anymore I can just have OpenShift go and point to my source code and it'll do a source to image and Here in an OpenShift template. I can pass in which parameter The hyper parameter is and it'll go and build and run that that job with the Hyper parameter that I want to try Now when I think about this and I want to say you know what maybe that's not good enough I want to do more complex stuff, right? I want to have a visual view of of doing this so For example, let's say I want to have a workflow like a pipeline or something This is like a gif it's like towards the end of the gif, but here's the beginning of the gif so I have Argo here I've already built my OpenShift Build and I have container already, but I'm passing in in Argo workflow. I'm passing in different parameters So when I'm running my my job and it's tracking my experiments in ML flow it can even track my models in ML flow and my models with ML flow can even store my my models in Ceph S3 which is an open source alternative to the S3 that's available by AWS and if you're interested in Looking at this stuff Think you'd be good to see this stuff. Let me So if you wanted to try this at home later You could follow this gist, but I'll just like explain it so you understand why what we're doing here so in here and This is the ammo file what I'm doing is I'm Saying that these are the parameters that I'm going to be passing into this container Here's the image that I'm building that I just built that takes in environment variables One environment variable is very important to know about is the ML flow tracking URI You have to tell it which ML flow instance you you've got deployed That's going to be tracking your metrics and your hyper parameters and then I pass in I make my script as like a command line thing where I can pass in different flags. So In this case, I'm using the example that's provided by ML flow you pass in two parameters and It'll have different results So and here these are this is the workflow. So for example Joanna just presented earlier on kubeflow and kubeflow pipelines and Argos what kubeflow pipelines is based on Basically, it lets you decide like I want you to try these steps and then When these steps are completed, then I want you to try these particular steps so Just to tie this back and once and then you can have like it do something at the end Okay, so That is Argo and ML flow integration um Okay, yes So we're gonna show you a demo because Flies are nice, but what's better than a demo, right? Let's see the real stuff no recordings Let's let's show them And we're gonna be demoing Quran who sits over here. He works on the the set storage Predictions for failure of drives and He's using hyper parameter tuning to do that with ML flow and we're gonna be demonstrating his his work All the tough questions are gonna go to you now Okay, so maybe we'll get the font a little bit bigger Can everybody see that No Oh Yes more more stop it All right, and I'm pretty sure you can see this Automation no more. Yeah, so this was what Karan was working on as part of his intern project so he was basically trying to do like a Seth hardware drive failure prediction so that was the Algorithm that he was training on and he had a few hyper parameters that he needed to fine-tune before feeding it into his model, so That's the generate hyper parameter Python script that he had where you just specify a bunch of your hyper parameters and Once we execute that it just tells you what were the total number of hyper parameters set and it also Generates a YAML file which basically Helps you to create spawn the job based on the number of hyper parameters that you have set in the code and There is an experiment tracking Repository also which we have where you can deploy the ML flow tracking server as well. So here what the Let me just quickly. So there's a small script here Which basically you're trying to specify where exactly is my hyper parameter YAML file that I've just generated and then you're Feeding into where your training data set resides currently and where the training Python file also exists. So these are the Parameters that you're basically passing through and what it does is it basically spawns a new job for every hyperparameter that you have set So when we run this particular script Please don't fail So it basically starts spawning those jobs and as you can see earlier We saw that it said the number of hyper parameters set were 12. So it should spawn 12 different jobs for you And then we can quickly open them So this is where we have our ML flow server set So these are running Slides are go back to Yeah, yeah, so this is the ML flow server So this is the tracking Ui that we were mentioning to you earlier in our slides it this is where you basically log your parameters So we just Run this and it tells you the timestamp. It gives you the username It gives you what was the source of code ran for this particular model And then you have your parameters being passed. So you have some loss functions L2 values, these are just parameters tuned for his specific model that he was looking into and Then you also log different metric values as well As I mentioned earlier, you know, you could Technically run this and then have your pipe have a Python script that will go and query and Based on your criteria, I'll find you the best parameters and then deploy up the model that has those particular parameters And yeah, you can pretty much just to just search over here you type in a particular metric. Let's just say Metric F2 Greater than what this value here Yeah, actually do it here. Yes, right here 0.98 0.93 Searching found 53 results. They found 18 results 12 results and so on so forth now you have like two results that fit this criteria and then you know You can dig into it more But that's just essentially the idea but like I said, whatever you do with the UI you can have it in a Python script and automate that And yeah, and maybe if I want to maybe further Give you a little bit more of a tour here is I can just say, hey, you know These two look like similar stuff, but I want to kind of compare it side-by-side And take a look at it like that. I can do that Yeah Pretty much that's our Presentation We did show you the demo And uh, yeah, we're just gonna give you a little brief overview of what you learned So you've learned the difference between regular software engineering and Machine learning engineering you've learned about some hyper parameters Um, you've learned about unsupervised machine learning supervised machine learning and some examples of them You learned about how to use mo flow to do experiment tracking You've learned how to do put it put it tie it all together and run it in kubernetes on open shift Thank you And uh, if you're interested in contributing this all this stuff is open source One of the best ways to learn is getting involved in open source communities contributing The first link that you see here is the experiment tracking Files for deploying the stuff with templates on open shift That the second link is the golang operator for mo flow if you're interested in contributing in either one We're glad to have more contributions from the community And yeah, I don't know what else to say. Thank you. Thank you Also, there's just a quick announcement So there's going to be a party def con party tomorrow At 7 p.m. And you can collect your tickets for the party today At 4 30 at the registration in case you can't get it today You can also pick them up tomorrow morning at the registration desk Oh, um, does anybody have any questions If anybody has any questions just come on up front and we'll we'll be happy to answer them