 We are going to talk about a scalable hyperparameter optimization framework. So just have you guys tried some hyperparameter optimization before, or are you aware of what are hyperparameters in general? Okay, great. So then this will be useful for anyone who is trying to train models and to improve the prediction performance. Yeah, so just a very brief intro on what are hyperparameters and what is hyperparameter tuning process. So these are parameters which are external to model and ones which are set even before the training process. So for example, in case of like neural nets, learning rate, momentum, number of epochs, these are the values which you actually set. You may be randomly setting it or you might be using it from the domain experience of what are the right values and keep it and then start the training process. So the hyperparameter tuning process is the process to find these given the model which you have, right? What are the best values of these hyperparameters? So this can be considered as a meta learning task, which is kind of like a learning outside your real training process. So the example when related to the earlier one would be finding what's the optimal batch size or learning rate so that my final objective is to maximize the predictive accuracy. So why is this hard in general? So one obvious reason is it's very difficult to guess. So the manual tuning in general, it's very inefficient and error prone. So say for example, take an example, the learning rate and say your range would be say 0.01 to 0.05 and it's a double value, right? So you have infinite values possible. So similar to that you have say multiple hyperparameters. So your search space that you have to try is kind of exponential. It's not easy to guess where you have to land. Or when you have multiple users, say there is a single job per user. How do you track metrics across jobs? Each of the job can emit its own metrics. Similarly, it can use its own in the resources, right? For example, certain jobs would need GPUs, certain might need combination of GPUs plus CPUs. So how do you manage resources across jobs when you have a multi-user system? And particularly like in an organization when you have multiple frameworks to be supported. For example, a user has to write in TensorFlow, someone else has to write in PyTorps or MXNet. So how do you have a single system that can try to solve all these issues for doing a hyperparameter tuning? So what is the right solution to that? So that's where this presentation comes to the importance where like, we need a scalable machine learning hyperparameter optimization tool. So this is called GATIB, it's in the Kubeflow ecosystem. I would briefly talk about what is Kubeflow in the latest slide. So it's fully open source. You can just visit the link. It's completely Kubernetes native. So you basically get all the advantages that Kubernetes provide to the applications. So it's like scalable, fault tolerant, portable. It can run on any environment where you have Kubernetes. You can scale up the system if you want more parallel jobs to be run. That's possible, it's infinitely scalable. It's framework agnostic. So user can have programs written in their own languages. It can be in any of the framework. They just need to say that this is my program. These are the hyperparameters. These are the ranges that I'm actually looking for. Give me the best hyperparameter values. So there are by default, there are algorithms supported. The random search, grid search, if you see, these are the most widely used hyperparameter or Bayesian optimization, hyperband. So these are by default, it's provided in the system. Also, it's a customizable backend. So if someone wants to try a new algorithm, you can actually do it in runtime. You can add service during runtime and it will work directly. So this is part of Kubeflow ecosystem. I'm not sure whether you have heard of it. It's like an open source framework for machine learning, end-to-end machine learning needs. So the basic idea is to democratize machine learning to make deployments scalable and easily portable across systems. So it can be on-prem, it can be cloud provider. Later, you can even migrate them. So there are many components where you can do local training and then move to cloud and move back based on your environment. So those are possible. So it contains lots of components. Kube is just one of that, which basically provides hyperparameter optimization. TensorFlow jobs, Python jobs are basically components to provide distributed versions of these frameworks. And you can directly use in Kube, where if you want to have a hyperparameter tuning on a distributed job, you can basically use those. So these are like the advanced features of Kube if you want to use it. So very basic system architecture. So from a user point of view, the config that user has to write is very simple. What he has to give is how is my experiment modeled, like in the actual environment. For example, my objective is to maximize accuracy to 90%. These are my hyperparameters which I'm actually looking for and these are the ranges that I have for these hyperparameters and this is my program. These are the config that user submits and that's it. And RESTALL is taken care by the backend. Backend is slightly complex. There are multiple components here. So like very basic idea would be that there are Kubernetes controllers which are running, which basically takes your experiment config, figures out what are the hyperparameter, like there's a suggestion service which is running for each of the algorithm that you present. So for example, you have specified to have a random search. So there's a random search algorithm service which is running, which actually will provide real-time suggestions. And a suggestion is nothing but a set of hyperparameters. So once these hyperparameters are obtained, it creates trials which is execution of these hyperparameters. So for an experiment say I need to have like say for a 10 parallel trials, what it does is you will have 10 executions of different set of hyperparameters running at the same time. Once they are completed, the metrics are collected. Metric is nothing but the final objective metric that you specify. So it will try to see that whether I have matched the actual output, whether I have reached the objective goal of 90% for example. If it is reached, the experiment is done or it will just continue the loop. So this is a very basic idea. We'll just go with the use case here. Yeah, so with the base laid out right. So we want to apply in a regular use case how we can use this scalable hyperparameter framework actually. So for this case actually we chose Indo localization. So I'm not sure like how many of you are familiar with Indo localization. So typically it's like a floor map laid out. With grid in a grid. And then you have a set of like wireless access points are beacons, Bluetooth beacons. It's a certain fixed coordinate. And let's say a user walks in and then he gets any of these beacons fixed signal strength. And then he records the signal and then he need to predict his location from exactly where he's located on this floor map. So that's a high level view of it. How an indoor localization mean? The problem statement of an indoor localization system. So then how is it relevant to Cisco actually? Cisco has a lot of assets actually like network equipment, work spaces, and which needs to be tracked. And also like Cisco has a lot of public events where it needs to track customer engagement, customer tracking, footfalls. So most of these are indoor events. So you need to localize the customer to as granular as possible. So that gives like a targeted marketing and also indoor navigation is also one more application which we find it's easily accessible. So given that the problem statement and the Cisco's business case, so the interrelated data is actually very confidential. So we had to rely on a public Bluetooth low energy data set, which is, we have to provide the link here. You can access this. Typically like this is how the data set looks like actually like. You have a stretch of columns of Bluetooth received signal strength. And then these are the input columns. And then you have location which actually like use your location coordinates of the user or the sensor. So how do you do that? Device. Device, it can be both actually, it can be a sensor or it can be your phone like, how if it's a. Over here both of them use this. Here it's actually like a iPhone 6s was used and then that the Bluetooth sensor in that was actually recording actually. So that's what the data set was about. Yes. And the strength of the navigation is that I want to identify. Where exactly you are looking at, right? Yes, they are decibel values. So essentially 200 means it's like no signal actually. So that's what it is. Yes. And each of them. What is the resultant for me each of these, we can say. For my phone. Yes, that's right. Because this being short talk, I will take the questions at the end of them. That's right. Okay. Given the data set and again like we had laid on a public model on that specific data set and it's reasonably good model. So this model tried to convert the input data, the columns into a pixel value. So which gave it as a nice representation of the grayscale image. So once you have grayscale image actually you can convert it like you can apply convolution neural network. It's a pretty small like three layered CNN and then a training on training and evaluation like use like root mean square loss function and Adam optimizer with default settings actually. So this model actually had a fairly good accuracy as well as a good location prediction at that actually it is in the range of 2.6 meters. Whereas each grid in the map was actually like 10 meters. That's a pretty good. The optimizer hadn't is reasonably good job actually. So now we had actually posed a challenge to ourselves like can we do any better on this using hyperparameter. Because there was no tuning then at all in this model in this. So and not only that like we can we thought we'll stretch it a little bit like okay can we do it in reasonably quick time. So that's when actually like we thought okay we should have scalable framework which can like actually do it in quick time. So then what are the hyperparameters that we want to tune for this model. So we chose like learning rate and beta one of the Adam optimizer. And we wanted to minimize the metric of Euclidean distance or the L2 distance norm. And we thought like okay let's build on like okay the knowledge of like which are the better hyperparameters using Bayesian optimization. So given these actually so then we thought okay let's submit this config or to the experiment which Macaulay was mentioning like to the Cateb system and see how well we can actually like get it. So I'm going to show you a short video how you can actually configure. And so you give an experiment name just like it's a metadata sort of thing. And then you have a range of common parameters which lets you control the overall like the experiment actually like saying that number of trials maximum of trials that is allowed in this experiment and how many parallel trials you can have at any point of time. And then you're going to list if you go to the objective section you can list your metric name which you're trying to optimize for. And then the goal what whether you want to minimize the metric or whether you want to maximize the metric. So in this case it turned out that we set the goal to 1.5 meters because already the benchmark was like 2.6 meters. So we thought let's see if we can reach anywhere between if the goal is reached. So the entire experiment would say that okay it's exceeded and then we reached there actually. Or else like the 15 which we put the max trials as the cap. So that's where it's going to get limited. And you can choose the algorithm according to like which in this case we actually thought okay let's start with Bayesian. So which is reasonably good. And grid and you have options to select grid and random like go in random like it's will be very hard to track them actually. So and you can as well like set the parameter setting for each of these algorithms through the framework. And these are the hyperparameters which we are will be tuning. And it also gives you a like a range which lets you okay I the default is like parameter is like 0.001. I don't want to deviate too much from that because it's a reasonably strong optimizer actually as far as performance is right. So we thought okay let's fiddle it around like in the default range. And the same also with the first momentum like momentum decay of beta one or any of this. So typically 0.9 is a decent number. So we thought let's do it in the range of like within that 0.9 like plus or minus 0.2. So given okay this is just like simple config which is very easy. Now the how do you submit your model okay. So we have certain predefined templates okay. Little bit of like Kubernetes involved basic Kubernetes idea. You can go and read this template in case if you want to go and play around with this. Essentially you are giving your model like it's a program then in this case we actually use a TensorFlow program. So implementation in fact TensorFlow implementation. So yeah, so that is our the model pi Python we implement in Python using TensorFlow. So and then we just submit that image like we and then the model takes the hyper parameters from the framework and then emits out the matrix. So this loop continues you saw that right like the earlier picture actually tried to go through that loop like until it reaches that optimization goal. Yeah, you can do similar like it's because it's framework agnostic you can as well do the same exercise entirely for PyTorch. You can specify again a PyTorch file to give it as input and then you are good to go and you click deploy that's it. We're done and then the entire process will like all parallel trials everything will go on in the backend. Okay, so we go to the we can go on track our jobs in the monitor. So it gives you like that. It gives a nice graphical visualizations of your entire hyper parameter how the trials actually evolved. So you can see that on the left most of your screen like actually that's the objective metric which we can see the range from eight to 2.1 or like something around 2.1. So and then these are the ranges for learning rate which we are configured through the UI right learning rate and beta. So we could see that actually like the range is pretty wide for the L2 norm. So we try to given that we got around to when we try to go to the Yeah, so, so we got a for the HP type tune model. So we got a mean location prediction at about 2.1 meters. So that's a 20% bump. So this entire experiment was done within 15 minutes. So that's a good part of it actually like with sometimes like we think that we might not be it might take hours or like might take days but the framework let allowed us to do that actually. So that's a key take away from this entire exercise and you can actually if you think of a wide range of hyper parameters actually you can still like go and like try it. So like either way. So you can do like a configuration where while you are doing this you can save the model somewhere else. So that's also possible. Or you can do this like even like a pre-step before the actual training process. Both are possible. Okay, with that I hand over to John. Yeah, I think it's almost done. So like basic key takeaways are the manual tuning process is quite difficult. So if you see the previous slide like a random selection of parameters that itself you could see that there is like a forex difference in the performance. So you would think that the parameters are kind of like very similar but like a small change can actually make the performance drastically different. So like you could just try it like it's fully open source. So we are like the upstream contributors. Like if you have any questions like we are happy to answer that. How you can use that in your particular use case or like that's both. And we are four of us who are actually contributing upstream from Cisco. Yeah, please. Thank you.