 Hi, my name is Carl Weinmeister and I'm joined today by Stefano Fierrovanzo. We're going to talk about the data science odyssey from a notebook all the way through to serving a model and everything in between. We're going to cover a complete data science workflow and introduce several products that help along the way. Let's get started. The primary use case we're going to cover is hyperparameter optimization, which is the ability to search across a wide variety of parameters to find the parameters that will create the best model. In general in this session, we're going to learn about simplifying your hyperparameter tuning as well as your serving workflows with intuitive UIs. The benefit is that you will accelerate your time to production with reduced training time and reduced time to build a process. You're going to get to the answers that you want faster and to deployment faster. And finally, we're going to cover collaboration. So reducing friction between the teams that build the model and the teams that deploy the model. First, what's Kubeflow? So Kubeflow is an open source project aimed at making ML deployments simple, portable across on-prem multiple cloud environments. And because it takes advantage of Kubernetes scalable so that you can run distributed training jobs and so that you can serve your models in a way that can handle the wide variety of use that you might have. The use cases we're going to cover today cover quite a variety. The first is that we're going to cover a complex ML system at scale. So we're going to cover more than what they might call a toy data set where it's small, it fits in one virtual machine. We're going to talk about handling a more complex scenario. We're going to talk about rapid experimentation, hyperparameter tuning, seeing how all this can work in both hybrid and multi-cloud workloads. And finally, you might see how these traditional concepts and software development of continuous integration and deployment might apply in the machine learning world. If you think about it, there's a lot of similarities there, concepts like testing the input data, testing the output of your model, making sure there's not a regression so that your model is better than it was before if you want to deploy it. All of these things you can codify into a pipeline that's reproducible and definable within your notebook. So the Kubeflow platform has multiple layers. Let's walk through each of those. First, you can bring whatever framework that you're comfortable with, whatever machine learning library you've learned or want to learn, you can use on Kubeflow. There are a variety of what we call operators that allow you to run your training jobs and serve your models. In this layer, we see the heart of Kubeflow, which is a set of components for capabilities like notebooks, Kale, which is going to help convert the notebook into a pipeline, the pipeline itself, which helps to orchestrate your workflow, hyperparameter tuning, which is called Kato, and several other components. We won't get into all of these right now. You also see a variety of serving components as well as capabilities for logging and monitoring, the whole stack of things that are necessary. And all of this is built on top of Kubernetes so that you're able to run your workload across multiple environments and the way that Kubeflow is designed. It will come naturally if you're comfortable with Kubernetes concepts. All right, let's walk through the workflow we're going to discuss in more detail today. So the first step is in the Jupyter notebook itself, which is available in the Kubeflow environment. In this notebook, you're going to start with identifying the problem and some data exploration and analysis. Next, you choose an algorithm, code your model. The next step is experimentation and model training. We're going to tune the model parameters and then finally we'll be able to serve it and that will take us all the way through the key steps in the process. Let's look at how Kubeflow works. So let's start with the user interface. Here's a picture of the user interface. You see on the left side, each of the key capabilities from notebooks, pipelines, experiments and so forth are available to you. You also have a dashboard that shows up where you see recent assets you've worked with and shortcuts to common things that you might need to do. In addition to the user interface, the command line is another way to interact with Kubeflow. So there are actually two command lines that you might want to use. So first, there's the standard Kubernetes command line that allows you to do things like check the status of a trading job and check the pods that are running, maybe get some logs, things of that nature. You also have the Kubeflow command line utility, which allows you to take templates that are YAML files and apply them to customize your configuration. Finally, there are APIs and SDKs. So if you're going to build a model pipeline, you're going to create a hyperparameter tuning job, these capabilities are available through an SDK. And the final thing I'm going to mention before I turn it over to Stepano is that ML code is one small part of the process. What we're seeing here in this diagram is from a famous paper about hidden technical data and machine learning systems. And what I want to point out is that these are things that you need to do when your code goes into production. It's inevitable. You will need to think about monitoring, logging, testing, managing your resources, making sure that you have a reproducible environment. So you'll have to build and manage that. By working with the infrastructure that we're talking about today, you're able to work on top of that and leverage it so that you could focus on the data science as well as putting together a process and a pipeline that matches the goals. of your business or your project. And you can leverage all the great work that's been happening in this project. So with that, I'm going to turn it over to Stepano. Thank you, Carla, for the great introduction to Kubeflow. Now that you have a general idea of the architecture of Kubeflow and all of the components that it provides, you might wonder how you can actually use it to develop and then continually improve and validate machine learning models using Jupyter Notebooks, Kale, Kubeflow Pipelines and Rock. So this is exactly what we're going to do in this tutorial. But first, let's have a high-level overview over the end-to-end workflow that we're going to implement. So first, the end user, you, will start from a Jupyter Notebook, so a local environment where you can develop your models or your training algorithms, etc. And then once you're done, all you need to do is to use Kale to annotate notebook cells to convert this notebook into a scalable pipeline. And afterwards, you can spin up a hyperparameter tuning job to run hundreds or thousands of parallel pipelines. Once that's done, you can use again Kale to select the best model out of this massive workload and then with a very convenient API, serve this model for everyone to use. Each step of the way is tracked by MLMD, so you can have an end-to-end lineage of the workflow. Also, each step is backed by PVCs and Rock takes a snapshot of these PVCs to have a complete time machine of your workflow. This is our agenda. We'll start from deploying MiniKF, so our local development environment. And then you'll learn how to convert notebooks to pipelines and then how to scale up this workflow with Coutib and hyperparameter tuning. And afterwards, how to serve the best model. So check out this URL, which will redirect you to a code lab where you can follow all of the steps of this tutorial at your own pace. Let's start by deploying MiniKF. MiniKF is our own portable and opinionated Kubeflow distribution. So MiniKF runs seamlessly on GCP or on your laptop or on any on-premises infrastructure. MiniKF is a single node VM, a single node Kubernetes VM that runs Kubeflow alongside Rock's data management platform. It is super easy to deploy and in just 15 minutes you have everything ready to go. Let's see how easy it is to deploy MiniKF on GCP. So I'm in the console of my GCP project and what I need to do to create a new MiniKF is to head over the marketplace. So I'm clicking on marketplace and then search for MiniKF. Here it is. Let's click and launch. All I need to do is to provide a name and that's it. After clicking deploy the deployment procedure will take roughly 15 minutes and you will be able to monitor all of the resources that we are deploying inside the virtual machine. For now I'm just going to use a MiniKF I have already deployed so once you start the deployment procedure you will see this page and once it's complete this URL will be available to you. So I'm copying the password of my newly generated user. I'm clicking on this link and I will be redirected to my Kubeflow homepage. So now that our MiniKF is ready to use let's start with the fun part. That is how to convert a node book into a pipeline. But first let's talk a little bit about Kubeflow Pipelines. Pipelines is one of the most important components in the Kubeflow platform. This is because data science is inherently a pipeline process, right? You always go through the same repeated steps, whether they be data processing, data transformation, model training and then serving and so on and so forth. So in this tutorial we'll try to simplify as much as possible the deployment and creation of this kind of workflows and how to make them completely reproducible. We'll do this with Kail and Rock. Now converting a node book into a pipeline has several benefits. First of all the node book allows us to clearly define what is the structure of the resulting pipeline. And since we have multiple cells we can easily parallelize and isolate them. Also once you have a pipeline you can apply data versioning and even different hardware requirements for running them, like running the training step in GPU and data processing in a CPU. Let's look at how the workflow changes when you use Kail and Rock. So before that what you would need to do to create a Kubeflow Pipeline is to write your machine learning code, maybe you test it locally, and then you would need to write some specific DSL code to construct and define the pipeline, and then build Docker images to package your code and all of your dependencies, and then build and upload the pipeline. Once you run it, if you bump into any bug or if you want to amend your code in any way, you would need to go all the way back to building new Docker images and then to redeploy the pipeline. But now with Kail and Rock this workflow gets dramatically simplified because you just need to develop on your node book, tag your node book cells with a very simple UI that you'll see later, and then run the pipeline with the click of a button without any Docker image building. And you can imagine how this workflow dramatically improves your speed to production and iteration. Let's also talk about data management and how Rock integrates with Kubeflow. So this is a TFX paper from 2017 that talks about high level component overview of a machine learning platform. So we can see here where the TFX components, the machine learning libraries, fit in the overall platform. And in our case this is where Kubeflow comes in to provide these kind of components in a containerized way. Kubeflow also provides the integrated front end to manage, deploy and monitor all of the various applications. All of this is orchestrated by Kubernetes. Now when you write the pipeline or the machine application on top of Kubeflow, you need some storage. So in general, you would write a pipeline that is specific, that interacts specifically with the vendor API based on the kind of storage technology that you're using. So Arikto comes in and provides a general purpose storage layer so that you can write pipelines that are not specific, that don't have to interact with a specific storage API, but can just access super fast local storage. How do we do that? Well, we've extended Kubeflow to be data aware and specifically to use Kubernetes primitives, PVCs, persistent volume claims in all of its applications. We do this by integrating with the container storage interface and sitting on the side of the critical IO path. In this way, you can read and write data super fast from local data, from local volumes. And in the meanwhile, Arikto can take snapshots of your volumes in mutable snapshots and share them across locations via an object store. So you can reproduce your environment or share it with your colleagues. Okay, so now we're ready to put our hands on a notebook and convert it to a pipeline. This is Kubeflow's central dashboard, the place where all of the Kubeflow's components come together and where I can navigate between them. As you can see on my left, I have notebooks, volumes, models, snapshot pipelines, and then experiments and runs where we group together objects coming from different applications. That belong to the same place, like pipeline experiments or hyperparameter tuning experiments. As a data scientist, the first thing I want to do is to create my own development environment. So I'm going to create a new notebook server. And it is just a matter of assigning a name. And in a manner of seconds, I'll have a full JupyterLab environment ready to use. Great, so the first thing I want to do is to clone the Kube repository to pull the example that we're going to run. So git clone, HTTP, git kubeflow, kale, kale. Great, so I get into the kale folder, the examples folder, and then the open-vaccine-caggle-competition folder. So we created this notebook to work on the open-vaccine-caggle challenge as we wanted to tackle the real-world problem. The challenge is about trying to locate the weak spots of a messenger RNA structure to help create a stable vaccine. This notebook contains a typical data science pipeline, starting from data processing to model training and then evaluation. I won't go too much into details of what this notebook is actually doing, because we don't care right now. But you will be able to play with it after the demo. So the first thing I want to do is to verify that they have all my dependencies available. So let's run the imports. Apparently I need to install some libraries, so let's do just that. Notice how I'm installing libraries here on the fly, and you will see later what this means. Okay, so now everything should be running smoothly. Now, since I know my notebook is working, I want to convert it to a pipeline with kale. So I'll head over here on the left on the kale panel and enable it. You can see a bunch of colors and badges pop up in the notebook. This is kale showing you how the notebook has been annotated. You can use kale's annotations tool to actually change these annotations, create new ones, so each cell can be annotated with the name of a corresponding pipeline step and its dependencies. For example here, the processing data step depends on the load data step. All the cells having the same color will eventually belong to the same pipeline step. That's all we need. And if I click the compile and run button, kale analyzes the notebook, validates it, and then starts to take a rock snapshot of the current environment to completely reproduce the environment we are developing on. So since I just installed on the fly some Python libraries, my pipeline will run regardless seamlessly without having to build new Docker images. Thanks to Rock Snapshots. Then I can follow these deep links to see what is going on. If I click here, I'm redirected to the Kubeflow Pipelines run page, where I can see the new run that starts. Since this run will take a few minutes to complete, I will head over to an experiment I run previously and show you the resulting pipeline after a few minutes of computation. As you can see, we have a four step pipeline. Each step corresponds to an annotation in the original notebook, so multiple cells have been packaged inside this step by Kale, which is running in the exact same original environment with all of your dependencies. Also, K creates machine learning metadata execution for each pipeline step. This allows us to track and link other entities that either belong or relate to the step itself. Some examples are the parent KFP run and artifacts that are consumed and produced by the step. We can see them by clicking here and navigating. So as you can see, this is an MLMD execution that was created by Kale. I can click here to go to the specific page and notice, for example, how the run ID, so the KFP run ID, the parent of this step execution is clickable. This is because we are standardizing on using global URIs to reference and link resources across Kubeflow applications and we are extending the Kubeflow UIs to interpret these unique identifiers irrespectively to where the UIs actually live. So clicking here will redirect me to the original run page. Likewise, going back to the execution page, I can scroll down and see that we took a rock snapshot at the end of this step. So we are linking with Kale this snapshot to the step execution and we can even navigate to the rock UI. Now that we have converted the notebook into a single run pipeline, we want to scale this up and optimize our model with hyperparameter tuning. So how do we do that? Well, we could start tinkering manually with the parameters in the notebook and run manually multiple pipelines just like we did now and then compare the metrics and choose the best model. Or we could use Catib to automate this process. So Catib is the official Kubeflow hyperparameter tuner. It supports several machine learning frameworks from TensorFlow, MXNet, PyTorch and others. It is very flexible and we've integrated it with Kale to run it from a notebook. So what we can do is we can go back to the notebook, configure inputs and outputs in a pipeline so we can create a pipeline that accepts input parameters and produces metrics that Catib can use to optimize over the resulting pipelines and then, still from the notebook, select the input hyperparameter tuning space, the search algorithms and the goal. Afterwards, just with the click of a button directed from the notebook, we can create and submit the new Catib job. Let's see how we can do that. We are back to our original notebook. So the previous pipeline completed successfully and now we want to optimize it with hyperparameter tuning. As we said before, we need two things. A pipeline with inputs and outputs. In Kubeflow Pipelines, it's possible to create pipelines that have input parameters so that can vary between pipeline runs and output metrics. With Kale, creating such a pipeline is very easy because you just need to create a notebook cell with some variables assignment and then annotate it with the pipeline parameters annotation. And with this, Kale will make sure that the resulting pipeline will be parameterized with these values. And then, to create a pipeline metric, all I need to do is to select to choose which variable I want to basically print to output from my pipeline. In this case, I'm choosing the validation loss here produced by my training procedure and then on the bottom of the notebook I can just print the validation loss and annotate this cell with the pipeline metrics annotation and this will make sure that the pipeline will output a Kubeflow Pipeline metric. Now, if I want to start the hyperparameter tuning job to spin up hundreds of pipelines all I need to do is to enable this toggle and open up this cut-in dialog that Kale provides. You can notice how Kale already recognized all of the variables that we've tagged with the pipeline parameters annotation. We have already prefilled this dialog before but you could choose between ranges and lists however you want it in all of the input parameters. Also, you can choose between various search algorithms and then the search objective that in this case it's just the validation loss we chose before and we want to minimize it. So just like that we've defined a hyperparameter tuning job and by clicking the compile and run button Kale will basically again validate the notebook, convert it and build it into a Kubeflow Pipeline. ROG is taking now a snapshot of the current environment to reproduce the current state in the pipelines and then Kale also creates and submits a new cut-in experiment where each cut-in trial will correspond to a pipeline run. Also, we can see here a live view over the current state of the cut-in experiment. So as soon as new trials and new rounds pop up and then complete you can monitor here directly from the notebook. Now I can also click in this link and navigate to the cut-in UI. This is a cut-in UI we have built from scratch following the design patterns that you may find already in other Kubeflow applications. We've improved over the existing cut-in UI to show much more detailed information over the status and various details of the experiments. Since this experiment will take a lot of time to run let me show you something I ran before. So I'm going here on the left selecting experiments and then HP tuning to go to the home page of our new cut-in UI. As you can see I have my new experiment running and I can also see something I ran before with 50 trials and at a glance I can also see what was the best metric and the corresponding input configuration. Let's go in and see. So our new UI also exposes the state of the entire experiment with this nice graph where you can see color coded all of the trials that have executed and their parameter configurations. This plot is interactive so that you can also basically explore how the various parameter configurations behaved and how they basically influence the output of the experiment. You can see at a glance what was the best trial configuration what's the current state of the experiment and then a list of all the trials and when you hover on a row you see the specific trial and its configuration in the graph. By scrolling down we can see at a glance also which one was the best trial this one highlighted. Here it is. If I click on this pipeline icon here on the right I will be redirected to the corresponding Kuflo pipeline run. This is because each trial corresponds to a specific pipeline run so we want you to be able to navigate between UIs seamlessly and link together all of the entities. If I click on config here I can I also have some new category related entries and navigate back to the original category experiment. So you always know where you are and how specific objects and entities across Kuflo link together. Let's go back to the pipeline. So this is the pipeline that performed best in my original experiment. You can also see that there are these two icons. This means that these two steps have been cached. In fact when running a cutting experiment not all steps need to be rerun across pipelines. The first step in this pipeline that consumes the input parameters is model training. So we actually don't need to reload and reprocess the same data over and over again. So with Rock and PDCs we are actually just skipping these executions and starting from here from from a process data snapshot. And let's actually go see visually using MLMD. Since Kale is logging input and output artifacts for each step we could go look here at the artifacts at the specific rock artifacts that are associated to this step. So by clicking here I can navigate to the rock snapshot artifact saved into MLMD then to the lineage explorer. And here this nice visualization allows me to understand that this rock snapshot artifact was produced by many many steps and consumed by many others. This clearly means that all of these pipelines are using the same step and so they were being cached. Now that we've run a massive hyper parameter tuning job we want to take the best model and serve it with KF serving. KF serving is Kubeflow's component for serving models into production. This is the on K native and it allows serverless inferencing on Kubernetes. So K serving provides several abstractions on top of various machine learning frameworks and provides quite a few features like canary deployments and scale to zero and much more. So what we want to do is to select the best trial of the previous cutting experiment and restore a notebook out of a snapshot of that of that pipeline. So in this way we'll use rock to restore the notebook from the best trained model and have the model directly into notebook memory. Then we'll use a very convenient KLAPI to serve this model. In general creating inference services is quite a bit tedious as you can as you need to submit new CRs and maybe even build Docker images if you're using preprocessing transformers. You will see how this all becomes much, much easier with KO. Let's see how it's done. I'm back to the previous hyper parameter tuning experiment. So I want to select the best trial and then get the best model out of it and serve it. So to do that I will first navigate to the corresponding pipeline and choose the last step in the pipeline. This is because I want to restore a notebook with the state after the model has been trained. So I'm heading over to visualizations where Kail has produced a bunch of artifacts. So here I can see an artifact corresponding to the first snapshot taken before the step execution and then another artifact corresponding to a snapshot taken after the step execution. Let's take the first one. So I'll open this link which will redirect me to the rock UI and specifically to this snapshot page. I'll copy this link which I can take back to central dashboard. I'll open up notebooks, new server and I can copy here the special URL which will make sure that my notebook is restored from this snapshot. Let's call this actually let's call this serving. Now that my notebook is up when I click on connect something interesting happens. So Kail notices that we are restoring a notebook from a snapshot. So what it will do is it will automatically open up the original notebook and start restoring a marshaling data so that the current in-memory state is exactly the one that I would have found at that specific point in time in the pipeline execution. This means that I will find my model in memory. I will have in my Python memory the best model trained out of the hyperparameter tuning experiment. That's it. Kail has completed the resuming of the notebook. I can create a new cell here and verify that model actually is here. I haven't done anything. I just created a new notebook out of a snapshot and here it is my model in memory. Okay. So now I have my best model here in the notebook and I want to serve it. So Kail provides a very simple and convenient API to serve the model. Let's see. So I can import Kail.common.servutils import serve. And now I want to use this function to serve my model and it's just as easy as saying serve model. But then I also want to pass some pre-processing function. Since KF serving supports both predictors and transformers, we have a very handy function here in the notebook which pre-processes features for us. So we can then pass just raw data to the corresponding model server. Let me go here and redefine the function and the tokenizer that this function uses. So I'm back down to the bottom of the notebook and I want to tell my API that the corresponding model server will have also to create a transformer which will need to execute these process features function. And I will need to pass as well my tokenizer. So that Kail knows that it needs to package this object alongside this function. So now that I'm running this, we can see that Kail recognizes the type of the model. It dumps it. It saves it in a specific format. And after this, it will start taking a raw snapshot and afterwards creating a new inference service. And now that the inference service is ready, we can print this object to see where it is served. And by clicking here, we can navigate to our new models UI. So we've built a brand new UI to expose the entire state of Kail serving to monitor and expose all of the inference services that you may deploy. As you can see here, I have an overview over where this model is exposed. The fact that I deployed both a predictor and a transformer and the entire state. I can have more details here about the various resources, the fact that we are using a PVC where it is mounted. And then we can also have a look at the various metrics exposed by the specific pods that are running, one for the predictor and one for the transformer. These are live updating metrics that come from the underlying K-native resources. We can also see logs. These are live logs coming from the predictor and the transformer. And we'll see how these update once we send a prediction and also the YAML definition. And to have an overview of all the models, we have a nice homepage where we can have a list. We can see a list with a summary of all our running model servers. Okay, so let's go back to the notebook and actually send, try to hit this model and get a prediction. So I would want to create a data structure to send to my model server. For this, I'm creating a dictionary with an instance key, which is a standard way to send data to a TensorFlow server. And I'm going to pass our unprocessed X public test data. This is data that we've defined before in the notebook and that was actually restored automatically. By KO. So this is our data, something that we will want to send. And then to send a request to our model server, it's just as easy as saying kfserving.predict data. Actually, it was kfserver. Okay, so sending a request and let's head here to the logs page and you can see live that our transformer is actually getting some input data unprocessed data and then Kale is doing some magic and then the data is processed. All this happened without having to build any kind of Docker image. So Kale was able to detect, parse and package the input processing function and its related assets, create a new transformer, initialize it here and then use it to actually process the raw data to be passed to the predictor. And here I can print my predictions. It should be quite a big response. So let's give JupyterLab a bit of time to process it. Here it is. That was the last part of our tutorial. So now it's time to summarize what we've learned today. So you've learned, you have learned how to install and deploy mini kf in a super easy way and in no time. And then how to use a notebook to build some machine learning model using Kale unnotated and deployed and converted to a scalable pipeline. Afterwards, how to use scale to even scale more this workload using hyperparameter tuning and run tens or hundreds of pipelines using caching. Afterwards, you've learned how using rock and the snapshots that rock takes to reproduce every step of the way. You can restore a notebook from a pipeline to its previous state. So in this case, we took the best model trained out of the hyperparameter tuning job to find the model ready to use in the notebooks memory. Afterwards, we used a convenient to use Kale API to serve this model directed from the notebook to make it production ready. All of this workflow was backed by an LMD so that you could have a complete lineage of your work. So we've also seen some cool new UIs like the catib UI and the models UI. These UIs are not open source yet, but will work in the next weeks and months to make sure that they will find their way into upstream Kubeflow. Also note that this is a pre-recording and we are still heavily working on this workflow. So by the time you see this video, some of the features might change or might see improvements. This is just a small sample of community contributions that we've done at Ricto. We are investing a lot of time and effort into making Kubeflow a great platform for machine learning. So from Jupyter Manager to volume support, MiniKF and authorization manifest installations across the board. Now Kubeflow is a big and vibrant community backed by many both large and small companies. So if you feel like you want to join as a developer or as an end user, here is a list of channels and public places that you can join and talk with us. Now, all the new things that you've seen today are the product of a joint effort of our internal team at Ricto. So I want to really thank all of my colleagues who have contributed tons and tons of work to this. So from Ilias, Dimitris, Kimonas, Apostolos, Konstantinos and Chris, really thank you. And thanks to you for sticking with us until the end of this tutorial. So now we are ready to answer all of your questions.