 Hi, I'd like to thank everyone for joining us today. Welcome to today's CNCF webinar from notebook to cube flow pipelines with MiniKF and Kail, everybody's favorite green, right? And my name's Ariel Jitib, I'm a business development manager for cloud-native technologies at NetApp and also CNCF and BASM. I'll be moderating today's webinar and we'd like to welcome our presenters, director in Greece, they have the Acropolis back there, if you can, if you can see. And we'd like to welcome Vanjelis Kugis, CTO and founder of ARITO and Stefano Fioralanzo, software engineer also at ARITO. Couple of housekeeping items before we get started during the webinar. You're not going to be able to talk as an attendee. There's a Q&A box right at the bottom of your screen. Feel free to drop your questions in there and we'll get to them as we move along or at the end. This is also as a reminder is an official CNCF webinar and as such is subject to the CNCF code of conduct. Please do not add anything to chat or questions that would be in violation of that code. Basically, just be respectful of all of your fellow participants and presenters. Please also note that the recording and slides will be posted later today at the CNCF webinar page. And with that, I'll hand it over to Vanjelis and Stefano to kick off today's presentation. Thank you very much, Ariel. It's a great pleasure to be here today, Stefano, Vanjelis, to talk to you about cave and Kubeflow and how you can go from your notebooks, where you do your ML work every day to reproducible, immutable Kubeflow pipelines, which you can then have another train of essentially. So let's get started. Let's make sure I can switch slides. What's the problem that Kubeflow tries to solve? Standing up your own ML infrastructure is hard. And doing it in production is even harder. And then there's lots of talk and there's a growing need of having your ML infrastructure split across multiple places. So having a hybrid or multi-cloud ML infrastructure. And even if you're running in your laptop and then moving to the cloud, say for training or moving to multiple locations for serving, this is a multi-cloud infrastructure. Your laptop counts as one more location for your multi-cloud deployment. So how do you do that? People tend to think that, oh, doing ML is about writing code. If I write my code, then everything is going to be OK. Well, reality is that doing ML requires lots and lots of DevOps, lots of time to stand up the infrastructure, lots of people, lots of different technologies. You need to configure infrastructure, collect your data, verify your data, manage your resources, cloud resources, extract teachers, manage your processes, develop your models. This is what most people consider ML. Serve, monitor, and then close the feedback loop and then do it all over again. So Kubeflow is there to help. How? Kubeflow containerizes ML components so you can run them end-to-end on Kubernetes. So you can essentially leverage what Kubernetes does for you, essentially a uniform way to run your workloads everywhere. So Kubeflow allows you to experiment with state-of-the-art AI technologies end-to-end on Kubernetes. It's easy to get on board, easy to get your notebook up and running and start working. And it also has outstanding community and industry support. So speaking of Kubeflow's community, we're very proud as a Rictor to be part of this vibrant community of more than 30 companies and individuals who contribute patches every day. A sample of these contributions is shown on this slide, which, by the way, is from a presentation that the PM group of Kubeflow did to the community. In this presentation, we will mostly be focusing on our Rictor's contributions to Kubeflow and we'll have this presentation and live demo on Minikubeflow, Minikaf. So what is Minikaf? Minikubeflow is a packaging of Kubeflow so it runs as an all-in-one deployment, a single node on your laptop or on the cloud, on GCP. So you can, with a single click, have your own Kubeflow get started very easily. Within 15 minutes, you'll see. And then you can start running your experiments and you can experiment with Kubeflow and then you can move to a bigger, better, scalable, cluster-based Kubeflow deployment. So Minikubeflow is the Kubeflow. What Minikube is to a scalable Kubernetes deployment, a very easy, all-in-one way to get started. So with this, we'll be having a live demonstration interleaved with the presentation. So what I'm going to do now is essentially start an instance of Minikubeflow on GCP and we can let it configure itself and we'll continue with the presentation and be back and connect to it and have our live demo. So let me switch to this desktop, go to the GCP marketplace, explore the marketplace, look for Minikf, this is Minikf, launch it in this project. I'm just going to go with the default options, maybe change the zone so it's a bit closer and the nice thing you can see is that I can also choose to equip my VM instance with GPUs. So it's very easy to spin up your own Kubeflow with support for GPUs and then you can train much faster on these GPUs. I won't be using GPUs for this demo, so I'll just deploy it. So what happens is GCP will allocate a new instance for Minikubeflow. It will deploy, it will start running our initialization scripts and we'll be able to watch the progress of deploying Kubeflow on this instance by logging into the instance and seeing our Minikubeflow script progressing. So we see GCP provisioning the instance, firewall settings, password we'll then use to log into our Kubeflow, so let me copy it. The instance will be up and running in a few seconds, so let's give it a little more time and if it doesn't come up, immediately I'll switch back the presentation. So the instance is up, let me SSH to it, okay, so let me run the Minikf command and actually see Minikubeflow being deployed. So at this point, it all happens automatically, all I did was choose some instance parameters, clicked on deploy, this is it. So I'll switch back to the presentation and we'll allow Minikubeflow to continue deployment. So exactly what is Minikubeflow? Minikubeflow is Kubeflow on GCP or on your desktop or laptop using Vagrant on-prem in just a few minutes, as you just saw. It's an all-in-one single node distribution of Kubeflow. So it's there for you to very easily start experimenting with Kubeflow. It's super easy to spin up your own infrastructure. It combines Minikube as its Kubernetes substrate, Kubeflow, and our rock data management platform for its storage layer. So what is new in the latest Minikubeflow that we're now deploying? It's based on Kubeflow 0.7.1. Very soon we'll have Kubeflow 1.0 and we'll produce Minikubeflow based on Kubeflow 1.0. So we're really looking forward to you trying it out and trying out Kubeflow 1.0 via Minikubeflow. It supports GPUs as we just saw. It allows you to near instantly restore snapshots of your notebooks and use them with pipelines because it works with rock, more on this later on during our demo. It has quite improved snapshots times for this and also allows you to snapshot your pipeline steps. And why do Minikubeflow? Why are we interested in Minikubeflow? Because we've seen most data scientists start their experimentation on their own laptop and there is no easy single-click way to deploy Kubeflow on-prem. Kubeflow is a big project, lots of moving parts, lots of components. It takes quite some expertise to deploy it properly. So we wanted to make Kubeflow deployment that simple, democratize access to its features, its components. We wanted to have data scientists use for their exploration on their laptop exactly the same interfaces that they would then be able to use when scaling their project. So you use Minikubeflow on your laptop, you have the same interfaces, you write the same channels, you provide the same objects to Kubernetes and to Kubeflow, you use the same CRDs, the same resources, and then you take all of these things, move them to another Kubeflow deployment, and you can use the same APIs. Let's have a look at what Minikubeflow is doing. So it's still downloading Docker images, progressing nicely, nothing else to do. Let's go back to the presentation. Why is it important to have a local instance of Kubeflow? A single unified user experience no matter where you are, same Kubernetes APIs and same components. You can start your notebooks, you can spin up your own Kubeflow pipelines, you can have cat-tip, the Kubeflow component for hyperparameter tuning, you can use Kubeflow, the component we are contributing to Kubeflow, to move from notebooks to pipelines automatically. This is what we'll be demoing today. And it's interesting because Minikubeflow has seen quite a lot of adoption in the almost a year that it's been alive in the community. We're now at over 8,000 downloads and we're really looking forward to having this number go way up after Kubeflow 1.0 comes out. Now, what exactly is the process of doing data science with Kubeflow? ML processes are pipeline processes. There's lots of steps and each step gets the output of the previous step and provides something for the next step. So we go from ingestion to analysis to transformation. We train our model, we validate our model, we train at scale. Eventually, we roll out and serve our model, we monitor it and we need to be able to have a trail of what we did. So we like pipelines because they represent exactly how ML happens, but it's also great if we can start from the end result, a model that works in production and go back in time and see exactly how we came to have this model. So we can fix biases, fix bugs, train it better, do things like that. So this webinar focuses on two things, on two aspects. One is what's the easiest way to go from a Jupyter notebook to a Kubeflow pipeline without having to write the pipeline from scratch and without having to deal with the command line at all. And then how can I make the pipeline reproducible? That is how can I know exactly how each step run, what its input and its output was so I can go back in time, explore the step and reproduce my results, which is super important for ML because if you change the tiniest thing then the result may be way different. So we're going to be talking about two components. One is Kail, our component that we are contributing to Kubeflow and Rock, our data management layer. How is Mini Kubeflow doing? It's provisioning the Kubernetes cluster, moving right along. So why go from a notebook to a pipeline? Because people like notebooks, they're nice, they are interactive, they can have their steps as cells in the notebook. They can clearly define their processes and experiment with them inside the notebook. They can run them one by one, find bugs, iterate. Once they're done, would it be possible to just click on a button and have a pipeline, an immutable pipeline? Yes, that's what we're doing. And then can we actually parallelize some of the parallel steps? Can we do hyperparameter tuning based on a few variables in the notebook? This is actually the focus of our workshop we'll be doing at the upcoming Kubecon conference in Amsterdam. So we'll be very happy if you can join us there. We can have versioning of the data of the notebook that essentially seeds the pipeline. This is also an important aspect of running a reproducible pipeline. I need to know the data I started with. And it's great if I can have this data accessible as just another mounted local file system under slash data. No need to go to an external object storage provider, for example. And then it's great if I can experiment with my notebook in my laptop, but then run the pipeline in another Kubeflow deployment, maybe using GPUs. So what is the workflow we're going to be showing to you today? Before K.O., you'd have to write your ML code somewhere, a Python script, a notebook. You'd have to manually convert it to Docker images, write your own Docker files, assemble your Python scripts in Docker files, try them out. Then you'd have to write code in the domain-specific language of Kubeflow pipelines or a similar pipeline component that you may be using. And this is quite complicated work. Then you'd have to compile your Kubeflow pipelines domain-specific language into something that will be submitable to Kubeflow, in this case, an Argo specification. Upload it to Kubeflow pipelines, run the pipeline, the things work, the things not work. Okay, I have to go back, amend my work, start all over again. So this is quite complicated. And it has quite a lot of technical steps, lots of command-line, lots of tinkering with Docker files and make files to have the Docker images work. The workload we are making possible with K.O. is write your ML code in a Jupyter notebook, tag yourselves using the notebook interface. K.O. comes with a Jupyter extension that allows you to tag yourselves with dependency information, essentially. So you can explain what cells may run in parallel, what cells will have to run serially, what step depends on what other step. And once you have it ready, and you've experimented with your code in the notebook, click on a button, have K.O. compile, convert this into a pipeline, submit the pipeline, link to the run, show you the run happening, you click on a link, you're shown the run, you can do it all over again. So to amend your code, edit the notebook, click on a button. This is it. The edit compile run cycle is edit the notebook, try out my new cells, click on a button, see the pipeline. This is what our demo is going to be. And Minicoogflow is almost there. So it's provisioning a few resources and let's give it some more time. So what we're describing essentially boils down to continuous integration, continuous delivery for machine learning, starting from notebooks. So we allow the data scientists to develop their models in Jupyter to convert their notebooks to pipelines using K.O. automatically to run their pipelines with Cook for Pipelines. So start from Jupyter development, experimentation, iteration, create a pipeline, convert to a pipeline with K.O., an immutable pipeline, run it in a way that's reproducible with pipelines, store data on rock. So when something happens or when you need to go back in time or when you need to reproduce your results, explore input and output data of individual steps again in notebooks using rock. So this closes the feedback loop and this is the notebook to pipeline a critical user journey that we have contributed to Cookflow as an ecosystem supported CVJ. Cookflow is starting. We'll give it a bit more time. So let's give mini Cookflow a bit more time to start. We'll spin up a notebook when it has started. Let's continue with Stefan explaining more about K.O. Stefan is the creator of the K.O. project. We're very happy to have him working with us in Erecto now and we'll switch back to mini Cookflow and continue with that demonstration. Let's do that. So I guess I can switch here so that we have an overview of what K.O. is. By the way, thank you, Mangelis, for this introduction and since Mangelis mentioned K.O. several times, let's try to understand what K.O. actually is. And as a component, it is actually composed of two different things, a UI and a back end. The UI is basically a Jupyter Lab extension. Jupyter Lab supports extensions out of the box and K.O. is an official Jupyter Lab extension that provides several easy to use UI artifacts that can be used to annotate the notebook very easily and we'll see later how. And then a Python package that interacts with this UI extension to properly parse package and then convert the notebook into a pipeline. Again, all of this happens seamlessly and effortlessly without the need to use any kind of external SDK, CLI command, no need for additional knowledge about Kubeflow SDKs. It's just about annotating a notebook with visual artifacts and components. So I guess that by now we should have up and running. Yes, it's almost there. So K.O. is an open source project on GitHub and okay, we're there. We're done just in time. Yep, it's up and running. That's our credentials. We didn't really do anything. Wait for a few minutes. So let me get the password. Go to the URL provided. I'll be redirected to Kubeflow's login screen. Let's give it some more time. While we wait, let me chime in with a question that we have from Babu and the question is does K.O. require rock to work? Good question. So K.O. does not require rock. It is essentially a component that can leverage any kind of data management. So the feature that it provides are featured at work out of the box. But then in this specific workshop, this specific example, we are integrating K.O. with our Victo's data management platform. So some of the features that you will see here are not part of the open source K.O. project, but by itself can be integrated with anything else. Great, thank you. Babu, feel free to follow up in Q&A if you have a follow up. Let me know. We have another one. Camille Rodriguez asked, is K.O. deployed as a Kubeflow component or does it have to be deployed on the side? That's the interesting thing. It is very easy to deploy K.O. because it essentially is just part of a notebook server image. So what you just need is a notebook server with the K.O. Python package installed and the JupyterLab extension. And that's it. So it's very portable, easy to use, easy to share, easy to build. So you're saying K.O. doesn't really have a server side component. It's Python code that leads inside the same notebook server image. It works as a server component from the K perspective, but it is not an additional service to your service in a short cluster. So Mini Kubeflow is up and running. It took some time for presumably the ingress components to come up. What's our demo going to be? We'll start a notebook server, install new libraries on the fly, so we prove that the data scientist can essentially work independently without having to create new notebook server images. That's just install whatever new Python library they need. Enable K.O., pack yourselves, run a pipeline, snapshot your notebook, run a pipeline, and then actually go back and reproduce pipeline state with notebooks and explore what happened. So this is the Kubeflow login screen. I've already pasted the password, logged in as a user. This is the Kubeflow central dashboard. I can see pipelines. I can see notebook servers. This is a list of my notebooks. I don't have any notebooks running right now. Cat-tip for hyperparameter tuning. The snapshot store, which is ROG. Let me also log into ROG, same password. No snapshots. So going to notebook servers, and this will answer the previous question about how we use K.O., creating a new notebook server. Let's name it, I don't know, webinar two, one, whatever. Let's do webinar one. So I'll be using this specific image, and this image actually has K.O. installed as part of it, and this is all you need to be able to use K.O. with your notebooks. You don't need a server-side component, and you need to deploy something alongside the Kubeflow. Okay, so what I'm going to do as well is I'm going to add a data volume. I'm going to call it data. It's going to be mounted under home slash Jovian slash data, and this is where we'll be storing our data so we can spin up our pipeline. So launching this notebook. In the back end, this becomes, and by the way, this notebook manager UI is something that we as a Rictor have contributed to Kubeflow 0.5, I think it was. In the back end, we're initializing a pod to host our notebook server. The pod is up and running, and we should be able to connect to it. So what we did is we created a mini Kubeflow instance on GCP. We have spined up our own notebook server. We have connected to it, and we're now ready to use it. And this is great. We didn't have to use any command. It was all CVS UI. Easy. Click. Yeah. Go ahead. So first of all, let's download our example from, we have a list of created examples. Does it make sense to increase the font size a bit? Oh, yeah, sure. Let's also do this. Okay. I hope it's enough. Otherwise, just tell us. Let's pick up, yeah, if it's not visible. So I will move inside our data volume, as you can see, the home Jovian data, and start to clone this repository. Kubeflow Kube is the GitHub organization where we keep all the K code. So now I have an examples folder, and these are curated examples to showcase how K works. In this webinar, we are going to use the Titanic example. That is an example that we curated, okay, I think I can just write this. Okay, sometimes JupyterLab complains where files have written in any case. It's a JupyterLab issue. So as I was saying, we built this example around a Kaggle challenge that offered a dataset composed of data of the Titanic shipwreck passengers. So the dataset is composed of some features related to the specific passengers. And then the label, the prediction label is whether they survived or not. We are not going into the specific details of what this notebook does. It's just some data processing, data validation, visualization, and feature engineering. And then in the end, we have, let me scroll down, we have the machine learning part where a bunch of machine learning models run over this data. They are just simple and dummy models just to showcase our workflow essentially. And then in the end, we collect all the results and compare them. So I can try to run my notebook and I see that I am missing some dependencies. And this is a very usual workflow. You would expect to write code and have to install new library. So let's do that. Let's beep install, install, see born. And I'm using the dash dash user flag. And this will become clearer why very, very soon. Okay, now that I have seen one, I can restart my kernel and everything should be up and running. Yeah, you can just run some cells. I see that I have the data and everything works. I will go through all the notebook because this is not the purpose of the example. But then so we can just go on to the second part of the tutorial that is enabling Kail and annotated the notebook. So now I've done the data scientist part. I have my notebook with my experimentation, my machine learning algorithm that more or less might work or might not. I don't know. And I want to test them in a pipeline. I want to convert the notebook to an immutable pipeline. So how do I do that? I switch over here on the left where there is this nice cube flow icon. We click on it with enable Kail and a bunch of stuff is shown on the screen. So all these visual color references, these badges are shown by Kail. Because it introspects the metadata of the notebook that we had already annotated. So each cell is annotated with a specific pipeline step. For example, here, this code, I want it to become a step of the pipeline that is called load data. So clicking on the upper right corner of the cell, I can actually bring up a dialogue that lets me change this metadata. It lets me create new pipeline steps, skip some of the cells because I might not care to have the resulting pipeline. And, for example, have steps that depend on others. Like here, the data processing step, of course, depends on the load data. And this is true for, for example, here you can see that multiple cells are colored with the same yellow data processing color. This is because multiple cells can become part of the same pipeline step. So Kail does the work of merging together multiple cells into one single pipeline component. Okay. And on the left here, you can see I can choose experiments. That are actually the experiments that you have available on cube flow pipelines. I can have pipeline name, description, and several settings. Okay. So once I've done that, that is a very easy to do thing, right? You don't need expert knowledge to annotate a notebook or to write a couple of settings. Once I do that, transforming this notebook into a reproducible and immutable pipeline becomes just a click of a button. So let's do that. So now that I click compile and run, Kail is asking Rock to take a snapshot of the current notebook servers volume, the workspace volume, and the data volume. In this way, these volumes are fed into the pipeline steps. And your environment can be it becomes completely reproducible because every step runs on the same exact environment that you were running your code on. I've previously, I had installed the Seaborn library with the dash, dash user flag. This is because I wanted that Python library to live on my workspace volume that is under user Jovian. So whatever is in their libraries, new scripts that you brought yourself that you are importing in your notebook, everything lives in that workspace volume and will be present in the pipeline as well. So my pipeline steps could actually have access to the library that you installed in the notebook. Exactly. And any custom script or additional file as well. So after taking the snapshots, Kail takes the notebook, converts it to a pipeline, and it uploads the pipeline to Kubeflow Pipelines. Here it is. Maybe. Okay. So now we are, we are using- Let me zoom out a bit. Okay. Here, zoom. Okay. So you can see here the pipeline, the load data step that I showed you before, data processing, several other steps, and all the machine learning models that are actually parallelized because their only dependency was on the previous feature engineering model step. So that they can run in parallel, something that you couldn't do on a single notebook where you would have to run things sequentially. And here is the pipeline actually running. Okay. But while it finishes running, let's talk a bit more about Kail itself. So Kail is composed of several independent modules that perform specific actions. When you click that blue button, essentially Kail takes as input the notebook and parses it to the metadata information inside the notebook, creating an internal graph representation of what the pipeline will eventually be. Then a series of static analysis and coding inspection steps happen in order for Kail to understand what are the data dependencies between the steps of the pipeline. Saying if I had a variable A defined in the data loading notebook cell, then I would want this variable to be available again in the data processing cell, just what happens normally in a notebook. Well, Kail takes care for you to detect this dependency and then serialize, save, and deserialize, load these variables from step to step in a seamless way. In the end, once Kail has this graph representation with all the data dependencies, it is able to generate a self-contained executable Python script that is written using the Kubeflow SDK to define the pipeline. And that script is the source of truth of the pipeline that is executed to actually upload it and execute it. Okay, so let's go back here. Everything has run. I can go into specific steps. Oh, yeah. Let's let's change the size. What was it? Yeah, let's do this. In logs, okay. This step was not printing anything. Let's see if I can find... I can see things that were printed on my notebook. I will find them in the logs here. And I can go all the way down to Results and see the resulting results. Of the notebook, of the models. A hundred percent score. Yes, this is definitely wrong. But trust, this is part of the demonstration, of course. So as a data scientist, I will see that this result is strange. It doesn't sound quite right because it can't be that good that everything is performing excellently. So this is where the data layer, the data management platform comes in because it will allow us to go back in time and serve the reproducible snapshot that we're taking during the pipeline execution and restore a notebook at a precise point in time to debug what happened. But I'll let first Vangelis walk you through the theory of what we will show you later. So this all comes down to data management. And this is the main reason why we as a Recto got involved in Kubeflow. Our very first goal was to extend Kubeflow. So it uses what Kubernetes calls persistent volumes, persistent volume claims in a vendor agnostic way. So we first introduced a spawner for notebooks based on Jupyter Hub that had support for persistent volumes. Then we had a native, Kubernetes native notebook manager with support for persistent volumes. We included support for persistent volumes in the pipelines domain specific language. We have a new volumes manager coming with many Kubeflow and Kubeflow 1.0 next week. So why are we doing all this? This is a page from the TFX paper that we'll have done in 2017. So all of these workflow steps, they're part of TFX libraries and Kubeflow gives you containerized versions of these libraries and it gives you a nice way to run hyper parameter tuning, cutting. So all of this is Kubeflow. And then you need an integrated front end to manage your jobs and start submitting things. So this is Kubeflow again. This is the notebook manager that we contributed. And then you need a shared framework for job orchestration, which is that in the case of Kubeflow is Kubernetes because everything runs as a pod. And then you need some way to have pipeline storage. And you can do it with open source tools. You can use any sort of object storage as your provider may be offering. You have to write your pipelines in a specific way to do that. Or you can use what many Kubeflow gives you, rock. So this is what we do. We provide storage for pods, for your pipelines to use. This is the context of our involvement in Kubeflow. And how have we extended Kubeflow? We've made Kubeflow data aware. This means Kubeflow uses PVCs. No matter where you are, at the experimentation stage, you're on your laptop. At the training stage, you're on Google Cloud. At the production stage, you could be in 100 places where you're actually serving your model and doing inference. So all of these places run Kubernetes plus Kubeflow. Kubernetes integrates with external storage, external storage via an interface called container storage interface CSI. We have implemented this interface. We sit on the side of your storage. Can take your snapshots, can take your work and seamlessly move it to other locations. So this allows you to eventually run a hybrid cloud, multi-cloud pipeline. So why is it important for running your pipelines? This is your data. This is the data volume that I created initially when first creating the notebook server. This is the data that the validation step of the pipeline, for example, works on. And when the validation step is done, we snapshot this data. We take a snapshot. This is the validated data. And then another step of the pipeline runs. On the same data, we take another snapshot. And another step of the pipeline runs and it fails, but we need to explore it more. So what do we do? We clone the latest snapshot we have into a new volume. We spin up a notebook and we can explore it. That's what we'll be doing later. Or we fix our things, restart the failed step again, and the pipeline continues. So let's do exactly that. Let's choose a snapshot of a step that we believe is suspect for a bug or a failure and explore it with a notebook. So let's do that. So let's go. Since I want to debug what happened to the machine learning model, let's take a step of one of the machine learning models. Let's say random forest. So KO did the work for us to produce this nice work done artifact that actually points to a URI to a URL that will show us the snapshot that Druck took at that specific point in time. I could see here all the snapshots that were taken during the pipeline execution. So each one of these steps corresponds to a specific pipeline step. So again, I'm going to copy the URI of this specific snapshot. And then I am going to create a new notebook server from that snapshot. And this can be done from here. I just paste the Rock URI autofill with all the information that are preserved and say I'm calling this new notebook debug. Let's roll. So now Kubeflow is provisioning a new notebook starting from the actual snapshot of the random forest pipeline step. And KO will know this. And as soon as the notebook server starts, KO will detect that it is basically restoring a notebook from a snapshot and to restore the entire Python context at that specific point in time. So all the variables, the Python imports, they will all be there ready to use. So we are connecting now to an exploration node. We want to explore one of the actual... I'm not doing anything now. As you can see, the notebook loads up KO does this thing and it scrolls down directly to the cell that corresponds to the pipeline step, loading the entire Python context. And actually I can directly run this cell. And okay, let me... Let me print so that we can actually see that it's running. See? Okay, I have everything running. Okay, so if I want to debug, I think I will do it fairly quickly because maybe we're running out of time. So what I'm going to do is analyze our training dataset and after some exploration, maybe some hours of cries and desperation, I would notice that together with my training features, I forgot to remove my training label that is the survive label. So all the models learn to map one to one, the label that is present in the feature set with the prediction label. So what I need to do is to just remove remove that label. Yeah, drop. Let's remove that label from the training dataset in place. True. And this. All right. So now that we have discovered the bug and everything should be running again, I can actually re-enable KO. Everything is there as it was before with the new addition, not just compiler run, and we'll see later on that the pipeline... Same thing. ...has executed correctly. So you implemented an edit compile and rerun the pipeline cycle. Exactly. I just have to change the notebook cell and that's it. I just have to click a button and the pipeline is there. Great. So let's wrap up while the pipeline is actually running. Without KO and ROG, you'd have to do lots of manual things. You'd have to be familiar with code control, compose your own YAML files, upload them, monitor your resources, create your pvcs if you're running on-prem or on your cloud, mount them, fill them up with data, start the pipeline manually, lots of things. KO just automates all of these things. You're in your notebook, click on a button, snapshot things, compile the pipelines, run. This is it. This is what we call the notebook to pipeline critical user journey, develop, convert to pipeline, run the pipeline, maybe explore the pipeline with ROG. What we want to do in the future is move these pipelines so they're multi-cloud, hybrid cloud-aware, because what you want to do is experiment locally around the cloud. And that's what we want to implement by leveraging vendor agnostic snapshots. And then why not do hyperparameter tuning with KO and Catty from linear notebook? Why not specify that a few variables are special? There are hyperparameters, specify the ranges, specify hyperparameter tuning algorithm, and have KO produce the hyperparameter tuning job. This is what we will be demoing at Kipcon in Amsterdam end of March. We want to track data and metadata with ROG and MLMD. I could flow components. So you can go back in time and have a full lineage data and metadata of all of your artifacts. And finally, mini-cubeflow with Kubeflow 1.0 coming next week. So please contribute to KO, open source. This is our repository. There's nice medium article you can read to get started. Please try out all of this, what you just saw. You can deploy your own mini-cubeflow. You can follow our code lab. You can watch the video. We're really looking forward to your feedback. Join the Kubeflow Slack workspace. Join hash mini-KF, our channel. We're there to talk to you. You're more than welcome to provide your feedback. And this is it. Thank you for your time. Thank you. I think we have some time for our Q&A session. Yeah, yeah. Thank you all of amazing live demo and presentation. I love the abstraction and the value proposition around UX, around all this. There's some questions. Some folks were trying to spin this up. Babu, I'm guessing that best place to go is on the Slack team and ask those questions that they're having. And two kind of patterns that I see in the questions are around workflow and security. So let's handle the workflow a little bit. And Derek Wyatt, I think, asked a good one, which is what is the expected workflow between the data scientists and the operations personnel? Ergo, the data scientist uses Kail to create the pipeline, but then what pipeline becomes blessed by Ops is now under their responsibility. How are Ops data scientists expected to interact, command line, get a notebook? This is a good question. So Kail itself does not have any interference with this process. The process of this does become a pipeline that's uploaded to KFP. It becomes part of KFP's database. If you have a blessing process or if you need to move this pipeline somewhere else, you can continue doing this exactly the way you do now. And the pipeline itself actually has input parameters that come from the notebook. You see that one of the notebook cells is marked as a parameter cell, and that's where we get the pipeline parameters. So you don't need Kail to actually run the pipeline once it's been uploaded. You can rerun it over and over again with different parameters. The operations is detached from what the data scientist does. But if you need to actually modify the pipeline, so you need to involve the data scientist, you need to spin up your notebook, edit your notebook, click on the Kail button again, and then you get a new version of the uploaded pipeline. Does this answer your question? I'll leave it. I think once it's a pipeline, CICD Ops kind of responsible, that's the handoff. And I think with the platform, to me, it looks like it does or the solution is it kind of makes it easier for the data scientist to work a little bit more in an environment that feels native to their experience and by way of the notebooks. But I'll leave it there to ask. And then about the security questions. So I think there's somebody asked broadly, security Juan Medran. And then there was also about the images which I'll roll into that. Derek Wyatt is also on our back. But let's, I guess let's, the easy ones. The images, where are the images that are getting pulled? I guess with GCP, those are the embedded AMI type things over there. I'll leave it to the person who asked the question about the images to clarify. But can you speak, I guess, a little bit on security and broadly? Yes. So the images, there is a mini-coup flow instance image that we have uploaded as a public image on GCP. So mini-coup flow can become a part of the GCP marketplace. And then this image will download all of the, will pre-fetch essentially all of the coup flow related images that the NKF cattle, the coup flow deployment to installs. So all of these are public images. And that's why you can deploy them on your mini-coup flow instance. About security, you'll see that our notebooks, it wasn't a part in this demo, but we've had other demos in the past. We have been contributed heavily to coup flow multi-user support. So all of these things run in your own namespace. Notebooks are managed as part of your own individual namespace. KO works inside this namespace as part of your notebook. We are working on making pipelines multi-user enabled. So right now in mini-coup flow pipelines, all pipelines run within a single coup flow namespace, but we are working and we have a design that we're implementing and it's part of the community and it's open for discussion that allows you to run your own pipelines within your known namespace. So security-wise, whatever you do has to do with objects that live inside a single namespace and they can only interact with objects within your namespace. Same thing with persistent volumes, all of these persistent volumes belong in your own namespace. And when you access rock, you use rock-specific tokens that again link your namespace with a rock namespace, a rock account. Great, thank you. We can actually run a couple of minutes over. I take with Taylor, she's good. So if you can hang around, I'll ask for a couple more of these. Let's go with Frank S, who's asking about roadmap for multi-clouds. I'm interested in hybrid cloud kind of roadmap. And he says, I think the abstraction of Kubeflow ops instead of KL interface for data scientists. Yeah, and I think that would was something we just covered. But can you speak a little bit to roadmap, I guess? When can we expect some of this support for a multi-cloud and hybrid cloud? So Kubeflow 1.0, coming out next week. Mini Kubeflow with Kubeflow 1.0, coming out next week. Then we are working hard to implement multi-user pipelines for Kubeflow. And this will be part of open source Kubeflow. We have a design that anyone can contribute to, which is now under discussion in the community. Then we're working to bring hyperparameter tuning into KL. This is part of our roadmap. And then multi-cloud, hybrid cloud functionality. This depends on whatever storage you're using, underlying your pipelines. So this is more of a rock question. We are testing with clients, rock in multiple locations, and the rock registry, a component we use to synchronize multiple locations. So how to implement multi-use, multi-cloud pipelines depends on either writing your pipelines in a way that accesses some sort of external storage explicitly. So then becomes your responsibility to keep these storage synchronized and available across locations. Or if you're using rock, you can seamlessly, essentially, specify that certain data is to be shared across administrative domains. So whenever you place a snapshot from one pipeline step in a specific bucket, this becomes available in another location, which then allows you to spin up the second half of a pipeline with input this snapshot. We essentially use snapshots to move the output of one step in location A to become the input of another step in location B. But this happens under good flow of the storage management, the data management layer. Okay, cool. We should talk about that a little bit more later. Greetings from Amsterdam. Looking forward to meet you at KubeCon. How does KL metadata changes look in version control? For example, which tools review NB? Like review tools, I guess like a diff for this, if I'm understanding that. So the metadata is actually part of the notebook itself, right? And the notebook itself is a JSON file. So whenever you change the metadata, this change train is actually very easy to see because it becomes just a diff between two different JSON files. Whenever you change a key value of the metadata of the notebook, that is what KL does to store this information, it's just like comparing traditionaries. Okay, cool. And with that, I think we've run over enough. There's a couple of questions. I'll point folks like Josh about who asked about how to improve the solution, testing develop suggests tutorials or anything. I imagine the Slack channel is the best place to kind of go and interact with you and the community and kind of follow up on all this great stuff. There's also a KL channel on the Q Flows back. So you can also place questions there. We are monitoring their channels. There's a media group. And the KL channel as well. So you can find us there. Well, awesome. Well, great. Thank you very much, Vangelis and Stefano, for a great presentation. And that is the end of the questions of the time we have for today. Thank you all for joining us this webinar. Recording the slides will be available later today. And we look forward to seeing you at future CNCF webinar. Have a great day or evening wherever you are. Thank you. Thank you. Goodbye.