 Good morning. Just waiting for a little bit, maybe some other folks join. I think we can get started. So Yannis, thank you for taking the time to present today. I'm excited to hear about Qflow. One of the projects or project areas that SIG is actually being looking at and reaching out to is MLops. So that's SIG runtime and how you run workloads in the cloud. I think MLops is one of the the up-and-coming areas in Qflow is one of the projects that it actually covers the whole scope of M2N and how you deploy machine learning workloads in a cloud-native environment. So with that, I'll just let you take it away. So they have presented my screen? No, you don't see your screen. I asked if I should start presenting, so I guess I should. Sure, sure, sure. Everything is ready. So let's get started. Hi everyone, I'm Yannis. I'm a software engineer at RITTO and I'm also a contributor at Qflow and I lead for the Manifest Working Group and the Notebook Working Group. So in this talk, I want to give you an idea of what Qflow is and the potential that it holds as a Kubernetes native ML solution. So first of all, let's talk a bit about what Qflow is. Qflow is a machine learning platform built on top of Kubernetes and Qflow is dedicated to making deployment of machine learning workloads on Kubernetes simply portable and scalable. And many people, when they think about machine learning and what you do in a machine learning project, they think that most of the time you spend defining a novel model architecture, training it and iterating on that. But the truth is more complicated. Usually, in reality, most of the effort needed is infrastructure and DevOps effort to address the various difficulties of the workflows of data science present. Configuration management, pipelines, reproducibility, serving, scalability, hyperparameter tuning are all hard problems that all machine learning projects will face. And Qflow extends to Bernatis to natively support ML workflows in order to solve these hard DevOps problems and provide the cloud native platform for machine learning. Now, Qflow is an open source project, as you know, and a product of many companies collaboration. You can get involved by joining our Slack, our community meeting, or a specific working group you're interested in. So I feel that the best way to show you what Qflow is about is to actually present a quick demo of a machine learning workflow using Qflow. So in this demo, it's a live demo, we will put ourselves in the shoes of a data scientist starting a new machine learning project. We will start with the open vaccine cargo problem, which was a challenge to find the stable molecular structures for the COVID-19 vaccine. And we will first start with a Jupyter notebook and choose a machine learning algorithm to train our model. Then we will experiment with our data and run a reproducible ML Ops pipeline. Then we will take that pipeline and run it with different parameters to perform what's called hyperparameter tuning to get the best model with the best accuracy. And finally, we will take the best model that we train and serve it using Qflow's serverless machine learning serving solution, which is called KF serving. So let's start with the first part of our demo. So let me actually switch to my browser. As a data scientist, I want to start a new experiment. I'm going to the Qflow website, my Qflow deployment website. I'm graded with a login page. So I'm going to begin. And immediately I see the Qflow central dashboard. I want to start experimenting. So I probably want to create a new Jupyter notebook and load my code, load my data and get going. So I'm going to go to the notebooks tab and actually create a new Jupyter notebook server in this form. In order to not wait for the server to start, in this demo, I have prepared one beforehand here called Seagrant Time Demo. So all I'm going to do is connect to it. And once we recite the Jupyter lab environment, what I have done here is actually clone a specific key repository and open the open vaccine example. So in the Jupyter notebook here, you can see there is a number of steps from installing packages, to loading and preprocessing the data, to defining a model, evaluating the model, and producing a final CSV file for submission. So as a data scientist, we have a Jupyter notebook where we experiment. We do things like that. And at some point, we are satisfied with the code that we have. And we want to actually run this code at a bigger scale. Because a Jupyter notebook environment doesn't really scale that well. What we would like to have ideally is a pipeline which we can scale and parallelize and actually have something reproducible. So how do we go from having a Jupyter notebook to having an ML of pipeline? Well, there is a tool called K, which is a Python SDK and a Jupyter lab extension. And K is a project that aims to provide a seamless experience to the data scientist. So whether we can easily go from running our code locally to running a reproducible ML Ops Jupyter pipeline. So to enable K, we simply click on the sidebar, on the K button, and we click enable. Now you will see some colors over each cell. And this has some special meaning. So in K, the data scientist goes to each cell and annotates it. So a group of cells becomes actually a pipeline step. And the data scientist can also declare dependencies on other pipeline steps. So without changing any of the actual code, we have gone from a Jupyter notebook to defining a pipeline structure. Now all we have to do is click compile and run. All we have to do now is click compile and run. And K will parse the notebook and generate a couple of pipeline, which it will then submit. So I will do that to demonstrate now. And in addition, K ensures reproducibility by logging metadata about its pipeline steps, inputs and outputs, and also taking a snapshot of the notebook's volumes using ROK's CSI Compatible Data Management capabilities. ROK is a CSI Compatible Data Management software by Arikto. And finally, Kale gives us a link to the pipeline, which we can open and track its progress in the Qflow pipeline UI. So this was the first step of our demo. Now let's continue with our presentation. So as a data scientist, we have gone from experimenting with a Jupyter notebook to actually having a reproducible MLOps pipeline. Now we want to perform a procedure called hyperparameter tuning, where we run the same pipeline multiple times with slightly different parameters in order to optimize the actual performance of our model. So in order to do that, we will use a Qflow component called Kati, which enables us to do hyperparameter tunings natively into Bernatis. Again, Kale will help us by providing a very QT UX for that. So going back to our notebook, let's say now we want to do hyperparameter tuning. How do we do hyperparameter tuning? We need two things. The first thing we need is to actually declare what the hyperparameters are, what are the knobs that Kati will turn and see what happens. Those are defined in special types of sale, which is called pipeline parameters. And the data scientist selects it from the UI like so. Then we need another thing, which is the actual objective. What is Kati optimizing? When it's turning a knob, what should it see to evaluate if the result was better or worse? In this case, we optimize for validation loss. And to do that, we simply print the validation loss and tag the sale as pipeline metrics. Note that we have done these two things. We can enable HQ tuning with Kati into the Kale deployment panel. Then click on setup Kati job to tune the various parameters. And once we are satisfied with it, we click on compile and run Kati job. And Kale will actually again take care of creating the Kati experiment and running all of those pipelines in order to perform the hyperparameter tuning. Now, as before, Kale will actually give us a link to the Kati experiment where we can track its progress in the UI. Because for the purpose of this demo, the experiment would take a long time, I have prepared one beforehand again. So I'm going to fill in this one. Okay, so let's continue. Now we have created an azupter notebook with our code, converted it to a reproduced Kulemilux pipeline, and produced the best model by performing hyperparameter tuning. Finally, we should simply take the best model and serve it. This is the serving part. So as I said before, the hyperparameter tuning process will take a long time. So I have prepared an experiment beforehand. So if I go to HQ tuning, this is my experiment. As you can see, this tuning UI provides an intuitive UI to see how all of your trials went. And you can also see a coupling leaf of the trials. And the best one is highlighted in yellow. So to serve this model, we will navigate to the pipeline that produced it by clicking on the little pipeline icon next to it. I have a question. Why is that the most optimal? That's a great question. So in our notebook, we defined some knobs, some parameters for cutting to tune, and we also defined an objective, which was the validation loss. So what this means is that this trial has the best validation loss. I see. Got it. So that's what the UI highlights in this case. So that's the lowest value, the lowest value, 0.69, basically. Exactly. Got it. Thanks. So we can go, so from the experiment, we easily find the best trial and we go to the pipeline that produced it by clicking on the little pipeline icon. Now, to actually serve the model, we will restore the state of the notebook in this last step, the model evaluation. To do that, we will use actually what is Kubernetes CSI capabilities. In this case, we have a rock implementing them, which is directed to the management layer. So to do that, we will copy the snapshot URL and create a new notebook server, which will actually go to the state of that step. Let's name this serving demo. And we'll launch it. And this will actually restore the notebook to the state of the last step of the pipeline. Now, this will take a bit, so I have prepared again a notebook server that is ready for this. To reiterate, the state of the notebook right now is at the last step of the best pipeline run. We actually have our model in a variable hold model. And we go to our notebook and try to serve it. So to serve our model, we will use kf-serving, which is qplus-serving-solution. And ktips-sdk will make it easier for us to serve the model without having to know everything about kf-serving. So what we'll do is, from the kf-sdk, we'll import the serve function. And then we will define a kf-serving server by giving it the model and the preprocessing function. So in machine learning serving workloads, it's common to have a preprocessing function before we... Yannis, I think the audio went off. Okay. Yeah, I think we can hear you now. Great. Okay. So moving on, we define a preprocessing function and give it also the model. So what kf-serving will do is it will create a server for preprocessing and another one for predicting. But the data scientist doesn't need to know all of that. They just give the preprocessing function and the model in Python. So we run this cell and in the background, K will take care of creating all the necessary kf-serving API resources to actually serve that model. So now we will wait a tiny bit and wait while K is actually doing all of this work and we'll see the server in action. So while this is happening, are there any quick questions? So you're using the built-in Qflow server mechanism, right? So you can have also other server mechanisms if you like to, right? Yeah. So essentially, kf-serving is more of an infrastructure side. It doesn't enforce a particular framework, for example, it can work with TensorFlow, with PyTorps or with everything basically. It uses a serverless model and supports scaling to zero and uses actually k-native underneath to do that. And yes, it's pretty generic. It doesn't enforce any particular framework for massively learning or anything like that. All right. Thank you. So now we're waiting for the inference service is an object from the kf-serving API and K is waiting for it to be ready, which should happen any second now. If there is another quick question, we can also do it now while the inference service becomes ready. How long does it usually take to run or it depends? It's usually faster, but if you remember earlier, we submitted the pipeline, and this is not a cloud environment, let's say. It's a restricted environment. So I suspect it's putting a bit of a strain on the cluster, and that's why it's taking longer than usual. Go ahead. No, no, no. Please continue. I interrupted you. No, no, no. So I was just curious. Let's say it runs low. What are the ways to tune it on the infrastructural level? Like adding more resources to the nodes or to speed up the... I didn't get the question, sorry. Yeah. So what is the way to tune the performance of the run to reduce the time? Like you mentioned, running in the cloud is usually faster. Is there anything else you can do on an infrastructural level to speed it up? Yeah. Add more nodes would help very much. Pretty much that. This is a one-note deployment. This is why it's putting so much strain on the cluster. So that's the sign of... I can actually help it out a bit. If I go... If the run is still running, no, it's not still running. Okay. Then it should be good to go. Okay. It's a bit longer than usual. So does it depend on the size of the data too? If the data is bigger, then it's going to take longer? Yeah, of course. Of course. It depends on the size of the data as well. In this particular instance, I guess it's the thing we say that everything works right until the left demo. But it's okay. Let's give this a little bit of time to see if it gets better and let's continue with the presentation. So moving on and I will come back to this later. Okay. So what we saw was the whole end-to-end data science workflow from Jupyter Notebook to a compiled scalable and reproducible ML of pipeline to running multiples of those pipelines to tune for the best hyper parameters to serving the actual model we will see in the end. And in all of this process, this is reproducible because we use MLMD, which is a metadata database by Google, which logs each pipeline steps, inputs and outputs. So you know what came from where. And in addition, we snapshot each step using a CSI compatible provider like Rogue in this case. And in this way, I was hoping to present you how actually the various components of Kubeflow would be combined to give the data scientist a unified workflow. Now, moving on to another subject that I wanted to touch in this presentation is a program called configuration management. In this part, I will introduce customized GitOps and explain how customized and implicit GitOps workflows for Kubeflow. Now, one of the hardest problems in Kubernetes, as I'm sure you know, is configuration management. And there is no separate bullet and there are many tools available right now targeting this problem. Kubeflow in particular started out with case on it, which had a really high barrier of entry. And we ended up moving away from it. Whatever tool we had, we chose had to be configurable, readable, extensible, and work well with GitOps workflows. So right now, after a lot of discussions and effort, Kubeflow uses customized for this. So I want to give a bit of an intro to customize so that we are all on the same page. Customizing is a configuration management of Kubernetes, which avoids templating and domain-specific languages. Instead, it encourages writing configurations as actually Kubernetes elements, like Kubernetes is the output here. Modifications are done with patches and transformations. So to give you a reason why we didn't choose a solution like Helm, it was because charts ended up looking like this example, which is an actual chart of Grafana from the Helm charts repo. Now, I will briefly explain how customized works so that we're on the same page. The user writes configuration as Kubernetes jamblin. For example, the user here defines deployment, a service, and a config map for Red is database. And then the user writes a customization jamblin file, which imports these resources. We call this, in this situation, the base customization. Then to render the final configuration, the user calls the customize build command. Now, let's assume that another user wants to also deploy Red is, but in a different environment, and wants to change the settings a bit. To achieve that, the user will create another customization called overlays deploy in this case, this folder, and import the base customization and also use a patch to change the deployment's number of replicas. And again, the other user will use customize build to render the final configuration, which, as you can see, is a mix of the base and the overlay. However, if the final configuration is a mix of the user's desired state, each of the users actually defines their configurations in totally separate files and folder. And this is important for what comes later in GitOps workflows. Now, we saw what customize is. Let's talk a little bit about GitOps. So what is GitOps? Here's the player on our left-hand side and the Kubernetes cluster on the right, we're going to deploy to flow. Now, GitOps is all about a Git repository. It sits in the middle and the deployer commits their desired state of the cluster and then apply only committed manifest to the Kubernetes cluster. So the state of their infrastructure is in the Git repo. But this is the basic, basic configuration. But most of the times what actually happens is that there are vendors which have their own manifests for deploying to flow or any application and their consumers or customers who want to use these manifests, but don't update them and keep up to date. So in the visual case, where the manifests come from a vendor, here's what happens. For example, let's see the following example. Arex Excel implements GitOps and we publish generic vendor manifests in a vendor repository, which is the blue one here. The deployer the deployer clones the vendor repository into a local repository. And then the deployer creates and then the deployer creates deployment specific commits, which we call customization. On top of the vendor commits and eventually the deployer wants to combine these manifests into the final desired state and apply it. So let's see now why customize helps the GitOps process and how the actual upgrade process happens. So one of the hardest things to do in this scenario is to answer, how do you upgrade? Because the deployer wants to have their own commits on top and the vendor wants to keep adding commits to upgrade their manifests. So how do you reconcile the two? For our GitOps workflows, what we do is we use Git rebase. So let's assume the vendor is at version v2 and the deployer has committed another deployment specific commit v1. At some point in time, the vendor produces v3, another commit. The deployer pulls and rebates their changes so they can now sit on top of v3 and thus upgrade their infrastructure while also keeping their own deployment specific changes. And customize is important here because as we said before, it separates the changes of the deployer from the changes of the vendor. So their base yields no conflict. The Git rebase, that's the key point here of how customize enables us to do this workflow. So after this upgrade, the deployer just has to reapply to upgrade their cluster with a new version from the vendor. Finally, I want to talk about the fact that it's not all good. There are some pain points in customize. I've highlighted two here. First of all, there is a feature called vars that is it kind of breaks the encapsulation of customize. And the customized team actually wants to be deprecated, but there hasn't been a solution yet since last year when the planned deprecation was announced. And second, which is more like kind of the fundamental way that customize works versus something like Helm, is that in Helm, all the power is with the Char developer. The Char developer exposes knobs in the virus.yaml file that the consumer can tune. On the other hand, in customize, the consumer has all the power. They can use patches and literally change anything in what they get from the vendor. But on the other hand, the vendor, the developer of the customization doesn't actually have a standardized way to expose things to the consumer like what Helm has. So it's a bit difficult to produce an easy way for consumers to actually customize their deployment, even though customize actually gives you a ton of power. So in order to find some middle ground for our workflows at least, we thought it would be helpful to show that our solution was to build our own installation tool on top of the customized manifest. So I have a small video here. You can see what it looks like. On the right are our docs. On the left is the rock deploy tool. It's what we call it. So what this does is it's essentially taking you through and then curses UX where you answer a bunch of questions. And the tool is actually responsible for converting these questions to actual customized configuration and committing them to your GitHub's repo. So we're still using all of the nice things we said we were using before. We just have added a nice UI to it so that it's much more usable. And I wanted to show what this looks like. Maybe someone else has a similar problem. I have a question. So this is the GitOps. Basically this is you're deploying your serving part or what is your whole pipeline, building your whole pipeline and basically deploying it. So do you have a CICD workflow? So this is mainly about this is separate to the CICD workflow. It's about initializing a deployment. So when you actually take a product into your hands or a software, let's say MySQL or Grafana, you're usually given some knobs to tune, some settings. What I'm saying is that while customized gives you an unlimited power to change whatever you want, it's a bit difficult to find the actual way to do it. You have to have knowledge of the whole thing. So we have built, this is a small direction tab that actually asks you a few questions and produces the extra customized configuration necessary to achieve those settings. After, in addition to producing it, it also commits it to the GitOps repo. So it actually follows the GitOps process. After committing it to the GitOps repo, you can hand it off to whatever CICD tool you want to perform the actual reconciling with the Kubernetes classic. I see. Okay. So yeah, what I was trying to think in was about how you tie these together with your models and how you create your models, you build your models and then you deploy them to serve them. And basically, as I understand, this actually makes it easier for your cluster to be set up for your whole environment with all the different components that you need to have to set up GitOps. Yes, exactly. So this is a bit of a different audience. The previous workflow we showed, where you had the K less decay in Python, which allowed you to easily serve models. This was for the data scientist to use. The second section of the talk about customizing GitOps is more for the DevOps team, let's say, the cluster deployer team, the team that is responsible for the infrastructure. Got it. Got it. Thank you. So I wanted to kind of show the two sides of the problem. And closing with this, thank you for giving me this opportunity. This is some contact information if you want to get in touch. And also you can try to demo yourself using this link. It was performed using something called MiniKF. It's a deployment on GCP. It's just click to deploy one button. So if you want, you can also do this yourself. And we have instructions in this link. Did the demo actually complete for serving? Oh, yes. I forgot about it. Oh, yes, it completed. Here it is. So let me actually deliver my promise and show you. So what I will do first is I will do a prediction for this model server. So first I define some data here and then to actually hit the model, I call the method predict of the KF server object. So actually Kail takes care of doing all the HTTP requests, et cetera, underneath and the data scientist doesn't have to think about anything. And this, what I'm going to show you is scale to zero. Right now there is nothing grounded. There is no server. So once I hit enter, what actually happens is that KF serving buffers the request and spins up a pod in the background. And once the pod is ready, then it will respond to my request. And in order to make this a bit more usable for the data scientist, we have actually added a UI for the model serving. So if you print the KF server object, you will get a link for viewing the model. If I click the link, it will take me to the model UI where I can see an overview of the model server, some details, some metrics of what is happening. So this is a direct connection to Grafana. Loves from inside the pod. Here are the process data that I would note. And if you're an advanced user, the whole family that was used to make this server. So that's the grace that I want to show you around model serving. Yeah. And so one question is, if you're actually setting up this in a repetitive way, you want to serve this in... I mean, this is for data scientists where they actually try and build the models, right? But then sometimes you want this to be more repeatable, like how these releases happen. So does Kubeflow provide a mechanism for that? Or not just going through notebooks or just different steps like you're describing here, but more like just a push of a button and then just deploy the serving mechanism. Does that make sense? I think so, yes. So at the moment, we don't have a UI for deploying and the deploying is actually done through the Python SDK. So right now, we are satisfying the people who use Python a lot. But yet, we don't have a UI for deploying the model. We have the UI for viewing the models and metrics around them, etc. The one we showed here. But yes, what you said is actually, I think we're working on. It is something that is sort of to be present in the later versions of Kubeflow. It's usually called... I think what you're describing is usually called the model registry. Yeah, I think so. Yeah, maybe you have a registry and then you just pull that model there and then you're like, okay, I just deploy this and serve this model. It sounds a little bit like what you have with a container registry where you build the container registry or the container and then you publish that in the container registry. And then when you deploy, you just deploy that container image. In this case, will be more like the model image that you're deploying to a production environment. Exactly. We have a weekly meeting in Kubeflow called the SIG meta data. I think that is looking hard into this problem. So even if one is interested in this particular thing, I would encourage you to join this specific meeting because it is doing work directly on the section you talked about. Got it, got it. Yeah, thank you. All right. So yeah, this was great. A lot of information. For me, myself, this is kind of like the first time I see the kind of end-to-end workflow. And yeah, one more question about those components. You have the ROC component. That's an open source project. But is that... ROC is not an open source project but it is connected, it connects through an open source interface. Like how S3 is not an open source project, but it connects through the S3 protocol that you can use an open source project for. Got it, got it. Okay. And so you would not actually necessarily need that if you wanted to create your pipeline, right? Yes, exactly. ROC is about providing reproducibility and making workflows much easier. Got it. Anybody else have any questions? Anything? Thank you for the great presentation. Kubeflow is a great project. Thanks for having me. Yeah, thank you. Yeah, thank you all. I'll give you back another, what, 15 minutes or 12 minutes. So, all right. Bye, guys.