 So today, as you can see from your schedule, we're here to talk about machine learning pipelines. Hopefully by now, everybody recognizes that mini acronym of ML. And we're gonna put in the context of a couple of open source projects and you in the previous session just heard about one of them. We're gonna revisit that. We're also gonna talk a little bit about Kubeflow. Yeah. So we'll do sort of a brief introduction to machine learning pipelines as we sort of idealize them or understand them today. And we'll talk about some of the attributes and features of those pipelines. We'll do sort of a survey of some of the more popular open source pipeline projects and that's gonna include Airflow, Tecton, Argo and Kubeflow pipelines. And then Joana is gonna get into a bit more detail about two of these. These projects have sort of different ages for us and we have more experience with things like Argo and Kubeflow pipelines. And so that's what we're gonna focus on and Joana is gonna lead us through a demo of that that is based on a very practical machine learning application, which is fraud detection. And then we'll at the end talk about some of the future direction and wrap up. So as you can sort of expect from a pipeline, there's this notion of sort of an organized set of stages and there's interdependencies between those stages and that's very important in the context of machine learning these days. Because it turns out there's sort of this classic notion of data science with a notebook, like an iPython notebook, a Jupyter notebook, but actually machine learning in production is much more complicated. There's various stages involved to get from that first point of ingesting some data and actually producing a model from it, training that model and deploying that into production. And this is where the importance of machine learning pipelines comes into place, into play and having tools to do that. So sort of in a generalized example here, we have sort of an ingest layer that is doing a read and transformation of data from various sources, maybe S3. And perhaps that data is coming in, it's sort of tagged as sort of experimental or development data or maybe it's actually for inference and it's coming in from some production sort of input or perhaps from some other location. And then we sort of want to distribute that throughout a pipeline where we can basically do various different types of model development techniques. So in this case, maybe it's PyTorch, maybe we're doing a convolutional neural network with TensorFlow, random porous model. And then from that, in our pipeline, we're coming down and we're doing an evaluation of how we did in terms of doing the training of that. And then there is some techniques that are applied at this point to find basically the best model for the input. And then finally, once we derive that model, we want to push it out into production and do some monitoring and things like that. So this can be executed in many cases in parallel in some of the companies that are very advanced in machine learning. Think of an Uber or Lyft or some of the other companies out there, Google. They actually do this in production. They have sophisticated machine learning pipelines for doing exactly this type of work. Slide please. So, but taking a step back, we can look at sort of the attributes and features of machine learning pipelines and in terms of computer science, a lot of us are very familiar with DAGs or distributed acyclic graphs. So you'll see when throughout this discussion of machine learning pipelines, we'll touch on this topic of DAGs. But in general, machine learning pipelines should be automated and we'll see that OpenShift and Kubernetes is an excellent platform for helping us with the enablement of that automation. Also, it should be repeatable. There should be a capability of doing some artifact passing in terms of input and output between different targets within our pipeline. The notion of triggers. So something happens say in a GitHub repo that should kick off a stage in a pipeline or the very beginning of a pipeline. And then there's the notion of integration for multiple clusters. So perhaps we have defined a pipeline that exists for the steps for DAG. There's sort of features or attributes that we're looking for. So the notion of targets, which are really the nodes in the DAG in the graph, the ability to do parallel processing, conditionals are important. So if we have A or A node here, perhaps there's some condition in here that immediately triggers that can get us down to E. Other conditions perhaps go to B and C or directly to D. That's where conditionals come through. The notion of loops, we're not talking about loops within a DAG. The notion of a loop in this case is within the target itself. The ability to iterate over a list or a sequence of inputs. And then for each of these targets, the ability to pause and resume. Also timeouts so that in some case, if we have a pipeline and there's a stage that's taking too long, we want to kill that off basically. Something's gone wrong. And perhaps kick it back and it follows another path down through the DAG. And then the notion of retries. So all these are the ideal capabilities that we want to see in a machine learning pipeline slide. So it turns out there's actually quite a few open source tools today that help us in this respect. And we're gonna focus on a couple of these, not all these in detail, but Argo, Kooflow Pipelines, Apache Airflow, Packaderm, Tecton, Jenkins X. These are all different open source toolkits that give us varying degrees of those capabilities that you saw previously. And we'll see that some of them are better suited to open shifting Kubernetes than others. Next slide, please. So it's important to touch on Airflow because it was one of the early open source workflow orchestrators or DAG orchestrators, but it is not Kubernetes native. It's a Python SDK and it can basically make use of various different types of schedulers like Dask or Kubernetes or Mesos, but it's not dependent on Kubernetes. It's not inherently built around Kubernetes. So that's an Apache project. It's been around for a while. It's very popular out there. It was originally started Airbnb and then contributed to Apache. It has the concept of operators, which is not to be confused with Kubernetes operators, which you'll probably hear throughout the weekend in various talks. So they have different types of operators for representing those single independent tasks in a workflow. So those are predefined in the SDK. So there's a Python operator for executing some segment of Python code or bash or integration with G-Speak, Google Cloud Platform. Next slide. I'm surprising you, because I threw this in the last minute. Joanna and I were upstairs and we were sitting there. I was kind of like, well, we should put something in there. So this is a little snippet. This is incomplete, but it gives you an idea of what the SDK for Airflow looks like. You see the abstractions there that we just talked about. The notion of a DAG as a native type in the Python SDK, Python operator, those types of concepts. So that's it for Airflow. Again, we're just doing a summary of some of these projects. Next slide, please. Sorry, not quite the end, but what Airflow does have is a fairly sophisticated UI, which is nice. It's fairly well developed. So it gives you a sort of a sense. This is one of many different types of visualizations that you can get from your DAG. And this one, we kind of threw it up because it gives you a classic sort of conceptual visualization of what the DAG is doing. And we'll see that the other tools that we're gonna demo later on also give you nice representations of these DAGs. Next slide, please. Tecton is a very new project. It came out of Google, specifically the Tecton CD project. And Vincent Batts, who is here, I don't know if he's in the room, but he's here at the conference somewhere, I think, is one of the people from the CTO office who is very much involved in this new project. And the idea is to develop a new Kubernetes native CI CD pipeline, specifically designed for Kubernetes. And right now, it's entirely based in GitHub. There's a couple of sub-projects that go along with it. There's a UI for it as well, CLI, and it has a sub-project for eventing. And so with Kubernetes and OpenShift, we're all familiar, we're always dealing with APIs. And in those APIs, there's representations of those resource objects from the API. They've done the same thing in Tecton. So in their API, there are Kubernetes definitions for things like a task, task run pipeline resource, pipeline, pipeline run. So tasks are really sort of the atomic, almost atomic thing that executes your workflow. So those are run as pods defined by a task run resource object. Within a task, each container now in a pod, as you know, you can have multiple containers. So the steps within a pod are represented by containers. Then there's a pipeline resource which represents possible inputs and outputs for a given task. So it could be a Git ref, maybe a pull request that appears on a Git repo, a change to an image from an image registry, interaction or output to another cluster. We talked about that earlier, the ability to interact with multiple clusters. And currently, of course, it's a Google project. They always lead with their preferred infrastructure. So there's a storage output, and currently that is Google Cloud Storage. But this is a project that Red Hat has a fair amount of interest in. And we're gonna talk about Kubeflow in a second, but they're also taking a certain amount of interest in this project as well. So again, a bunch of resource objects, sort of the highest level resource object is the pipeline itself. And that's your definition of this ordered sequence of tasks. And so you have your inputs and outputs going from task to task. These are defined as the pipeline resource. And that defines your flow through the pipeline. And then a pipeline run represents an object that is the execution of a specific instance of that pipeline. Next slide, please. So if you were here for the previous talk, we talked about Open Data Hub. The important part of these machine pipelines is that they need to be useful. They're part of some larger ecosystem. And in this case, the ecosystem that is of interest here is Red Hat's own Open Data Hub. Now that's an open source project that provides various capabilities. It's a meta project for the hybrid cloud. It is designed specifically to run very well on Kubernetes and most specifically on OpenShift itself. It has notebooks, it has integration with Spark Operator for generating or spinning up Spark clusters, notebooks, TensorFlow notebooks, scikit-learn. And this type of meta project is what's needed to sort of connect together some of this pipeline in terms of the different components. So there's different responsibilities that are tied to different personas in the overall machine learning pipeline. So with Spark, you can do ETL into say a TensorFlow notebook, for example. And then you can do serving of a model that once it's been developed in the notebook and then you can do some monitoring or hyperparameter tuning of that model. So this is the direction that Open Data Hub is going. And one of the key sort of components we've identified for integration into Open Data Hub is a workflow manager and currently the target is Argo. And Joanna's going to talk about Argo in a bit. Next slide. So a graphical representation of Open Data Hub. I've sort of called out Argo workflows in sort of this layer here that we associate with monitoring and orchestration. It's important to understand in the Open Data Hub that's available today. We have an Open Data Hub operator. If you don't get all these things currently, this is sort of a reference architecture for a set of components that are designed to be integrated into a machine learning pipeline. So components like model life cycle, we integrate with Selden. We have the AI library developed by Prasant there. We have Jupyter Hub for spawning notebooks, super set. And of course Spark for doing big data processing. Internally, we use some other components, elastic search, Kafka is a caching layer, Hive. And then so it's all unified basically by Red Hat OpenShift and Kubernetes, which is what OpenShift is basically. And we integrate with Ceph for storage. So it's a fairly comprehensive reference architecture for machine learning. And again, an important component of it will be YARGA workflows, which we are targeting to deliver. And the Vagas. And the Vagas. Yes. Is that changed? Is it? No, I can't say. We're still in track. Are you the one on the hook for it? Yes. Next slide, please. So that's Open Data Hub. That's a meta project really driven by Red Hat. There's also a project out there called Kubeflow. You might have heard of it. It's an upstream project that was initiated by the folks at Google, who spent a lot of time doing development of machine learning pipelines and recommendation engines for YouTube. And they took some of those best practices and understandings from that environment and went open-sourced. So that's what Kubeflow is. So that's a project I've been involved with for about over a year now. And it's grown quite a bit. And the idea is that it's totally dedicated to making machine learning workflows and pipelines simple, portable, and scalable, only on Kubernetes. It's not exclusive to pure Kubernetes. It runs fine on OpenShift. Just requires a few tweaks here and there. But it's a very powerful project. Again, a meta project. So the idea with Kubeflow is not that I recreate all these different types of things like Python notebooks and Jupyter and all that stuff. But basically to provide an integrated platform, kind of like Open Data Hub, for these different components to fill out the machine learning pipeline. So the core components is a notebook controller, so you can spin up Jupyter notebooks. TensorFlow Training Controller, you could do distributed job training. TensorFlow Serving and Selden. There's a sub-project called Catib for hyperparameter tuning. And then finally, a fairly large significant sub-project that came on late into the Kubeflow project is called Kubeflow Pipelines. It actually came out of the TensorFlow Extended, TFX group there at Google. And they have gone about building up a machine learning pipeline, DSL and SDK, that basically sits on top of Argo. So they've provided some useful machine learning abstractions on Argo, which is not necessarily designed with machine learning in mind. It is designed for Kubernetes, but it's more generic than that. Over to you, Joanne. I think that's it for me. Yes, thank you, Pete. All right, so I'm just gonna ask a couple of questions just to have a feel of the audience. How many people here are using Argo today? No, not you, Prasad. I already know what you're doing. No, so how many people actually maybe are looking into moving from notebooks to workflows for production? Yeah? I know Sophie too in the back. All right, so I guess my audience is a clean slate, so that's good because this presentation is very basic. We want you to leave this presentation with some idea of which tool to use and maybe a little hello world of how to use it. So what is Argo? Argo is an open source container native workflow. It's one of the original, I would say, container native workflow tools out there. It was originally made for CICD workflows, but then it got picked up by the data science community to do production workflows. Why is it Kubernetes native or container native? It's because the steps in the workflow are containers, you run containers, and then they use something called the custom resource to add basically API to Kubernetes. Does anybody here know what custom resources are on Kubernetes? Okay, so basically what it is, is you tell Kubernetes, I wanna provide this API for the cluster and then this module or this container is gonna take care of that API. And this is what Argo did. The custom resource is called workflow. So if you want our workflow, you just create the custom resource and then Argo will handle it for you. So the workflow that Argo provides can be a step-by-step, parallel steps or dependent steps, or it could be a DAG, which are tests. And you'll see that in a little bit. You can pass parameters between containers. You can define artifacts. You can do loops. You can do conditionals. You have to write your workflow in YAML. I don't know how many people are comfortable with YAML. Oh, okay. I say I'm comfortable. Every time I look at YAML, I'm like, oh, what is this? I can't really tell what's under what. And then there is a UI portal. The UI portal is very basic. It's read-only and it's just for you to see the workflow. And you'll see that in a little bit. All right, so move on on to Kubeflow Pipelines, which Pete talked a little bit about. So what Kubeflow did is they took Argo and they built stuff around it. The first thing they did is provide an SDK where you can write your workflow in Python. Data scientists are more comfortable with Python. I don't think they would run away from Python. They probably hate writing YAML. And then what they did also is they kind of gave the data scientists more tools to do experiments with parameters, so it's repeatable. So you know when you ran your workflow, you know what parameters you used, when you did it, and you can repeat it or change the parameters and run it in a different way. Like I said, they have Python SDK and you'll see right now to be able to compile your Python code, which in turn translates to a YAML workflow that's run by Argo, which we'll see in a little bit. All right, so now we're gonna look at some code. The first one is a Hello World. And I'm gonna show you a Hello World in Argo and a Hello World in the Kubeflow pipeline using Python. I find it really hard these days to find a Hello World. How many people here find it hard when looking at open source projects? So here we go. This is the Hello World for YAML. It's pretty straightforward. You can see here, I don't know if you see this. Yep, you can see here that this is your resource, the custom resource code workflow, and you're basically just entering the parameters of the custom resource. This is just running, basically, not even a step. It's just running a container called Whalesay. All it does is just print out a picture of a whale that says something, and here it's saying Hello World. And then if we go to Kubeflow, this would look like this in Python. Sure, this looks more well-organized. We have functions, we have variables. So this is the same thing. It's running only one container, which is called Whalesay, and it says Hello World. And then when you run this through Kubeflow, it's gonna translate to this big YAML file. And you can see here that Kubeflow just added parameters, and that's the way you run different experiments and runs. And then it added at the bottom here some artifacts, and they are the metadata for your more information about the workflow that you run. But it's basically the same thing. It's a container with some argument and some image. All right. Next, how do we run those? In Argo, all you need to do is install Argo on your Kubernetes or OpenShift. And then to submit or to run a workflow, you can either do OC or KubeCTL, create minus F in your YAML file, or Argo, they also have the little executable you can install on your computer, which will give you a little bit more visual and you'll see that in a little bit. And then you run your workflow. And after you run your workflow, you can go to the UI and then just take a look at what your workflow did. We'll see that in a little bit. In Kubeflow, things change a little bit. For running Python, you have to set up a virtual Python environment, install the Kubeflow pipeline SDK, and compile your Python code into this tar.gc file. Then you go to the UI, you upload your tar.gc, which in turn translates to a YAML file that is run by Argo. So this is kind of like just giving you an idea of the difference between the two. I say, if you're just starting out, you're comfortable, yeah, we'll go with Argo. It's only two pods that run on a Kubernetes or OpenShift. If you already have Kubeflow or you really want the end-to-end and you're willing to give it time to invest more time in it, go ahead and do Python. All right, so we're gonna run these hello world in the demo in a little bit, but I just wanted to put context for a kind of more complicated workflow, which is based on our use case of raw detection use case. So we built this use case based on data that we downloaded from Kaggle. It was around 2,000 credit card transaction rows of some hidden features in the data time and amount, including the data. The full blown demo is on YouTube if you're interested to see it. And we got this data and what we did is we created a notebook and we analyzed data in the notebook. We created a model called the random far square classifier model. We used Spark to kind of grab the data from Ceph and then we served it using seldom. After we served it, we also grabbed metrics from the model and we used Prometheus to grab the metrics and show it on Grafana. All this is nice and handy in everything, but it's not really repeatable. We have the Jupyter notebook. What if the data changes or we wanna change the model? So we're moving into creating this workflow and this is the workflow that you'll see next. It's kind of a shell of a workflow that we wanna do. We're not there yet. Maybe next year in the next dev comp we'll have it completed, but this is what it looks like. So we're reading data from different sources and that's just showing you how to do parallel steps or parallel tasks in a workflow. And then after this is done, we have transforming the data task, which is another task that depends on those two to be done. It won't run until the two are run already. And then in transform data, this is gonna decide after we've done the data transformation, it's gonna decide based on a condition whether it's gonna do hyperparameter tuning or not. And you'll see that in the demo in a little bit. After both these are done, then we train the model. After we train the model, we wanna validate the model and we're giving here an example of how you run the same container with different parameters. We're gonna validate the model with parameter A, we're gonna validate it with parameter B and after we're done, we publish the model. But that's not the end of our workflow, right? Actually, that's where all the work begins. We don't have that either for next year. But after you publish the model, you don't just sit there and you just look at it. You really have to keep watching it, you have to keep watching the data coming in, you have to monitoring, et cetera. But for this demo, we're just stopping there. Any questions? Good, all right. All right. So for the demo, we're gonna run the Hello World in both Argo and Kubeflow. Then we're gonna look at the Fraud Detection Workflow Yammer file and we're gonna run it in Argo and we're gonna upload it to Kubeflow and run it again. All right, so let's hop off. Of course, I got logged out. I'm gonna just, so I'm just trying to get the password. All right, so this is our cluster and we are, so we are in, let's first hop onto Argo project. So when you install Argo, it has Kupads running, the Argo UI takes care of the UI, which you'll see in a little bit. And the workflow controller, that takes care of every time you create a resource, it handles it. And let's take a look really quick here at Kubeflow. I'm just gonna see if I should make it bigger. The back row see this? Yes, no, no one's looked, no. Actually, that doesn't work. So basically what you see here is a lot of pods. You'll also see the same ones. This is Argo UI and then at the end we have the workflow, but there's a lot of pods running, so if you wanna install Kubeflow, it's a lot of dedication to make sure they're running and everything. But it's an end to end, so it has a lot of tools for you to use. So that's the other workflow control. All right, so let's hop off and look at the Hello World YAML. Really small, I can't make it bigger. All right, so basically it's the same thing I showed you, it's just one container that runs. And then all that container says is Hello World. So we're gonna hop off and we're gonna run that container. We're on that workflow. So this one, I can make bigger, nicer, and we can look at it here. Is that better? A little bit? Still looking at code. So this is how it looks, and we're gonna run it. All right, kicked me out here too. Sorry, sorry again. Got kicked out of the cluster. So we are in the cluster project and we're gonna run the workflow. So we're gonna go back to the UI, to the Argyo UI, and you're gonna see a Hello World. And then you can click on it. It will tell you one pod was running and some parameters about the pod. And if you look at the logs, it will show you what logs it spit out, which is basically Hello World with a whale. So we're gonna do another thing is we are gonna run this in Kubeflow. And the way we do it in Kubeflow is a bit different. So you'll see here, this is my Python environment that's running. And I am going to compile my Hello World volume WI. So this is the command we run. And it's gonna produce this file, hello world volume tar.gz. I'm gonna hop back to Kubeflow this time. And this is how the Kubeflow UI looks like. The way we upload pipelines is this, we click upload, we choose the file, choose this one, we click on it. And if we look at the source, it looks different, right? We have all these artifacts that are attached to it, the parameters. And to run it, you create a run. You can say Hello World run. And you can add more things. This is just simple, we didn't really add anything. So we hit start. And we can watch it. You can see that it's running one container. So if you look at the logs, it did run it. And there's an error. And that error is not a surprise error. We already know that it's an error. There's an issue with running. This is the issue. If anybody's interested to know what it is, but it's basically what it is, it's OpenShift 4.1 that we're using does not use Docker and not. And the Kubeflow pipeline is still relying on Docker for moving parameters back and forth. But it's an open issue that we're still working on. All right, so let's move on here to the last workflow we have, which is a little bit more complicated. And I am gonna open it from the tunnel. So it looks a little better. So this is the fraud detection workflow that we talked about earlier. Is it a little better? Back row, yeah? Okay, good. All right, so again, all these containers that we have are just shell containers. Right now we're just building the workflow. So we have a container called echo. This is an example for you to know to see how we can pass keys and access to an S3 storage to the container, right? So we have a secret here inside the namespace. That secret stores these two credentials, which is access key and key ID. And we pick them up, including the S3 endpoint for the bucket where we are, our object store, and we just pass them to the container. Another template that we're using is Whalesay, which I showed in the Hello World, and all it does is just draw a whale that says something. The next one is an interesting one. It's a container. What it does is mounts a volume and writes in a file in that volume, whether to run the next step or not. And here you'll see echo false, meaning don't run the next step. And that's the conditional loop example that we're gonna go through. All right, and then let's look at the DAG. So the first two tasks that we showed, they run in parallel and they're just grabbing data. So they use echo because echo has the credentials for these S3 buckets. And then we transform the data. And once we're transforming the data, what we do is we decide whether we wanna do hyper tuning or not because hyper tuning task is relying on this. If this is false, it won't run. If it's true, it will run. And we'll show you this in a little bit. I'll toggle between them and you'll see. And then the last one, just the training model and then validate model. And then the train model, that's another example of a for loop. It's one container that's running twice, each time with a different parameter that's passed in. Hello world, goodbye world. Very intuitive. All right, and then at the end, we publish the model. So let's run this in Argo. And let's hop back at the UI because it looks more impressive there. So transform data right now was false, meaning hyper parameter tuning is not gonna run. So reads AWS and stuff ran in parallel. Once they're done, transform data was dependent on them. And then it ran when they're both done. Hyper parameter is grayed out because it says here it's false. And then next, we train the model. Validate the model which runs twice with different variables, hello world, goodbye world. And then we publish the model. So this is one iteration. We're gonna do it again with false just to see it not running just for fun. Any questions while this is running? Nope. Right now it's just a shell, but the model that we have was linear regression. So you can write your pipeline so you can do a pipeline. No, there's more. So you can run experiments, attach parameters to experiments. So the experiment is multiple runs with multiple parameters. And it keeps the history of what you ran and which parameters you use so you can keep track of it. Kubeflow does. There's, I think there's more complexity in Kubeflow pipelines, right? In terms of, because you can put together as a zip file, there's probably different types of components that could be, or artifacts within the zip file, right? No, they all have to translate to Argo. So if Argo doesn't support it, you can't do it. So, so whatever artifacts Argo supports populates up to Kubeflow. All right, so what we're gonna do right now, so it runs, you saw this because we told it to run. So what we do right now is just again for fun. We're gonna go to Kubeflow and we're gonna run the same pipeline. So we do pipelines, upload a pipeline, choose a file and this time we're gonna choose the fraud detection one. And as you can see here, I can do YAML with Kubeflow too, but I won't get the parameter and the metrics that's associated with it. The Kubeflow does. I'm going here. So right now it's just showing you the name of the containers. It's not showing you the name of the steps. And I say create run. Let's just do fraud run. So here you see the run parameters. This is where you can actually specify parameters and change them depending on what you wanna test and what you wanna do in your run. Let's start. Same thing you can have. You can see some, this is basically the same information that you see on Argo logs and put out of Artific. This is just printing. Any questions while this is running? Are we good? This is the last thing we're gonna demo today. We just have one more slide. So can anyone notice the difference between in Kubeflow between running this workflow and the Hello World? There is no errors. That's because we used YAML and none of the extra parameters that are added by Kubeflow are there. So that's why it ran without errors. All right, so let's move on to the last slide that we have. All right, so what are the future challenges or things to think about with regards to workflow? I think automation from perspective of moving from a notebook to a workflow. I think there's some ideas out there. I just don't think it's well thought of or well developed, but a lot of data scientists start with a notebook and how do we translate that or transform that into a production workflow? Triggers, there are some triggers. I just don't think there is enough of them. There's a couple of them there, but it needs to be more elaborate to run workflows. Things like triggers could be based on incoming data or model performance. Monitoring, you have your workflow running. I think better tools for monitoring how the workflow is running, how it's performing, what to do based on some events within the workflow. And the last one, I think it's exciting and I think it's coming to multi-cluster. We do have today multi-cluster storage, but compute running a workflow across multiple clusters and deciding where you wanna compute certain steps in the workflow, it's interesting. Yep. We've talked about a bunch of projects and they have varying degrees of maturity in terms of these capabilities. For example, in Kubeflow, there's the faring sub-project which is designed to basically, from a notebook, do the training and then push that out as an artifact into a pipeline as an image. So there is work done there. We talked about Tecton. There's a fair amount of triggers there, but all the projects are not entirely equal in that sense. And again, Tecton does have support for targeting multiple clusters so you can provide credentials for basically going from a task and logging into a cluster and basically sending that output to the cluster. So it's a very interesting space. I think we've set our cap on Argo basically for open data hub. Yeah, for open data hub. Kubeflow Pipelines does have a lot of momentum so it's a space that we're kinda watching very closely and trying to, it's difficult to pick a winner, but where the winners will emerge is the degree that they're well integrated with other components, I would say. Yeah, on performance. And you saw me when I was running the Kubeflow was kinda slow and I had nothing, they were just shell containers. Yeah. Maybe it's UI issues, but yeah. Is there other questions? Is there non-COE questions? No, go ahead, Eric. Non-red hat questions. The question is possibly providing abstraction for Kubeflow Pipelines, I think, on another platform like Tecton. Yeah, and that is sort of being discussed in the Kubeflow community. Unfortunately, right now there are some Argo abstractions that leak through and so there's work to be done to basically encapsulate that stuff. Also in Kubeflow they are looking at, currently in Kubeflow we have an end-end sort of testing pipeline. Currently that's totally reliant on Argo. There's an open issue if anybody wants to get involved in Kubeflow for basically also putting in Tecton as a possible replacement for Argo for that. Yeah. MLflow, the question is how does experiment tracking in Kubeflow Pipelines compare to MLflow, which has been sort of developed by Databricks? Did someone raise their hand with an answer? Yeah, yeah, Zach over there. Zach. You can answer Zach. Zach, way back there. He's our MLflow guy. He's our MLflow guy. MLflow is right out of the gate, very sophisticated in terms of UI and the various capabilities from a user experience. At this point I'll hand off to Zach there. Maybe he needs this. Can you stand up, Zach? Can you use your outer voice? Yeah. That's a good point. When is your talk? Is that tomorrow? Oh, right after this. Perfect. But I feel like I think part of this is it's a whole workflow that you are setting parameters versus maybe one model or one within MLflow. This is an end-to-end workflow from the beginning of getting all the data all the way to publishing, which is not covered by MLflow. So I would think of it as an end-to-end. Also kind of go full. Currently in Kubeflow there's no sort of inherent integration between Catib and Kubeflow Pipelines. There's a lot of discussion about how that might look like. Catib currently uses model DB for experiment parameter tracking and is part of the hyperparameter tuning. So it's an area where they're trying to figure out, okay, how these two things sort of fit together. Currently they're barely separate. So down the road hopefully there'll be better integration for that, yep. Okay, anything else? Good. All right, thank you everybody. Thank you.