 So, hi everyone, and thank you for attending the session, and I think for those people who attend virtually, thanks for watching my session. My name is Yi Hong. I'm from IBM. I've been working on IBM for about, I think it should be 17 years. Originally, I worked on the IBM Taiwan, then I moved to IBM US, and currently my team is mainly working on machine learning related open source project. And I've been participating in several open source community since 2017, starting from the Node.js and TensorFlow.js and then Genus Graph. And within these two years, I mainly focus on the Kubeflow community, and today I'm going to share the new intermediate representation in the Kubeflow v2, and also the journey about how to make the new feature available in the Kubeflow pipeline with Tecta. Okay, so without further ado, yeah. I think let me begin with a very quick introduction to the ML and Kubeflow pipeline. But I think I have just one quick question. Anyone of you heard about the Kubeflow before? Can you raise your hand? Okay, so not many people have heard about, so I can, yeah, I can go a little bit slowly on the introduction part, okay. So the thing I want to show here is that this is IBM Global AI Adoption Survey actually for this year. It's done by 7500 companies worldwide growth. And actually 35 of the companies actually report using AI in their business already. And additional 42% of them say they are exploring the AI. And 30% of these global IT professionals says that employing their corporations already saving time with the new AI and automation software and tools. But you can see why the adoption rate is still about one-third, right? So I think the major barrier based on this survey is that the first is that not many, they don't have enough AI skill, expert and knowledge in their companies, that's the first thing. And the second is the cost of AI adoption is comparatively expensive. Imagine that you want to have a model up and running. Actually you need the whole ML systems, for example, including the data store configuration deployment space serving in for monitor and feedback mechanism. Those are for just one ML ecosystem. And the third is that they also say they don't have enough tool or platform for the AI solutions in their companies. Last but not least is actually the complexity of the project and data. And also the ML ecosystem is actually fairly complicated. And when we talk about the AI life cycle, we actually categorize the AI procedures into these three pillars. The first is data. After you define the business use case and exit criteria, you will start to deal with the data, including data extraction, analysis, and preparations. And after that, you move to the model. The data scientists that will implement different algorithms with the prepared data, you just get from the previous pillar. And then he tried to train various model and evaluate the quality of the model and validate the model's performance. Then he moved on to the last pillar, model serving and model monitoring. So after you model on the production and running for a while, you will get actually another set of new data. And you will iterate through this cycle again and again. So here comes the ML pipeline concept. So ML pipeline concept become very popular when people start to adopt the AI into their business. As you can see here, delivering an ML model comprises many, many processes and operations. So therefore, defining those tasks in a pipeline and automating the execution actually can speed up the business operation and expedite the model delivery. Others also actually avoid error from the human operations. So it's good to actually automate the whole ML lifecycle as the ML pipeline. So when we talk about the ML pipeline, it actually can be a portion of your ML system and it can also be the overall end-to-end scenario. For example, as for the ML app engineers, they actually can define a superset of ML pipeline to integrate, for example, the data preparation from the data engineer. And then he can also integrate the model training from the data scientist. And he can also integrate the model serving and the monitoring from the operator engineer and the software developer and become the superset of the end-to-end ML pipeline. So here comes to, you have the ML pipeline. Definitely you need a platform to run it, enable to fulfill each of the ML tasks. So KUFO pipeline is actually a cognitive acquisition engine for building and deploying those portable and scalable ML workloads based on darker containers. So you can see here in the KUFO pipeline, all of the ML tasks are shareable components using containerized implementations. You also provide a user interface on the right-hand side here. And allow you to manage and tracking the experience, jobs, and runs. And also we also have engines to scheduling multi-step ML workflow. You also provide Python SDK and its own DSL for defining and constructing pipelines and components. Later on we will dive into the pipeline. And I hope that I can have time to demo how you run the ML pipeline on the KUFO pipeline. Fingers crossed, and I hope the network should be smooth. So here is actually an example showing how you use the Python SDK provided by the KUFO pipeline to compose a pipeline. You can see here you just need to simply add this DSL, the pipeline decorator, in your Python functions. This decorator is imported from the KUFO pipeline SDK. Then you can, by using this decorator, you can convert this function into an ML pipeline. And inside this function, you just lay out the ML operations you need to do. For example, there are several operations here. And then you also use the input and output to connect the dots between each operations. And on the bottom, you also see, you also provide SDK the compile API. Then you can compile that pipeline into a pipeline spec. And then you can use either through the API to submit a pipeline to the KUFO pipeline server. Or you can also use the UI, which I showed earlier, to run the pipeline layer. And behind the scene, actually, KUFO pipeline use ARGO workflow as the execution runtime. ARGO workflow is a very, I'll say it's a very popular CI CD open source container-native workflow engine. And then ARGO implements its own Kubernetes CRDs called workflow and uses a direct cyclist graph, aka the DAG, to construct a multi-step workflow and the dependency. And you can see each step inside the workflow is actually an individual container definitions. And you also have workflow controller, which will monitor the workflow CR, the overall workflow CR, and also sprung corresponding power for each step. And based on the DAG inside a workflow, it will also update the status for each step. So besides ARGO, actually from IBM, we also integrate Tecton, the other popular container-native orchestration engine into KUFO pipeline. And it's very similar to ARGO's workflow CRD. Tecton also implement a set of CRD to represent the pipeline. So there are several CRD we introduce here, including the pipeline task. Pipeline task, pipeline run, and task run. A pipeline is actually contain the tasks to run a whole pipeline. And then a task is actually define a set of steps that you want to perform within a part. And for the pipeline run and task run, they represent running status of the pipeline and task. And it will contain the status of the execution part, inputs, output, and final state. So here is a high-level architecture diagram of the KUFO pipeline using Tecton. Users actually start from using the KV SDK, like I showed earlier, the example. Compile his ML pipeline, and in this case, because it's using Tecton. So the artifact that is generated from the SDK is a Tecton pipeline demo. And then you can either go to use the API or the UI to submit the pipeline to the KUFO pipeline API server. And from there, the KUFO API server will dedicate the pipeline executions to the Tecton. And meanwhile, the pipelines and the metadata are stored into the original database, as well as the object store. And then you can run the whole pipeline on either Kubernetes. You can also deploy it on the OpenShift. And besides this, when you run the pipeline, on your pipeline, you can also leverage all source of different service as you need it in the pipeline. So I think earlier I mentioned about the metadata and artifact. So since the very, very beginning KUFO pipeline already provides the metadata and artifact tracking, which are actually the fundamental requirement for enabling ML apps. With metadata and artifact tracking, every input and output of each steps in your pipeline is stored in the data store or object store. The metadata will be stored in the original database, and artifact we are using the object store. So based on this information, you actually can, using the metadata and artifact tracking to know which version of the dataset is used to train your model, for example. And you can also use this kind of information to compare model training in different rounds. Yeah, like we show here, it's a little bit small, but I think I can show the live demo later. And you can also carry over the state from the previous models. And on top of this, you can also use the MLMD to do the caches and speed up the ML pipeline execution. So here, just a screenshot about how you use the artifact tracking is just a snapshot. So when you click on that artifact, you can actually see all the information related to that artifact, including the pipeline run and where the location is stored. And the other thing is that there is also a linkage explorer that you can check the versions and the histories of the model you trained and also other artifacts. So like the example here, you see you can easily find the pipeline name here, and you can also find which dataset is used to train these models. So you can use this linkage explorer to find out all the information related to that artifact inside the pipeline. So here just, I hope, yeah, it's a live, it's a gif that's showing that how you use, how you open up a pipeline run and you click on the MLMD data tab, then you can see all the output and input for that task. And from there, you can find the artifact. When you click on the artifact, you can also click on the linkage explorer. So you can view how that artifact is linked in your pipeline. So we talked about the metadata and artifact tracking. So I tried to add the metadata, how that we implement collector and those metadata and artifact inside the KUFO pipeline. And here you can see we actually use a synchronous agent called metadata writer. So he will collect the information from those task parts and you will collect the input, output, those metadata information and store into the MLMD store here. It's also a relational database. And for the artifact, we actually inject an extra step into the container. So we call it copy artifact as the last step in every task. Then you will collect those artifacts and upload to the object store. But on the other end, TensorFlow Extended, they also leverage the same, this part is the MLMD store. They leverage the same MLMD store, so we share the same library. But the information is stored and the way you use is actually different. And you can see the MLMD client actually here integrate into every step in the pipeline's task or components. So you actually collect the MLMD synchronously, so different from the way Open Source did. So this is where it deviated from the KUFO pipeline V1. This also implied that we actually need a better design for the metadata writer. So because of some restrictions, which I will cover in just a bit, and some hard dependency to the acquisition engine in the V1, the KUFO community start to work on a better metadata-driven design for the V2. So here is actually the two main goals for the KUFO pipeline V2. The first is to design a new intermediate representation that I mentioned about in the very beginning of this meeting. And then by using that, we can compile the pipeline into this intermediate representation and use a metadata-driven approach to run the ML pipeline. And second, we try to decouple the dependency from the back-end runtime. KUFO pipeline can get more control over the pipeline execution rather than mostly rely on the specific feature of the back-end engine. So let me go through the V1 and then I will also cover the V2. Then we can have a good comparison about what we have in the V1 and what we will have in the V2. So in the V1, you can see I mentioned about the MLMD metadata writer. So this is the design for the V1. There are two components for the MLMD. The first is the MLMD service. It actually provides the CIUD operations to access the information that's stored in the MLMD data store. And the second is the Manhattan R-Writer. I just showed you one of our diagrams earlier. It's actually a silo process. It keeps monitoring the change in the pipeline run on the Kubernetes level. Then it creates the pipeline context and updates the artifacts information of the pipeline run and store into the MLMD service. Because it's a synchronous nature, so when the pipeline task starts, there's no guarantee when the task's metadata information will be stored into here. So you are not able to rely on the metadata on a task to get his dependency upstream information. So in the V1, we only store preliminary data into MLMD. In this case, we only store the artifact metadata information. So when we try to display a pipeline run status, we actually need to, in the pipeline UI, we need to aggregate the pipeline run status from various data source, including the pipeline data base and also the MLMD service. And besides, the data format in the data store are different. I mentioned earlier for the ARGO, if you are using the ARGO engine, it's the workflow. If you are using the Tecton, it's the pipeline. And here is the new MLMD design in the V2. So you can see we still need the MLMD service because this data store is the source of truth. We store all the MLMD information here. But you can notice that MLMD is gone. That asynchronous service is gone. Right now, it directly integrated into the pipeline executions. And another key change is that we not only store, like I mentioned, the artifact metadata information, we actually store all the task-related information into the MLMD data store. And the way we store it is kind of using a tree structure. So you can easily get those information from MLMD and you can reconstruct the ML pipeline by using those information. Because those changes actually all leverage help us to actually be able to, for a pipeline, when you run a pipeline, you are able to leverage the MLMD to get, like I mentioned, the previous task that you have been run. Because when you run the task and it depends on some information from your previous task, it will already be stored into the MLMD because once your task is done, the information is immediately stored into the MLMD. So here is another key change in a V2. It's actually the pipeline spec. So in a pipeline spec, like I mentioned earlier, in a V1, if you are using different baguettes, you need to use different pipeline spec. The upper side, if you can see, is actually for the algo. You can see here is the workflow. And the bottom one, this one, is for the take-down. So you can see in a V1, different baguettes actually will impact how you generate the pipeline spec. It's not reasonable why an ML developer needs to care about what kind of custom resource you use on a baguette. So here comes the new V2 design. So the whole community tries to come up with new intermediate representations to represent the pipeline spec. So on the compiler side, SDK compiler, right now it won't generate the baguette-specific pipeline spec. Right now it's compiled this intermediate representation. We call it IR. The data structure in this new pipeline spec, and it's also very easy to understand. It's a little small, but I think you can still see it contains three major sections. The first is the component section. It defines all the components that are used in the ML pipeline. And second, it has the deployment spec section. This section is actually container-level detail, telling you how to run that component. And last is actually a root section. That's sort of the DAG information to describe the task structures. And the other new change in the V2 is that since we are using the new IR, so they also change the UX to read the information from this new IR, and they also have some enhancement on the UI. So on the UI, they try to improve some usability. So for example, they add this break-on trail to help you to navigate some of the nasty DAG complex pipeline. So you can actually zoom in to a sub-DAG, and you can also use this break-on trail to actually zoom out to the parent DAG. And again, I think one of the V2's goals I mentioned earlier is to reduce the dependency on the backend acquisition engines. And because of the new IR and the SDK compiler, here is actually the new way that how the pipeline lifecycle is in the V2. So it still starts with the KIP, DSL, and SDK. But the output right now becomes the IR. And you submit the IR to the KUFO pipeline service. And in here, you will see earlier, we are using the approach we call the smart compiler, because compiler needs to know which backend you are using. But in here, we call it smart runtime, because we actually add a backend compiler in the backend, not the compiler SDK. So the pipeline service can leverage this backend compiler to interpret the IR and compile it to the backend-specific constant resource, and then kick off the pipeline execution. So as you can tell, because we have two different backend, so means that here we would provide two backend compiler, one is for Argo and one is for Tecton. And in here, I mentioned about the smart runtime. This is how actually we execute the pipeline. So in V2, pipeline is actually directly, we directly pass the whole ML pipeline to the underlying accusation engine, Argo or Tecton. But this has some drawback. It's actually restricted. The first is the pipeline spec, and second is the capability of the execution runtime. So for the smart runtime, we actually introduce driver, executor, and publisher mechanisms to run the new pipeline in V2. This mechanism is designed to dynamically control the pipeline execution, data passing, and better MLMD integrations. So you can see from the driver's side here on the diagram, it's actually talked to the MLMD, and it will retrieve the parent DAC information and create pipeline contacts, replace the placeholder in the input parameters, and perform the caching mechanism based on the information from MLMD. And the job for the executor and publisher is to download the artifact if needed and run the user task inside here, and then upload the artifact after it finishes the user's task, and publish the task metadata into the MLMD, including output, input, artifact, and status. So this kind of smart runtime is actually offload a lot of responsibility from the architecture engine that we are using and make those architecture engines mainly responsible for spawn the path, tear down the path, this kind of Kubernetes part lifecycle management. And here is the... So when we tried to create the smart runtime, we actually just found out that within a cool flow, we still have a lot of component actually tied to, directly tied to the backend. One extra information is that the whole cool flow pipeline, most of the code is written in Go. So in a Go that if you are trying to use, for example, if you are trying to access the custom resource for Tecton, you need to use the Tecton's Go package. If you want to use the access to Argos custom resource, you need to use the Argos Go packages. So the thing you can see here is that we tried to also create an abstract interface here. So in this case, the pipeline service won't directly have the dependency to, for example, the Argos Go package or the pipeline Go packages. And furthermore, because of this abstract interface, if you want to support another CI-CD engines, then you can actually just fulfill this abstract interface, then hook into the cool flow pipeline system. Then you can actually run the ML pipeline on your own abstraction engine. So let me go through a little bit detail about that abstraction layer. The change of the abstraction layer is already upstream to the cool flow pipeline repository. And there are two major benefits for this kind of abstraction layer. The first is that in a V1, even the front-end UI and also the back-end service actually, like I mentioned, are very tied to the back-end orchestration engines. So when you are trying to compile those components, at least here, you actually, those components you want to run against Argos, you need to compile a version that's specific for Argos. If you want to run this component with the tech-tongue back-end, you need to compile a specific version of that binary and run with the tech-tongue. But in a V2, because the front-end side, right now we are using the IR. So the logic to rendering the front-end, by finding wrong and status are pretty much the same. And for the back-end, because of this abstract layer, those components, at least here, right now they are not directly tied to the Go package for the specific back-end engine. Right now they are using the abstract interface now. So here are the detailed abstract interface we implement in a community. There are three parts. The first part is the compiler, definitely because, like I mentioned, for the smart runtime we need a compiler to compile the IR into back-end specific format. So we need to have the compiler API. For this API, the real implementation for the Argos, you will compile to the workflow custom resource for the tech-tongue you will compile into the pipeline run. And the other tool we call is the execution client and execution spec. This execution client you can see is just a client that used to talk to the Kube API server to retrieve and perform those CIUD operations against the custom resource. And for the execution spec, it's just a data structure actually to represent the custom resource that the back-end engine is using. And those are the details of the execution client. And if you are familiar with using the Kube API to get those custom resource, you will see those API should be familiar to you. And in our interface, we just create sort of one-to-one mapping to those CIUD and list and patch method for the Kubernetes API servers. Again, the purpose of the abstract interface is to avoid direct goal-land package dependency in the Kubeflow pipeline component where the custom resource are needed. So right now, you won't direct tie to the specific goal-land package. Instead, you use this unified interface. And the last analysis is the execution spec. When we do the refactoring on those components in the Kubeflow pipeline, those components use the custom resource. So we try to minimize the change to avoid some regressions. So we directly took those API, Kubeflow pipeline components used, and merged them into the execution spec. So those API you see here represent the information of the custom resource that I use and how components manipulate the value of the custom resource. So because of the IR and the smart runtime in the back end, so actually one potential benefit along the road is the ML democratizations for the ML pipeline. You actually can share the ML task as a reusable component and using the IR format. And right now, community also trying to publish a draft for defining a component registry, which is actually a set of API to help you to search, categorize, and discover the ML components. Using a draft phase, and our team is actively joining the discussion with the community to finalize the detail. So actually with the component register and the Kubeflow pipeline SDK, then can directly load the component from the register while you're composing the pipeline. So the ultimate goal is actually to integrate that kind of local, local tools with the component registers. So you can imagine that after you have this kind of integration, you can use that drag and drop approach to actually compose the ML pipeline. So I think that's almost everything I want to cover. So you can actually find out how metadata is so important and how we use the IR to compose the ML pipeline component and also the latest code here. And here is the URL to the Kubeflow pipeline and Kubeflow pipeline on Tectown. And let me show you a quick demo about the E2EM list that you can run on the pipeline. Sorry, I need to find my cursor. Okay, so actually I have two systems. This system is the, if I remember correctly, this is run on Tectown and this is run on Argo. So if I go to one of the pipeline I just uploaded earlier, you can see this is the new UI. So you can see those are the tasks that you define in the ML pipeline. And this is indicated as an artifact that it will generate in your pipeline. And here is actually because in order to run this pipeline, you can see you actually need six minutes to finish the pipeline. So actually I can see if I can run it now. So here is the parameters that you need in your pipeline. And I just started. So you can see different run will have the different entry. And like I mentioned, then you can compare the result of your different pipeline run. So let's click into here. Yeah. You can see you already finished. Yeah, but I say it needs six seconds. It needs six minutes. But why you finish so quickly is that you can see here if you notice, let me jump it back to this one. This one is the one that I run. I think it's this morning. Is using the check mark. Yeah. And but this one, if you can notice, is using a kind of download indicator. It means that because those input parameters are the same for this task. So the runtime engine can tell that you can directly get a cache version to represent the output. So you don't need to run the real task again. Yeah. So those is actually get the cache, the version output, and display here so he can finish the pipeline run very quickly. So there's power of the MLMD data store. So I want to show you that here is actually run on the Argo. But when I click into here, you can see there's no different. I upload the same pipeline, but the backend is actually two different runtime engine. Yeah. So just let you know that based on the work we have done, you are not able to see the backend difference. Yeah. So I think that's it for my session. Unfortunately, I don't have the time for question, but it's very welcome. If you have any questions, just come to me. Yeah, we can discuss offline. Thank you.