 Hello, and welcome to this lightning talk about a component registry for Q4 pipelines. My name is Christian Kadner, and I work for the IBM Center for open source data and technologies. In the session, I will briefly talk about the stages of the AR lifecycle, introduce Q4 pipelines, and some of the reusability challenges with pipeline components. I'll introduce the new component registry and show the machine learning exchange. And if there's some time, also show what's in studio pipelines. Now, for the stages of the AR lifecycle, we're really talking about data sets and models. In short, we're using data to build models. And those models are used to automate decisions. Each of these individual stages has multiple steps. And these individual steps are often performed by teams of data scientists, data engineers, MLOps engineers. And so that is part of the challenge, with all of these steps. The process often remains bifurcated amongst various teams. There's lots of duplication and redundancy. Similar data sets, similar models are being created. And there are numerous challenges for traceability, governance, and risk management. So what's needed, really, beyond the pipeline engine, is a central catalog where data scientists can share their AI and ML assets across organizational boundaries. And ideally, those ready-made data sets and models, they come with quality checks, proper licensing, managed tracking, all to increase the speed and the efficiency of the AR lifecycle. Now, I mentioned Q4 pipelines. What are Q4 pipelines? The purpose of Q4 pipelines is to enable and simplify the end-to-end orchestration of machine learning workflows. Q4 pipelines makes it easy to quickly try out ideas and techniques to manage trial runs and experiments. And the individual pipelines and components are fairly easy to reuse. The platform itself has a workflow engine that is either ARGOR or Tecton. The workflows are executed on the platform. Those are executed on the Kubernetes cluster. On the user end, there is a UI, a VAP UI, where users can manage their experiments, their jobs, their runs. And they can also connect directly via the Python SDK. And that enables integration with Jupyter Notebooks. The individual steps of a pipeline are called components. And components are self-contained sets of code. They perform one step in the pipeline, such as data processing and transformation or model training. And they're kind of like a function with name parameters, return values, and a function body. Components are containerized, and they run in Kubernetes parts. The component spec itself is a YAML file, and that can be offered directly or it can be compiled from a Python DSL. And a component has metadata, an interface, the input output parameters, and implementation, typically Python or shell scripts. Q4 Pipelines has been around for quite a while. And there are numerous components that have been contributed over time by various platforms, by various vendors, and for all the individual steps of a machine learning workflow. There are components for training models, for model evaluation, components for uploading downloading files to S3 storage, or MLops type of tasks like sending Slack messages or SNS notifications. With that rich ecosystem, there are a few problems. They start with authoring and publishing components in the first place. The multiple ways to publish or to create components can be authored via Python. The YAML can be specified directly. And in V2 of Q4 Pipelines, there's also the intermediate representation. Between those multiple ways, there's no feature priority and no consistent way to publish and document those components. Hosting is currently GitHub. All the components, they're part of the GitHub repository for Q4 Pipelines. And there's no good indexing or searching. Components are not well-versed. And there's not many ways to categorize components. Maintenance is another issue. Oftentimes, components, once they contribute it, they're not well-maintained for a long time. Now, this new component registry, as proposed by the Q4 Pipelines team, aims to address some of those problems. There are really two parts to it. There's the API that is being implemented on the client side by the Q4 Pipelines SDK. And then there's the server side that can be implemented by third-party registries. You can follow up on the design doc if you go to Q4 Pipelines issue 7382 and find out more information about it. But in short, benefits will be that the YAML format will be unified between pipelines and components. Components or pipelines can be versioned and tagged. And there will be direct integration in the Q4 Pipelines SDK so users can seamlessly download components from a third-party registry just by connecting to a host, searching for components, finding the versions, and then downloading components. This slide is with some terminology, templates, versions, packages, tags, hosts. Most of them are straightforward. Here are a few samples of what the REST API could look like to interact with a registry or what the Q4 Pipelines SDK methods, the newly introduced methods could look like. So the REST API shows how to upload a component in the first snippet. The second snippet is to download the Hello World component with version V1. And at the bottom left, you can see how it could look from the Q4 Pipelines SDK to load a component from a registry or to directly create a run from a component. Now, what might a component registry look like? For example, the machine learning exchange. We're working on implementing that new Q4 Pipelines registry protocol. The machine learning exchange is a project between IBM and the Linux Foundation for AI and Data. And here you can see the UI. We have various types of assets. We have pipelines. You can search pipelines and see details. You can see the YAML specification. This is still a V1 YAML for Q4 Pipelines on Tecton. And you can even launch the pipeline directly from the MLX UI. After you specify the parameters here, many of them have defaults, click Submit. And you will see the Q4 Pipelines run graph with all the individual pipeline steps. And for each step, you can look at the input output parameters, metadata, logs, everything that's available on the Q4 Pipelines UI. And we have a very similar experience for components, for models, datasets, and notebooks. And even notebooks can be run as part of pipelines. Oh, this is a list of the assets we currently have in the MLX catalog. You can find MLX on ml-exchange.org. And we're constantly working on adding new models and datasets. But this is what we currently have. You should also check out Watson Studio Pipelines currently in open beta. As you can see here, it comes with a canvas, and you can drag pipeline components from the component to catalog, and then run your experiments. And it integrates notebooks, can run your audio experiments, and it makes scheduling jobs very easy. Before I leave off, this is a list of links and references. Q4 Pipelines issue 7382 has links to the design spec for the template registry API and SDK Machine Learning Exchange. You can find us at ml-exchange.org or on our GitHub repo and on Slack. Oh, before I leave off, stop by the Code Café or the IBM booth. We have a book signing by Emily Yan, and you can take the open little G space rover for spin. With that, thank you, and see you next time.