 Hi, welcome to our ML Ops presentation. My name is Joana Nakfor and I'm part of the AI services team at Red Hat. What is ML Ops or machine learning operations? ML Ops is automating the end-to-end AI ML workflow all the way to production. It's everywhere. There are a large number of smart applications using machine learning. This has introduced multiple complexities to the software production pipeline. Previously, and as we see on the left, the continuous integration and deployment was simple. Get the code, build it, deploy it, and repeat. With the introduction of machine learning workflows, as we can see on the right, there are new variables introduced, data and models. Both these variables add complexity to the automation process, especially in production. ML Ops serves to address and minimize this complexity. In this video, I'll give some examples of how. I'll cover ML Ops concepts such as AI ML pipelines, automating AI ML pipelines, and model canary rollouts. The platform we are using in this presentation is called Open Data Hub. It is an open source project that installs an end-to-end AI ML platform on OpenShift. It incorporates many different open source tools such as Kubeflow, Prometheus, Grafana, Spark, and Jupyter Hub. Users have the flexibility to pick and choose which component to install. Where can we find Open Data Hub? Open Data Hub is an operator that can be installed from the OpenShift catalog or Operator Hub. Offering Open Data Hub as an operator has a huge advantage since it provides a simple and easy to install method. And comes with all the operator level features such as auto upgrade and metrics gathering. Who creates or writes the AI ML pipelines? The answer to this question may vary in different AI ML teams. To make it easier, however, Open Data Hub includes tools such as Ellyra to auto-generate pipelines from notebooks. In many cases, data scientists use notebooks to develop machine learning workloads. Ellyra is a JupyterLab extension that converts notebooks into pipelines. We will demo this feature next. Here we will show how to use Ellyra to convert four notebooks to a Kubeflow pipeline and run the pipeline. The Kubeflow pipelines we are using run on Tecton. The four notebooks represent the four steps we see here in the pipeline. Downloading data, hyperparameter tuning, training a model, and selecting the best model. Here is JupyterLab with Ellyra enabled. We will drag and drop the notebooks we need as steps in the pipeline. We specify the base container we want to use by right clicking on the step and selecting the image. Some steps required Panda-based images, while others required PyTorch-based images, since our model is PyTorch-based. We connect the steps by dragging a connector line between steps. Once we have all the notebooks set up, we click on run the pipeline and select the Kubeflow endpoint and then select OK. This will create this pipeline and upload it to the Kubeflow pipeline endpoint to run. Switching to the Kubeflow pipeline dashboard, we can see experiments under the Ellyra Kubeflow PyTorch experiment. We can see a new pipeline run. Clicking on the run shows that the first step of downloading the data is running. We can go deeper by looking at the pod logs and the pod definition. Now we can see all the pipeline steps running in sequence. Hyperparameter tuning, model training, and model selection successfully completed. Since we used Kubeflow pipelines on Tecton, we can head over to the OpenShift pipeline dashboard and check out the pipeline run. We can see our run. Clicking on it will show us all the task runs our pipeline executed. We can also click on each task run and check out the logs. A more advanced MLops pipeline set could look something like this. The MLops process most certainly will include multiple pipelines. Each for a specific phase of the AIML and to end workflow. The first pipeline is for data preparation. At the end of this pipeline, we have a new dataset with a specific version. This triggers the next pipeline, which is all about model development. In this pipeline, you will see we added more complex steps such as distributed training, hyperparameter tuning, and model validation. Once this pipeline is done, we have a new model version which will trigger the model deployment pipeline. Monitoring is constantly running, looking for model and data drifting and any other cluster resource issue. In most MLops processes, pipelines are not run manually but are triggered by events. There are many different types of triggers from a simple cron job that can run a pipeline on a regular time basis or more sophisticated triggers that are based on data uploads, data drifts, and model drifts. In our next demo, we will show how to trigger a pipeline run based on new data uploaded to a bucket storage. We will use the Red Hat self storage for our data store. Self storage allows subscription to bucket notifications based on bucket events such as new data uploads. We also created a monitoring service that subscribes to these bucket notifications and triggers a Kubeflow pipeline run. The pipeline we are running in this demo is the same one we did in our previous Elyra demo. Download the data, hyperparameter tuning, train a model, and select the best model. We first check out the Kubeflow pipeline dashboard and we see that we have no runs. Let's take a quick look at the monitoring service code called Flask Server. Here you can see that we import the KFP SDK, connect to the Kubeflow pipeline endpoint. And once it receives an HTTP post message, which is the bucket notification message, it invokes a pipeline run with the name Elyra Kubeflow PyTorch. That's the same pipeline that we ran in our first demo. Next, we will use this notebook to create and subscribe to bucket notification messages for file uploads. We will import files, create S3 and SNS connections with our Red Hat self storage cluster installation. We create a bucket called Elyra. We set the attributes for the topic to point to the Flask server endpoint. Telling self to send the notifications to that endpoint. We create a topic called S4 events. We list the topics and make sure our S4 events topic was created. We define and attach a bucket notification that uses that topic, on every data delete and data upload. We get the bucket notifications to make sure it was created. Now we upload a file to the bucket and that should trigger a notification to be sent to the Flask server, which in turn would launch a pipeline run. Let's hop to the Kubeflow pipeline dashboard and check to see if the pipeline is running. We can see a new pipeline run for Elyra Kubeflow PyTorch. And we can see that the pipeline runs successfully. Our last demo will show how to serve a machine learning model and perform a canary deployment for a new version of this model. In this demo, we will use a fraud detection PyTorch neural network model. We will use KF serving or KServe to serve and canary deploy the new model. Once the model is deployed, users can send an HTTP post to the endpoint to get predictions. Let's take a look at the inference service custom resource that is used to serve models in KF serving. You can see that the name of the service is PyTorch fraud. And the model file is called FDDModel, that is being downloaded from an S3 bucket. We will go ahead and create this resource, which will create a pod that is running this inference service. We can also go ahead and check the inference service endpoint that shows that this is the first model version and 100% of the traffic is going to it. Now we will introduce a new model version. Take a look at the custom resource file. We see that the name of the inference service is still the same, PyTorch fraud. And we are specifying only 20% of the traffic for this new version. We are pulling a new version V2 of the model from storage. Let's apply this new resource. This will create a new pod that includes an inference endpoint with the new version. We can take a look at the pod logs. Let's check the inference endpoints. We can see that now we have two versions with a 2080 traffic split. To test this Canary rollout, we have a script that issues curl commands that sends HTTP post messages to the inference service for prediction. The loop runs 100 times and sleeps two seconds in between. Let's run the script. We see all the curl commands executing. And if we hop over to the pod logs, we can see the received HTTP messages. Flipping between the two pods, we can see that the older model is receiving a lot more HTTP requests. For every 12 messages sent to the older model, around three are sent to the newer model. Thanks for watching. We hope to see you in the OpenDataHub community at OpenDataHub.io and at OpenDataHub-io in GitHub.