 Welcome, everyone, and thank you for being here. If you're looking to learn some basics of model serving and how to make both data scientists and software engineers happier with deploying models, you're in the right place. I'm Isabelle Zimmerman. I'm a data science intern with Red Hats for a deployed engineering group. And I am also an undergraduate student at Florida Polytechnic University. I am a strong believer that the value of data science does not end with training machine learning models. I hope that you can find at the end of our time together here today, you can walk away knowing what model serving is and why it's useful. The features that model serving offers to enhance the business value of models and how to deploy a model on Red Hats enterprise version of Kubernetes OpenShift using the open source project of the Stelden core. But before we launch into what model serving is, some context is helpful. Because businesses don't have a one size fits all method for how machine learning systems are used, going from raw data to application to end users can become quite difficult. A model that's built to detect spam versus not spam emails is going to be deployed differently than a model that's making suggestions for other products to put in your Amazon shopping cart. A data scientist can probably build a well architected model to give you insights into your data. But in order to gain business value, you probably need the help of a software engineer to deploy it as part of a larger intelligent application. One of the biggest issues with making this intelligent application is not a hidden secret. It's the fact that sometimes data scientists and engineers don't speak the same language. And it's not the fault of either party. They simply have different goals and they use different tools. Data scientists who build the model want to maximize model performance while a software engineer wants an application that builds repeatedly and behaves predictably. Data scientists might be using tools such as Python or R while engineers might be using tools such as Java or YAML. In general, data scientists want to build this model and we tentatively end our data science workflow at making sure that the model's giving us a reasonable output. They're often built in isolated environments such as Jupyter notebooks with no natural connection to other applications. And this isolation can create pain points. More importantly, what happens after the model validation is just the beginning of the model performing in the real world. For a long time, data scientists have been able to build models in isolated Jupyter notebooks. And we're moving beyond this, thankfully. We're moving beyond being able to just throw models to engineers and say, okay, have fun deploying this, figure it out. And then weeks or months later, something breaks. And the issue can often be difficult to diagnose because it's at an overlap of the necessary knowledge of a data scientist and the necessary knowledge of a software engineer. And it's difficult to find people with all of those different tools. Very simply, model serving is what happens after a model has been built. It's saving a trained model and hosting it outside of a Jupyter notebook and supplying a REST or a GRPC endpoint to interact with it. When you take models out of Jupyter notebooks and deploy them as their own service, you make it easy for data scientists to deploy new versions of the model while keeping your engineers happy by being able to have the same server architecture and use the same APIs. This new autonomy in your model allows you to more easily leverage all the benefits of a Kubernetes environment. You can automatically integrate load balancers to evenly spread incoming traffic. You can add auto scalers to scale pods as more resources are consumed. And you're just leveraging this overall portability and flexibility of containerization. So everyone starts with the goal of model serving. I just want an endpoint to send requests to my model. Realistically that solves the original problem but it leaves a lot of loose ends. How can you tell if your model's staying healthy? And what if your end result isn't from just one model? Can you tell when your customer's preferences have changed? From learning what model serving is, you realize that bringing a model out of a Jupyter notebook into its own service eases pain points in this data science engineer handoff. But hopefully with the questions I asked you, you realize your model is still missing out on a lot of business value that's just barely out of reach. As more people are interested in model serving, we now have a lot of really cool open source projects that come readily with different features that help your model work harder. And I'll repeat myself here. Businesses don't have a one size fits all method for how machine learning systems are used. Maybe your deployment to serve a model is needs to be online and has real time serving requests. Maybe your model does everything offline. There's a lot of different model services that exist, but not all model services are created equal. The first thing that you need to make sure of is that you're choosing the right type of model service. You need to make sure that it's handling your incoming data the way that it needs to be served. Once you have that under control, then you can implement all the other fun features. And the first way that you can gain more from your models is A-B testing. This includes deploying multiple models and you'll route some amount of traffic to your best model while also having other models be explored. So let's say you're in e-commerce and you'll have 90% of your customers see your current ad and 10% see a different one. This way you're able to see what type of advertisements drive more interaction. If you can see that your new model is getting more traction with customers, you can manage the amount of traffic there that will gain more insight. Other times you might've found that a structure with many computationally inexpensive models gives a higher accuracy than one single model that is both expensive and time consuming to run. The deployment of multiple models or an ensemble gives only one single output. Model services have the capability to automatically orchestrate and combine multiple outputs into one prediction and post-processing. This can come in many different flavors. You could see someone who is tracking data drift and the data that you received three days ago in blue looks really different than the data you see today in red. This probably means that your model is not performing as well as it used to, but machine learning works quietly and breaks quietly. So your model can be running with no errors even if you have a 0% accuracy. Your data drift might be going unnoticed. Model serving offers tools that automatically tracks and alerts when data drift is known. You could also need to uncover explainability, which is kind of like playing 20 questions with your model and understanding why it's making certain predictions. Here we can see that stop words and number of words in a spam email detector are the most important features for classification. Another good example of explainability is the story of feeding an image recognition model, photos of shoes and asking it to show the user a pair of sneakers. And of course, all the photos of sneakers were of people on grass. So the model was actually looking for grass, not the type of shoe. A picture of a turtle sunbathing in a yard or a pair of heels on a lawn were also going to be classified as sneakers. Model servers can integrate these explainability algorithms in order to have feedback on model decision making, which is a key for exposing unintended bias in models. Many of these features, such as ensembling, drift detection or explainability are not technically unique to model servers. In fact, there's an abundance of Python libraries that can do this locally. However, putting these algorithms into your machine learning workflow helps you to leverage all of those same capabilities of Kubernetes and of containerization. You have your native repeatability and flexibility. So you built a model and you know you need to deploy it after validation. You've shopped around and there are certain features that you want. You need a model service that can handle streamed data and you'd like to know a little bit more about why your model is making a certain prediction. Okay, so now what? So here I have a complete YAML file to deploy a model using Selden Core. You can see there are two pods in particular that we're looking at. In blue, we have our classifier and in red we have our explainer. So our classifier will obviously classify the models and our explainer, which is an anchor explainer, will give us a little bit more feedback as to why this classification was made. Both of these are underneath one Selden deployment which is gonna orchestrate the pods themselves. So this is kind of what my machine learning workflow looks like. We start with Selden Core and that's what's going to be hosting and deploying our models. You could see the YAML just a few seconds ago. I'm a data scientist and I really love Jupiter notebooks. So I'm able to use Jupiter Hub to interact with this model and making sure that it's making the predictions that I want and checking up on it and maybe making edits and redeploying different versions of the model. All of these different versions are tracked and different metrics are also pulled and sent to Prometheus, which is a time series database. All of this time series data is also streamed to Grafana, which is able to easily visualize the metrics and the model outputs in dashboards, which are really easy to digest for software engineers to check up on model health. And I make fun of data scientists. I don't wanna deploy all of these different operators. So I use something called the Open Data Hub, which is an AI as a service platform that automatically hosts many different open source projects. Excitingly, everything inside of my workflow under one operator. So I only have to deploy once. Let's take a little bit closer look into this in the demo. So we'll start in OpenShift with our Open Data Hub operator already installed. We can see here, we have our singular Jupiter notebook. We have our image explainer and our image classifier Selden pods and our Selden pod manager, as well as our Prometheus database and our deployment for Grafana dashboards. In order to see the full suite of tools offered by the Open Data Hub, we can take a quick wander over to the Open Data Hub dashboard. You can see here that it has everything I need for my own personal model serving pipeline. However, if you're looking to build more robust workflows, there's other options that you're able to integrate using the Open Data Hub. But going back, I'm a data scientist and I really enjoy working in Jupiter notebooks. Let's take a look inside. Within my Jupiter notebook, we're able to see a lot of the pieces of a standard data science workflow. Of course, the model isn't built here, but I'm still able to interact with it. So starting at the very beginning, I'm just importing my libraries. I'm gathering my data of Siamese cats. And right here is where I'm actually building a connection between my model and the Jupiter notebook. This is specifically with the Selden deployment. And here is with the Selden classifier pod. This is an important distinction. This is where I'm actually classifying the image with my predict function call. This is also the same gateway endpoint that a front-end developer might use to interact with the application. And I do have to mention here that building a gateway endpoint isn't normally in a data science workflow. So you can see there's still pieces that are a little bit clunky with model serving. After I've built my connections and have solid forms of communication, I have my predicted image. I can send my image away and get a prediction back. And yay, I have an image that with 84% certainty is a Siamese cat. However, there's also a 2% chance that this could be a paper towel. And I'd really hate for a customer to look for a Siamese cat and get paper towels instead. So we're going to do a local explainer in order to get some more information. Now, if you wanted to make this using the Selden explainer pod, you'd just build another gateway endpoint and interact with the pod in that sense. Here, we're able to use the explainer and I can see eyes, ears, and whiskers. So I'm a little bit more confident that my model might not accidentally misclassify the Siamese cat. So this is what a data scientist will look at, but a software engineer might be more interested in things like model health. For those who are more interested in that, we have Grafana deployments. Grafana is where we're able to visualize what's happening under the hood. Engineers might be a little bit more interested in the model user interaction. And so here they're able to monitor the deployed model even while the data scientist is working on it in the Jupyter notebook. So we can see we have different deployments, we have different model images and model versioning techniques to give yourself a little bit more robust idea of how your model is performing. We can look at things like prediction API requests per second, latency, and different model metrics and anything else that you can possibly imagine and build. Because machine learning doesn't break loudly and code can keep running with gradual and even unseen model degradation, Grafana is able to allow us to measure this model health. And we can see the request being sent to the model and how it's responding. If there was a distinct change in model health, whether it be API latency, global success rates, or any other custom metrics you choose to build, you could clearly see that the model's not performing as expected. So do we need model serving? To help data scientists and software engineers be able to speak a little bit of the same language, it helps. And to be able to pinpoint where things go wrong, it helps. I have a few other projects here that you can use for model serving. They're all open source. Most of these projects have a lot of the desirable features like real-time serving, AB testing, model explainability, ensembles, and you can explore even more to see what other features exist. We're just beginning to develop more sophisticated and more usable model servers. And I encourage you to check out or contribute to any of the open source projects that are listed here to help continue data scientists seemingly pass information to engineers. As a data scientist, I should be able to spend the bulk of my time doing data science without also giving engineers headaches. I use the open data hub to simplify my end-to-end machine learning workflow, and I use Seldon Core to serve my model. But the workflow is not perfect. I spent months playing in the world of YAML and building REST gateways, which typically lies outside of a data scientist's realm of expertise. While they're not perfect, there are incentives to model serving, and I hope I've been able to make some of those more clear today. Thank you for joining, and I look forward to chatting with you and answering your questions.