 So hello everyone, my name is Audrey Resnick, I'm part of the Red Hat OpenShift Data Sciences team. As a data scientist, I've had the pleasure or maybe the torment of delivering AI ML models to production. So in a world of Jupyter notebooks, terminal servers, get lab runners, S2I containers and OpenShift, you don't know how glad I am to have discovered Open Data Hub. In this presentation, I'll give you a brief kind of background on how Open Data Hub started, what Open Data Hub is, and hopefully I'll be able to have enough time to conclude my presentation with a quick demo. It's not gonna be a live one, it'll be with slides, but the demo will show you how to go ahead and deliver on ML model, which dwells kind of into the world of fraud detection. Get a history on how Open Data Hub started. It started internally within Red Hat as a platform on OpenShift for data scientists to basically go ahead and store their data and run their data analysis workloads. Hence kind of the phrase data hub. And fairly early on, it was realized that data scientists and data engineers requirements for tools and really anything to do with AI ML components were pretty different from DevOps requirements. So data scientists, and I can test to this as a data scientist, are mostly UI driven. We really avoid using terminal commands and we expect the tools, any of the tools that we use to include our favorite AI ML libraries that we're accustomed to using. Now, collaboration and sharing is also a very important requirement for our workflows to successfully be able to be delivered to production. So the main points of kind of sharing machine learning workflows done in notebooks and moving a model to production and managing the mode while in production, monitoring it, making sure that your predictions are accurate, watching for any data drafts, resource usage, GPU memory and whatnot. Those are all very important to us. And these are things that were combined together as multiple tools and components to kind of obtain an end to end AI ML platform. Hence, we have this open data hub being not a single application, but really a platform with multiple tools running on OpenShift. So open data hub is really how Red Hat does artificial intelligence and machine learning internally on OpenShift. And we've learned quite a lot from running machine learning workflows on OpenShift. We kind of still face a lot of challenges and issues that we try to kind of resolve and provide solutions in the open data hub. Issues and challenges, there may be three or four. First of all is the people in the AI ML projects. There's always a team of data scientists, data engineer, DevOps, product owners, business developers that need to collaborate and work together. Secondly, there's sharing and collaborating around the AI ML development is difficult. Sometimes most of the time it can be manual and really can be error prone. Thirdly, another important challenge are just the computer resources themselves. AI ML workloads are compute heavy and CPU memory storage are not unlimited resources. I think we all know that. And definitely they're not unlimited resources in any development or production environments that we're working with. And fourthly, which is the final challenge and one that is very critical is delivering to production and the production development lifecycle. Sometimes that's not as easy as it sounds. So today open data hub internally runs AI ML workloads such as application logs. So in our internal open data hub clusters, we run anomaly detection on multiple red hat application logs. We have cluster metrics. We gather and analyze the cluster metrics or sorry, the cluster logs from OpenShift clusters and we have an AI ops team dedicated to finding or predicting any issues that may occur there. And finally, we have customer support data. So on our customer service side, we store and analyze any of the SO reports, customer feedback and many other different types of customer data. So we've kind of gone through the history. Let's go and take a look at really what is specifically open data hub. So open data hub first and foremost is an open source project driven by an open source community. It's a collection of tools and components that make up the end to end AI ML platform specifically on OpenShift. The AI ML workflow starts with prepping and basically transferring the data into a data lake or storage and making it accessible for data scientists. When we look at what the data scientists do, we're really looking at the next phase, which is model development. And what we're doing is we're looking at the data analysis of our data, picking certain features, going ahead and creating a model, going ahead and then training and then doing some model validation. The very last phase goes into the DevOps realm, oops, back up, the DevOps realm. And that's really moving and serving the model into production. This phase is not kind of a static one-stop model serving delivery phase, but it's a constant optimization phase. So the cycle of monitoring, optimizing and serving is a constant cycle that happens really for the lifetime of your model. And again, at the end of the day, it's that collaboration between your data engineers, your data scientists, your DevOps and any of your business developers that you have. So next what I wanted to basically show you is just a diagram and show you where you can actually find OpenDataHub. So first and foremost, OpenDataHub is an operator that's installed from the OpenShift operator hub. So you see that I have an OpenShift screen here and we can go ahead and then choose the OpenDataHub. And when you look at the OpenDataHub, you're able to see that there are various tools that you might be able to use. So if you're a data scientist, you'd be very interested in using Jupyter Hub. Maybe for some of the business analysts, you might be interested in using Grafana to take a look at some of your results from the model that you've deployed. Now OpenDataHub integrates open source projects as I mentioned, and AI ML platform on OpenShift. So we go ahead and we take all of these different open source projects such as Kubeflow and we adapt them to run on OpenShift. And we package them basically within an operator and then we go ahead and offer it on Operator Hub. So of course, Kubeflow is pretty big and the central component in OpenDataHub and we add other components and you can see them on the screen there. We add things such as Grafana, Spark, Prometheus, Jupyter Hub, Kafka, et cetera. So this slide here really shows all the different tools and components that are provided by the OpenDataHub platform and it just addresses basically a specific functionality in the end-to-end AI ML workflow. And again, this will look very similar to the slide that we saw just two slides ago, where first of all, we focus on data analysis. We have storage integration, which could be our self-storage, working with Postgres SQL or MySQL. We have to have some way of doing data exploration so we might use Super Setter Hue. If we're interested in our metadata, we might have something as Hive Metastore. Then for big data processing, we may use something as Spark. Those are things that the data engineer and the business analyst are very interested in. Then we move on to the artificial intelligence and machine learning. So the data scientist domain. When a data scientist, they may jump into an interactive notebook, such as Jupyter, go and do some of their work in there. If they want to go ahead and train, fine-tune their model, or work with a distributed model, they may use something as PyTorch. They may use something like Spark. For machine learning applications themselves, there might be various libraries that they're interested in. In that case, they can use the open data hub AI library. And then finally, they're going to go ahead and look at how they can deliver some of their services for their model or deliver their model through Kubeflow pipelines or maybe Airflow. That brings us to the production side where we're going to go ahead and deliver what we've created to the DevOps engineer. So again, when they're looking at the model serving, they may use something as Seldom. Way to deliver some of the services, again, might be using something, pipelines such as Kubeflow pipelines or maybe ARCO. And then finally, if we want to actually take a look in what's going on with our model, we'll use some sort of monitoring tools, such as Grafana or Prometheus. So the open data hub comes with an ecosystem. And again, this is provided by Red Hat and certified partners. And basically to help enable our customers, we built this ecosystem around this open data hub. And we feel that it provides our customers with a faster go-to-market strategy. So if we take a look at the product integration, this ecosystem provides tools for tighter integration with Red Hat products, such as Red Hat OpenShift, Self-Storage, OpenShift ServiceMesh. We can go all the way to Red Hat through scale API management. To actually get help with some of those items, we do have Red Hat consulting engagements. So as part of the ecosystem, we have that dedicated AI ML consulting services team to help our customers succeed in their digital transformation efforts or plans and really accelerate their time to market with what they're trying to do. Very important part of this is our Red Hat certified partners. We work with their party vendors to get them certified to use UBI images and certified operators. Then these partners become certified partners that will provide support for their tools integrated with open data hub. And we could look at some things such as Selden or Anaconda, anything that we might use for model serving, et cetera. And finally, we have industry use cases. So basically to go and showcase these integrations, we've built multiple industry use cases showcasing how we're using open data hub integrated with the Red Hat products again, such as fraud detection with open data hub and the Red Hat decision manager. So what I'd like to do is just give you kind of a slide demo to show how I would go ahead and do some fraud detection within a bank to give you an idea or flavor of how you can work with OpenShift and open data hub to actually deliver your solution. So the first thing that we're gonna do is just basically log into your OpenShift account. From there, we're going to go and proceed to the open data hub dashboard. We're logged in as a developer and to do any of the navigation we would use the left panel navigation bar. So right now we're looking at the topology. So I would just proceed to the open hub dashboard by clicking on the ODH dashboard operator and then click the open URL button. What'll happen is we'll be presented with some sort of open data hub screen and we'll have a large choice of options to choose from. As I mentioned, ODH contains a number of tools that you can build and manage and deploy your models. We're gonna take on the role of a data scientist and work on a fraud detection model. So what we're gonna do is click on the Jupyter Hub card to open Jupyter Hub and go ahead and begin programming. So when we open Jupyter Hub, we're first going to have an option to determine the type of notebook that we're going to use. We're just gonna use a basic machine learning workflow notebook that we can use to deploy a fraud detection model. And again, just a reminder, we're looking at legitimate and fraud fraudulent transactions that are in a bank. So we would go ahead and just accept the other defaults and choose spawn to continue. I've actually gone ahead and pulled in the notebooks through a get repository. So in this case, when you go into your Jupyter notebook and you pull in your notebooks, you'd be able to see them. And in this case, we have some of our feature engineering and model or logistic regression and services notebooks that we use to deploy our fraud detection model. When we put the model into production, we actually go back to the OpenShift side and we use pipelines. So we're deploying the machine learning pipelines into production with OpenShift pipelines and we'll see how we can use the services to make predictions. When we go back to the main OpenShift console and select pipelines, you'll see in this case, there's a pipeline that we've already created. So what we do is we could click on the pipeline and see the pipeline details. Now remember, this pipeline is gonna help deliver our models or our model. So once the pipeline is finished, we have a model or a rest service that's built with source to image or S2I. And at this point, what we'll want to do is take that pipeline service, more specifically the URL, because we're going to be using that URL and you'll see at the bottom, I have a service URL such as pipeline operator, data hub, user one, et cetera, et cetera, and we'll be using a request library in Python to interact with our rest service that we've just managed to deploy. So if we go jump into the Jupyter notebook to interact with our model services, we'll go ahead and just replace our default host with that generated URL from those pipeline services that we have running. Then if we go ahead and run our services, make that request, and then run our model, we'll have the model making its predictions. In this case, we have a lot of legitimate predictions on the right-hand side of the screen you can see under predictions. That could mean that we're very good. As we'd run this model a little bit longer, we'd probably see some fraudulent predictions coming up. All in all, that looks very good. So then what we want to do is we want to actually go and take a look at graphically what our legitimate and fraudulent transactions look like over time. So we can go back to ODH and we would launch Grafana. We would log into Grafana and then we would get in touch with the pipeline service that we had running and then we'd be able to visually monitor our service for fraudulent and legitimate transactions. I apologize for the screen capture being fuzzy, but what that's doing is what it's showing you is over the course of a day, the number of legitimate transactions, which should be a lot larger than the fraudulent detections which we are detecting. So through that very, very short slide demo, you have the ability to visually monitor our service for fraudulent and legitimate transactions and that's all going through using the Open Data Hub services where we were able to deploy a Jupyter Notebook and go ahead within that Jupyter Notebook and at the end of the day get our model running and then go back into Open Data Hub for another tool that will allow us to actually see some of the services that we have running from our model in the back end. And that concludes my demo and my kind of recap on Open Data Hub. I hope that you've found this useful and I look forward to answering any questions that you may have.