 Good day. My name is Audrey Resnick. I'm a Senior Principal Software Engineer at Red Hat working with the Red Hat OpenShift Data Science team and Today I'd like to talk to you about accelerating ML ops with Cooper Navities, CICD and GetOps So the items that we're going to be looking over include a case study where we're going to deliver an intelligent retail coupon application We'll be looking at AI ML models and the work of data scientists specifically in the arena of ML ops We'll be productionizing a ML model with OpenShift and then we'll take a deep dive into GetOps and Pipelines and then we'll conclude with our demo of the Globix AR coupons intelligent application So let's go ahead and take a look at an overview for our retail coupon application So what's the discount what we want to do is give the customer the ability to find merchandise discounts for shirts as they browse clothing in a department store and This was actually a proof-of-concept project that our leadership team Gave myself and two other of my colleagues and they said, you know Can you take our existing resources and build an intelligent application using kind of basic ML ops and under two weeks well what we were able to do is with some of the open-source technologies that we have and Reusing an existing model. We're able to develop a proof-of-concept in just a few days with the three team members I'm going to make the caveat there that it is a proof-of-concept typically if you're going to create an intelligent application and vet it thoroughly and go ahead and Make sure that it is Well situated in production with retraining and modeling Monitoring that takes a few months. So I just want to put that out there with this application as we mentioned, we're just going to have the user have the ability to walk into a department store pick up a t-shirt or have someone try on a t-shirt take a picture Your edge device, which will be your phone and have that image sent to our model And then our model was able to see if that particular shirt would have a discount Let's talk about people in collaboration When we are going to create an intelligent application. We have a lot of people that we have to work with It goes from business leadership that basically goes ahead and gives you a set of goals and metrics The data engineer that is gathering your data the data scientist that is going ahead and taking that data and Developing a model and that model being handed over to an application developer So that they could wrap it within a flask app for deployment and then to the ML engineer and IT operations that are going to help you with model monitoring and Model management so besides the people in the collaboration You also have to pick some tools and in our case We decided on a number of data science and ML ops applications and managed services for our tools They go from Tecton, which is a framework to create CI CD pipelines to Kafka, which is a Streaming service where we could go ahead and ingest our data Red Hat OpenShift Data Science, which is a platform that we use for data scientists developers and ML ops folks to go ahead to Develop train deploy a model to quay which is going to analyze and distribute some of our container images Argo Which is a get ops continuous delivery tool for Kubernetes and S2i, which is sourced to image It's basically a workflow that you can use to deliver images Via containerization with an open-shift So let's talk about an AI ML model and the work of a data scientist now that we know what the personas are Let's see how these personas Interact within the steps of the model life cycle when we talk about steps of the model life cycle We're talking about setting goals gathering and preparing data Developing a model deploying the model in an application and model monitoring and management Business leadership is going to go ahead and set the goals and define the metrics for your project that point Afterwards the data engineer will step in they're going to gather and prepare the data Whether the data is in a data storage or lake. They're going to do some data exploration data preparation They may even use something like a patchy patchy Kafka for Doing some stream processing of the data once they have a data set that they feel is ready They're going to have that off to the data scientists. The data scientist is going to sit down and they're going to develop train test a model and The more than likely do this in an IDE such as PyCharm's Where they may initially use machine learning notebooks machine learning libraries such as up TensorFlow or PyTorch and once they feel that they have a model that is good enough To answer some of the the questions that leadership has set forth in the goals They're going to go ahead and hand that model off to an app developer And the app developer is going to deploy the model within an application This is going to bring into the model life cycle the parts with CICD and ML ops We'll go into that a little further and then what they're going to do is they're going to go ahead And once that model is deployed they are going to do some model monitoring and management This could be for various alerts if the model starts to drift Model visualization and at this point in time This is where the data scientists can step in and decide hmm The model is not really working the way that I expected and then that's where the model can actually be Retrained or they can very well decide that the model doesn't fit the situation and they may go and recreate a different model You'll notice that the very bottom IT operations is involved in all the steps They're the ones that are supporting the entire platform where all these people are working on So they have a vested interest in making sure that the open-source technology that is being used is very safe for the environment It's not going to allow any malware Anything of that sort to compromise the integrity of the the network that they are currently presiding over So with that, let's go ahead and go to the next slide and We're going to look at how we develop a machine learning model I'm going to give you a demo on this But I wanted to kind of put into context the kind of steps that you may see in an ML operations diagram And that kind of starts with the data store You're going to have some location where your data is located That is going to be taken in for model training You're going to have your machine learning model and once you're happy with that machine learning model You may go and choose to put that model in a model store. That one saves your model So it's reusable but it also gives your fellow co-workers the ability to reuse that model if they want to use that model for something else That they're working on Typically you're going to create an image of that model Meaning that you're going to wrap that up so that it can be successfully tested and deployed And in this case once we create that image we usually use containerization for that We're going to put that image in what we call a registry and that makes it even easier for people to go ahead and Connect to that model and use it For their own data exploration or solving their own problems Let's go ahead with the demo Hey, so what I'd like to do is actually create a machine learning model that I can use for my intelligent application for my retail coupon application Where I can go to find this platform, which is called Red Hat OpenShift Data Science is Clicking on the launcher icon within the Red Hat OpenShift dedicated cluster So I'm going to click on Red Hat OpenShift Data Science And I'm going to log in and That is going to provide me a platform where I can pick a number of tools that I want to work with I just clicked on the explore tab so that I can go through these Managed services and applications that are from our various independent Vendors and determine which ones that I want to use to create my intelligent coupon application. I Went and I selected Jupiter Hub that's going to show up in the enable tab and just to Note that within the resources tab. We have access to all the available resources such as documentation tutorials and quick starts for a number of the different application services and Managed services that we have available. I'm going to go back to the enabled icon Or sorry enable tab and I'm going to launch Jupiter Hub And I'm going to log in And what Jupiter Hub will allow me to do is to create a specific image that I want to use to build a TensorFlow model Now if I didn't want to build a TensorFlow model I could choose some of these other notebook images if I was doing something within say standard data science or Minimal PyTorch or PyTorch itself. I could choose those images I'm planning to use a TensorFlow model that will have these packages included if there is a package That's missing what I can always do is do a pip install into the Jupiter lab environment where this notebook server is going to spin it up from The deployment size that I'm going to choose will be a small we only go up to small in this public sandbox or domain and Typically though the container sizes will go up to extra large and they give the data scientists the ability to pick how much CPU and how much memory they need for their project We also have the ability to add environment variables environment variables are for those things that you may not want to store directly in your notebook Maybe such as AWS Access keys or passwords and with that I'm going to start my server and my server is going to take the choices of environment variables deployment size and notebook image and spin up a Jupiter lab environment That in this case will have all the available tensorflow packages and libraries that I would need to create a tensorflow model Now remember this Jupiter lab environment that I'm spinning up is an ephemeral one when I log out at the end of the day It'll go away. So all the resources Will then be freed up for anybody else to use within this environment And you'll notice that within this environment when I come in I've already have a number of projects that I'm working on and That's important when you're working with others on a project is you really want to be able not only to have your own environment but to share some of the code that they've developed and That is done through using GitHub in this instant we can Initialize a repository that we're creating first of all for a project what I've done just before this demo is I went and cloned a repository, so I put in the name of my arc model for my Retail coupon application and Basically downloaded the code or sorry clone the code into this ephemeral environment Within this environment here. I have a number of notebooks that I have been working on The first thing I'll do when I'm trying to create a model is to go ahead and explore the data So I can connect to s3. I Can take a look at an image that I'm using because remember with this coupon application will be using Our edge device which is our phone to take an image that we will feed the model and in this case. I'm creating a tensor. I Will load in a model that I've created And Essentially what I'm doing is trying to make sure that my model at the end of the day does recognize Certain objects within the photograph that I've given them and this is good It has recognized the objects but what I want to do is fine-tune the model so that what it will do is Recognize just the piece of clothing That I would take and it has recognized the piece of clothing I don't want to go ahead and deploy a Jupyter notebook into production So what I'll end up doing is taking my prediction model and some of the fine-tuning and put it in a prediction dot pi file this prediction dot pi file and some of this other code will then be Uploaded to my get repository that when I finish so that when I return into the open shift environment I can go ahead further into ML ops and Actually use some get ops to create an image that I can then deploy along with an intelligent application That will allow us to take pictures and then have the model predict on them Okay, so now that we've seen how we can go ahead and create a model Using the red hat open shift data science platform We need to kind of step back and start considering ML ops and that question being is How do we now automate this process of delivering a machine learning model to production? We've gone ahead and we've seen how we've scoped out apps and metrics. We've gone ahead and done feature engineering We've collected any of our data We've gone ahead and Monitored and validated our model. We've seen the model training and tuning. So the next step is is now How do we automate all of that? I mean, there are a number of moving parts Well, there are ways to automate this and we're going to go further into ML ops To automate this process. Let's take a look at how we can production eyes ML models using open shift This is going to start with a discussion of open shift pipelines now open shift pipelines are a cloud native CI CD solution that are based on Kubernetes resources and they use tecton blocks to automate deployments across the multiple platforms by abstracting away the underlying implementation details now tecton itself it introduces a number of Standard custom resource definitions that are called CRDs for defining the CI CD pipelines that are portable across the Kubernetes distributions So of course this means that these are going to be built for Kubernetes They're going to scale on demand you're going to have a secure pipeline execution And this is going to be very flexible and powerful with that in mind Let's now move on and we're going to take a look at data and ML pipelines in our data pipeline we're going to have data coming in from a number of resources and When we take the data from the number of resources, which could be clusters or environments We want to be able to gather or collect that data in a central location Where we can then do future extraction for our models and then put the data within a data store so that it's readily Accessible by our data engineers and our data scientists So here we are for our other demo what we're going to do with this demo is actually create a Kafka instance Remember we need some way to be able to handle the data that's streaming in mainly our images streaming in and to be able to push That to our application which we created before our application and model which we created an open shift Before hand, so let me share my screen with you And I'm going to admit that I had to go ahead and do a little bit of cheating before We went ahead and delivered our containerized model What I did is I broke out the front end application and deployed an object detection application made from Node.js and When I do that I'm able to go ahead and connect that to our model that we created so now that we have our model and we have this front end and We have that model working where it would Pull in some of the pictures. Let's get this working with streaming data The way that we're going to do that is again through creating a Kafka instance now a Kafka instance in Open Shift Streams for Apache Kafka will include the Kafka cluster bootstrap and the configurations that are needed to Connect to the producer and consumer services Now we're going to be creating this Kafka instance and related resources in a similar way as that we did in creating our open shift containerized image So we'll go ahead and click on create Kafka instance I've already created one previously. So I'll just reuse that name and it's going to be called a Resnick object detection The cloud be fighter is going to be Amazon web services I have one region to choose from and The availability zones are going to be multi by default for our test cluster here So that's going to take a while to actually create that image the next step in that is to actually go ahead and Create what's called a service account and that's the way that we're going to connect these applications to our Kafka instance In order to be able to feed data to it Now unfortunately, even if this went ahead and created I found out and I'm sure none of you have read into this before that all the licenses for this particular Test cluster have been taken out. So what we're going to do is actually step through the remaining items But we'll be doing it from a workbook that I have So I'm going to just bring that workbook into view Now remember I went ahead and I created that image then what I want to do is to go ahead and click on connection And in the connection we create that service account to set up the account So I would click on create service account I would give it a specific name. So in this case, I might put my username a Resnick dash Kafka dot dash essay and I would create it And what will happen is that I'm going to be granted certain credentials That's going to be to a secure location And once I have copied those over into notepad. I would save them so that I would have access to them later on After I go ahead and create that service account I also have to set an appropriate level of access for that account and that's in the access control list or acl of the Kafka instance So I'll go into my account and then I would go into access tab to view the acl for that and I would click on Manage access and then I would use an account drop down to select the service account that I previously created and connect next And this is where if you saved these Credentials from before you would be looking them up to see that that account actually matches and I would go ahead and manage the access for this And in this case, I would review any existing Permissions we wouldn't have any of those so we would go ahead and assign permissions to our topic consumer group and transactional ID I would save those And then we go ahead and create topics and that's going to when we start creating Kafka topics That's going to start producing and consuming the messages in our service So at the Kafka instances of the web console We go ahead and we click the name of our Kafka instance that we created earlier And we can click on create topic And what we're going to do is we're going to actually use these topics to go ahead and connect to the application services that we've created before within red hat open shift and of course that instance has timed up, but that's okay So we go ahead and create those and then When we go ahead and save everything we will actually have the ability then to Use our app where we would be able to take a picture because we have this Kafka service now that picture would go through the Kafka service the Kafka service is Connected to our Application front end that I created with no JS. So it's going to go through there. There's a rest api that's going to be called to our model that Photographer image will go to the model the model will go ahead and do the detection and then send back the results Through the Kafka streaming service so that we can Actually go ahead and look at them later on So let's go ahead and Close this part of the demo. Sorry that we weren't able to give you a really live demo But again, that's what happens when everybody's scrambling for resources And we're going to go back to the presentation and talk a little bit more about ml ops and open shift Okay, so now we're going to move on from our data pipeline to looking at the machine learning pipeline So ask what is the machine learning pipeline? Well kind of the definitions that float around include that The machine learning pipeline is just a means of automating the machine learning workflows By enabling that your data to be transformed and correlated Into a model that can be then analyzed to achieve your outputs or achieve your goals And this type of pipeline makes the process of inputting the data into the machine learning model fully automated Now you can create the resources required to run a machine learning pipeline by setting up a data store that we had talked about across To access the data that's needed in your pipeline steps You can configure a data set object to point to persistent data that may live in or isn't accessible in a data store And you can set up various compute targets on which your pipeline steps will run Okay, let's talk about the open shift get ops pipeline Now the open shift get ops pipeline is going to ensure that you have consistency and applications when you go ahead and deploy them to different clusters in different environments and that can include environments such as development staging and production and it also goes ahead and organizes the deployment process around the configuration repositories and makes them the central element And it always has at least two repositories the application repository and the source code And the environment configuration repository that defines the desired state of the application Now these repositories contain a declarative description of the infrastructure you need in your specified environment And they also contain an automated process to make your environment match that described state Specifically though the open shift get ops Helps you automate the following tasks You're going to ensure that your clusters have similar states for configuration monitoring and storage You're going to be able to recover or recreate clusters from a known state And you can apply or revert configuration changes to multiple open shift container platform clusters Uh, also you have associated templated configuration with different environments And lastly you're able to promote applications across clusters from staging to production Now with that let's take a look at the release pipeline Okay, once your intelligent application has been containerized The next thing that's going to happen is we have to have that release pipeline in place So what we're going to focus on is anytime there are new changes made into our get repository That's going to trigger something within our ml service and our intelligent application May be changed by some of the new code that we go ahead and add in or just some of the finer tuning And it will go ahead and deploy and then again at that point we can go and look for model drift And if we need to we can always take our model offline And go ahead and retrain it if necessary or if the Data or the results that are coming from it are not at totally at all what we want We can go ahead and pull that model completely offline and rebuild it This final diagram puts everything together that we've talked about We've started with some data that we ingested We've been able to get a data scientist to create a model for us. So the model has been trained with that data The model may be Stored within a model store We can create an image out of that model using the open shift platform And we can test that image and make sure that it has nothing in it that will cause it to Crash within our system Or at this point in time it looks like it is Valid for the goal that we have set so we can register that model image within a model image Within a model image registry and What we will then do is that once we have that model there We can from our application, which we've created and get ops that relates to the model Um can have a trigger for any new model changes And we can put our machine learning service into play and have our intelligent application Ready for use or deployed or sicked and at that point in time when we have everything online We just go ahead and monitor our model Make sure that there's no further drift that the model is performing as expected And we can continue this loop for as many times as we want to until our machine learning model is Operating the way that we expect This takes us to the intelligent application that my colleagues and I created remember again, this is a peripheral concept And these is an example case for ml ops with open shift get ops and pipelines The demo that we are going to have is again for a retail application what I want you to do is to take your iPhone and go ahead and point it at the The qr code what that is going to do is it will open up a camera automatically And you can either take a picture of the t-shirt here that is in this diagram Or you can get somebody to stand up in front of you and take a picture of the shirt that they're wearing Because at the end of the day out of the objects that are being displayed here We want the model only to recognize the clothing and to be able to apply a discount to the clothing Well, you should have ended up with something very similar like this after taking a picture of the barcode You're going to have a camera feature come up. You're going to go ahead and take a picture of The clothing and you'll see that that clothing would be associated with a discount So this shows that our model has been able to successfully Recognize a piece of clothing In our code that we've used for this application. It applies a discount But more importantly the pictures that you are all taking Are going through and streaming through our Apache Kafka services or our data pipeline The information is then being fed into our model through our model pipeline And then we're given a result Thankfully via the intelligent front end or app ui that we've stored within a get repository That guess what has been brought in through open shift and has been associated with our model Thank you for your time today I would like to give you the opportunity to try out the red hat open shift data science platform for free if you're interested in taking a look at how you can go ahead and Develop train test and deploy a model go to https developers dot red hat dot com slash products slash red hat open shift data science I hope you enjoy the rest of your conference