 All right, everybody, we're gonna get started in just a moment We'll just give another second for folks to join from the previous sessions Give it another 60 seconds All right. All right. Welcome back everybody. We're gonna start up with our next session exciting This session is going to be titled ML Ops for AI powered applications the journey from dev to prod With us we've got Miriam and I'm gonna turn it over to Miriam Thank you so very much. Hi guys Yes, my number. My name is Miriam and Dennis and I am here to talk about ML Ops for AI at the edge so What we have seen in most of our customers is the ad even though they have really good ideas and a lot of projects about how to Use AI to enhance the experience for their customers. It's really important for For them to be successful to be able to operationalize AI models not only at the edge. It's already Difficult to do it with regular models and more traditional Architectures, but when we are talking about the edge, we're seeing that there are add layers of complexity. So we And there there was this study and they made they ask all of these different companies how long it took them to Actually go from idea to putting the model into production and the half of their respondents said that it was between seven and 12 months which is a lot And a quarter of their response a little bit over the quarter said it was a year or more and only Very short 15% said that it was between three and six months. So in terms of being able to you Get some value out of AI initiatives seven to twelve months. It's a lot of time especially now with agile practices in When application teams are delivering new versions of their applications every three weeks or every month a year It's it's a lot of lost opportunity So What happened is when you look closely at what are the main things hindering Divility to successfully put models into production Most of the people often think about How to train a model in all of the complications around selecting an algorithm Having Data scientist train a model using the data and but really the complexity on upper successfully operational. I think a model is around Everything else other than actually Training a model. So if you see in this picture that small white square is What you need or the the complexity added for training the model and everything else is Around some architectural challenges or things that are not really related to the model how to handle and manage configuration How to do data collection lineage? How to do resource management? The serving infrastructure is a really big one. So how to make sure that your models are Doing inference in the infrastructure monitoring management. So we can see that not only the complexity is on non AI related topics, but also when Google analyze the the root cause of Most of the problems on their machine learning systems, they realize that I think I was 90% of them were non AI related that were related to the infrastructure. So being able to solve all those things needed To successfully deploy models is very important. And this is exactly what Open shift data hub helps you to do. It's not really about producing models, but giving getting Having the ability to iterate really fast and produce models that you can test in production and monitor To make sure that the predictions are correct And if not being able to start the whole process again, you know automated reproducible auditable way So if you see the stages to the architecture or the Process of successfully using AI You can divide it into four stages one relating to data gather and preparation of data Stretch exploration sanitizing if you are doing a real-life Streaming of data while on stream processing Then you have all of the activities around developing a model Where the main tools are notebooks and machine learning libraries I also have some tools to do experimentation and these two stages is sometimes known also as the inner loop when you Once that the model is trained you go to the outer loop which includes everything for deploying the model in an application which is Packaging the model making sure that it has an API or a way to be Access to send data in and get an inference back The ability to deploy the model where the infrastructure is whether it's centralized cloud media from premise Distributed or even edge And then once the model is deployed and is actually serving which is Doing inference How did you monitor the model and how do you manage it might manage their it's life cycle like? How do you trigger retraining when the predictions are no longer accurate? How do you handle versioning of that model? How do you do things like a B testing CICD? And this is known as the outer loop if you see that the stages They are not really that different from what we've been doing for years now on regular application development the only difference that I Are one of the difference that I see is that? with projects or applications that are powered by AI it's even more necessary to develop this Muscle memory and this is skills to constantly iterate and it's more important because the models are only good as the data and the data is Reflecting of reality and reality is always changing. So your application and your model has a Short shelf life because data is constantly changing. So being able to iterate is a must is not something Nice to have is it's something that you definitely have to do at some point to reflect the act the the current state of the world and When we talk about edge And the pulling things at the edge what we've learned is that? Edge is not really a single location or a single place It's not something like you can pinpoint and say wow, we're going to go out here This is the edge and these are the conditions. It's more like like a broad Regions like a continuum Devices at the edge of the network can communicate Latterly with each other. They can communicate north to court Data services or south to more low-level Services and all of these devices have different characteristics there. There are different layers at the edge You can have a very constrained Tiny ML type of edge you can have edge on devices that are capable to have Rel or or support some sort of continuation so it's really a spectrum and So when we were thinking about designing solutions for edge We focus more on what are the characteristics that make edge unique? In edge computing comes in many forms, but what are the common things in all of these different forms? How? Edge computing is different than traditional. Let's say on premise Computing or distributed computing or why do I need to to to take some actions for edge? And we saw that these characteristics are usually These advantage networks even if they're when people Talk about edge. They imagine really remote Locations really hard to be hardly accessible It's it's not really Something like a factory floor could be could it could be intentionally air-gapped or poorly connected to the outside for security reasons So this advantage networks. It's kind of like a common thing either Denied or firewall heavy firewall disconnected with intermittent connection With low bandwidth like let's imagine an oil well where where you can only transmit very limited amounts of data and Where you have really high latency because of of the remoteness and you need really slow Sorry really low latency Because maybe you are operating operating devices that if Failed or if they have any blip could actually hurt people And you know all of these aspects in regards of connectivity is it's Something that character is a set the other one is that it they are data-centric architectures They are focused around the sources of data and the storage nearest to the point of generation And this doesn't mean that you know All of these data is valuable something that also is very common in edge and especially when we are processing signals is that You have massive amounts of low value data. So It's it's architectures that are heavily depending on massive amounts of data and and to being close to the sources to do the processing there and then the other one is because of Some some part because of the disadvantage network you have a different management models. So instead of having a centralized core Controplane managing the workloads where you push any updates Or you push any changes on to the worker nodes here because of the limited resources and because most of the times the priority is Keeping the resources for processing or for other more more critical Task usually the management is follows up pool model So when the edge node is ready, it will go out phone home and ask for the latest updates So all of these things are different capabilities of edge. So when We started thinking in the community about what do we want to To What is the thing that we need to solve so our customers are more successful when deploying at the edge Or what do users need to do to be more successful when deploying at the edge? We think that the main The main characteristic needs to be flexibility. So Any component or any workload that you want to deploy at the edge needs to be able to tolerate the conditions I need to be able to fit in whatever layer on this continuum You want to be it doesn't have to be necessarily all edge it can be in the core, but It has to be portable. It has to be consistent. So the same Setup and the same workload that you are testing in the core on the Public cloud has to you have to have the assurance that that same setup in that same Workload it's going to be Deploy exactly like that at the edge. So you can decide the optimal locations of services and capabilities Determined by the trade-offs between the constraints. So if it's very expensive to hold back all of the data from the far edge where our device is constantly emitting signals and it's really Expensive to send all of that by The internet to a core location. Well, maybe the best location for The processing service of these signals or the models that are processing the signal is at the far edge For monitoring capabilities, maybe you want to be able to monitor in an intermediate location. Let's let's say in a server room In a store you want to monitor the algorithms that do face recognitions on cameras or to do Visual inspection or something like that So you have to be able to install the monitoring pieces or components of your AI platform On the near edge as well as on the far edge. So the most important thing is that You can put that that Users need to have the flexibility to put the workload where it makes sense and Thinking about that We set out to find some of the deployment patterns that we most commonly see for edge and we're seeing that usually There are some Usage where they have a data center in the cloud and they also have a data center that's Distributed doesn't need to be super constrained. It just happens that it is Disconnected and the analytics needs to happen in this distributed data center others are that you have a data center in the cloud or the Some part of your service is running at a core location in the cloud And then you also have a layer of constrain near edge. So think about some industrial industrial PC or some server room in a store or Or Maybe some Drone with with more capacity And then you have also other solutions or other implementations where you have the core data center And you have some sort of the computation done on the far edge on the device itself Things like intelligent agriculture where a very small drone is out there inspecting the fields to determine the the optimal levels of the water or fertilizer and Either it's doing the computing on the drone and then the insights to the cloud or Sending the data to the cloud. So and then the other one that we also see very common is to have Solutions that are kind of like turn key solutions where you see Everything or you have all of the different capabilities in in one box in and it has the ability to train models serve Monitor and everything is back there and you just have these kind of like appliance that will give you all of the capabilities needed So What are some of the edge computing footprints that we have when we're talking about edge where can if I have a Workload that I want to run an AI powered workload that I want to run and where can I install it? So first we have a Regular or a compact cluster of open shift with three node three node Cluster with the compact with the worker nodes and the control plane on the same three nodes And we really saw we see a lot of these type of footprint in telcos or Smart manufacturing where they are not so constrained in resources You can also have the control plane in at the core and then one worker node on the edge location So we see also again this in telco Or in IT and data collection gateways you can have open chef single node or SNO for let's say in vehicle field operations or Single server operation or in disconnected environments where again the resources are are being more and more constrained And then finally you have red hat device edge which includes Ralph for edge and Micro shift, which is that the smallest of the footprint. So if you are really really worried about the usage of Resources and this is like that the smallest footprint that we have Oh, sorry, so What we have seen is that One of the accelerators for edge are containers So as we said in order to give Maximum flexibility to the solutions that we build There are multiple things that you can do so one of them is to avoid any risk when deploying these intelligent applications to edge location To make sure that whatever you tested in a connected environment at the core is the exact same thing that you are deploying in Disconnected environments and our computers are really good for that because they are very portable. They are immutable you can also Embrace microservice architectures. So you can bake in some capabilities when the resources that when the device is not suckers and straining resources for things like tracing monitoring maybe do something of these metrics on site and When it happens that you are more constrained on resources while you can opt out of this sort of capabilities and This containers are very easy to track to sign for security reasons It's really easy to version then and there is a whole set of tooling around them So we see containers as one of the better accelerators for edge and then the other thing that we saw is that Right now we kind of see that there are two approaches to how to develop applications Powered by AI So and it's kind of like an approach where you have a full stack data scientists that knows not only how to train models But also knowns about Infrastructure and also knowns about how to develop applications And these versus you have an mlops platform teams that setups all of these tools for the data scientists and the data scientist is more Focus on really the hard aspects of training models in and optimizing them for it So in the in the story with the full data stack data scientists The data scientist has the tools to to do training and experimentation of models or most of them based on Python or are Jupyter notebooks as the development environment or some other It's a local Tooling and once the model is strained It just needs to make the model available to one of the most important pieces of An mlops platform, which is the model server and the model server is capable of without any Deep knowledge about the platform the underlying platform the underlying Kubernetes platform it's capable of taking that model from that location and Figure out everything needed to run that model in production from versioning handling the risk the request Publishing an API with multiple communication protocols Some models are receiving half capabilities to do some AB testing or canary deployment And it also has the way to determine Where does the model needs to It's to be deployed depending on the demand and it can do things like scaling out the model and scaling it down and Even decommissioning the model so it handles a lot of the of the life cycle not all of it But a lot of it and it also includes some monitoring capabilities. So that way the data scientist really the only Interface it has with the infrastructure is the model server and the the the the input for the model server is The model the train model itself and the output is A deployed model that it's already as accepting requests The other approach that we see. Oh, sorry with this approach We see that it has a lot of problems Advantages because it really simplifies the task of deploying models at scale So you don't need to if you have a huge Kubernetes footprint or a huge Kubernetes cluster and you want to do things like Distributing the workload is killing the model You want to reuse that Infrastructure for different types of model not just one time or you have or you want to use that as a centralized ML of platform grow up your teams you can very easily do it with model server and it takes charge of all of the multi-tenant aspects in and all of the Governance aspects in of the model It can handle some of the model life cycle without the data scientist having to do anything Well, like actively do anything and it can handle scaling and performance Some of the disadvantages with this approach in because and this is the reason why it's not useful for everything Is that the container that you get at the end? It's not really an immutable container because the model server Handles the way it Beals that container and how it downloads the dependencies and all of that is not something that you control So because it's not immutable sometimes. It's difficult to troubleshoot or debug when That the model is being executed in remote locations. It is not as portable. So you cannot 100 be sure 100% that what you Tested is the same thing that it's going to be executed. Maybe the dependencies are different in different locations. So it's it's not as portable It doesn't follow software engineer principles to deliver build and deliver software. So this is more AI centric is not as DevOps centric, but it's definitely better for the data scientist The other approach is well when you have an MLOps platform team that it not only includes MLOps engineers But in here that the data scientists again has all of its tools to train the model and experiment the the end product of that training and experimentation is a model and once it is training and the data scientists is satisfied we with the results it will store the model in a model Repository that where the model will be versioned and will be tracked in the lineage of all of the artifacts of that model will be Will be recorded and you you will be able to audit them at any point and from there the MLOps team Can build a pipeline where different profiles and different personas can contribute to that pipeline So for example the MLOps engineer to automate the whole process of deploying an immutable container The security specialist to do things like signing the model or handling their certificates or the authentication to deploy in certain locations You can even have a compliance specialist. So here it's more like a multi discipline our team And at the end what you end up is it again a model that's deployed somewhere in a Kubernetes Cluster and that it's running and You also have some monitoring capabilities for the data scientists and the MLOps engineer. So to support the data scientists You can have the best of the two worlds. So you can either give each one of the different profiles involved in The process of putting a model into production. You can give them a specialized tools for Each one of them. So for example for MLOps engineer, you can give them things. They already know like tecton pipelines or Customized to handle their configuration on the different stages For the integration, especially as you have integration tools like Kafka or And so on or what we are trying to do it on Openshift data hub is to build some abstraction layers where the data scientists if Preferred can still control the whole life cycle of Of the model With these abstraction layers to all of these different skills and capabilities that are needed. So security Monitoring etc. I Doesn't need to know in depth. How are they working just that they are included in the platform but if there is a team of MLOps engineers that are Operating the platform. They also have they still have access to these Let's say lower level tools that are running on open chef So even if the data scientists defines, let's say a pipeline to automate the deployment using Python and the Lyra An MLOps engineer will be able to log in to the platform and look at the actual code The workflow implementation running the pipeline and in C YAML and containers and things that That profile knows so we think this is some the best of the two approaches and Yeah, I don't know and so for that one of the We are trying to identify all of these different tools that are needed depending on the persona And this is a picture that kind of depicts what do you need to have like a full Platform to be able to deploy models at the edge Something there are tools specialized for the data capture and the algorithm development for training and doing auto ML where Conversion and compilation is specific for the device and on device performance estimation. It's really important model compression and optimization because again, we are we could be deploying to very constrained Devices, so this is what we think we a full MLOps platform should have and this is finally how the model lifecycle looks like You have a version control Capabilities that could be based on get or on a specific tools for AI Or even open source Repositories of models like hugging face of disease. It's quite popular and then you have a CI CD pipeline to be able to package the model into a container image that then gets stored in another registry for distribution and at the edge to have some sort of component that is capable of Managing the lifecycle of the model once it is deployed and as we saw it follows the the different Approach that it's needed for edge where it has to constantly pool The changes and the configuration from the core location So that's what I have today. Thank you very much. I don't know if anybody has any questions Thank you so much Miriam It looks like we have one question in the chat that we can get to we've gone a little over time So I'll make this very very quick We've got a question about enabling the GPU support in ODS does it support the NVIDIA GPU add-on as I have seen in the docs. It's no longer supported Yeah, we are working on having better support of You know that the CUDA framework for GPUs for NVIDIA and also trying to work with other providers Like Intel to also do CPU acceleration But yeah, at the moment we are working on that capability It will come Well, thank you so much and we will see you all in the next session coming up shortly Thank you very much. Thank you You