 So welcome to the last, I think talk of the edge days. We're gonna talk about how to deploy models, AI models to the edge. And my name is Miriam Fontanis. I work as a product manager in Red Hat on the OpenShift AI BU. And I'm Jack Holden or Jay Kohler, and I work as an engineering manager on the same team as Miriam. Perfect, so let's start. So first of all, we set out to bring capabilities for AI lifecycle management to the edge. And the first challenge that we have was really defining what edge was. Basically because we saw that the closer you get to user devices, embedded systems, microprocessors, the more specific the use case is and the more complex the hardware, there is no way to have homogeneous APIs the way we have in the cloud. So we understood that the same way the cloud native, it's kind of like a paradigm. Edge computing, it's also a paradigm where you have these myriad of power computing somewhere outside of the IT or of the core organization. So we tried to design all of the components for the AI model, thinking about what were the challenges to put AI computing on these continuum between the core and the edge. And we found that basically there were three main challenges. The first one being that the closer we moved to the user device, the more constrained resources we found. And it was not only about constraints on computing or on battery, it was also resources, constraints about how do you move all of that data that's generated at the edge back to the core or back to other nodes in the edge where it can be processed. The other thing, it's also resources in IT people at the edge, so even if you have a server room in a store in large retail people doing Edge, you normally don't have IT personnel. So you have also to account that whatever architecture or design you have, it has to be very simple because everything extra is usually important. The second thing was everything revolves really around data. So the pool of the data gravity and all the concerns related to data, sovereignty, data privacy, data security was one of the core aspects of Edge. And then the third one was really about management of all of these Edge nodes where you have to, most of the times you are working on air gap disconnected environments or on environments that have intermittent connectivity. So you have to account for operations that need to be unmanned, automated and they have to be very resilient and ideally they have to self heal. So with all of that in mind, we wanted to be able to have or to control the whole lifecycle of an AI model at the Edge. And borrowing from software engineering, we kind of divided the lifecycle into two loops, the inner loop, everything that happens while the model is being trained, how do you gather the data, prepare it, train it, and experiment up to the moment where the data scientist feels that the model is optimal to go to a production environment and that's when we push out the model to some sort of registry and it's ready to be to put into production. And that's when the outer loop starts. And there we have a review process, the process to build the actual model. We are leveraging containers because even though the Edge, it's really different to Cloud Native, everybody expects to have the kind of same experience, to have services, to have a self service console, et cetera, to deployment, to serving the model, to model monitoring. And as we said, all of the designs needed to account for the fact that any of these operations or any of these stages on the lifecycle of the model could move along the continuum from the cloud to private cloud, to virtual, to physical, to Edge. So we have started with some customers they had initially like a supermarket, they wanted to put AI inference in each one of the cameras, then they realized that they had hundreds of cameras in just one store, so it was very expensive. So they decided to move the analytics to the server room. So the same service has to be capable of traveling these different capabilities. And for that we found that the best way to provide consistent operations in management and observability capabilities was to leverage something like Kubernetes through OCD and Microsoft. So this is more or less the lifecycle that we ambition for deploying models at the Edge. We start with a model that's been trained somewhere, in this case could be a cloud service from any provider. It could be locally, it could be on VS Code. Once the training part has finished, the next step is to register that model into a registry. And for registry, they're starting to being standard. So we have HoganFace, it's a public registry, most basically for GenAI and LLMs and things like that. But we also have other model registries for more like predictive AI. So things like MLflow, we are actually starting a new project inside the Kubeflow community to have a new model registry that's open source. Or it could be something as simple as S3 or even Git to be able to version your model. So once the model is trained, it's pushed out to the model registry, it's tagged as ready for production. And that's where we think it's very important to be able to automate all of the operations that happen later on. So we are using OpenShift, yeah, sorry, OKD Pipeline Space on Tecton to build a container image that has the inference service. We provide model runtimes that are certified and come from secure sources. So you have things like OpenBino that can leverage not only GPUs because that's very expensive, computics for some predictive models, mainly used at the edge to do predictive maintenance and visual inspections, you don't really need sometimes GPUs. So we leverage also CPU acceleration. And that pipeline, finally what it produces is a set of containers because it's a microservice architecture where you have not only the model and the model runtime, you also have services to do life cycle management, observability monitoring, and all of these services are opt-in services. So depending on what layer of edge are you gonna be landing, you can have less or more of these services. If you have totally constrained resources and you are deploying to an antenna in the middle of nowhere, you can just deploy basically the model and start doing inference there. So we're gonna see a little video of how do we build the pipeline. Exactly. So here we have OKD and we are using tecton pipelines where we basically an MLops persona will be in charge of either triggering the pipeline when the data science teams hands off the finished model or you can also have triggers pre-configured so that it's completely automatic in the way that once the model is tagged as ready for production, then the CI pipeline is triggered. So we download the model, we have built some integration components to S3 or to Git. So the only thing that the pipelines need to know is where is the model located. Once it fetches the model, it clones the container repo, all of the things that needs to actually produce a container. And here is very important that these pipelines could be customizable because for edge, one of the important aspects is around security. So you can do things like sign the container image that's produced, you can encrypt it, you can add compliance or regulation extra steps to build that model. And yeah, that's a little bit of what we're seeing. At the end, what we will produce is inference service that's completely containerized that will have all of these additional services for management and observability. And it will be OCI compliant, so you can register it or use something like Quay to distribute it to the different edge locations. Okay, so if you're not lost or if you are. So what we were just showing is three and four here. And what we're getting ready to show now is caking off five, which will push our inference service container image to something such as Quay and image registry or whatever you choose to bring as your image registry. And then as well as kick off a PR for the ML Ops persona to be able to check that and decide whether it's good for production and to merge it. So if you wanna show that one. So as I was saying here too in the picture, you don't have to use Quay, you can bring your own model registry. You can do PR automation if you'd like to so that it automatically burges, but in our case, we wanted to make sure that the ML Ops engineer ensures that it's production ready and can go out to the edge. And it's really basically that simple. You wanna say anything on these right here? Nature both of Kubernetes and GitOps. So we make sure that nobody at the edge is, you know, configuring something directly through the UI or the CLI. So a lot of the use kicks as we're seeing is that there is not as of today an edge provider the same way that there is a cloud provider. So basically we're saying that either cloud providers are trying to offer that space to as an edge provider. So like AWS or some telcos. And we're also seeing industry specific solutions. So people ship out some appliance to some factory and they don't have any visibility on what is happening there. So to be able to ensure that whatever was intended to run there is exactly what is running this declarative pattern, it's really important for us. And then the second one is because GitOps kind of doesn't need for a central hub to be constantly connected to the edge node. You have at the edge a controller that's kind of like the brain that's constantly pulling and asking the hub, is there a new version? Is there a new version? The communication is inverted which really feeds the edge use case where you don't know when are you gonna be connected or you don't know when the edge node is gonna phone back. So we found that pattern really helpful. One thing is that all of these are really, again, taking best practices of software engineering and applying them to AI. And what we are adding is ODH is an AI platform that lives on top of Kubernetes. So all of this is fully automated. The users don't have to stitch together whatever controller they want to use. In our case, we're using Argo CD. They don't have to stitch together the observability and everything else. We kind of done that and automated all of that for the users. So once we were able to build that container image, put it in our registries so it can be distributed, the next piece is, in this case, Argo CD. It's the implementation that we use and it's because we're working a lot with that community to have agents that can really fit very constrained environments. So they are gonna have an agent with a smaller footprint and it's constantly watching for changes. And then for life cycle management, we are leveraging OCM capabilities and that's what's really observing at the hub how many edge nodes that you have. So once the GitOps part is completed, the PR was submitted and accepted, we can see on the OCM console that we can get the list of all of the edge nodes that are out there and we can visualize them there and OCM is already prepared to constantly ping or have a cron that's constantly looking which of the edge nodes are reporting back and updating their states and what do they have deployed. So we can see here, we have just two, the local one, which is our hub or core and we have the edge node also deployed. So again, all of these things are not new to deploying things on Kubernetes. We are just taking the same experience to the edge automating the whole thing and we are also treating AI workloads like any other workloads. So users don't have to use different processes to deploy different things at the edge. Okay, so as Mary was saying, so OCM can do management between the core and the hub and of course we have to allow for interconnective, intermittent connectivity. We're not assuming that it's always up. So actually the ACM spoke will intermittently try to reach back and say, am I still supposed to run? Is there anything else I need to do differently? Typically we wouldn't imagine that that would happen but the key thing is now that we have our edge deployed, our go CDE sits there and starts pinging the get ops repo to say, hey, is there anything that I need to update? And there is, they can bring in the image, the new image or whatever it's told to download, push that to the infrared service as containers and those sit there and run. One of the things that we're showing here too at the edge is that we've used open telemetry in order to gather whatever metrics we decide we wanna gather. Do we wanna monitor the edge health, the CPU, the usage? Do we wanna monitor any of the model metrics? Whether or not those things are, is everything going okay with the models at the edge? And this data of course can also with open telemetry is really cool. It has some pre-processing of data because at the edge you may not be able to gather a whole lot if you're, especially if you're having intermittent connectivity and it can do some pre-processing. Even in some instances there may be secure data that you wanna process out of that before you send it back to a hub or even in there you can send it to a splint device or wherever you want to send. So it's very flexible. In this case we're just showing this one instance where we're gonna send it back to the core to Prometheus. So everything over here running is just whatever we packaged together to be deployed at the edge. In our instance, our OCD, the OCM open telemetry and of course our model. Yeah, in this packaging in ODH, I mean this is the way we found easier to implement the whole flow for operating a model but it's not prescriptive. So if we are in an environment where maybe there is no Kubernetes and there's only support for containers to something like Rail and Podman, we can change the implementation and instead of using our OCD we could use something like Ansible Playbooks to automate all of that intelligence or controller piece. If, I don't know, the hardware is very specific we can add more customization using other tools like customized to account for very of the specificity of the edge nodes and the use case of the user. Something also really cool about the open telemetry is that we are working to adding intelligence with AI as well to the telemetry so we can do things like summarization, we can detect, we can do anomaly detection so you just send information when you know that something is happening or you just send the insight instead of all of that data. So the dashboards look something like this, let me... And again, this is a representation we have with Grafana, we're using also Prometheus, but since we are following the standards based on open telemetry, this could very easily be integrated with other monitoring solutions like by cloud providers, Neuralic data docs, some of the ones that are sponsoring this event and you wouldn't have to do anything neither to your model or anything else to be able to integrate with those type of solutions. So we can see some of the metrics around health of the process itself and all of these metrics are given by the model runtime itself, we actually didn't do anything to that packaging but we are working now on instrumenting that container so we can get more specific metrics for the model performance or we can pass metadata on the model about the use case so if it's a predictive maintenance model, we can have metrics very specific to that use case and we can configure them through instrumentation using again the open telemetry standard. So we can... I think right now what we're seeing is the open telemetry pod and the CRD just to showcase that we can have as many... We can send that to as many receivers in this case we have Prometheus and inside of Red Hat we use Observatorium so we are also sending them there. We're sending them locally to our Prometheus because maybe whatever the edge location is there is someone that's managing the infrastructure and they will need to know if something's wrong so you can configure it in customize it as much as you want. So I think that's all we have. I don't know if there are any questions. It's hard to see. I see, I can't see. These lights are really intense. Hi, you had the last mile provider also marked as potential boundary because customers tend to use that to air gap their systems. Is all of your stack on one side or the other when you have this boundary or do you currently assume you don't have that boundary or solved it somehow by carrying USB drives or something else? Yes, so we... This use case, we chose a use case based on what we're seeing most commonly in the market but all of the components don't have to be in either side of the last mile connection so we are working on designs that could be portable and we are leveraging containers to be able to do that. Okay, thank you. Yeah, very good talk. I was curious whether you've gotten to the feedback loop where you need to sort of retrain based on the data in the field because sort of understanding what are the constraints around doing that? You mentioned it might be air gap or whatever but have you gotten to exploring that with users yet? Yes, so we have different users. The majority are training the model on cloud environments with cloud providers because it's the less resistance path and they are using all these services that basically do all the training for them and there we are working on having all of these pipelines that are triggering the steps to be event-driven. So whenever you have an event of, your model's performance is decaying, the edge could send that event back and that can automatically trigger either a message to the data scientist to tell him you need to train this again or we have use cases where it's very common now to find solutions that are like edge in a box and all of these life cycle is happening at the edge. So whenever, oh, sorry, sorry. No, so you're saying you trigger an event saying you should go look at it as opposed to feeding the data out, okay, that makes sense. Yeah, well, there are other use cases where on the same appliance you're collecting the data. So whenever you see the model performance decaying, you go and trigger an event to see what extra data or what more data have you gathering that period of time. And with that, again, you have an overall orchestration engine saying, okay, we have more data, we need to start again with the training, with the model validation and then we're gonna do the CI and the T and so on. There are other customers that are using the data or the inference data to do labeling. So once you detect that you have outliers or data that it's out of distribution, that the model has never seen during the training, you take that data and you have labeling tools close to the near edge and you start labeling there to reuse it to train it again. So there are multiple use cases, but basically what we want is to give the ability to detect when you need to do a retraining, whether it's for model performance or just you have new data and the ability to automate the whole thing. Any other questions? Well, that's all. Hope you guys like it. Thank you so much.