 Thank you so much for coming to the session. I see a lot of familiar faces. I hope you guys like it. So we're going to talk about how to choose the best Kubernetes to deploy AI workloads at the edge. We're going to talk a little bit more about how to deploy itself. And my name is Miriam Fontanes. I am product manager with the OpenShift AI. BU in Red Hat. And I think this works, yes. I'm Jacqueline Kohler. And I'm a senior manager at Red Hat for the engineering team. So what we're going to see in this session is, first of all, we want to give you our definition of edge. We know that everyone has their favorite. But we want to kind of tell you where our mind is in that. Some use cases that we have seen with different customers on telco that can be used as a pattern for other industries. And then the challenges that we have seen when they are trying to productize AI workloads. So the challenges with serving, managing, monitoring, and observability of these workloads at the edge. We're going to see in a little bit more detail the lifecycle of the model. How do we solve some of the management challenges, the observability. And at the end, we want to give you the general picture of how we have implemented our solution. So we're going to start with what is edge. That is a very overloaded term. We decided to take the more simplistic approach to it, which is edge is everything that is outside of the data center. There are a lot of companies that have been doing edge for a very long time, and they don't call it edge. So like retail, they have had servers on the stores for a very long time, and they never thought about it as edge. But now we are seeing that it's more and more important to put powerful computing at the edge, and that's what is edge computing. And basically, the reason why we are seeing a lot more interest in edge computing lately, it's because of all of the challenges that you have when you have the data sources very far from where you want to do the computing, and all of the costs associated with moving that data between the source and whatever you want to do the analytics. And these costs can be monetary, so moving all that data, it's expensive. But also in terms of latency, when you have applications that require really low latency, like VR and video games or things that interact with end users, latency is super important. And also at the edge, we tend to see that there is a lot of high-volume data with very low value. So we basically find the challenges related to all of that. And as we talk with customers more and more, we were able to identify three main characteristics of edge. The first one is the constraint on resources. So it was not only about the bandwidth and the latency, it's also about CPU and computing resources. We think that edge computing is very complementary to cloud computing. But where in cloud computing, you have these infinite pool of resources in edge, you are very constrained by the actual physical space and the location of your servers. So that's one of the biggest challenges, connectivity. For security reasons, with a lot of our customers, we see that these environments are either air gout or intermittently connected. So always being able to access them to manage them or to get data out of it is really challenging. The second characteristic is about all of these environments tend to be data-centric. So the data gravity is really what is pulling all of this computing. It's in some space of this core edge continuum. And when it comes to data, a lot of our customers are really worried about data. Severity, data privacy, being able to do some sort of computation to move that data. So that's another of the things we're seeing over and over. And the third one is around management. What really sets apart just distributed computing from edge computing is that another resource that it's really scarce in this edge location is IT personnel. So even in a retail store, you don't see a lot of IT people there on a factory floor or on an oil rig or on all of these places you don't have really IT. So you need to be able to have operations that are fully automated, that are resilient, and that are unmanned. They have to have some sort of autonomy. And some degree of the management has to be done at the edge node and some at the core. So those are kind of like the three main things that we saw that were repeated over and over as we saw more customers. So we're going to talk now a little bit more of the challenges of these specific use cases. OK, so yes, we're going to talk on some of the challenges for the use case specifically. In our scenario, we focused more so on telco for our solution. So as you know, 5G is enabling a lot of new business models and service delivery models. And there's a lot of different products and services for multiple infrastructures and software vendors. And with that, there's, do you see here the feedback? Here, read that. Maybe this will be a little bit better. OK, so the first thing we'll cover is manageability. And with all the edge nodes that are being added, there's challenges with being able to manage them across different geographical areas and compute zones. And with 5G networks, you want to ensure that you are delivering on the promise of the bandwidth and make sure that throughput is high, that there's no low latency, and that you're using your capacity efficiently. And so some of the things in order to consider when doing this is to make sure that you have automation in place, that you can manage the edge nodes. You want to be sure that you can remote provision those through your WAN networks. You need a single painted glass somewhere, also known as a dashboard, so that your IT folks, if you have them, can watch to see if there's any alarms going off or what metrics are being sent back. You want to make sure that you can also remote upgrade those sites. And you have to ensure that all of that's going to work and be compatible between the different versions of software throughout the infrastructure. Also, you want to make sure that you have resiliency so that even if you have intermittent connectivity, that you can still run your workloads at the edge. The next thing is latency. And I think this one is easier to explain with an example. So think of driverless cars. So if you have a cloud that's milliseconds, dozens of milliseconds away, that can be a lifetime, especially if you have people in cars, big metal cars, and they're not able to react in real time. So one of the things that you can do to solve that is to put many data centers sprinkled throughout the city. Security is another big one with edge. You want to make sure that you're encrypting your communication between the core and the edge applications. What you can also do is make sure that your edge device is its own single network. You want to make sure that you can somehow ensure that the edge when it has connectivity is running the latest when it's connected and that you don't have a single source of authority. So if someone is able to compromise one of the nodes in your network, you obviously don't want to allow them to be able to compromise the rest. Value of data. So as things get distributed, where you used to be able to look at that data real time with edge, the longer that data is out there and it's not processed or anything is being done with it, then that data loses value. So you want to be able to try to be able to do some processing and some analytics on the edge. And so hopefully when you get to some of our solutions, you'll see how we're trying to solve some of these problems. The next is AI and ML models at the edge. Each edge may have a different use case. So the number of models you may need with different data and building those models starts to grow exponentially as well. To the contrary, we've touched on this just a bit. So you need to be able to do something to gather, store, preprocess, and forward those metrics before you lose the value of that data. And then network automation. So this is key too. You need to be able to apply configurations even during intermittent connectivity so that you're meeting your customers' SLAs as well as making sure that you're using your resources correctly. And so I think that goes to our next one. So we've talked a lot about the problems and the challenges. So now it's the part when we give our take on what could be a possible solution to solve these challenges with telco customers that applies to something else. So first of all, what we want to do, we are talking specifically about AI workloads. And if you see the lifecycle of a model, we have borrowed from software engineering and we divided that lifecycle into two different loops. So you have the inner loop, which is everything that happens before a model. It's ready to be deployed into a production environment. So everything from gathering the data, experimenting, evaluating the model, the data scientist, training with whatever tools they want in whatever environment they want. Once they decide that they have the optimal performance for that model, the idea is that they will store that model into some sort of repository where they will be able to push it out and hand it out to operations that will be in charge of actually putting that model into production. And that includes tasks like reviewing whatever new model is coming, building the model, deploying it, serving and monitoring it for things like model performance, things like data drift, model performance decay to be able to re-trigger, again, all of the inner loop. In application development, this is very important because you want to constantly enhance your application. But in AI models, this is crucial to be able to really have models that reflect the reality and are performing well. So we want, when we started thinking about all of these challenges at Edge and the lifecycle model, what we wanted to you give our customers was the flexibility to perform any of these tasks anywhere in that continuum from the public cloud to the Edge. So if, let's say, for any reason, your inference needs to happen in real time, you want to place the serving part of the lifecycle into maybe a camera or maybe a pole on a street where all of the cameras feed the stream of data. So that's something that we've had in mind and we decided that the best way to do that was to use container technology, Kubernetes, that it's being shrinked down and down. We just attended a talk about Kubernetes on mobile devices. We have MicroShift for very small footprints. So that's kind of like the idea. And then when we started seeing about the individual challenges about managing the lifecycle, we came to some data points where they told us that 70% of business said that management at the Edge was really the biggest challenge they had when they were deploying workloads there. And it's even more difficult because the farther away you go from the core, the more specific the environments are to a particular use case or to a particular user. The hardware, it's not homogeneous. You don't have in the cloud really nice interfaces to be able to provide you with operations. So we decided to try to find a solution for that. And if we saw what were the actual more complex challenges when managing an Edge, we saw that number one, it's the amount of Edge nodes that you have to manage. So for people that are used to cloud operations, maybe you can manage hundreds, thousands of nodes. When you go to the Edge, that could very easily multiply to tens of thousands, hundreds of thousands. So just the amount or the scalability for management capabilities. The second one is the complexity of the applications and of the environments themselves. Again, it's not the same to have hardware with GPU acceleration that can leave in a drone that having factory PC that has different computing, maybe no UI, maybe not any acceleration. So all of these heterogeneous environment need a consistent way to be managed. The next one was, again, what we saw. Disconnected operations. It's really hard to be able to manage something that you cannot see all the time. So we have to account for that on any management solution. And then compliance, how do you enforce that? What you are intending to deploy, it's really what's deployed on that far Edge location. So we decided when we were thinking about it, we came up with the approach of applying GitOps for these management challenges. It really fit well the nature of the problem that we have. First of all, because since we're using Kubernetes, it was a very natural fit. And GitOps provide us a lot of benefits to be able to manage the lifecycle of the model and deploy it. So the first one is that it's a declarative approach. So you can make sure that you declared the desired state of your inference service container in a manifest. And by automating the whole process of upgrading, you are assured that nobody at the Edge site is manually configuring something through the command line tool or through a UI. You are ensuring that you have a controller at the Edge node that it's constantly reconciliating this desired state with what is actually installed. And if somebody makes some change, this intelligent controller will make sure that it goes back to the compliance state. The next thing is that it has ingrained an approval process. So it all happens the same way that in application development, you have to go through a PR process. You can have a human in the loop approving this PR. So you can have automated processes to do the merge automatically. But what's important is that you always have these mechanisms or these gates to make sure that you don't just send anything into a production environment. Version control, it's also really important, because especially for Edge nodes, we see that right now there is not such thing as an equivalent as a cloud provider. So when you run cloud computing, you have the three big ones. And you can just ask for compute resources, and you have it. There is not such thing on Edge computing. So if you want Edge computer, either you own all of the Edge nodes at the site, or you have third parties that deliver things that are now known as Edge in a box. So they go and deliver the box in your environment, and you manage some of it, but it's really more like a black box. And you have cloud providers trying to also jump into this business model. But the important thing is that you need to be sure, no matter who is managing, there has to be somewhere where you can audit what is the different versions or the different configurations that you have implemented. And the last one is because GitOps reverses the management responsibility to the Edge node. It's in the Edge node where the controller is that is checking constantly. You get eventual consistency, and that allows you to scale better. There are a lot of community projects right now that are working on shrinking and shrinking also. The agents to do GitOps, like with RGocD or with OCM. So we found that that fitted really well. So basically what we had was a flow, something like this, for deploying new versions of AI workloads. You have someone, a data scientist presumably, delivering new versions of the model. They push them to a code repository. From there, you have somebody reviewing those PRs. And once they tag that as ready for production, you have a CI pipeline that you can customize. So the more basic pipeline will just download the model, download all of the dependencies, the parameters that it needs. Make sure that you are using secure supply chain to be able to graph all of those dependencies. Do some on-device testing. So that is also something very important when you want to make sure that the performance that you get of the model while you're testing is the same one that you will get on the model while it's flying out there on a drone or while it's on an old rig. And then it containerizes this inference service. Once all of this step is concluded, it will put the new inference service container image into a distribution repository and it will also publish a new version of the desired state of the model into a get repository. You have the key piece for all of this, which is the controller that has the intelligence to be constantly pulling the repository in searching for new changes. But also that controller, it's very specific to the environment where it leaves. So that controller is the one that knows that I am living on this device. So I have to do things a certain way or I have to apply the manifest through command-line tool or using a playbook or a script. So that controller is really key to our solution. And then, yes, so that controller is the one that's constantly pulling, the next one. And you know, these basic building blocks are not tied to any specific technology. So depending on your use case, you can really use all of the diversity of technologies that we have. So for example, for the pipelines, we use Tecton, but you can also use things like Jenkins or any orchestration engine that you already have. For the distribution repository, we use an OCI, which is OCI compliant because it also has a lot of utilities and tooling to do the distribution in remote locations. For the management itself or the deploying of the implementation of GitOps, we use Ocm and the pool feature that it has, it really again fits well because the communication is always from the edge node to the core, asking for new changes. And for the controller, we use a combination of Ocm and Argo CD, but again, if you are in an environment where maybe there is not even Kubernetes, you can use Ansible and deploy directly to let's say Rela and Pothman or to a virtualized environment. And if you require even more customizations, you can maybe use something like Helm or Customize. So all of these, it's again, we are just implementations of the pattern. And in Open Data Hub, we decided to build the abstraction layer for AI or for ML Ops. So we provide all of these configured out of the box so they don't have to craft it themselves. But if again, it's pluggable, it's extensible and the idea is to be flexible. Okay, so let's cover what we did for observability. You know, when you have limited storage capacity at the edge and you have intermittent or congested bandwidth, you need something that can handle that as well as being able to store and maybe do some pre-processing. So we decided to use Open Telemetry for our solution and this is how it works. So basically, we chose this because a backend doesn't matter. It can do some pre-processing of data. It can compress data. It can do intelligent data filtering, which means that if you have sensitive data at the edge that you don't want to sit back somewhere, it can actually help with that and filter on that. And it also supports all kinds of third parties like Splunk, Prometheus, Donut Trace, PagerDuty. So it just seemed like a really flexible and good fit for our use case. And so you'll see here, we have our core and we'll have our edge networks that are running Open Telemetry and it can send data back to the core or you can send it to Telemetry Gateway or you can send it to the cloud that people here at the core can work at the cloud. So it just seemed very flexible and that's why we chose it for our edge data or metric collection. And so I know we've talked a whole lot, so we wanted to put in something that showed you our deployment and how we're doing things and this is the overall picture. So we start at the data science, which we didn't cover that with our model, we're just assuming that the model's trained, it's built. Your data scientist is gonna put it in some kind of storage, whether it's S3 or Git repository. And at that time, the MLops engineer is gonna decide whether or not it's ready and they're gonna put it into production. So the MLops engineer will kick off the pipeline. That will initiate, in this case, to using Tecton to retrieve the model. It will build the model, build the inference service container image as well as test that the model is doing what it's supposed to do before it gets pushed out. If everything passes there, then the inference service container image gets pushed to some image registry. In this instance, we're using Quay, but you could bring your own, whatever you wanna put there. And also, a PR will be posted with the latest metadata in this, what we used as a Git ops repo. Now, what Miriam was saying, we can automate this or if not, the MLops person can sit there and just monitor to see if something got posted before they merge it, especially if it's going into production environment. So then, once that happens, you'll see that you have your edge node there with Argo CD running, and it's gonna keep checking back to the Git ops repo to see if there's been something pushed. It needs to pull down a new container image. So it pulls it down, it gets it deployed on the pods there, and everything just keeps working that way. In between, you'll see that the OCM spoke there. Constantly, or it checks back when it can with the OCM hub just to see if there's any updates. Should the node, should the edge node come down? Are there anything else that it needs to do? And as well as you'll see where we've fit in the data from the open telemetry. And so that is our setup and our deployment. And it's out there working in ODH if you wanna go try it. All of this is Open Data Hub. If you're interested in taking it for a spin. And that's it. Questions?