 Microphone working, can you hear me? All right, so welcome to our talk, streamlining the Kupurd VM creation. Hope you had a great morning so far and hope you can learn something from our presentation. So my name is Felix Matuszek. I'm a software engineer at Dread Hat, working on Kupurd, specifically its instance types, preferences, and its command line tooling. And with me I brought Andre Procorny. He's an intern at Dread Hat and he's also working on Kupurd and helping me with the command line stuff. So let's have a quick look at today's agenda. First, I will give a short introduction to Kupurd. Then we'll have a look at the joys of a declarative API. Next, we'll have a look at how to make the VM creation in Kupurd simpler. And then we'll show a demo of the things we were able to achieve. And lastly, we will point out some next steps for the instance types in Kupurd. So what is Kupurd? It is an add-on for Kubernetes which allows you to run and manage your VMs on Kubernetes. So Kupurd might be an opportunity for companies who already have virtual machines and wanted to give Kubernetes a try. So all you can do with traditional VMs you could do with Kupurd VMs too. But Kupurd brings virtual machines into the container world so you can do container stuff with them too now. So for example, you could use the services and the load balancing of Kubernetes. So Kubernetes has to offer something on top of traditional virtual machines. And also you can combine your virtualized and your containerized workloads. So for example, let's say you have a virtualized application and you wanted to build a new microservice on top of that, then you can run your virtualized application in your cluster and build your new application in a container. And they both can work together, for example, using the same network. And to do all of this, Kupurd extends Kubernetes with certain virtual machine related customer source definitions. And therefore allows you to use the Kubernetes API to manage virtual machines. And because CRDs are not enough to actually run virtual machines, we provide additional controllers and agents. And under the hood, Kupurd is using QMU and Librit to run virtual machines with one QMU process per container. And as you might know, QMU and Librit, they both provide a vast set of capabilities and using those effectively will be the main aspect of our talk today. And if you want to learn more about Kupurd, please visit its homepage or the user guide. You can find the links on the bottom of the slide. And also we have a booth outside, so please visit us. So yeah, what are the joys of Decorative API? Well, as I just told you, Kupurd is providing the vast set of capabilities of QMU and Librit, and it does that with a declarative API as all Kubernetes objects do. And for that, we have the virtual machine customer source definition. And this is very rich, but it can also be very overwhelming, especially for users intending to create virtual machines in the simplest possible way. And also maintainability of virtual machines can become quite hard because let's say you have a fleet of virtual machines and you wanted them to all use the same settings, then you would have to keep track of all the settings separately for HVM, and this is not very maintenance friendly. So let me show you an example. This is the manifest of a very simple Windows virtual machine. So there's already quite a lot going on. And when I first saw this, I couldn't believe my eyes. I was like, there must be an easier way. And I also had some questions like, how do you even know which settings are appropriate for your guests? So how do you know all the settings and how do you apply the same settings to different virtual machines in a reproducible way? And how do you even share a common set of options? So again, my question was how can we do better and how can we simplify the virtual machine creation in Cupert? So let's say for example, we have all the settings required to run a Windows-based virtual machine and we should group them into one building block. So commonly used settings should be abstracted into blocks. And these blocks should also be reusable between virtual machines to avoid duplication. And ideally, Cupert would already provide ready-building blocks so you could start right out of the box and you wouldn't need to search for the appropriate settings for your guests. And please note, this is our approach to this. So using the traditional manifests still have the use case because let's say you want a very precise customization of your virtual machine settings then they can still be very useful. But our use case here is for creating virtual machines in the simplest possible way. So this is a different approach. Before we are looking at our solution, let's have a quick look at previous solution attempts. So there were two. First, the virtual machine instance presets. An issue with them is that they are deprecated starting with the version 0.57 release and they will be removed in the future. They are based on the pod presets API of Kubernetes and this API never graduated from the Alpha stage and it already was removed in Kubernetes 1.20. So it will be removed from Cupert as well in the future. And there was also lesson learned from the virtual machine instance presets. There was no differentiation between resource sizing and runtime preferences. So let's say, for example, you have a Linux-based guest and a Windows-based guest and both should be using four gigabytes of RAM for cores. There was no differentiation between the cores and the other settings for hardware-related settings. For example, what your guest prefers as a disk bus or whatever. And so there was some duplication again because we all had this in one object and ideally this would be split into separate objects. So we would avoid the duplication of the resource sizing and the runtime preferences. And the second solution attempt we had were the templates and the issue with them is that they are a downstream concept by Red Hat. So you can only use them on OKD or OpenShift. They're not usable on Kubernetes and they have another issue. So let's say when you create a virtual machine from a template, you create a copy of the whole definition inside the template. And if you create another VM, you create another copy and so on and so on. And let's say you wanted to improve your template, change something, some setting, then the only way to apply the setting to all your existing virtual machines would be to drop them and recreate them completely and this is also not very maintenance friendly. And let's also have a look at other hyperscalers. What do others do? Well, whether it's GCP, AWS, Azure or OpenStack. If you look at their command lines, they're all pretty similar. So all you need to create a running virtual machine is just an image and everything else is derived from this image. So we thought this was a quite nice user experience and that Kubework should have something else. So let's have a look at our goals again. We wanted to take away the complexity when creating virtual machines and we wanted to group settings into resource sizing and runtime preferences and we wanted to improve the maintainability of virtual machines by making those group settings reusable. And how did we achieve all of this? Introducing the instance types and preferences. Those are new custom resource definitions combining the resource sizing and the runtime preferences. They are available starting with the version 0.57 release and there are namespace and cluster wide variants available. So I told you about ready billing blocks. They could be shipped as cluster wide variants while you could still have your own custom instance types and preference as a namespace object. And one of them, one of each of them can be referenced in a virtual machine. So if you start your virtual machine, virtual machine instance will be created and the settings of your instance type and your preference will get applied to the virtual machine instance. And to understand this a bit better, we have a quick visual overview. So we have two different APIs here. First, we have our instance type API and this defines the instance types and preferences and then we have our core API in Kupurd. And if you have a look at the virtual machine, you can see that you can specify an instance type and a preference in the spec of the virtual machine. And if we go one level below to the virtual machine instance, you can see that there is no more concept of instance types or preferences. So when a virtual machine instance is created, the settings of the instance type and the preference will get expanded and will get applied to the instance, to the spec of the virtual machine instance. So if you would compare a virtual machine instance created with an instance type and a preference or without, you wouldn't know any difference. And speaking about ready billing blocks, we have the Kupurd slash common instance types repository. And this repository holds a set of predefined instance types and preferences. And the goal is to ship them with Kupurd by default in the future. So right now you still have to manually deploy them, but we want to change this. And if you wanted to have a look at those predefined billing blocks, just go to the repository, you can find the link at the bottom of the slide. So how does it look like using instance types and preferences? Well, on the left, this is the same definition I showed you before of the simple Windows virtual machine. And on the right, this is basically the same Windows virtual machine, but this time using an instance type and a preference. And you can see that the manifest has become quite shorter. And you can see that the instance type and a preference are used in about in the middle, but I won't bore you with details here of a manifest. So please have a look at it yourself later. And the last feature was creating a virtual machine with just an image. So we also implemented this in Kupurd. And for that, we are leveraging labels. So images are Kubernetes objects too, and you can label them. And so images can recommend a suitable instance type and preference. And this is all required to create a running virtual machine. The, if you specify this volume or this image for your virtual machine, then the instance type and preference will be inferred. So in the end, this works similar to other hyperscalers and we also provide appropriate tooling to use this feature on the command line. Yeah, so in the end, if you wanted to, then you have no more need to work with YAML manifests at all. So an image is enough to create a running virtual machine. And talking about the command line, we also improved the command line of Kupurd. And the command line utility is called WordCTL. And it was enhanced to match the user experience of other hyperscalers. And for that, several new subcommands were added. So now you can create an instance type, a preference, and a virtual machine with WordCTL, which was not possible before. And now I'm handing it over to Andre to introduce you to some of the subcommands. So as Felix said already, we were working on adding with the CTL create instance type and preference command. And these commands should help you to create the manifests. They will generate it for you. For the virtual create instance type, we have support for most of the parameters that you can specify on your own manually in the manifest. YAML, you can also do it on the command line with our command. For the preference command, we don't have, we don't cover all of these options yet. But this command should serve more like a starting point. So you will avoid writing the wall YAML manifests from scratch. Basically, you will generate the YAML and you will save it somewhere in the file and then you will modify it, modify it based on your needs later. And the outputs of these commands, as I have said, they can be saved in the file or you can even pipe them into Cube CTL or OC Client and then create the objects in your Kubernetes or OpenShift cluster directly. So the last new subcommand would be, would CTL create VM? And as its name says, it allows you to create virtual machines. It's available starting with the version 0.59 release and it provides you with a fixed set of CLI flags to adjust virtual machine parameters. So for example, you can specify a name, a boot volume and you can also specify the instance type and the preference to be used. And again, this command outputs manifests and I think there's a really great way because we didn't want to reinvent the wheel here and create another Kubernetes client. So all it does it outputs manifests and if you're familiar with Kubernetes, then you already know how to work with manifests. And I think you can also use it for example in a script and just pipe the output into OC or Cube CTL. So I think that's quite a elegant way. So now we will present a short demo of what we were able to achieve. And we wanted to make this demo focused on people who already have a running Kubernetes cluster and who wanted to give Kupert a try. So you should already be a bit familiar with Kubernetes. And so the first step would be how to deploy Kupert. And the second step will be creating an instance type and preference. And as I told you before, you could use predefined instance types of preferences, but in this demo, we will be creating our own. And the first third step will be finally to create a virtual machine with an inferred instance type and preference. So first, how do you deploy Kupert? Well, there's one prerequisite and that is that you need a Kubernetes cluster which has virtualization enabled. Then the first step would be to deploy the Kupert operator and you can do this with Kupert CTL apply. And here we are just applying the manifest of the Kupert operator from Kupert's releases page. The second step would be to create the Kupert custom resource. This will trigger the actual installation of Kupert. And again, we're using Kupert CTL apply here and applying the Kupert custom resource manifest. And then the first step would be to wait until our components are up. So we can do this with the wait command of Kupert CTL and we can just wait for the available condition of the custom resource. And one more notice. This is still using the release candidate zero manifests but the version 1.00 will be released in July so please give it a try when it's released. And next we will be creating our instance type and preference. So now when we have a running cluster with Kupert deployed we can start creating our manifest that we will use with our virtual machine. First we will create the manifest of virtual machine instance types and to do so we will use the command that you can see here. And in this example we are specifying the CPU and memory flex which are required ones but there are even more flex that you can specify. For example, input output threats policy or if you would like to use the GPU with your virtual machine you can use the GPU flex and it will pass through the GPU to your VM. And below the command there is the example of the output which you will not see here because we are piping it into the cube CTL to create the object in our cluster. Next step will be the creation of virtual machine preference manifest. To do so we are again using the virtual CTL create preference. Here we are specifying the CPU topology and we are setting the value to prefer course. Other option for this flag would be for example preferer sockets or preferer threads. And of course this command has even more flex you can specify but as I have said already before it should serve more like starting point to avoid writing the wall YAML from scratch. It does not cover all of the options yet. So now that we created our instance type preference we can finally create our virtual machine with the input instance type preference. And for that two steps are required. First we need to upload a bootable image and label it accordingly. For that we can use the image upload command of virtual CTL. In this case we are uploading our image into a PVC persistent volume claim called my image. And then we're using the default instance type and default preference flex to specify the instance type preference we just created. Then we're using a size of one gigabyte. We're using the force bind option to avoid waiting on the upload when we have a storage with wait for first consumer. And then finally we're uploading a seros image which is just a simple testing distribution which is quite convenient to use. And finally we can create a virtual machine with the inference enabled. And for that we are going to use the create VM command of with CTL. We're giving our VM the name myVM. And then we have the infer instance type and for preference flags. Those are still required but we plan to drop them in the future. So imagine they wouldn't be there then the command line would be even shorter and simpler. And lastly we need to specify our boot volume. And we do that with the volume import flag. This will create a clone of the image we just created. So our VM would also have persistent storage. We're giving it the type of our image. That's a PVC. We're giving it the name myImage. Then we're using the namespace default because we didn't specify any other. And again we're using a size of one gigabyte. And all of this is piped into a cube CTL create and this is enough to create a running virtual machine. So let's have a look at the results. We have the instance type, the preference fields of our virtual machine here. And you can see that the virtual machine is using the instance type, the preference we just created. So the inference was successful. We didn't specify this in the virtual machine. This was inferred from the boot volume of a virtual machine. And then there's the third field called revision name. This is used to create controller visions for our instance type and preferences. So those are freeze in time versions of our instance type and preference. So let's say for example, you wanted to stop or start your virtual machine. With the controller vision, it will always use the same settings which it got when it was created first. So there wouldn't be any changes without clearing the revision name. When you clear the revision name and start your VM, a new controller revision is created and the new changes are picked up. And if you have a look at our virtual machine instance, specifically at CPU and memory, you can see it's using two cores. So our instance type said it should use two CPUs and our preference said it should use cores. So we're doing that. We're using two cores, one socket, one thread. And our instance type also specified that we should be using 256 megabytes of memory and we're doing just that. So this is all for a demo. And last we have some next steps. So currently we're shipping the version one beta one of our API and we wanted to ship the version one with a keyboard version 1.1 or greater. And for that, we still need to improve our width city alfax. So as I told you, we want to enable the inference by default. And we also want to deploy the common instance types with the word operator. So we can use them right off the box. And then we still need to make various improvements to the controller revisions because right now for every virtual machine, a new controller revision is created. And so we get a lot of duplication. There's no de-duplication between them. So let's say you have an instance type with the same settings and you create two virtual machines, you get two controller revisions with the same contents and we wanted to improve on that. And of course, lastly, we want to fix all the bugs because who wants to ship any bug with his version one. So that's it. Do you have any questions? Okay, so the question was if we plan to release a tool for migrating from over to Kupert and to be honest, I don't know. I know that there are tools available, but I can tell you right now, sorry. The answer was that there's a tool called MTV. You should have a look at this. Any other questions? Question was, if I understood right, if we have the possibility to create custom instance types and preferences or... Yeah, of course, you can create them yourselves too. So you can either create them as a cluster-wide object or you could create it as a namespaced object. Other questions? Question was how well Kupert is integrated into Kubernetes and if you can use the same tools to manage your virtual machines and containers, and I would say yes, yes you can. It's quite the same. So, virtual machines are just another object in the Kubernetes API and so naturally you could use all the tools you can use with containers. You can use them with virtual machines too. So for example, you could use Argo CD to roll out your virtual machines. There's no issue with that. Nothing different to a deployment or to a pod. So I think there are not too many differences there. Does that answer the question? Question was if there's also an UI to Kupert, not just the command line tooling. So answer is sort of if you go to OKD, you look at its console or OpenShift, that's the same console. There's a UI available you can use with Kupert. So partially, yes, but not on Kubernetes itself. The question was if the UI is also making use of the feature presented today, yes it is or at least it's planned to, it will be in the future. We will make use of it, yes. Question was if there are security relevant settings in virtual machine, right? Sort of if you, let's say you wanted to break out to your host, that was the question, right? Yes. To be honest, I can't answer right now, so. I think there is, but sorry, can't take this question. But please come to the booth, maybe someone else can answer, so. Any more questions? Okay, that's it, thank you. Hi everybody, welcome to our talk about how the CKI team, which I'm part of, is using spot instances. So the title was made at the time when I thought I'm going to give an introduction in many ways of how we would launch spot instances, but then after time, when I actually started to prepare the slides, it turned out that there are not so many ways to actually launch spot instances correctly, so the title might be slightly misleading, but maybe it's interesting, right? So I'm going to talk a bit about why the kernel, CI environment actually uses spot instances, what it looks like when they fail in various interesting ways. Then because we just didn't know anything really about them, some of the things we learned while investigating the underlying issues, and then what we did to improve things so that they don't fail, right? So let's start with an introduction. So I'm part of the continuous kernel integration team, which is a team that provides pipelines, CI pipelines as a service to the internal and external kernel development community. So the high level goal is that we try to prevent bugs from actually entering the upstream kernel by testing as early as possible, but a lot of our time is actually spent testing during the real development process. So nowadays, kernel developers sit on githlep.com, like the internal ones, right? Like not the upstream ones, but the internal kernel developers, they sit at githlep.com, they file merge requests and they get pipelines, and CKI is responsible for keeping track of the system and making sure it runs. So I linked the project documentation, all our code is on githlep.com, it's open. So you can take a look, you can send merge requests if you want to contribute anything. But yeah. So our main job that we figured out after a while is actually that we run this service and the two things we mainly do is we make sure that we can build the kernel or that the kernel for these merge requests is built and we use spot instance for that, so that's what the talk is going to be about. And then we also test the kernel or these merge requests or also the RPMs that are finally getting built depending on what changes are there, trying to gate stuff so that we don't break composers, those kind of things, right? And I know there's some improvement there, especially around breaking composers and stuff, right? More than that, yeah. So for the building part, what this breaks down to is that we built all the four supported internal architectures like x86 on PowerPC and S390x and that comes down to about 300 hours a day of building pipelines that are spent building draft versions in merge requests that then also are getting handed to QE for testing those kind of things. And because this is the kernel, these are pretty beefy instances. So they have 16 CPUs, 32 gigs of RAM, a local SSD for all the temporary storage during the RPM build. And these things are not cheap, right? Like these things cost in general about 70 cents an hour, like US dollar cents, US cents doesn't matter too much. So we try to optimize that and that's why we use potences in this case, right? And at the bottom you see a plot of the, I think what they have on this slide, of the hours. So you see this nice pattern of the last couple of months where we kind of see that kernel developers are actually doing stuff like proposed merge request. And the workflow is in a place where they actually really use that, right? Like before they would send a patch to a mailing list. They would require, would be required to build it locally. They would send a scratch build to Boo to get it built. But nowadays it's basically get push and boom, there comes a pipeline along. And this is what the pipeline looks like, right? So this is the general CKI pipeline, which after on the left in the prepare stage, you get the usual thing you would expect, right? Like there's some merging going on where the merge request is merged into the target branch. So you get like a merge result pipeline. Then it's getting built, which is the interesting part I'm going to talk about. We are publishing repo files, so QE can actually test these things. And then we also tested ourselves. So there's some automated testing going on in B-curve, which is a couple of hours on all of these architectures, making short boots, LTP, if it's a storage change, there's a storage test running, those kind of things, right? So there is automated testing and it could all be better, but it's also not bad. And the way this actually works for the building part is that we use native GitLab CI pipelines. So I'm not sure how many of you used GitLab CI? Couple? Yeah. How many used GitHub actions? About the same. Okay, we're getting there. So the way this works on the GitLab side of things is that there's a job, which is basically an expanded JavaScript. And it gets handed to a piece that's called GitLab Runner, which is a service running on some machine. And what this thing normally does is spin up Docker containers somewhere, but let's say on the host and then it executes the JavaScript on this container. That's basically it, right? Like there's magic involved, but this is basically it. But now, if you spin local Docker containers, you are constrained by the host. So if you want to have dynamic scaling by using something like OpenStack or Hyperscaler, there's a piece involved that's called Docker Machine, which is an interesting piece of software that's abandoned upstream, so it was developed by the Docker folks, which creates a VM on a Hyperscaler, basically just calls the API, that I don't know whether any of you were on the Cube world talk before, but they gave a nice list of what this involves. It's not a lot. And then it installs Docker on this newly created machine and then it forwards the Docker port of the Docker on this VM to the local machine where GitLab runs and then it basically talks to the remote Docker daemon along the socket. So that's all there is. And there's basically no difference between a Docker container running locally or a Docker container running on this newly created VM. So this is a pretty, I find a pretty simple, nicely working system. Now, it was not the direction that the Docker folks wanted to take the whole thing, so that's basically why it's abandoned. GitLab forked it to be able to give it updates. And if you drill down into the Golang source code that it is at the end, there's basically one API call which is called request spot instances. And the way this is normally configured is that you give it an instance type and you give it a data center where you want to have your machine created, right? Like that's basically, like you say, like, give me this machine in this data center and boom, there it comes, right? And that's all there is, right? And to say, before we used AWS for spot instances, we used OpenShift cluster. So Retta has some interesting, huge OpenShift clusters internally that have huge nodes, PSI OpenShift, I don't know, people might know it. And they have nodes of 128 cores. So you can actually spin these containers on an OpenShift cluster as well. But a cluster is in some way static. So, I mean, you have a certain amount of nodes. They might scale up. They normally don't scale down. And if it's a bare metal cluster, I mean, these nodes, they are bare or they are not, right? So switching to something like AWS provides for a more, far more efficient environment where you only get these machines when you need them, right? Like if American current developers wake up, they will do it. And at night, nobody's running any of those pipelines normally. Yeah. So the question was, are we using cross-compilation or native compilation? It depends. So we are doing native compiles on ARM64 and X86, nowadays on AWS. So we're using the native machines there. For S390X and PowerPC, we are mostly cross-compiling them. And then we are natively compiling some pieces that don't want to be cross-compiled. So the kernel, there are some interesting aspects to BPF or whatever. I like the kernel tools. Yeah, some pieces don't like to be cross-compiled. Officially, the official RPMs built on brew, they are always built natively. But for our CI pipelines, where it's really the point of actually providing results fast that are good enough. And there have been discussions where this breaks down. But for nearly all, the cross-compilers work. They are supported by the compiler team. So this is actually a supported workflow for this case. But for the official RPMs, they are natively compiled. So this works nicely. And we were really proud of ourselves. We moved off the OpenShift clusters that were failing once in a while, that had scalability issues. And we moved to AWS and it's stable and it's a hyperscaler and it's all dandy, right? Like that was two years ago and it worked nicely until the end of last year. Right before the holiday season, beginning of December, our jobs started to fail. And so this is what it looks like when a GitLab job fails. So normally there's stuff up here. So it starts at line 104. So there's further things up there that actually worked. And at a certain moment, you get a message and it tells you, I cannot connect to the Docker demon running on this VM. And the reason it can't connect to this is that this VM shuts down, right? Like if it shuts down, there's no Docker demon. So yeah, basically it fails. And now these are jobs that take half an hour. So if you have this thing running for 20 minutes and then it shuts down and then you repeat the job and you repeat it automatically, but you just, you waste 20 minutes of 70 cents an hour. And if you do this in parallel, that's actually not really going for. And especially for the current developers, that's affecting their workflow, right? Like it's, these failures are annoying at best. And they're not just annoying, but they're really actually really breaking the process. So this is first of the summer. Summer around noon, Americans tend to wake up around the time, do a Git push or something on their merge request. And then it just explodes, right? Like these are within the time of 15 minutes, like huge loads of jobs. Just failing and we iteratively, or we continuously restart them on an exponential scale. So they get restarted directly and then after 10 minutes and then after 30 minutes. And so basically it's already on the 10 minute mark. So it tried a couple of times already and it just can't get anything done. So this is what started to happen beginning of December, right? Like when everybody goes on PDO already. And basically it says 30% of the jobs failing on a daily basis. But actually what it meant was that during the American work day, nothing would actually compile. And so we started to dig down into it. And if you look at the logs on the runners, you find two error messages. And one is the first one, which is unfulfillable capacity, where AWS helpfully tells you that there's not enough capacity available to match your request. This is what you get if you do a spot request. Doesn't matter what you put in as a prize, basically. I mean, first they take them away from you and then they prevent you from spinning more, right? Like, so this is the first thing you get and I don't know what the son knows them for the image builder team, but this is what you normally see when spot instances are not available anymore. But then obviously spot instances. Just let's just throw money at it. It's Christmas, right? Like holiday season, we have some cloud budget left and we switched them to on-demand instances, which are the normal ones. And then helpfully, AWS came back with a different error message. In this case, it was insufficient instance capacity and it was also the first time we've seen it. And they are actually more specific. They tell you like, the instance type you asked for is not available in this data center availability zone. And then it lies to you because it tells them, it tells you that they are working on increasing capacity. They are not, right? Like it's just, no, obviously not, right? So that was not helpful. And then we kind of like panicked and we started to switch instance types around and we kind of like recovered a bit, right? Like in panic mode. And that was the moment when we kind of figured that we just didn't know anything about spot instances and how this all worked, right? Like because this was not supposed to happen, right? Like it's a hyperscaler, right? Like you ask, you get, right? Like it's capitalism, right? That's how it works. So money in some direction and then supply follows, right? Like that's just how the world works on this side of the world at least, right? So what is this actually? Like what is the cloud? What's a hyperscaler? Now, if you ask Wikipedia, it tells you that a hyperscaler or hyperscaling is the ability to scale with demand. Now, obviously that's a lie, right? Like because it did not, right? Like a hyperscaler did not hyperscale because we asked, we had demand, but they were not able to deliver. And then going to the specifics of AWS Cloud, looking into various sources on the internet, you find something that they have like about 30 regions, let's say cities, something like that. And then they have data centers, availability zones in their most availability zone is a data center. And each of these data centers has a couple of servers, right, like above 50,000 per data center. And these are, I think, big hyperscalers. I'm not sure, yeah. David Duncan is not here, but I've seen him. I feared he would be here and would just tell me I'm full of shit. So that's my interpretation, right? Like I haven't talked to anybody on the AWS side. So what is AWS EC2 then? That's AWS renting you compute. So you can go there and say like, I want to have a server and they give it to you. And you, depending on how much money you want to spend, you can get different guarantees from them. So you rent the machine on the go, pay per minute, that's on demand. You can get a spot instance, that's what we are using. These are cheaper for various reasons. And the main reason is that AWS can take them away from you. Again, when it decides for whatever reasons that it needs to. You can also go down to savings plans and reserved instances where you commit to use a certain amount of compute or a certain amount of compute per instance type. So that gives AWS some way of actually figuring out how much demand there normally is going to be. And we felt good about ourselves until this happened because we followed the recommended strategy of saying like, well, it's steady workloads, we do reserved instances, savings plans, those kind of things. Interruptible workloads, spot instances for all the rest, bursty stuff on demand. So that's what we did, right? Like that didn't help now. So we read a lot of stuff and we still didn't know what is actually going on. Now, on the word for the interruptible, that was something new that we didn't know is actually that AWS tells you a bit more about what it actually means. So you get a spot instance and it can be interrupted. You get two minutes, which is good enough for shutdown, but not much else. But they actually also tell you what the chance is that your instance gets terminated. So there's a helpful table and there's an API call and you can actually ask them, how lightly if I ask for this instance type, is it that you will take it away from me within a couple of hours or whatever. And normally this is below 5%, so basically there's enough. But it goes up to 30 and further, right? Like so if you ask for the wrong stuff, you will actually have a very high chance that stuff terminates within five minutes, 10 minutes, a minute, right? Like it might actually just be really impossible to use as a spot instance. And then there's a bit, the problem of the whole thing, right, like so there are real data centers somewhere, right, like this is not the cloud, this is a data center and it has servers and it has cooling, it has power, and these instance families are real machines, they are real hyperscalers sitting somewhere. And they have a certain number of them, right, like they don't scale, there's no scaling of a rack, right, like just there or it's not. And to give you an instance when you request one, it needs to be there. If you want to request a non-demand instance, it needs to be there in this data center. And they can only give it to you if it's not used by somebody else. And the spot instance is kind of like buffer this range of instances that are available but nobody's using yet, right, like so basically you can use them until somebody wants them. But obviously there comes this moment when there's just nothing there anymore and then even on-demand requests fail. And now actually spot instances change in price and this is the plot at the bottom. This is for the last three months. I was too late with making my slides so I don't have any plots of the time when this all happened, right? But you can see at the bottom you have the black line which is the cost of some instance, pretty cheap. And you can see how the spot instances which are the colored lines at the bottom actually go up over time. Different availability zones, different data centers and they all kind of like follow the same trajectory. So somebody was asking for spot instances, they got more expensive. But then what we did actually was not in any way taking that into account, right? Like we just asked for this one specific instance type in this one specific data center. So there was actually nothing AWS could have done to help us, right? We just asked for the server and there was none. And this is basically what these messages mean. It means we don't have these machines available and yeah, okay, then a hyperscaler is also just bound to reality. So we were thinking about like what to do, right? Like we are engineers, what do we do if you have a problem? We add another script, another tooling, whatever. Our idea was to find out how can we specify multiple instance types? How can we specify multiple data centers? Can we maybe pull these APIs and figure out what to do about it? Maybe we can have something automatic that switches it around and stuff. And then somebody not known to the world of engineering suggested we should actually read the documentation, right? Crazy people, right? Like that's just not what happens normally, right? If you don't just read documentation. But yeah, okay, we did, right? Yeah, so we read the documentation and there are about five different ways to create instances in AWS, right? Like historical reasons, most likely, right? Like five different ones, right? And they are in the table. On the left, the most well-known is run instances and we've seen it in the Kube world talk before. That's the one you find all over the place, right? It's the one you, it's named after run instances. Perfect, it's not recommended, right? Obviously not. And then there are two right below that have spot in their name. They give you spot instances. Run instances also can give you spot instances. But the other two that have spot in their name, they give you spot instances. You're also not supposed to use them, right? Like AWS tells you, don't use them. Just, they are named like that, but you should not use them. What they tell you to use are the two at the bottom and one is called create fleet. So this sounds like, I don't know, Ocean Warfare or something, right? Like, I mean, wait, okay. And the other one is auto-scaling, create auto-scaling Kube. I mean, I don't want a Kube, but like it shouldn't auto-scale, I just want a machine. But that's the other one, right? Like they say, take these two, okay. Which we did obviously not, right? So that we figured it was our problem. I remember we knew that before. And so in the middle, reading a bit of documentation about them, looking into what we actually wanted to do, which was like specify multiple instance types, specify multiple data senders. The three at the bottom actually give that to you. So they allow you to specify this in the API call. So you don't need any custom tooling. It's a sad world, right? Like you can't write code. You just need to change this API call, but okay. So the two rows still look good. And if you look like into the purpose we have, we want to spin one machine temporarily and then shut it down again. They are, it depends a bit how hard they are to use. And so create fleet is actually the one you want to use. You might not know, right? Like we did not. The internet also doesn't know. I don't know what chat GPT would tell you. So we would, somebody can ask. It would be something if you're abroad, right? Like ask chat GPT what it would recommend to create an instance on AWS. Would be interesting, right? Like maybe it has catch on, but maybe not. So we used request spot instances. We should have used create fleet. That's what we would want to use. There are a couple differences. I would just mention them. One is that request spot instances is simple. You give it something you want to launch. For CPUs, 16 gigs of RAM, it will give it to you. And it can also use something that's called a launch template, which is like a template where you can specify something that you can reuse and kind of like not have to specify it over and over again. That's basically all there is to request spot instances and create fleet has a couple of more cool features. One of them is that you can vary this launch template. You can say like maybe do it in data center A, B or C. It's your call AWS. You can be more generic for instance requirements. You don't have to say like give me this specific instance type like this magic number that nobody can decode. But you can actually tell it, give me something with four CPUs or more. And it will do that. And the last part is the allocation strategy where you can tell AWS by looking for an instance, please look that this instance is cheap and available, highly available, something that will not get killed. So create fleet is better, right? Like AWS, right? Most of them, they're right. Most of the time so. And it's not hard, right? Like it's actually not hard. The sad thing about the whole story is that it's actually simple to fix. So if you want to create a launch template, which we did not, you specify this YAML file converted to JSON and feed it into the API called at the bottom. So there's an image ID, you give it a name, the image you want to boot and secure shell key. And then you say like, do we have here 16 CPUs, 32 gigs of RAM, please don't use you really, really old instances that don't boot on real 10 anymore and put a local SSD on those things. And they will give it, it's perfect, right? Like it's great, creates a template. And now if you look into it, there are actually 11 different instance types available, right? Like we asked for one, but AWS says, we have 11 of those, right? Like just interesting, right? Like we didn't know, yeah. Anyway, and so the second part that you now need to create this instance, using create fleet. So you say like, I want to have an instance, instant instance like now, please. Should be spot instance and spawn it into one of these subnets that are available. And that's it, why do you do this? You get an instance ID, it all works, right? It is that simple. So this was kind of like amazing in some way. Now the problem was, yeah, we didn't read the documentation, obviously. We still don't read the documentation, but we know a bit more about the AWS API and they do stuff to it. So like stuff like these allocation strategies, this is pretty new. Use create fleet, I can just implore you to do this or use order scaling, order scaling groups. This price capacity optimized strategy is basically what you want to use. Like something that will not get killed and is cheap. Problem is nobody uses create fleet now, right? Like it's not something that is very common yet because everybody gets told use run instances. And so the thing we use, like GitLab Rano uses Docker machine, does not support this because it's abandoned upstream. So what we did in the weeks towards the holidays last year was basically implement support for that. We forked it, I learned Golang, which was interesting. We put this in there and since then we've never had any troubles anymore with spot instances. So there's a happy ending to the story of CKI against the failing spot instances, right? So it all worked out. It was interesting, it was an interesting exploration and I thought that was a good reason to share it. So that's it. Do we have any questions? Yeah, so the question is, did it result in cost savings actually using this new API call? Not really that much because these things are mostly similarly expensive. But yeah, it was a bit cheaper than retrying the same jobs over and over again and failing over and over again. So we were just wasting a lot of money trying to get stuff done but eventually not actually getting it done. So the instances we might have gotten now, they might actually be more expensive but because we only run jobs once in total, it's I think similarly expensive as before but far more reliable. Yeah, so the question is, are you able to restrict regions? Yeah, so this API calls are very specific to regions normally. So you actually say in this region and then in one of the data centers but this is within a city basically. So this is like in the Amazon world, a region is let's say a city and then availability zone is something like a data center. Yeah, the question is, is it possible to say like I don't care? No, it's not possible to say that and you also don't want to because one of the most expensive things in AWS is data transfer between various things. They are mostly free if you stay within a region, especially S3. If it's all in one region, it's all good but as soon as you move one of those out, it gets really expensive. Yeah. How does this work because we can go to Amazon? Yeah, so the question is, how does this interact then eventually with Pico where we are going to test these corners? So we are building them in AWS, we are pushing them mostly to S3 buckets in AWS and then because transfer is expensive, transferring it into the Internet of Red Hat is expensive, we'd only do this once, we copy them into an internal S3 bucket and then Pico boots off a compose something internal, adds the repository with these custom kernels, installs it, boots it, checks that it's actually the one that's booted and then the testing progresses. Yeah, so yeah, kernels are just normal RPMs. This is just a default repo and we tell it, test this kernel and we make sure that it's actually also booted because that's kind of interesting. Yeah, so the question is, did we try to actually limit the amount of jobs that it would spawn? Yes, we tried, but very fast, it was actually the case that it would just not spawn any instances. So it was a single data center that was used by others because our load did not really increase over these time periods. Some other pesky customers came in and took our instances. So actually we were in the spot where we could not spawn any of these machines and our theory is that because we require these local SSDs and we think that there's a rack and a couple of SSDs that can be attached and if somebody else wants them, I think there are far less SSDs than VMs that you can put on there. So if there's no SSD, there might be compute, but there's no SSD, so you don't get any instance, right? So the question is, did we actually observe any intermittent patterns of if there's somebody else needs spot instances? Not really, so we did not really figure out what was actually going on at the time, but something unrelated is that in the last couple of months, since about March, spot instances have gotten far more expensive than before. They settled around 30, 40% of the on-demand price and now in a lot of regions and a lot of data centers, the prices have actually approached the on-demand instances. People have looked into this and it corresponds with a higher rate of interruption. So it's not just Amazon being greedy, which might be something you would think, but it seems like somebody or maybe because of cost savings, people move more to spot instances. So there's more demand for spot instances and so kind of like Amazon, I think the algorithm in there tries to provide a signal for people to know, oh, don't use those types because we don't have enough of those and I think there's something happening in the background, but if you don't use create fleet, you don't react to the price indicator. I think AWS wants you to actually use these API calls so they can distribute load across data centers, across instance types, but nobody uses the API calls so it doesn't work for them or not good. So the question is, did we actually look into multiple regions? Yes, we thought about it. So we thought about multiple availability zones. Now these, our availability, like the networks we spawn, some of them have internet connections. They are limited to certain data centers at the moment. So that's one of the things we would want to enable, spawn across multiple data centers, spawning across multiple regions is interesting because as I said, data transfer costs actually factor in quite a bit and so we would need to have S3 buckets in these other regions. There is a transit gateway into the Reddit internet that would need to be provided there and also we have an OpenShift cluster that does a lot of the magic behind also what goes into the data and this one also runs in the same region. So around data transfer costs, this design kind of makes you not care, which is bad in some way because it's cheap, right? Like these jobs pull gigabytes of cash files from S3, it's free, right? Like it's a very efficient way, it's cloud native, whatever, but if you want to split this across regions or move to another region, it would get prohibitively expensive. So we thought about it, but we didn't see any way yet to do this. So yeah, so there's the final answer. So the question is like, they've seen the same problem. I don't know what team you are on, but so other people have seen it at the same moment in time in US East One, which is like the most popular one, right? We are not planning to move to other regions, so you're free to move there. I know that Reddit has internet connections in other regions as well, but I mean, if you have better experiences there, we will follow you maybe. But yeah, it's all of them, right? Like it's okay. Okay, so yeah, it's one of those. Okay, so the remark is that actually it might not be related to the SSDs because they haven't used SSDs on their instances, but the problem was very much the same. Okay, thank you very much. Okay, so hello everyone. I'm Aitamar Holder. I'm a senior software engineer working for Red Hat, and I'm a Qiver developer. And today I'm going to talk about the journey through supporting VMs with dedicated CPUs on Kubernetes. And the reason that I put the journey word into the title is because this was a true journey for me. And during trying to solve this problem, I've stumbled upon many cool technologies and cool facts about them, how Kubernetes is implemented under the hood. And basically this talk is going to be divided into two. The first part is introductions to many technologies, cool facts about them. The second part is the actual problem that I'm trying to solve. So we're going to talk about CPU manager, Kubernetes resource allocation, C-groups dedicated CPUs, pod isolation and many other interesting stuff. So let's begin. So first of all, an introduction to Qivert. So let's say that you need to run both containerized and VM workloads. And I'll talk about use cases in a second. So on the right here, you have your Kubernetes stack, which is designed to run containers. But as you probably know, containers and VM are designed completely differently. And you can't just run VMs on plain Kubernetes. So traditionally you would have a different platform to run your VMs on. So that's a huge disadvantage because now you'd have to gain knowledge in two platforms, need to maintain the both implement, logging, monitoring between them, make them communicate with one another, which is a huge burden. And we would much prefer that we would have only one stack to run all of our workloads on. So that's basically what Qivert is. It's an extension to Kubernetes that lets you run VM workloads alongside container workloads on Kubernetes in a cloud native way. And when I say a cloud native way, I basically mean that these VM objects behave just like any other Kubernetes objects. So, but what are the use cases to run VMs today? So three main use cases. The first is legacy workloads. So let's say that you have a big business with many VM-based solutions and you want to make a transition into making all of your workloads containerized. This transition might take a lot of time. And for big companies that are making this transition, we wanna support these legacy workloads and to run everything on the same platform. Another use case is VM bound workloads. And what I mean by that is that some workloads need to emulate their own kernel or need to emulate their own hardware. So they simply don't fit to the containerized model. Another interesting use case, which is pretty new, is a project that's called Cupcake, which stands for cluster API provider Qivert, which basically lets you bring up VMs, Qivert VMs, install Kubernetes on top of them, and then you have Kubernetes inside Kubernetes. So that's another cool use case. And I'm not going to dive into the architecture of Qivert too much, but the basic idea that you need to understand is that the trick is that we're running a VM inside a container. So inside a container, we would have a hypervisor running the guest and this is all wrapped inside a container. So why do we care about VMs with dedicated CPUs? So it's crucial for certain use cases like real-time VMs or VMs that depend on low latency. And the key point that we need to understand here is that we need to avoid context switching the guest. So if you depend on very low latency, you don't want your guest to be context switched out and then when something happens and you need to respond really, really fast, there's an overhead of context switching the guest back and we want to avoid that. And another thing is that it's widely supported by most regular hypervisors and we want to bring it into Kubernetes as well. So a question, does anybody recognize this or know what this means? Okay, great. And how many of you actually know exactly how it's implemented behind the scenes? Right, okay, so this is relevant. So first of all, let's talk about what are containers? So containers is basically an idea. It's an abstract concept, which can be implemented in many ways. And if you'll go to the kernel and ask it, like, do you know what containers are? It would say, I don't know, because from the kernel's perspective, there is no such a thing as containers. There are building blocks and if you're using those building blocks, you can implement a container with them. So let's talk about these building blocks. The three main building blocks for containers are C-groups, S-Linux, and namespaces. And very briefly speaking, C-groups is responsible for resource allocations, S-Linux for security, and namespaces for isolation. So let's dive into C-groups for a second. So basically, C-groups lets you split the resources of the node between groups of processes. And the architecture is there's a tree of resources, so in this example, you can see, for example, let's say that on our system, we have 100 CPUs. We can split them into children, and then eventually, every process runs is attached to one C-group. So for example, these group of processes would be limited to 10 CPUs. Another thing to know is that in C-groups, there are subsystems. So for example, this is the CPU subsystem. We also have subsystem for memory, IO, huge tables, and a bunch of others. And in the Kubernetes model, each container, sorry, gets one C-group. So just a word about C-groups V1 and V2. So C-group V2 is the new version. It was introduced in March 14th, which is marked just because it happens to be my birthday, but never mind. There is no backward compatibility whatsoever to V1. And the basic idea is that V1 was too generic. It didn't restrict you at all, and therefore, it was really error prone. It was very easy to misconfigure it, and therefore, they're very hard to debug it if you do so. So the idea in V2 is that you have much more restrictions. It's less generic, but it would be less error prone. Another thing is that we have a unified hierarchy. So you remember when I talked about the subsystems? In V1, basically, there is a different hierarchy for every subsystem. And that also might cause a lot of trouble because the different subsystems are not aware of each other. In V2, we have a unified hierarchy, and in every C-group, we define all of the different subsystems. Therefore, they're more aware of each other, and again, less error prone. Currently, both versions are supported, and V2 was GA in Kubernetes 125, so it's still relatively new. Let's talk specifically about the threads model in V2. So in V1, we have no restrictions whatsoever on threads. You can do whatever you want with them. And it was pretty nasty, and nasty is not my word. It's from the actual official kernel documentation. That's what they said. So basically, I wanna talk about two limitations when it comes to threads. So first of all, threads must live under the process subtree. We can just take two threads of a process and split them between different groups. They have to be under the process subtree. Another thing is that if your C-group is threaded, then you can use only threaded subsystems, which means basically that you cannot use a lot of the subsystems, which is a huge limitation. Okay, so in Kubernetes, all of the values are always absolute. So for example, when you're defining a container, you would define it with 100 mCPUs, which equals to 0.1 CPUs, 1.3, whatever, but these are all absolute values. In C-groups, we have relative shares that call CPU shares. And basically, as opposite to Kubernetes, they're entirely relative. So let's say that we have only two containers or two processes running on a node, one with one CPU share and one with two CPU shares. So the one with two CPU shares is gonna have twice a CPU time. It doesn't matter how many CPUs we have in our system. That's completely relative. So how does Kubernetes convert between the absolute values and the relative CPU shares? So we can say that one CPU is 1,024 shares, just because it's the default. So if someone needs 200 mCPUs, which is basically one-fifth of a CPU, all we need to do is divide 1,24 by five, and then we get approximately 205 shares. But remember that shares are still relative, so there's a nice side effect here, because for example, if all of our pods request only 50% of the CPU on the system, the other 50% is going to be split relatively to the container's request. So basically, request is the minimum amount that's allocated. Let's talk about Kubernetes QoS for a second. So we have three quality of service levels in Kubernetes. The first one is best effort, which basically means that you don't provide any requests or limits, not for CPU and not for memory. If we'll talk about the guaranteed QoS, it's kind of completely the opposite. You specify both requests and limits, both to memory and CPU, and they have to be equal to one another. And anything that's not best effort or guaranteed is burstable. So burstable is that you can specify only requests, only limits, you can specify them both, but they're not equal, every other case. And basically the idea here is that there's a trade of between predictability and stability. So Kubernetes tell you if you're gonna be predictable with your resource usage, you're gonna get more stability. And of course, you have to keep your promises. So for example, if you limit yourself to certain amount of memory and you cross that limit, the container is going to die. So yeah. So let's talk about dedicated CPUs in Kubernetes. So we have CPU manager, which basically is responsible for allocating dedicated CPUs in Kubernetes, and there are two requirements for that. So the pod has to be of guaranteed QS, and the CPU request, which, sorry, which equals the limit has to be an integer. So another fact is that not all the containers in a pod need to be allocated with dedicated CPUs, but the whole pod as a whole still needs to be guaranteed. Okay, so let's now talk about namespaces for a second. So namespaces is a part that's responsible for isolating different group of processes from one another. So for example, we have the pad namespace, we have a mount namespace, cgroup namespace, a lot of that. So for example, if you're from within a container, if you execute a command like ps to see all the processes, you're going to see only the processes within your namespace. And a cool fact that I didn't know before I dove into that is that we can actually share the pad namespace between the different containers. That's supported by Kubernetes, and that's pretty cool because we can have processes communicating with one another from different containers. And as a side effect, the file system are also shared if you use that. The trick is going into a slash proc slash pad slash root, and then you get to the root file system of a process that lives in another container, which is pretty cool and sometimes useful. Now a word about KVM. So basically there are two kinds of hypervisors. Type one, which is also called a bare metal hypervisor and a type two hypervisor. So with a bare metal hypervisor, we basically install the hypervisor straight on the bare metal and with type two hypervisors, we install an OS on top of the bare metal and then the hypervisor on top of the OS. Now type one hypervisors are much, much faster, and KVM is great because it's a kernel model that basically turns Linux into a type one hypervisor. So with Kubrick and KVM, we can reach near to native performance. And another thing is that KVM is basically responsible for CPU virtualization, which is the performance part. For other stuff it uses, we're using QEMU. So for example, for stuff like IO and similar. And in the KVM model, each CPU from the guest perspective is implemented as a kernel thread. So for example, if you're creating a VM with four CPUs, then from the kernel's perspective, these are four VCPUs threads that basically run your guest's workload. So going back to Kubrick, to talk a little bit more about the architectures, when I said that we run a VM inside a container, so basically we have the virtual launcher pod, which actually runs the guest. And inside we have different containers, but the main container is called the compute container. Now, let's talk about the attempts to support dedicated CPUs. So first attempt was pretty simple. We can simply allocate dedicated CPUs to that container. It's possible with CPU manager, as we talked before, we need it to be guaranteed, we need the requests and limits that are equal to be an integer. And that's it, we're done, right? So, not really, so let's dive into that a bit. So inside the compute container, we basically have three levels. The first level is the Kubrick management layers. Basically these are processes, Kubrick processes that basically start all of the other processes, monitors them, and also access a bridge to Kubernetes. So for example, in terms of Kubernetes logs and stuff like that. And also communication to other components that we have. So this is the first layer. Another layer is the virtualization management layer, which is basically consists of libvert, which means libvertd, vertlogd, et cetera. And the third layer is the emulation layer, which means basically QEMU, the vCPUs, which is basically the guest itself. So these are some of our processes and all of the threads. Now you don't have to understand everything that's going on here. What's important is that here are the vCPU threads that I was mentioning before. We have like two CPUs here, they're just regular threads as I said. But the problem here, as you remember in one of the first slides, I was saying that the key point is avoiding preemption, avoiding context switches. And here we have tons of threads. So we basically took a container, allocated it with dedicated CPUs and now all of these different threads are running on these CPUs. So what happens is that we would have to preempt or context switch the vCPU threads in order to run different threads. And this becomes even more complicated because all of these threads has very different priorities and some of sibling threads of the same process have different priorities. So if we're going to look about the QEMU process, for example, there are the vCPU threads that I mentioned, of course, they have the highest priority, but you know, VNC worker, for example, doesn't need to run on dedicated CPUs and but IO threads is a bit more important. So we have many different priorities between all of these threads. So what we did is basically lie to the guests. These aren't really dedicated CPUs, right? Because we're going to context switch out the guests all the time. So second attempt. Basically, the field in the VMI object in the virtual machine instance object was introduced that's called isolate emulator thread. The basic idea here is if the user asks for xCPUs, we're going to allocate x plus one CPU. So one extra dedicated core to run all of the non-vCPU threads. But, and basically what we're doing here is using leverage configuration to say pin all of the non-vCPU threads of the QEMU process into this dedicated core. So again, if you look over here, we have the first vCPU running on a first dedicated CPU, the second one the same and all of the other threads are running on a third dedicated CPUs. So there are two problems here. One that we waste one dedicated core in order to achieve that. The other one is that what about all of the different threads here, you know, there are a lot of different threads that we didn't even do anything with. So this doesn't really solve the problem. Again, these threads are going to be context switched into this dedicated CPU. So third attempt is the housekeeping approach. So the idea is that we create a child C group for lower priority threads. This is called the housekeeping C group. Again, just like before, we would allocate one extra core. So if the user asks for XCPUs, we'll allocate X plus one. Then what we'll do is move all of the non-vCPU threads into this housekeeping C group. And then the vCPUs would run on dedicated CPUs. So this is basically how it looks like. We have the vert launcher pod, and then the compute container with X plus one dedicated CPUs will create a child C group, the housekeeping C group with one dedicated core, which basically runs all of the threads but vCPU threads. And the vCPU threads themselves would run on X dedicated CPUs. And while this approach is a huge step forward, because this is the first time when we actually support dedicated CPUs, there are still a lot of problems with it. So one problem is that we still waste one dedicated core. And ideally we would have said to Kubernetes, we need X dedicated core, of course, plus 0.2 shared CPUs because we don't really want all of the low priority threads to run on dedicated CPU. But as we said, this is impossible in Kubernetes because if you'll write something like 3.2 CPUs, you don't feel the requirements to having dedicated CPUs. So Kubernetes goes for an all or nothing approach. Either all of your CPUs are dedicated or none of them are dedicated. Another problem, which is the following two problems are more related to design than actually performance because we're doing something twisted here. We say we need the vCPUs to run on dedicated cores. So we configure every other thread to run, we configure every other thread and it should be reversed. We need to configure only what we care about. We need to configure only the vCPUs and leave everything else as is. And another problem is that I've talked about, this C-group runs threads. So this is a threaded C-group. And as I said before, threaded C-groups are exposed to many limitations. And basically now we can't use many subsystems on almost all of our threads, all of the threads except for the vCPU threads. So that's another problem. So fourth attempt is the emulator container approach. So the idea is that the compute container will stay as usual. And when I'm saying as usual, it would not be allocated with dedicated CPUs at all. It still needs to be of guaranteed QoS, but no dedicated cores. Instead, we will create another blank container with X dedicated CPUs. This will create a new C-group pass because in Kubernetes, every container gets a new C-group. And when I say blank container, by the way, I mean that one process that's slipping forever or something like that. Then we can move only the vCPU threads to this C-group. So let's see how it looks like. So now we have the virtual launcher pod. We have the compute container with Y shared CPUs and they're shared CPUs and not dedicated. And we have the emulator container with X dedicated CPU. And what we can do is simply move the vCPU threads into the emulator container. And now only the vCPU threads are running on X dedicated cores. Everything else is running with Y shared CPUs. So there are great advantages to this approach. Basically, we solved all of the problems for before. Only the relevant threads are being configured. The compute container and all of the threads inside stay exactly the same. Housekeeping tasks are running on shared CPUs now. We avoid allocating one extra dedicated core which is a high expensive resource. And we keep things open for extension in the future in the sense that we're not... We don't have the limitations for thread C-groups for all of our threads only for this vCPU threads. But, oh, first question. So how can we even move threads into another container? That sounds completely weird, right? But we didn't really move them into another container. We didn't move them only to another C-group which is different. And remember what I said. From the kernel's perspective, there isn't such a thing as a container. We really only change C-groups. And should we share the PID namespace? That's what I thought originally. But the answer is no, because we didn't change the namespace at all. So the vCPU threads that are running now on a different container share the PID namespace with all of the threads and processes from the compute container. But this doesn't work with C-groups v2. So with C-groups v1, that's entirely possible. With v2, that's a problem because of the threaded model that I was mentioning. And basically all of the threads need to live under their process sub-tree. But the process, the CPU is a QMU process, is in a compute container. We're moving some threads into a sibling container. And that's illegal with C-group v2. So that forces us to move all of the QMU process with all of its threads into another C-group. But this is not such a huge problem and maybe it's an opportunity because what we can do is the following. So just as before, we would have another container, but turns out that the C-group for the pod itself is owned by Kubernetes, we cannot change it at all. But we can mess around with the C-groups of our containers that's allowed in Kubernetes. So what we can do is the following trick. We can edit the CPU set of this container to having both ex-dedded, dedicated CPUs and y-shared CPUs. Again, this is not legal in Kubernetes, but it is possible in C-groups. And then what we would do is move the QMU process with all of its threads. Only the vCPU threads would run on ex-dedicated CPUs. All of the other threads would run with y-shared CPUs. Now, just in this C-group, nothing really runs here because all of the threads are split between one of the two children. So that's basically just a C-group hierarchy, but yeah, and this is how it looks like. But let's look at it again, because as I said earlier, there are two layers of basically management and the third layer of emulation. Now, in essence, the management layers are really different from the emulation layer. We can have, like, since they're different, we could treat them differently. We can have different permissions, different resources, different, I don't know, different definitions for the management layers and the emulation layers because they're different in essence. So if we look at it again, what we did is basically reflect our model a lot better because now we have basically the compute container for the management layers and the emulator container for the emulation layers. And we can now configure the both containers differently according to our needs. And also, this opens the door for further extensions in the future because now we have no limitations about both of our threads. We can also extend this hierarchy even more. So let's say, for example, that we wanna limit IO for the ZCPUs on certain scenarios. We wanna limit, I don't know, memory for the management layer so we can even extend this C-group hierarchy in the future so that leaves the door open for many extensions. So summary and take away. There are a lot of introductions here and we've seen a lot of cool technologies, CPUs allocation, C-group, dedicated CPUs, namespaces, KVM, Kuvert. And again, my hope is that beyond the problem and solution that I was trying to solve that you would take some of these cool facts and cool technologies and use them in your journeys in whatever you're interested in. Yeah, and that's it. Thank you very much. And yeah, thanks. So, are there any questions? Yes. Yeah, so the question was how much of it is a heck and how much, how, like, basically if I understand you correctly, what you wanna ask is how do we know that Kubernetes won't be surprised by these changes, right? And so to be honest, this is still a work in progress and we still have to test it with a bunch of scenarios to be sure. But from whatever I tested it, it was completely fine. And from what I understand is that Kubernetes is owning the pod C-group but everything that beneath that, it doesn't care about. They're only configuring it while the container is being created but after it's been created then don't touch it at all. So I don't see a reason right now why Kubernetes would be surprised but again, this is a work in progress and I might be wrong here. Yeah, sorry, I didn't hear you. CPU pinning with different size to cores. You're referring to NUMA, right? Okay, so the question is how does NUMA get into all of that? And yeah, I'll be honest that this is still under investigation. Again, we don't see a reason why it wouldn't be supported and Kuvert already supports NUMA. So I guess, I don't think there's a problem in this area but again, work in progress, I don't want to guarantee anything. Any other questions? Okay, thank you very much. Yeah, so hello everyone. I'm Jakub, this is Nikos and we are representing test platform team. We'll be speaking today about optimizing costs across several OpenShift clusters. In fact, some of the things you can apply on it to one cluster but that is the topic. We would like to start from the question why should we optimize costs when using OpenShift? The answer for this question for anyone is like different. So there is no one answer. It could be your manager, manager wanting you to optimize. You may want to optimize costs. So it depends. We have our team. We have two core reasons why we want to optimize costs and why we did it. The first one is that our service, OpenShift CI is going big every month, every year, we execute more and more and more. And second one is that we have specific use case while OpenShift is nicely tailored for default use case, our use case is a little bit specific. This is why we want to optimize costs. So why our use case is different? You can see here our growth from June 2021 to June 2023. We had two build clusters, 23,000 job definitions and 200 repositories. And now we grow. We have six build clusters, 60,000 job definitions and more repositories. Our users every month execute half a million of tests and 0.1 million of FMLR clusters ad hoc clusters are created. The important question is how to start optimizing your infrastructure costs? And the answer is that every cost casting decision should be made only after you carefully analyze your own data and understand your own use case. So every use case is different. We are giving here some tips, hints, but this is what you should do first. You should really understand your own data. How to do that? There are several methods. You can analyze your metrics and review alerts. You can use Prometheus, Grafana for that, alerts coming from the infrastructure, your own alerts. You can inspect cloud costs explorers for GCP AWS, for example, and you can craft your own analytics tools. For example, we can use Google Studio for that. Are there any easy wins that you can apply to achieve something relatively quickly? The answer is yes. First and kind of obvious easy win is do not execute not needed tasks. It may be obvious, yes, does not run things, but if you are really going big and you have multiple users that are using the platform, it's easy to get lost and lost track of the things that are happening in your platform. For example, our own use case was that we used to run our end-of-life tests for older races because they were abandoned and we detected that and disabled them or lowered frequency. Second easy win is minimize worker nodes. This is coming from our own use case really, but it may be also in your case that you do not need free worker nodes by default that are in OpenShift. You can set, consider to scale them down even to zero if it's needed, not to waste resources. It can be done manually. You can eliminate it automatically. You can use Autoscaler or you can do it at the installation stage. The next easy win is associated with AWS. You can change the type of CPU with the test for our platform. We change from Intel CPUs to AMD CPUs and that brings us benefit because in that case, AMD CPUs are cheaper on AWS. The performance is more or less the same so nobody was hurt by doing this. This is also referring to the previous slide. So if you change not only the Intel to AMD, for example, in this case, but you also inspect what are the machine types and for example, you are running for a long time your node, you might discover that you are running on old machine type. And in that case, you can inspect and try to upgrade to something better which brings better price performance. This has, of course, indirect influence on cost-cutting efforts, so you will see it. You can, of course, do more, much more if you consider easy wins, but let me give two more hints. The first one is to benefit from savings plan, committed use discounts in case of GCP and reserved instances in case of AWS. For AWS, you can choose GP-free as your default storage device and that is very general volume. That should be ideal for most of the applications. All right, so, but how can you do, like try to save more costs by using the specific machine types in any cloud provider? As we all know that all cloud providers have different kind of types, but most of our users, our customers, they don't care about what type they are going to use, but they're trying to actually use what is by default in OpenCyp, which is a general purpose, right? So, how can you try to optimize that area based on what operations you have, right? First of all, you will need to stop using the default one that OpenCyp is using and, technically, you will need to investigate your infrastructure, how the operations are using the resources. Are you using CPU? Are you using memory? What are the operations that you are doing? So, this is, for example, if you have something that is using a lot of CPU, then you might most probably need to run the operation in the CPU optimized instance type in the cloud provider. But how you're going to start is, with OC tool, you can easily start monitoring your infrastructure on your pick ours and to realize exactly how you are going to use it. And, technically, like Jacob said before, you will have also Prometheus data and other metrics and other monitoring logs in order to understand exactly how you are going to pick out those types, right? Consider switching to ARM. Nowadays, the ARM Insta times are way cheaper than other types, but this comes with a hidden catch, right? But if your operations can work on this architecture or you care actually to work on different architecture but if you choose ARM nodes, you will be able to save a lot of money based on the instant usage. But, nowadays, we added up having heterogeneous clusters in OpenShift, which does exactly that. A heterogeneous cluster is going to give you a control plane nodes in FT64 and it is going to give you the ability to run other machine sets in different architectures, ARM architectures and other architectures that you can use. You can use the heterogeneous cluster to just have ARM working nodes if that's feasible, right? But how to scale OpenShift's relation to decrease cost, right? Most common mistakes that users do is that they don't pick the infrastructure that they want to spawn based on their needs. And by default, OpenShift is using specific amount of resources with the general stuff and the default configuration, but what if your infrastructure just needs to use only workers? This is hypersift, you can use hypersift to do that. What if your infrastructure wants to maintain or spawn and destroy ephemeral clusters? You can use Hive to do that. Hive is really good at doing that. There's another option too that you can run OpenShift in a single node, but I don't recommend that. Yeah, so another thing that you can consider is to apply the node down pressure in your nodes. This is happening by default by Autoscaler, but there are few cases that Autoscaler is ignoring and we need to identify them. So if your pod is using a local storage, that means that the storage exists in the specific node, therefore the pod can't be evicted or transferred to another node. So the node will be there alive with the pod running it, so you won't be having the node now pressure feature in that case. Also, if the pod is not owned by a recognized controller, if you have just a single node running and it's not like in a deployment or a state set or demon set, and of course if you have a PDB configure which is the most annoying stuff, the pod disruption budget, you prevent of evicting specific pods based on your needs. So you will need to have your configuration done in order to make sure that the node down pressure is happening correctly. And you can bypass that by applying specific labels to your pods and also you can enable the high node utilization in your schedulers configuration which is a CR that we have in Opensiv. And one of the important ones is that if you would be able to identify the long running processes or your nodes, this is a big trap because if you have different kind of long running processes and they are scaling in a specific nodes, those nodes will be kept alive by the auto scaler because the long running process is running. So this is another way that once you identify that you can, using Tate, Selleration or node selectors, you can control where your long running processes will be done. So it would be better to have them all in one node. So you can keep the auto scaler doing his job. One other thing that we actually using in our infrastructure is starting using spot instances. And surprisingly, using spot instances can reduce dramatically the cost. And we have also had option in the machine set in Opensiv to specify that there are few benefits, few drawbacks. There are different kind of services that can provide your spot instance, but I can give you an example for SpotIO here because SpotIO is a guaranteed return of investment paid service which they will charge you based on how much money you saved. So this is really good. And you will save money. But there are some benefits basically that you can completely replace your machine sense and machine auto scalers for specific tolerable workloads. And the SpotIO can scale down unnecessary nodes just to decrease the cost. And but there are few drawbacks of course. And if you don't pick ours, the spot instances can become unavailable. Then the machine API won't be able to fall back to on-demand instances and you will have a deadlock there. So before moving to spot instances, you have to make some investigation of course in order to make sure that your operations work correctly. Okay, so the last question we want to ask today is how to optimize data transfer to benefit from lower rates. And here is the example. The example is pretty easy, cost of downloading from storage bucket. And as you can see if we are, here is AWS and GCP as an example. We are in region one in AWS and we download things from S3 bucket. We see that cost is free or almost free. If we go to the different region on AWS, the cost is of course higher. But if we download things from GCS bucket which is completely different platform, of course the cost is the highest. Why is that? It's that because of the colocation. So the cloud providers encourage us to host as much as possible by imposing charges if we consider alternatives. So if we have multiple cloud providers as it is in our case, optimizing costs should involve prioritizing communication within the cloud. So how to do this? Collocate systems and data in the same region. Preferably in our BT zone. Here are the direct examples coming from, here there are direct examples coming from our own infrastructure. So we have task and job dispatchers that are dispatching things based on cloud origin. We have engines proxy and the one that is shown here is a registry pull through cache. So let's say that we have external registry and we have a node on in this example AWS and we want to download image. We go through pull through cache bot and for the first time, we download the content from external registry. But by the way, we are saving in region one S3 bucket. And the next time, if we are, I don't know, retesting or running the thing, we will have redirect in place that will redirect from pull through cache bot to S3 bucket and S3 bucket will serve directly the content to node on AWS. And by doing this, we allow less cost while coming to the external registry because we target it only once if we want newer content. It is extremely efficient in our setup because images are changing and people are rewriting things. So yeah, that's working. Another thing that I want to really briefly mention is eliminating the node gateway cost. In cloud systems, ingress is not subject to fees. If it doesn't pass through an out gateway, but by default, open shift have not gateways. So if you want to consider this option as a cost saving option, you have to know that modifications are not officially supported. But you can do this in AWS as well as in GCP and I think also in Azure. There are multiple more topics that we could talk about but some of them we briefly mentioned and some of them we consider not to include here as not being so general advices, yes. So we could definitely do more in the area of cloud storage instances, cloud form functions. We can consider using Bell Metal machines, explore more savings plans and reserves instances but in the end we decided only to mention those areas in our presentation. And that's it, thank you very much and now it's time for questions. Okay, so the question is how much effort we put into it and how big were the savings? The answer for the first question is that it was an effort of entire team plus staff engineer and also other teams were involved. So the effort was really huge, 10, 15, maybe 20 people working. It was not that we were working on a daily basis on that, yes. We had some topics, we identified them and we introduced them. So this initiative is running for one and a half year, something like this. And savings in the time when we introduce the most of the savings because of course plans are changing and our infrastructure grows as well but by the time we introduced most of the savings, yes, so the big batch of savings, the day we introduced them or the week, the month that we introduced them, it was around 60%, yes. So if in your case when you want to try to start analyzing, sorry. So we're talking about a big infrastructure, right? But in smaller infrastructures, having in your mind reducing the cost is a way on concept while you're designing an architecture in your infrastructure. So you want your infrastructure to expose metrics about the resources that you are using. You need different guide of monitoring and logging system in order to maintain that. And once you will understand your needs of your infrastructure, you will be able to time to time reduce the cost or keep doing the more efficient approach, right? So the cost reduction comes every single day and incrementally you improve in their situation, right? Yeah, actually the question is if, what are the monitoring tools that we are using? And we are using Grafana for dashboards and we are using Prometheus to expose all our metrics and inside tools exposing custom metrics in order to gather what we need. And we have, we are using AWS CloudWatch in order to set all the logs that are using vector and we analyze those logs in a json format so we have some query stuff. So we are able and in order to understand exactly what's happened to our infrastructure. Yeah, time to time we had to say these ideas to start using something else, but it's working. Don't tell anyone, you know. Everything is presented also in UI which is like a look at studio, Google studio. So we have a UI with query set up and yeah, it's working. But you can start from something simple. You can start with OC to monitor CPU to monitor memory and that is also a good starting point. Yeah, but the customers don't want to use OC, you know, they want GUI, you know. Yeah, the answer to the, so okay, sorry. The question was, when is the right time to start doing savings and when is the right time to stop? If you, I will answer the second one. So maybe you will never stop, yeah. And the answer to the first one, it depends really on your infrastructure and on your costs, if they are too high for you. If they are too high, you should analyze. If you analyze and you think you can optimize something then optimize, yeah. Can you repeat the last? Yeah, on what factors in the driving factors why do you think it was one of them and what did they like the reason behind taking this stuff? Generally, I guess that, I don't know if I understand correctly, the answer was what motivate, the question was what motivated us to start, right? Like to simplify, what motivated us to start, start. So what motivated us to start was... Your manager will come and say, reduce the cost, you know. Yeah, if we want to simplify, you can say something like this. Yeah, it can be that you are, of course, your service is growing and you reach a point that you hear that for the next, I don't know, quarter or year, you have to give 10, 20% more or something like that, yes? And you start to think what to change, yes? Not to give this money to the cloud providers. And there are options. To talk more about that, I encourage you to see us in front of the doors, yes, and we can talk more because it is really a long topic, yeah? And one thing that is worth mentioning is that once the costs start to increase based on, because your infrastructure is extending, the costs are increasingly really high rate. So one day you will see that, like maybe 20% of your cost is up because of that. And then in panic, you will start analyzing your data and so on. So that is why we advise you to start doing that from scratch. Like when you want to deploy something or have a cluster and you might want to consider it yourself in what kind of configurations, of default configurations I have to remove in order to improve the efficiency of the cluster and the cost, right? So this is like we are encouraging you to install an operative cluster in more depth knowledge, no? It's also important to notice that we are not doing this on a daily basis, right? We are trying to identify areas. And our users know about that. They know that we are trying to identify areas. And that is nice that the users already know that we have this initiative. And it was nice to see several days ago that they started email threat, what we can do to further optimize cause they have even, they have some ideas, right? At the beginning, if your users do not care, you are on your own. But now if they are joining, they're giving ideas, that's nice. And if they don't care, they can increase your cost. Like we have a case where a team was spawning a new ephemeral cluster just to run a unitist. And that was spawning every day. Yeah, so the question was what is about the prioritization of workflow? Well, as I told during presentation, in our case, at least some of the workers were abandoned. So I mean, end of life releases, they were just running because nobody cared. So that is their first good advice, disable them. You cannot count on your users always that they will do it. You have to sometimes monitor yourself. So the, I didn't hear like, can you repeat the question? The question was how we made users, if we made users to care. The thing is we have initiative. We remove their jobs like in Saturday night. But the email threat that I saw, it was like coming out of nowhere. So I didn't motivate anybody to think about that. So that's nice. That is why I'm speaking that it's nice. Yes? Sometimes they are affected though, because they know that something will happen. And there are doubts. I think that we had some doubts when we changed machines from Intel to AMD, but they were cleared after some time. So yes. Yeah, when it comes to cost, like all teams are really cooperative. Also, we did some tuning in their jobs and tried to, also our team is in costly communication with all our stakeholders. And that makes us easier to communicate with them and let them at least know why we change their Chrome job interval or why we change our architectures in specific clusters. But a lot of that our users are completely free to do whatever they want, including attack vectors, some cases, but we trust them, right? Yeah, we try to trust them. Not to limit them, yeah. Any other question? All right guys, thank you very much. Thank you. All right, hello everyone, good afternoon. Welcome to this talk on the topic of information in an empty list. Now this question can sound weird. And to be honest, in all generality, to answer it, it would be even weirder. So we're not be looking at it in the full generality of this question. We will be looking at it in a specific context. This context is building proactive recommendations for OpenShift. Now, if there are any happy users of OpenShift 3, among you, I'm sorry, this is about OpenShift 4. My name is Sien Holacek. I'm a software engineer at Red Hat and I happen to be a part of a team that is actually developing these proactive recommendations for OpenShift 4. So let's look into it and let's talk for first about the context a little bit more what these proactive recommendations are. So software support or customer service for software services, for software products, that's pretty much standardized business these days. Many companies employ something that's called knowledge centered support. Where the idea is that our assumption is as much as individual customers are unique, their issues are not. So when a unique customer has an issue, that issue is very likely to happen to another customer as well. And it's a good idea to document like these issues and their solutions. So when a support engineer gets a call from customer complaining about an issue, the support engineer is motivated to think about what is the gist of the issue and put it down, add it to a knowledge base. This knowledge base is great because then when another customer gets an issue, they can go to the knowledge base and search for an answer before they file a support case. Great for the support folks. If they don't do that and start filing a support case, you can still have tooling in place that will suggest these knowledge base articles based on what the customer types in. And even if this fails, still the support engineer can save some time by simply referring the customer to a readily available article instead of doing everything on their own. So this is all pretty neat. The issue is that it all happens only after the customer has been impacted by the issue. That's when they start to care. That's when they start thinking about filing a support case. That's the reactive part. What would be great is if we can automate the knowledge base articles, detect those issues before the customers are actually impacted and give them an advice or recommendation. Dear Mr. Customer, you might want to fix this otherwise this is going to happen. And this is the area of Red Hat Insights. And here we are talking about Red Hat Insights for OpenShift. So this is an example of a recommendation that we are giving. This is actually an excerpt from a real recommendation. Please don't read all that. The gist of this is the customer made a configuration mistake. The configuration mistake doesn't cause any issues immediately and the customer might have hard times to find or realize that there is actually a mistake. But later on when they start, when they try to upgrade a cluster or do something else, they would get bitten by it. So we get a proactive recommendation telling them, dear Mr. Customer, you have this issue in your cluster, please fix it before it helps you. So much for the context, now how it works, how we do that. Obviously, we need some data about the cluster to be able to do these recommendations. And in the case of OpenShift, which supports a number of different deployment options, the ways how customers can install OpenShift, including on-prem ones, what we're actually doing is that we're getting some data from the cluster and providing credit insights as a cloud service basically. So the cluster sends data to a cloud, to Red Hat, where we analyze data and if we find any issues, provide the recommendation. We're talking about health data or remote health monitoring. You might be immediately thinking Prometheus, that's the de facto standard these days. And yes, OpenShift sends data, like we call it telemetry, using Prometheus or Thanos. But we're actually not using this data for the insights. We're using data that's collected by insights operator. Why? With Prometheus and its time series, if you want to retain the time series for long period of time, for a large number of clusters, and you still want to be able to query the data in real time, you have to be kind of picky about what data you collect, what metrics you will be collecting. Also, you want to do it frequently. If you're interested, for example, at CD object counts, having a sample, one sample a day, probably wouldn't tell you much about the cluster. You need the data more frequently. So what the telemetry data is looking like, it's basically few kilobytes of data every five minutes approximately. From this data, we can't do many recommendations really. The details are not there. And if we're talking, for example, about alerts, alerts are already written for issues that the developers anticipated. And the alerts should be actionable on their own. We don't need an extra recommendation for an alert, or well, shouldn't at least. So what we need is a broader set of data that we can look at and perhaps detect issues that the developers didn't anticipate, that they didn't set an alert for. And that's why we have the insights operator because it gathers more data. Also, the gathering collection process is more expensive. We can't do it every five minutes. By default, we are doing it every two hours. So we get this nice package of data that we're getting from the cluster and analyze it. This is at a very high level of what the architecture looks like. The insights operator is a cluster operator. It's part of OpenShift. When you install OpenShift, the insights operator will be there. It periodically queries the API server into cluster, collects the data, sanitizes it, not to include any personally identifiable information or otherwise sensitive data, wraps it up into a nice archive and sends it over to Red Hat. Well, it's received by a recommendation service which analyzes the data, produces recommendations, and this can be done via it in the UI in Insights Advisor. You might be thinking why we're doing it this way, exactly. The architecture is pretty much dictated by these requirements. So we wanted to support on-prem deployments, not just hosted OpenShift installations. We wanted to work it out of the box. We didn't want the customer to have to do anything to get this feature. These two requirements combined mean basically that we need a component that enables these recommendations part of the cluster, part of OpenShift when you install it. That's where insights operator becomes actually a cluster operator. Then, combined with the other two requirements, we want to be able to write ad hoc recommendations. Recommendations for issues that we didn't anticipate when we were developing the product. Issues that we learned about, things that we learned about that are issues as customers were using the product. That's the ad hoc recommendations and well, no remote code execution. I'll return to that later. So you might be thinking instead of gathering the data, we will run the recommendations inside a cluster where it has access to all the data. We don't have to worry about sanitization of data, about sending data out of the cluster. We will send just the results or even show the results inside a cluster. If something is part of OpenShift, it's versioned with OpenShift. So you can't change it. If you release insights operator with OpenShift 4.10, you will not be able to change it in OpenShift 4.10 12, let's say. If you want to change it, you need a new release and for on-prem customers, you need customers to upgrade first to get an update. And customers are notorious for not upgrading their clusters frequently, let's say. So all this meant we took our best guess with the insights operator and the data that it gathers and doing the analysis on redhead site. If we didn't limit ourselves, if we dropped the on-prem requirement and if we consider only hosted solutions, this is what the architecture could look like and spoiler alert, this is what it looks like, kind of, kind of for hosted OpenShift products or like Rated OpenShift on AWS and OpenShift dedicated, where the service provider actually can install an additional component onto the cluster to evaluate the recommendations within the cluster and only results are sent out and then perhaps it's displayed in a user interface. Not a topic for this talk, perhaps next time. So here we are limiting ourselves to the solution where we are sending a bunch of data from the cluster and analyzing it at the redhead site. So let's consider an example. Let's say a feature of OpenShift that requires the customer to create a custom config map and we learn by experience that customers for whatever reasons tend to forget to create that config map. So we want to write developer recommendation that tries to detect that the customer wants to use that feature but forgot to create that config map. We have the inside operator archive and we're looking for a config map and we want to make a recommendation when the config map is not there. So this is where the empty list comes in and we can finally answer the question. What information is in that empty list? That empty list that doesn't have the config map? Well, what we want it to mean and what it hopefully means is that the config map hasn't been created and we can fire the recommendation. We can produce a recommendation to the customer. Hey, the customer please create this config map otherwise this feature won't work for you. Is it all? Does it really mean that? Well, not quite. It can also mean that the config map was not collected in the OpenShift version that the customer is running. What I was mentioning a little while ago, when you release inside operator in 4.10.12, it's fixed and it gathers data that it was meant to gather in that version. Now inside operator obviously evolves. It doesn't gather the same information in all versions. It evolves as we learn more about what data we need to write these recommendations. It also evolves as the whole product is evolving. There are new components coming in so the inside operator is updated to match that. So for the version that the customer is running, we might not be gathering the data that we need, that specific config map. Now you might say, come on, this is easy. So we check for the version, right? And well, you're right. But that's not all. Another reason might be that the size limit was simply exceeded. The inside operator has a built-in limit of what data it can send. And if other data exhaust the limit, utilize it, then no other data will be added to the archive, including the config map that we want. So again, we get an empty list, but it doesn't necessarily mean that the config map wasn't there. It can mean that other data, like log files, perhaps, eat up all the available space and well, there was no space left for our config map. Is this all? No. Another reason might be that the inside operator or the cluster simply had a bad day and failed during collection for whatever reason. For all these things, fortunately we have a solution and I'm actually not sure about this solution. The inside operator also includes some metadata about the status of the individual collections. So we can tell if a part of data was collected successfully or if there were any errors and we should take the data with a grain of salt or basically disregard it for the purpose of recommendations. This is an example. It's basically a long JSON file with all the collections that tells us about errors, warnings, and internal errors of the inside operator. There are more reasons why we could end up with the empty list. The list could go on and on, but the reasons would get also more and more obscure. For example, there's little that we can do about someone playing around with the archive and changing it before sending it over to us. That actually happens. We're doing it to ourselves when we're running integration tests, right? So in that case, we are creating the archives ourselves and sometimes they are not complete, so then our recommendations fire at random or don't fire when expecting. But this apart, as these reasons get more obscure, we kind of take the risk and don't try to mitigate them. Simply hoping that they wouldn't occur frequently enough. For these that I listed here, they already can occur and this is something that rule developers or recommendation developers really need to think about when developing a recommendation and take measures against this. So if I were to answer the question from the beginning of the talk, how much information is in an empty list, I would say less than one might think. Thank you for your attention. I guess we have enough time for questions. If you have any, please. Yes, yes, yes. This is kind of tricky to measure, but we have a system in place that tries to estimate how many support cases we prevent using these recommendations. And if I'm not mistaken, it's like hundreds a month. So hundreds support cases a month that we prevent. Please, go ahead. Right. So the question is how we decide what recommendations we need to develop. That's a very good question. And telling upfront which recommendation will be impactful and which one will not be impactful is a very difficult task. What we're doing is, as I talked at the beginning about knowledge centered support and the knowledge based articles, we actually have a system in place that monitors the numbers of references to knowledge based articles in support cases. So we know if there's a knowledge based article, if it is being referenced in active support cases. So this is, for example, one measure that we are taking into account. If we get like a knowledge based article foreignish, for that has been created two years ago and there are two support cases linked to it, we will probably not worry too much about it, right? If it's a knowledge based article that has been created two months ago and we already have 10 support cases linked to it, now this seems like a good candidate. Well, it is reactive for the customers who already had that issue, but it will be a proactive recommendations for recommendation for anyone who would be getting it. Yeah, okay. Oh, please. So the question is what the difference is between a proactive recommendation and documentation. Well, for the example that we were talking about here that customers tend to miss a configuration step when configuring something. The reason usually is between the chart and the keyboard, not paying attention to documentation. Actually, the jokers in that case is on us because we didn't make the installation easy enough, right? Or the product doesn't provide enough feedback to the user about mistakes that they make. It happens to every software product. I mean, you can't anticipate everything. You probably do assumptions that turn out to be invalid later on and you need a way to fix them. So that's where these ad hoc recommendations come into place. We're also writing recommendations for other things, not just configuration mistakes. So we're also, when we can, develop recommendations that won customers about bugs that we discover one time. I remember there was a recommendation when some stage data leaked into production and when they pulled them back, it actually caused confusion for some operators. So again, this is a recommendation where we practice with one of the customers. You will be having this issue and these are the steps that you can take to solve that issue. So it's not just the configuration issues that we're writing recommendations for. Please. Yes. I'm sorry, could you repeat it? All right, so the question is if these tools can be customized along with OpenShift because OpenShift is a platform that customers can customize to their needs. Not really, I would say, but on the other hand, like OpenShift is customizable from a certain level. At the bottom, the platform is all the same for most of the installations. I mean, there are different cases, like for telco when the platform is heavily optimized. There might be other special cases when the platform actually is very different from the standard one. But if we take a standard enterprise that wants to run their workload for business processes in the cluster, the platform will be the same on all the clusters regardless of how they're deployed that cluster. And this is what our recommendation have been focusing on mostly. So this platform, what customers are running on top of it, that's kind of their problem and their responsibility. We are not really right. We haven't been writing recommendations for that, but if I return to this slide that I didn't want to talk about much, we are actually using this for making recommendations about customer workloads. We don't want to leave data about customer workloads, send data about customer workloads from the cluster somewhere else. These need to be evaluated in cluster. And we're actually using this architecture for the managed OpenShift deployments to make recommendations about workloads. So hope that answers the question. Please, louder please. All right, so if I understand the question correctly, it's about maintaining the rules, what processes we employ to keep the rules up to date with the product? That's a tough question. Obviously, we are maintaining the rules. We are watching data from customers, customers in the user interface, for example, have an option to disable a recommendation. They would do that in cases, like we tried these recommendations to be really reliable. When we make a recommendation, we want it to be applicable to the customer, but there are cases when we misjudge or simply can't do that, and the recommendation would be basically a false positive for the customer. So in these cases, the customers can disable the recommendation and we are watching that data. We're monitoring which rules, which recommendations get disabled by customers and reconsider, evaluate our choices for these recommendations, for example. As for OpenShift versions that go beyond, that have end-of-life, end-of-life doesn't mean that Red Hat stops caring about those clusters. It doesn't mean no support at all. It still means if a customer, like even if today, and I think the latest supported version is 4.9, 4.10, something along these lines, so even if a customer were still running 4.6, Red Hat would still support them in transition to a supported version. So the recommendations for all old versions can still get used. What we are watching, like periodically, but not very frequently really, is how the numbers of these old versions evolve. And if we, for example, see that some issue is causing some issues, and it's applicable only to old versions where there's only 30 clusters in total, on the applicable versions, we will simply retire that rule. At the moment, we don't have so many recommendations that we would have to optimize like the analysis time, that we would have to remove all the recommendations to reduce the execution time. We will get there one day, so far it's not a problem. All right, so if I understand the question correctly, it's basically if we're taking the lessons learned from these ad hoc issues, ad hoc recommendations that we need to write, and try to address them in future versions of OpenShift so that like the same or similar issues wouldn't occur. I'm not sure if I understand the word predict in the question. Yeah, so we were talking about preventing the same issues in future versions of OpenShift. That's not really something that we're involved in. That's really on the product team, on the product development teams, right? And they have access to the data and we actually, well, part of the things that we do with the data is also helping our development teams find out about issues early. So another part of like our bigger team, is using this data to monitor the status of the fleet on respective versions and flag like some spikes, for example, in alerts on different metrics that occur specifically in newly released versions, and we feed this information back to the product teams and they can then decide perhaps to pull the new version from the upgrade graph or fix that, look into that issue and fix it in the next version before it affects customers. Now if we're talking about new versions, this is actually proactive because as I mentioned, customers are slow at upgrading their clusters. So if we detect this spike, if we detect this issue with say 100 customers running that version before all 10,000, I don't know, tens of thousands of customers upgrade to that version, the issue will not be there anymore. So that's what we're using the data for as well. It's not really like scope of advisory recommendations. That's something slightly different. And I've got informed that we're out of time, so thank you very much for attention for all the great questions. If you have more, please catch up with me out of the hall. It's okay? Okay. So okay, good afternoon everyone. My name is Jose Lato. I work in Red Hat and thank you everyone for coming to this presentation where I'm going to talk about the title is Tuning and Automating Delco 5G Containerized Workloads. I work as part of one team that is called Partner Workloads and Enablement, and together with other colleagues like Javier Peña who also work on these same topics and he also helped me to prepare the presentation. We work helping our different Delco partners in order to allow them to run their different Delco workloads into our platform, which is open safe. Very quickly, the agenda for this session, I will do a very quick introduction about the Delco architecture and the different layers that we can find there. Just also to explain the different points where in Red Hat we are trying to help to this partner. So the idea is not only to enable open safe to run the workloads, but also to provide for them or to help them with a solution to manage all the clusters that they are going to have. To manage mean not only deploying, but also doing automation, configuration, upgrades, et cetera. And how we prepare all these things together. We have a set of different open safe Delco partners that allows, that helps them to do the different tuning that they have to do for Delco. And finally, I will talk about the special configuration of open safe, that is single node open safe. Pretty special because it's a cluster, but only one node. And these clusters can reconfigure with one special profile for these partners. So how is this architecture? So well, this is part of the devices that are our mobiles that are connected to internet through Intel Cova Snowman radio access network that I will mention very much during this presentation. Radio access network is divided at the same time into different layers. In the first one, we will find the radio unit that are common antennas where our mobiles are going to be connected. And then close to these antennas or behind this antenna we will have the first set of cluster or sites. And these are called distribution units. Distribution unit, take the signal from the antenna and everything is collected into the central unit where we again have a new cluster. And then the last layer that is the packet core. Packet core is closer to internet. And again here we have one new cluster. This time could be in a CPD or something like that while the central unit, you can have one or several in different regions, managing different distribution unit. So well, we can use different on-pensive cluster configuration from a standard compact but the most special part is, oh, cool. Now it's out of this full screen, sorry. So okay, during the presentation, I will focus on the radio access network and the distribution unit. We implement this with a single note open sieve together with a management cluster, with a special profile that comes pre-configured inside this cluster. Also using some special operators for telco and with a very big requirement about the scaling and making the things reproducible. Why? Because if we focus here in the distribution unit that we have set that is closer or behind the antenna you can imagine how many radio units or how many antennas these partners are going to have hundreds or thousands of new clusters that they need to scale and they need to manage. Well, we usually say that the telco war is different but maybe everyone say the same from their partner or customer but it's true and we will see later that we have to do very low level configurations. Also we want to reach to the far edge, the edge and the cloud using the same technologies and the same platform. And when I have said before that we have very strict requirements about the scaling and making the things replicable. So how we make, because we not only need to make open sieve enabled by, to run this workloads but also we need to manage and administrate this huge amount of clusters. So this is why we need to create a management cluster that is composed by a set of different technologies. Well, a management cluster is a cluster, an open sieve cluster or a Kubernetes cluster that manage other cluster, deploy, monitor, configure in an architecture similar like that because we are combining this together with GitOps and zero touch provisioning technologies and tools. So we have our Git repo, we have our GitOps where we define our infrastructure and configuration and this is managed by our hub cluster and we can deploy different sites for distribution you didn't need, the central unit or the packet core part. How we will manage cluster? We said that is an open sieve cluster so the base layer is going to be core OS and open sieve. Then we are going to use Red Hat advanced cluster management product or tool together with assistance service that is the newer open sieve installer to want to install the different cluster. We will see more later about that then we have to do GitOps, we want to use GitOps combined with this so we have Argo CD and also a new operator that is the Talm operator. This operator helps you about the lifecycle management for the configuration of your cluster and we will see later the special needs that we have and then of course you can have monitoring or other workloads that you want to have in your management cluster. Also I'm talking about the zero touch provisioning and what is the scope of the different areas that we want to cover? We want to cover the whole lifecycle management. So in the day zero, we deploy our clusters together with this CTP GitOps tooling and plugins that I will explain in another talk more in detail. You can define your infrastructure, in this case you define your different single node open sieves, when these are deployed you go to day one, day one is still the cluster, well the cluster is only deployed, still is not ready for telco, it's just single node open sieve and in day one it's configured what we call the video or virtual distribution unit profile that are a set of operators with some configuration and some tuning that we will see in a moment. In day two is the work of the partner so they have a cluster where they can deploy their CNF or workloads and of course after that in day two we can still do more configuration more upgrades whatever using the same platform that is ACM, our GitOps things, with our CTP tools, et cetera to do the configuration. Very quickly because we will see this later, with the CTP tools we provide two new templates of custom resort definition that allow you to define the cluster and allows you to define the different configurations. This is just an screen capture of ACM, ACM is not only a we, it's also a set of services that allow you to do all the stuff around installing cluster monitoring, deleting nodes, whatever you need to do. I don't go in details with this because right after this talk I have another one where I'm going to go further about how we work in this management cluster. But okay, let's say that we have this management cluster with thousand of distribution units and now how we configure this? Because we said at the beginning that telco war is different, so we have open seed telco operators that is what we configure in this day one. So as an overview, we have our single node open seed, we want to make it virtual distribution unit into our radio access network. So we have to create this video profile. Again, we have open seed, it's true that it's open seed only with one node. Then we are going to use the telco operator, logging, local storage, well, local storage is not a special operator for telco but is there and then we have the node tuning operator and all previous to 4.12 performando operator that allow you to do some customization. Then the SRIOV operator for networking and finally PTP precision time protocol operator. All these operators together with the configuration allow us to convert the single node open seed something that can be a distribution unit. Finally, you can have other operators, other workloads but maybe the most relevant part is the application workload of the telco customer or the telco partner that is going to run. We are talking every time about distribution unit, virtual distribution unit, what they're going to run, both with their virtual BDU implementation. And what is this BDU? Okay, so the BDU what it's doing is to take in, okay, we go again to the telco architecture, we have our mobile connected to the radio unit sending a radio signal. The radio unit here we don't do anything. The radio unit converts the radio signal into the digital signal and this reaches to our BDU pods. Each of these BDU pods what are doing is processing the digital signal that is going to be sent to the central unit. How this is done? This is a loop, it's a process that is continuously working on this signal processing. Interesting things here, these pods are working in this infinite loop with very, very demanding needs. It will take one CPU and it will take this CPU for them and they will not share. So they will take the 100% of each CPU. You have four CPU, you can run four BDU pods that are going to be constantly using the CPU. We need a real time kernel for that. This CPU process cannot receive kernel interruptions because they only want to do signal processing and they don't want to be bothered. If you interrupt the CPU for whatever microseconds this will make the distribution unit to drop thousands of packets that is not acceptable. Also they need to access to the network card with a very high throughput of data so they will access directly to the network card not passing through the kernel. So for that they use DPDK that allow them some special network cards to do this kind of things. So this kind of tuning is what is making this BDU a single node offensive. Especially this is why I am focusing the presentation on this and here the previous list of operators came into action. The first one is a node tuning operator that allows you to create performance profile you can enable real time kernel and a performance profile you can disable kernel interruptions for some CPUs. You can create few pages of memory for these processes et cetera and maybe the most interesting part is that you do the CPU pinning where you say we will have some research CPUs. The research CPUs are going to be the CPUs that are going to be used by the operative system by Opensiv or for other workloads and then we have the isolated one. The isolated one don't receive kernel interruptions and are going to run the BDUs. More tuning, then we have the PTP operator precision time protocol because they have to have synchronization with a level of nanoseconds, okay? So this protocol is in charge of that. So this is a server where you have a network card then you will receive in some way a GPS signal for doing this precision time protocol. One of the demons managed by PTP will take the signal from the GPS signal from satellite and it will take this into the, well it will synchronize the clock of the network card. Another demon will take this signal and we'd send it through the network to synchronize other cards that could be in the network or other cards that could be in the same server. And again, in another third time on what it's going to do is to synchronize the clock from the network card into the server hardware clock. And finally, the third operator, SRIOV operator, this SRIOV technology is enabling some network cards and allow you to take one port from one of these. It allows you to take the port, the physical ports of the network card and to virtualize it into different virtual functions. From the point of view of the pods, they see this as a real hardware. And you can use these virtual functions to be used it unique by each of the pods. So the pods have one CPU only for itself and also it has one of these virtual functions only for itself. And remembering that here we don't go through the kernel, we can jump this and go directly to the card using the DPVK technologies. Okay, more or less I'm reaching to the end. So we have this single open seat and the video profile that I have explained before is one open seat cluster with only just one node that is very prepared or is designed to be used for at the edge, maybe edge computing not, but is to be used in the edge. Because in this distribution unit, we have to deploy hardware that is very confined in very small spaces with very reduced cost, power consumption is very limited, also the network connectivity, et cetera. And again, single node open seat is optimized about the number of CPUs and RAM that is consumed. Why? Because as less CPUs that are used by open seat and the operating system is free, more CPUs to run the video pods. If each video pod can manage one radio unit and you free more CPUs you can manage, at the same time more radio unit that this is a lot of money for telcos. So we are also present to even reduce the minimum number of CPUs that we need for running open seat single node. Okay, so when we deploy the single node open seat, if using some of these tools, serotoge provisioning, and the GitHub tools, this automatically includes what we want, what we try to make that is a run reference profile with some pre-configuration of this, of course the operator, some configuration that can be generic and some other that can be customized depending on each partner. Also with CDP and GitOps we come and the management cluster, we can manage your fleet of clusters in groups, you can do upgrades for a group of clusters, you can do a lot of different management things. What is the video profile? Well, I don't go in detail but I have a link here about this GitHub repo. This is a list of manifests that by default are always included with the single node open seat. For example, it enables a CDP protocol which is pretty useful in these cases. It makes some customization about making the server to start up faster because in these cases you have to reboot, something goes wrong. You have to start up as quick as possible otherwise you can lost some connectivitys or we enable KML dump or disable for example that cryo wipe the partitions with the containers every time that you reboot. If we do that in the single node open seat each reboot will take a lot of time that is not acceptable. Another kind of customization. And these are the generic and then we have tools that allow you to customize and configure your operators or whatever that you have in open seat. With what we call policy and template that again I will explain in the next talk. But okay, this is one example of a performance profile policy and template that helps you to define this performance profile where you can see that we configure the, I don't know how to point. You can configure the isolated and reserved different CPUs, you configure your new pages and also neighbor Realtime Kernel, et cetera. Whatever of this configuration and policy and templates are managed by the hub cluster and you can decide to which cluster you want to apply the configuration. Okay, some numbers about the scaling and some performance tests because we talk about the scaling a lot. These are some results coming from Red Hat Internal Lab where they are doing a lot of these performing testing with one hub cluster that is basically a compact cluster with three perimetal servers in this world. They always use perimetal server that are pretty performant servers with hundreds of CPUs and 500 gigabytes of memory that this management cluster is going to try to deploy a single node offensive and to configure the video profile as quick as possible and in a huge quantity. We are going to push to our Git repo a 500 SNOS per hour trying to be deployed during like eight hours with the intention of attempting to reach to 3,672 single node offensive deployed in these eight hours. Okay, it's done 99.7% of the installation which is pretty impressive and only a few less don't reach it to get the DU profile fully configured. Anyways, these are very good results here in this graph. We can see how each one hour we add new 500 cluster that are installed in these lines tell us that okay, each hour we have five new single node offensive clusters installed and this pink one is the one that installs or configure the video profile. So it's pretty stable about deploying the cluster about the app and the profile. It's also more or less stable about how many new single nodes offensive also contain the profile. So just ending, just some conclusion. Here in the presentation, I have focused in the radio unit and the distribution unit and the radio actors network layer because I think it's in some way different because of the customization that we have to include. The distribution unit is implemented with single node offensive with this video run profile with the telecooperator and extra configuration and we can have a management cluster that can do the LSI call management as we will see later. Not only about deploying but also about the configuration and the day one and day two. The same technology on Farage, Edge and Cloud and well, we need to provide something, a platform that is scalable and to provide replicability. Okay, so last thing is different levels of certifications that we have. Vendor validated is, we will say that it's okay, we have made the video running in the cluster. So the vendor can say that it's validated, it means it will work in this offensive. What is new now is that we want to provide the cloud, the CNF cloud native function certification. CNF certification don't go into details if it's working or not. What is testing is that, okay, it works because the vendor says that it works and also we certify that it's using best practices, some security issues and life cycle, et cetera. So we can have the platform, the management cluster, the life cycle management and also the certification that the CNF is going to work and it's going to do it following best practices. That's all for the presentation. If you have questions, I think I go more or less on time. Yes? Yes. So I'm just curious, which PCI interface is this because PCI interface 3, 4, I'm not sure now. Okay, yeah, the question was about the SRIOV card with PCI interface version is using. I feel that I'm not sure because there are also because there are many different cards about that. But it's true that it's something that we don't usually have problems with that. I think the question is about how it's doing the work of reducing the requirement for the single node open seed to free more hardware for the telcos. Right. Well, I'm not part of the engineering team so I'm not sure how they are doing the magic of going, for example, from eight CPUs to four CPUs. But for example, some of the things that is related to the work that we are doing is, well, OpenSeed, it's a platform that is for generic intention. So it contains some pieces or operators that maybe are not needed in these telcos scenarios. But by default, the single node OpenSeed is an OpenSeed cluster. So it contains operator and, for example, OpenSeed monitoring, okay? Some of the partners, they don't need or they don't want to have the OpenSeed monitoring because in this scenario or maybe because they have everything different that they are going to use to use monitoring or because directly they don't want to use monitoring. If something fails, you replace the server, you reinstall and that's all. So these are the kind of things that we are trying also to see how much custom is able we can do this platform in order to consume less resources. How do we manage? Ah, redundant, okay. Because it's a single node OpenSeed, there is no redundancy and there is no way of doing it. So it's something that is acceptable in the way, okay? So you have one distribution unit in one antenna, okay? Well, the theory is that you don't have only one antenna in the same area. So if you lost this distribution unit, in theory, other distribution units that are closer are going to compensate this to not lost all the connectivity in the device that you have in the area. So distribution units compensate this failure. And this gives you some time to try to fix it as quick as possible. Also the management cluster is important in this because we say replicability. Mostly these kind of servers are always the same. So it should be very easy and fast too. And if we are doing key tops, we delete and we recreate. Maybe it's okay, well I have some more links and what, thank you everyone. Sounds okay? Okay, okay so welcome everyone. My name is Jose Gato. I work in Red Hat as a software engineer and I'm going to talk in this presentation about going to Zero Touch Lifecycle Management for Delco Edge Deployments. For the ones that attended to the previous talk, we can see that this is complementary because here we are going to see some more details about how we do the cycle management in the Delco Deployments. This presentation is so in part of the work that we do inside the Delco Integration Team where we again help two of our Delco partners to run the different workloads and deployment in the OpenCIF platform. And in this case, more in concrete, what we are going to see is how we create a whole management platform for not only deploying but also for doing the whole lifecycle management and also following the Zero Touch Provisioning ideas. About the agenda, the different topics that we are going to see, I will start with some background and context about why Delco needs this kind of Zero Touch Provision in Lifecycle Management and what we are proposing. And well, the proposal is to build a management cluster with ACM and CTP GITOPS and how we build this management cluster and the different tools that compose it. And then we are going to see two GITOPS, CTP GITOPS plugins that allow us to do the deployments and configuration using Zero Touch Provisioning and GITOPS. Finally, I will do a demo. So the background, why? Because well, our Delco partners and customers have to scale, especially when we go to the Farage. In the Farage is when we are closer to these radio units and antennas and you can imagine that if you need one cluster per each antenna, we are talking in numbers about a hundred and a thousand. It's not acceptable to manage all these clusters individually, especially when mostly most of them are going to be the same hardware, the same workloads or very similar. We are using a common platform to manage all these sites from the cloud up to the Farage and not only to manage the deployment but also to do the configuration, the monitoring and the different upgrades. So the whole Lifecycle Management. Because we are going to use a GITOPS methodology to do everything, we will have a GIT repository where we will store our different manifest for our infrastructure and for the configuration and we can say that the GIT repo is going to be the unique point of truth for all your infrastructure. Also, with this GIT repository, it's easy to scale and easy to repeat the things because you can just copy some files and do some push and commit. With this idea, you do a commit, you push it and automatically, zero touch, your infrastructure is deployed and it's configured. After this push, there are a lot of components, services, operators, interactions that are going to happen behind in order to orchestrate everything without your intervention. Okay, so let's see more in detail how to build this management cluster. This is more in detail that we have seen in the previous presentation. Again, we have the architecture of one half or one management cluster that is going to manage other clusters. The management cluster is actually an open-sync cluster with a base of CoreOS and open-sync. Together with Red Hat Advanced Cluster Management, ACM, that is going to provide the different services, services, functionalities, et cetera, to do your cluster management. Together with the assisted service installer that is the newer installer for open-sync cluster. Also, we will use Argo CD to do the synchronization between your Git repository, the objects that you have there into the ACM. Also, the Argo CD is specially configured because it contains two new CTP GitOps plugins. And finally, the topology, our lifecycle manager, that is a new operator that allows you to do the lifecycle management, but more in-concrete about the configuration that you need to apply and we will see later why we need that. Okay, this is very quickly a screen shot of Argo CD where we can see one deployment of one of the cluster. The information is storing Git, Argo CD, well synchronized this, and ACM, and assisted installer, and other components we will do the installation. This is the screen shot of ACM, that it's okay. It's the central platform tool that allows you to do all the management of your different cluster and the different versions. More details about how we implemented that, okay. We, of course, we start with Git repository. You can have a Git repository with whatever system that you want, can be Git lab, Git lab, or just a simple Git repository. There, in the Git repository, what we are going to do is to create our manifest. These manifest are two new manifest that are provided by the CTP GitOps plugins that previously we have installed inside Argo CD, and these two templates are called Site Config and PGT Policy Gen Template. Site Config to define your infrastructure and Policy Gen Template to define your configurations. So the files go to Argo CD, and Argo CD takes this into ACM where the assisted installer is leaving. It will take all the information and will start the installation. In parallel, the topology of our life cycle management will take the configuration, and it will allow you to decide how do you want to apply all the different configurations. So what is the scope of the set of those provisioning? We have the day zero. In the day zero, we are going to define our infrastructure with one of these CTP GitOps plugins. So you have this new template or custom resort definition that is called Site Config where you will define your different, the different cluster that you want to have. Then when the cluster are deployed, we move to day one, and some operators are going to be installed, configure, and some tuning is done to make this cluster a special cluster to later run Telco workloads or CNFs. Some of these configurations are coming by default in all these clusters when we are using CTP. Some others are customization that you will add as policy gen template that also are going to be applied in day one. Why? Because the point in between day one and day two is important because it was said that this cluster is ready for running CNF workloads. So when everything is done, this cluster is labeled as done, and the day two, the different Telcos can deploy their workloads. And of course, after that, you can continue doing more day two operation with new upgrades or new configurations. And these are the two new templates that are provided by the CTP GitOps tools that are, well, the plugins are installed in our CTP and provide you with these new templates or custom resort definition. One to define your infrastructure, the other to define configuration. So let's go in detail about these new plugins and the new templates. The first one is the site config that allows you to define your infrastructure and allows to define your infrastructure. And this is important with a unique custom resource because before having this plugin, you could do something similar in ACM, but in ACM, to make a deployment, you have to create a lot of different manifest that make the thing a little bit more complicated here. All these manifest are wrapped into a unique custom resort that only contains the fields that are needed for this telco deployment. So how does it works? Okay, you are in day zero, you have your Git repo, you define your site configs, you push the changes, this go to the Git repo, this go through Argo CD with these plugins that have been implemented as customized plugins. And this site config is going to be transformed into different resources that are the ones that are really needed by the system installer at ACM. This is a site config example, you define the name, you define the open-source cluster version, sunnet working for the cluster, sunnet working for the node, and maybe the disks that are going to be used for this installation. Now we go to the policy-gen template. The policy-gen template helps you to do the configuration. It's like a set of pre-configurations for concrete tasks that are very related to telco. And okay, very similar, you are in day one or day two, you have your policy-gen template that are applied through Argo CD with these plugins, and the plugin converts this into something that ACM and teloperator understand that are actually policies. This is an example of a policy-gen template. The idea of this, let's say, helpers is that help you to, for example, to deploy one operator, one operator require different manifest, a lot of lines, and here it's easier to use because you don't need to know how to do this manifest. You only say that you want to install the storage operator. Maybe also you customize a little about the name that you want to do to the storage class, or, for example, this other helper that allows you to set which offensive version you want to have, and you only have to decide the decided version. And to just end it with the new custom resources, we have a third one that is Clustered Group Upgrade, and this is provided by the Topology-Awarely Cycle Operator. So it's not 100% true that for a configuration, once that you push this to the git, it's automatically applied to the cluster. Why? Because if we think again that we have 20,000 antennas, maybe you don't want to do one change and to apply this immediately to the 2000 antennas. Maybe you want to do it in different groups. You will upgrade to three antennas at the same time. There were one question before about what happened if something goes wrong. Maybe you want to upgrade only one antenna per region or things like that. Also, because what happens if the policy fails in the antenna 100? Do you want to continue upgrading the other ones or do you prefer to stop here and to see what happened? What happened if you have long bad width? Maybe you want to use one feature here that is attached to download everything before starting the configuration. Another different feature that I provided by this new operator. Also, it's very easy to use because, okay, you just decide which cluster you want to apply one policy and template and how many clusters you want to upgrade at the same time. So now we will go for the demo. In the first, I don't know how I'm going about time. It's okay. So, okay, what we are going to start is one existing cluster in our infrastructure with three masters. So the site config already exists in our Git repository and we are going to do is to scale it up to add it to more workers. And in order to do that, we are going to take this worker definition in the site config. They use worker zero, the disk, the networking, et cetera and we will add this to the existing site config to have the final cluster with five nodes. Let me go to my local version of the video that I think is going to see better Okay, okay, so here in the left side, I have my Git repository. The customization Jamel file contains the different site configs that you have in your infrastructure. In this case, I only have one. And this one contains the information from these three masters, okay. We have three masters with it's networking, network configuration, et cetera. What we are going to do now is to add two new workers. The definition is going to be pretty similar, okay. For both of them. And following the GitOps methodology, this site config has been modified, we make a commit, we push it, and Argo CD and the other components will do the next step for you without intervention. Here now it will start the installation. If you know the two new nodes. So in the bottom right here, we have the only three masters. Right up here are the some objects that are provided by ACM and other operators that are pretty important. These three ones that are here are agents that were in charge of installing the previous masters. Now by the moment we are focused here, in this resort that is PMH, that is going to do the provisioning of the two workers that are as usual Intel Core bare metal servers. So what it's going to do is to switch off the servers. It's going to mount virtual media with an installation ISO. The installation ISO is going to happen when you say that here is provisioned. While that is provisioning, the server is booting with this life ISO in some seconds in the video, but some minutes in the reality. The new agents will appear here as part of this installation ISO. Here they are, you have these two worker agents running on the new servers that will start communicating with ACM and the system installer to get the instructions to do the installation. So when the agent appear, we can move to watch in other resources in order to see how this installation is progressing. So you only did the git push and git commit and everything was automatically orchestrated. We continue. Well, this is just the last screenshot because the last frame goes very fast, but okay, we can see the last frame. We can see how the now cluster have the two new workers ready and while the installation reaches to 100%. So the second demo, I think I'm not more than okay. Okay, so now what we are doing is to upgrade, no, to do an upgrade of the different clusters versions. We have clusters that are running 4.12.3, this 4.12.1 and we are going to upgrade them. The interesting thing that we are going to see here is that we are going to do it in groups, okay. We only are going to upgrade or to apply the configuration in groups of two clusters at the same time. So in the cluster, there is already one policy that keeps the clusters in the version 4.12.8. And well, or maybe we change it, this policy in order to make the cluster to be in 4.12.8. We push this change it, this create some, the plugin will create ACM policy that are needed for doing the configuration and then we will have to create this new cluster group of great resource that is going to say, okay, this policy that is called upgrade 4.12, I want to apply it to this list of clusters and I want to upgrade only two clusters at the same time. So again, for the video, I will go to my local copy. Can you read correctly the text in the video? Less, okay? Can not be bigger. Well, I can make zoom in the desktop it needed. I think it's more or less, okay, it's okay. We have, as we have seen this policy that in this case for the demo, is saying that now we have all the clusters in 4.12.10. We are going to modify this policy to say that we want to go to 4.12.11, for example. And we are going to push and commit this, I think, wait, sorry, yes. So, okay, we go to our GitOps, we push the changes and everything will start working. Second, we have to make to say to the Talm operator to apply this configuration. We have seen in the previous sample, we take the four cluster, we say to apply this policy and in groups of two at the same time. So one that we had the CU, we apply the video, demo effect also affect to the videos, which is nice. Okay, so we change the version, we commit and we push the changes. So here we go. So here, in this part, we have the four cluster that we have selected here, SNO five, six, seven, and the LHK. We see that now all of them are in 4.12.10 and we want to upgrade them, okay? Because we have saved to only upgrade to cluster at the same time. Now, when the installation, oh, why, because when I post the video, yeah. The video is running, but it does not refresh here. Okay, I don't know what happens with the video, but okay, I can pause it here and we can see that the installation is toilet. It is moving for 4.12.10 to 11, but it's only happening in the two first clusters. If the video wants to work, that I don't know why not, if you start it. Okay, now in this part of the video, we can see how the first two cluster are already in 4.12.11, that is the decided status and have started to upgrade the other two clusters. Sorry, because I don't know why when I post the video, then it does not continue. Okay, so, very quickly, this is the same from a graphical user interface where you see that only SNL 6 and 5 are upgraded at the same time. Okay, so, well, conclusions, just finishing. We have TheraTouch provisioning and GitOps that seems to work very well together for the objectives that we want to have with this partners. Red Hat ECM can manage all the clusters from the cloud to the far edge, and, well, it seems that it's a pretty good idea to have your infrastructure defined in a declarative way in your Git repository and allows you to have scalability and replicability. So, now I have finished my presentation, we can move to questions in case that you have. Okay, the question is about if this management cluster is operated by us or by the partner. Okay, usually, in the cases that we are working, they are running their own management cluster to manage their spoke cluster. This, okay, the question is how many clusters you can manage with a management cluster, okay. Sorry because the answer is very, but it depends on how is your half cluster, how big is your half cluster. But in the previous presentation, we have seen that one half cluster with, that was a compact cluster. It was a compact cluster in very metal with hundreds of CPUs, okay, it's not a small one. Can manage, for example, and deploy more than 3,000 single nodes of it. More questions? Yeah, sure, it can be whatever. Yeah, the question is if when we are doing the Git push is to our Git repository if it is inside the management cluster or not. In principle, the management cluster is not composed by a Git repository, it could be, this is maybe up to you. In principle, this can be wherever you want. But by default, ACM or Argo CD don't come with a Git repository. When you configure Argo CD to work with the CTP GitHub's tools, you have to configure the applications to say where is your Git repository, the Git repository that you want to have in sync. But your application, for example, because this is Argo CD, okay, it's all the application. Yeah, but I didn't understood the question. Okay, the question is if they are going to, if our customers are using Git or other tool to prepare these files. In principle, I'm not sure if Argo CD can synchronize with other application. What we are doing is always we are using Git repositories. How the manifest reach to the Git repository, this is true, that depends on different partners or customers. Do you mean physically? Okay, so part of the configuration, okay, great. Part of the configuration that is in the site, yeah, the question sorry is about how you define where the cluster, the managed cluster is going to be physically and the rest of the clusters. So for the spoke clusters is because in the site config is where you say in this case is most of the time we are working with bare metal. So you have to know what this bare metal already exist of course and you have the bare metal interface with the IP user and password that you define in the site config. So of course the management cluster have to have connectivity to these bare metal servers and in the site config is where you define where these are in the way of what is the IP of this interface in the bare metal that do the installation. Yeah, I think these are different methodologies. But the question is how we solve that because this edge cluster could have could lost connection to the hub cluster from time to time. Okay, so what it will happen there is that if there are not connectivity between the hub and the spoke in ACM you will detect that because there is one agent running there that is connecting to that. If this happened what the spoke class is autonomous. So it will continue working as it is and the world loss will be running no matter what it happened. If you don't recover this connection you will lose the possibility of doing day two operation. That's okay. Okay, no more questions, thank you. Okay, hi everyone, good afternoon. It's a pleasure for me to be here today and present at DevConf Sizi. My name is Juarez Barbosa Jr. I work for Oracle as a senior principal developer evangelist and today I'm here to talk about Kubernetes operators for databases and of course as an example of implementation the Oracle database operator. So without further ado let's get started. Yeah, that's me, just a quick intro. You know if you scan this QR code you can see my full profile and perhaps connected me on social media if you are interested in discussing the topics that I'm going to present shortly today here. I have over 20 years of experience in several different IT companies with a focus primarily on Java which is my primary language, programming language that I focus on, of course, and but also several different cloud providers DevOps, something that I'm really passionate about, cloud native and a little bit of blockchain as well. But that's not a presentation about my profile so let's get started. Okay, so yeah with cloud native computing we understand what are the benefits of container orchestration container-based development and what container brings to the table in terms of a better approach to leverage the hardware resources and so on and simplification with the advent of microservices as well. And there are several different things described here in this slide. This is just a kind of level set slide just for the sake of fairness to make sure that people who are starting their journey into cloud native understand it. But at the end of the day, management of applications, microservices, if you compare the declarative approach to provisioning resources and managing resources in comparison with a kind of non-immutable approach to that or manual implementations and ways that you can do things without automation, for example. The loop that we have in terms of observability, analysis, and then possible action that happens as part of the platform and the environment, service discoverability, things like, there are plenty of different frameworks and libraries that you can use in different platforms, Eastume, KEDA and so on. Statement and self-healing, all the benefits. And Kubernetes, in terms of popularity, it gains traction, but it is still ramping up, right? So every day we are seeing more and more projects using Kubernetes as the de facto container orchestration platform. So it is important for us to consider that in the scope of databases as well. Challenges in terms of the deployments, usually if you are working with a cloud provider, for example, it's important for you to use maybe some extensions that are available as part of that platform, but without, I would say, having an approach that will result in vendor locking and make sure that perhaps your provider is also sticking to the upstream open source component and the collaboration with the Cloud Native Computing Foundation under the Linux Foundation umbrella, open source and so on. There are several different alternatives in terms of platforms, red, red, open-shift, Rancher, Docker, Swarm, Azure, and Kubernetes service, and the one provided by Oracle as well, okay. Container images, just a snapshot, not so updated one, and of course it is not an exhaustive example, but you can see that in terms of components that I don't know, maybe we can talk about, have a focus on storage persistence and so on. There are several different players here apart from Oracle, so definitely if you are running a project, you know, microservices and that is involving container orchestration, Kubernetes and Cloud Native and so on, possibly. At some point, I will have to interact and maybe also have that as part of your Kubernetes cluster, so that's a common concern indeed. Let's talk about Oracle's strategy to Cloud Native application development. You know, just a quick overview to give you a glimpse of what we have. The integration with our cloud platform which is called OCI, Oracle Cloud Infrastructure, Generation 2, by the way. Many people don't know that what Oracle used to have a cloud, Oracle Cloud Infrastructure Classic, but years ago Oracle actually decided to redesign everything from scratch with the best practices and learnings from different experiences and, you know, exchanging information in the scope of some initiatives like CNCF, for example, and this is the cloud that we have. We have pretty much, you know, all the services you need to develop are applications including Cloud Native related applications, but just to perhaps let you know that all the bits in terms of the platform, the example that I have here today included some slides concerning the OpenShift platform as well, but we can do that with the Oracle Cloud alone. OCI container-based platform then, moving now from the cloud services, you know, in terms of IS or perhaps PAS, but more with the focus on the services that are available there for you between implement container-based apps. We have pretty much all the bits as well and support in terms of, for example, Java applications to the mainstream frameworks and libraries, things like Spring Boot, Gravium, you know, Gravium, which is created and maintained somehow by Oracle Labs, which is our research institute, right? We've just released a new version of it. Quite interesting. We have something now called Oracle Gravium, which is free and with several different things that you can do in terms of using native image just in the scope of cloud-native apps as well, not only to better leverage the hardware resources, but also to reduce the footprint of your application when you are using a container-based image with native compilation, for example. At the end of the day, you can reduce the surface attack area of your application because the unused classes, they are removed. You know, there are several different ways that you can keep things that are important for your application and tools to support this migration in case you are using, for example, reflection and things like that, but this is something that you should have a look at. Same thing with Halidon, for example. We have now Gravium cloud-native with some specific facilitators and accelerators for your application in case you decide to use Micronaut. In Halidon, which is also a cloud-native focus framework created and maintained by Oracle. And you can see again here, in terms of security, observability, and application runtimes and so on, Oracle has everything to support your development. These are the choices in case you want to go with Docker and Docker-compose, you know, department, you know, Kubernetes or Rancher, you decide or OpenShift and so on. And at the end of the day, you can use several different tools for things like infrastructure as code, Ansible, but also, you know, Terraform, or if you want to give it instead of declarative approach to IAC, perhaps a more programmatic one, Pulumi. There are several different ways that you can use to provision cloud services nowadays, right? You can also access usually the central hub, you know, or proxy that is available and provided by these cloud providers, like resource managers, and you can use SDKs for Java, GoLang, Python. But concerning these different options here and how you can collaborate and use these container runtimes and platforms, I want to start to focus now on the DB operator. We have an operator, which is open source, by the way, and also available in red on the, yes, no, I can talk about that later. But it is open source and one implementation that perhaps if you have another database or if you have an idea for containerizing different data stores and databases, you can use also as a blow print. Because as I said, the code is there for you, you can just check the implementation, perhaps improve it. You can also collaborate with us and PRs and so on. Okay? Our database, the app and the simple mission here, you know, with this focus on container based, as I explained, you know, it is really important for us to consider, you know, the use of such databases in the scope of cloud native and Kubernetes. And this database operator at the end of the day, you can do things with Helm charts, for example, right? But it is not so natural or so straightforward when you are talking about stateful workloads, for example. Okay, so usually an operator and the way that we provisioned it, you can, for example, attach to an existing cloud provisioned article, autonomous database, for example, and then use Kubernetes to pretty much perform all the most important operations with that database and that's primarily what our operator does. In terms of the cloud and the database versions, you know, Oracle, we have single instance, you know, as I said, the autonomous database, shorted DB, you know, real application clusters. If you are perhaps aware of that or if you've participated in a project that involved Oracle database, I'm sure you can understand easily what I'm talking about, but otherwise, my deck has all the proper resources and possibly the event organizers will make that available for you. Full support to Kubernetes in terms of the database, as I said, right? We have, you know, for example, the Oracle Container Registry as well from as part of this cloud platform where you can host your images, custom images, or you can leverage the existing ones, right? And combine that with other different components that we have just as a couple of examples. Here we have micro TX, which is a kind of framework for microservices where you can easily compose and use, you know, distributed transactions, saga, pattern, and all those things, okay? And there is now a new offering as part of our marketplace as well called Spring Boot Backend. So it is not only about having the database clusterized and also as part of a Kubernetes deployment, but you can also use other different accelerators, okay? Let's now deep dive into the Aura operator or the Oracle database operator for Kubernetes, okay? Again, this is just level set. I remember, you know, just a refresh of the main Kubernetes related components and we have the so-called operators, right? As part of this architecture. And this is the basic blueprint concerning how you can, somehow customize and create your custom operator, you know, and the approach that we've taken when implementing this operator. No tricks here, just the usual blueprint, right? But this is just for illustrational purposes indeed. The thing is that, again, as I said, you know, the challenge if you don't use an operator for a database, for example, is managing the state and the complexity involved in this implementation, you know? So it is good when you have, for example, Oracle as the creator and the maintainer of Oracle database and all the different open source libraries and SDKs and everything around that, providing also the Kubernetes operator for you. Okay, so you can avoid headaches, you can just focus on what you have to do and that's it. The Kubernetes operator for database, you know, at the end of the stateful applications, you know, these things, as I said, you know, replica sets and unique states, you know, and also the life cycle of your database, the examples and the scripts that I have here, for example, if you want to apply a patch, for example, for your Oracle database, you have to make sure that the database is in a given state. It has to be available, for example. So it is somehow a frequent, you know, and common mistake to think that, okay, no, I can do that quickly, but when you start to implement an operator for a situation like this one where you have to work with databases, for example, more and more requirements will appear, you know, so it is not a matter of overlooking it, it's just a matter of, you know, when you start to explore and really, in practical terms, implement this stuff. There are many real challenges here and that's what this slide is about somehow, you know. Also, here is just a high-level overview of this architecture, you know. We can use the common tools called Cube CTL, of course, and we have the DB operator as part of the Kubernetes cluster and then the different databases that you can orchestrate and also use, you know, container-raised Oracle DBs or what we call the base DB, you know, a simple database instance. The Autonomous Database, when we call Autonomous, perhaps you don't know all the Oracle Autonomous DB, you know, there are two flavors of that, the Autonomous Transaction Processing or the OLTP database and the DW version of it for data warehousing, but it is autonomous because some tasks behind the scenes, you know, there are some AI-based agents running for you to do things that usually a DBA has to do, like patching, you know, security included, okay, because if you go open source, that's easy, but there's a cost, right? If you overlook maybe applying a patch, that's what the crackers will explore in your architecture because open source is free, but it is not easy to manage, you know. The governance of open source is something that you have to take into consideration and that's why we have companies selling what we call professional open source services, right? Things like training, consulting, support, and so on. They give you the open source component, but usually the component is not optimized, so you have to go and work with them to maybe have a better implementation. And that's what you can get out of the box with the Autonomous Database, you know, patching, performance stunning as well, indexes are created for you, so the AI based agents, they are running continuously behind the scenes to take care of your database instance and the deployments. And the last one here, we have on-prem DBs as well, so depending on the scenario, you can also use the Kubernetes operator for that. Oh, I forgot to mention the lifecycle operations at the end of the day, that's the main and the ultimate goal of this operator and the benefits, you know, you can just provision the database, you can bind to an existing one, as I said, that you provision it regardless of which tools you used, IAC or the SDK, as I said, or the Oracle Cloud Infrastructure Portal, you decide. And you can stop, start, terminate the database, perform backups, restore, patch, upgrade, also scale, and do a little bit of the security related tasks as well, because Oracle has what we call the wallet, which is, that implements MTLS, Mutual TLS, so it's another layer of security that can be managed with the Kubernetes operator as well. The examples that we have here, you can see at the bottom, you know, kubectl apply-aff Oracle Database operator, so everything, you know, no tricks here, as I said, no, we are not introducing new tools or extensions or you just use the plain and the usual ones and that's it. Okay, why the Kubernetes operator for database, right? As I said, we want to be as comprehensive as possible and work and collaborate with all the different tools and the possibly the cloud providers that we can, not only the Oracle Cloud, right? That's why the operator is open source, if you decide you can also extend it. We want to make that, you know, you can go and provision the Oracle Database, I have an example here, where I provisioned the Autonomous Database with Terraform and IAC, but then I can use the operator just to bind to it and then use it from kubectl and so on. Of course, again, address the pain point of using stateful Kubernetes applications and, you know, all the problems that might arise from that, that perhaps you are overlooking and extend that to other environments, you know, not only the cloud-focused ones, but also as an approach to hybrid cloud, you know, on-prem systems and data centers and the public cloud as well, or multi-cloud, you know, because some people somehow are confused that frequently they think that hybrid cloud and multi-cloud is the same thing, but it is not, you know. Even the literature, if you check, but we have all the bits to support all the scenarios here. Why you should care, you know? Of course, I don't have to overemphasize that here because at the end of the day, that's what this event is about, you know, discuss all the nice things concerning, you know, platform engineering, DevOps, cloud-native and programming and so on. Okay, so DevOps, GitHub, CICD pipelines, you know, that's the world we are living in at the moment, automation, of course. We do need that, right? There's no way to reach and have the proper scale without looking at these things. Those are not only things that you can include in your list of requirements as nice to have, once, you know, but you have to work with them. Okay, the developer preview, just a glimpse of the features here that you can explore later in the different databases as well, okay? And depending on the flavor that you want to work with, there are some operations and features that you can leverage. The production version, you know, which is about to be released soon, okay? We have a team working on that and a couple of PMs dedicated to this effort. And the roadmap, I just want to perhaps highlight here, DB23C, which is our upcoming release for the enterprise database, but we have at the moment what we call 23C Free Developer Release, a free database that you can download and install and it is the same Oracle database, our DVMS same engine, same features and everything. You can run that, you know, for your own site projects or maybe for learning and so on. It is there for you. I have a slide about this one here in case you want to have a look later. Yeah, talking about the OpenShift platform then, Oracle collaborated with the partner which will also expose that as part of the Red Rat Ecosystem Catalog, okay? So all the features are there for you as well in case you want to go to another level and really abstract the complexity of that, you know, that's a possible path. You'll just look for it or operator, you'll find it, okay? Just some screenshots here. How you can install and, you know, all the details. But of course you can do it with the Oracle Cloud as well. Why? Because we have these database operator add-on as a tile, you know, included in our cloud portal and console. All right? Yeah, the demo steps here, you know, just for example concerning the lifecycle and as I said, the main benefits of using that, you know, the typical scenarios or use cases, if we can call them use cases, okay? Biding to an existing database, that's the one that I have here, but my slide, my deck actually, I have some backup slides including all the remaining scenarios. You know, DevOps, it is sometimes, it is difficult for you to pack and talk about everything in a talk. But provisioning ADB as well is scaling a database up, stopping, terminating a database and so on. So just to give you a glimpse, you log into the console and you click Autonomous Transaction Processing, that's the Autonomous Database ATP, okay? You just select the database then, you copy what we call OCID, this is a kind of unique identifier for the cloud related services that belong to the Oracle Cloud Infrastructure Environment, okay? With that, you just prepare your YAM file, you just provide your OCID for the target database and then you can just apply, you know, Cube CTL, apply dash F and the specific YAM file with the configuration and that's it, it will be built. So you can now move forward and use the other remaining scripts to work with this database, okay? It is as easy as that, no catches here. Same thing, scaling up, okay? Same steps here, you just increase the OCPU count, the example here, we are going from one to two, apply again, then you can scale it. There's a way to query for it, to check and make sure that it is available. Some examples here I'm using environment variables but of course you can use, depending on the, if we are talking about credentials and so on, just use your vote of choice, you know, and you are good to go. It's stopping a database, same thing, okay? You can just, depending on the current state, you can just modify it and if you have to check if it is available, terminated and so on. And in order to really terminate it, there is a specific configuration property that you can use, hard link and that's it. Terminate means I don't need these database instance anymore, okay? So I want to terminate it, remove the underlying resources and that will be gone. In terms of observability, you know the exporters here, metrics exporter and log exporter, you know? So you can leverage Prometheus and Grafana as well to get some metrics and also observe your database as usual, right? That's a common requirement as well. And depending on the specific database version that you are using, there's something called Enterprise Manager Database Express as well and you can use that console. It is basic observability, but you know, the charts are there for you and you can just serve them alive. Yeah, before I conclude now, let me just shift gears here and show you, yeah, this is the database that I have, you know, on Oracle Cloud, an autonomous database. You can see the status that I talked about here. You know, when you access the DB instance, you just have to copy OCID here, okay? For example, I can just have it here. This is a typical OCID as I said, you know, that is a kind of unique identifier. That's the only thing you need to work with the operator then, okay? So the bind operation or action as you can call it, okay? You just include the OCID here and you are good to go. Kubectl, apply-f and this file here, okay? Autonomous database on the score bind and that's it. Same thing for the other operations then, you can see them here, backup, delete resource, rename a database, restore a scale. One of the examples with a CPU account that I talked about, stop, start and terminate the database, update admin, password, MTLS, that's related to mutual TLS as I talked about when I started my talk and the wallet that you can use. Update a network access, you know, in terms of whitelisting and creating some ACLs, access control lists and the specific wallet that I'm talking about. I happen to have a wallet here, you know, because I'm talking about wallet, but what is this wallet? The wallet, when you create a database instance, you go to database connection here. You have two choices. You can download this wallet or you can use the connection strings in case you decide not to use MTLS, mutual TLS with the wallet, which is better, but use just the usual credentials, username, password and the connection URL to the database. When you download this wallet, for example, I have one here, it is a zip file, you unzip it. You can see the usual or a co-configuration files here, also a Java key store, some certificates and so on. Okay. Yeah, and in order to provision the database as well, it's talking about automation with Kubernetes first. Yeah, let's just finish this one here. These are at the end of the day actions, you know, or things that you have to do with your database instance. You can maybe use DevOps platform for that. You decide about which one, okay. OCI DevOps or Azure DevOps or maybe GitHub actions, which I love, you know, because the action is what you want to do. I want to maybe bind or stop or start the database and so on. Actions are comments, if you like design patterns, for example, we can go for patterns, so the comment pattern. So you can just go and create simple actions too and run them as you need, okay. Same thing to provision the database then. You can use the portal, of course, put my example here. I can do that from Terraform, so I just created a Terraform file here with some variables. Basically, the compartment, OCID, the compartment is a specific container, let's put it that way, that you can use to aggregate your cloud services. If you do that or you are used to other cloud providers like, I don't know, maybe Azure. This is the same as Azure Resource Group, okay. And then you'll provide the database name here, your desired password. Usually, you receive a notification when the database instance is completed. Outputs, the connection string configuration and the specification for the database. I decided about OLTP, the autonomous transaction processing. That's what I have. And as I said, I can run that from, for example, as a GitHub action if I want. Okay, I can just go and provision the database. And I can combine that as I want regarding any DevOps solution. Last thing, if you want to explore that, Oracle GitHub, our Oracle's handle on GitHub, Oracle Database Operator, okay. The scenario that I explained, it is here for you. Plenty of samples for all the different databases. The link to the Oracle Database Operator on RedRats ecosystem catalog. And just to finalize, also my deck has all the references including the links. And talking about the Oracle ACE program, this is a program that Oracle has for distinguished community members. So if you are doing things related to Java, open source or the Oracle Database, feel free to perhaps approach me on social media and we can collaborate, I can nominate you. There are several benefits. Live Labs, we have some free workshops available for you. Everything is there, Python, Golang, Java, Database, Cloud, you know. Oracle is running a campaign at the moment concerning free training and free vouchers for the Oracle Cloud Infrastructure Certification. The 23C Database that I talked about, the free developer release, please make sure to have a look at that because you can test everything that you get from the autonomous database running as a cloud instance, for example, with this local one, okay. If you want, the Oracle Cloud also has what we call always free services, so just go there and create an account. Thank you very much and then I can take some questions now if we have any. Yeah. Actually, this is something that we are, okay, yes, you're asking about the support for on-prem databases, right? Yeah. There's a way to actually work with, depending on what you are using for the on-prem environment to, or if you are going, what we can call the do-it-yourself approach to Kubernetes, right? With the open source components and so on. If that's part of your cluster, you know there are ways that the operator can target that as well, but this is a feature that it is not released yet as part of the developer preview because I talked about the production release, so I don't have all the details about it, but if you want, I can connect you with the product managers who are responsible for this feature, okay. We can talk about that offline. Any other questions? No. Okay, once again, thank you very much for the opportunity to introduce you to the Oracle operator for Kubernetes just to create awareness about it, and hopefully you will have a look at it and perhaps provide us meaningful feedback as well. Thanks again. Thank you. Hello. Awesome. Hi, everyone. Good to see you all, and here at till four o'clock, which is I think a great achievement, right? Listening from the morning. So yeah, hopefully I won't bore you all with my talk and I will try to make it more lively with the demo so that you all have like some interaction and feel free to ask questions. So a little bit about me. I am Sini and I am working on machine config operator project which is part of OpenShift. Yeah, and I'm in red hat. And today I will be talking about OpenShift OS customization as a bootable container. So you might be wondering what exactly this is, but probably you might have guessed a little bit what is OpenShift OS, so yeah. So here we'll be talking about where exactly this whole OpenShift for run, right? And you need an OS, and so I think probably, how many of you know like what OS we run on OpenShift for? Awesome, can you tell? Yes, CoroS and it's formally called REL CoroS because this is derived from REL and CoroS because it is based on the CoroS technology which internally uses Arpium Moisture OS tree and we have upstream version of this operating system as Silver, Blue, and Federal CoroS and yeah, probably many more in future, Kinoite as well, which is KERDI edition, yeah. So for here, our talk interest is more for REL CoroS that's run as part of OpenShift for and this is image based because it's based on Arpium Moisture and OS tree technology and so for OpenShift for when we made it it was openated and we didn't want to use this to really interact and go and SSH into machine and to conflict changes. We wanted to make it secure and like, so that we can have everything being updated automatically including the operating system itself and that's why it was more like done to SSH and to manual changes, use something and that's where MCO comes into picture. MCO is what, which is a core operator that runs as part of OpenShift and that's where you have like all the conflict changes that you want to do on the OS, you can do that and it takes care of your OpenShift of CoroS, I mean REL CoroS update as well automatically. So a little bit more about MCO, this is a core operator as we already talked and here it helps in performing the OS update like when you click, like okay, in your probably console or when you say OCADM upgrade, everything upgrades along with that, your MCO that takes care of upgrading your operating system itself from whatever version it was before to the next version and any other changes that are part that should go to the OS and from where it get exactly the content of the OS. So if you know like OpenShift ships everything in a single, everything in the image which is image format and the container image and similarly the REL CoroS as well is a form of container image and that's how we ship and there are some extensions which is like not part of the core OS core itself but you can basically install some of them and wish they all come together in the release payload that is part of the OpenShift release of any particular version. Okay, so why we are having this talk so this talk is related to this limitation so with all those design everything works well like you can configure your machine with changes that you want on a node and you can have really update and everything working well. It works well for most of the people but for some people it doesn't work like what if I want to have a custom agent or third party which is not really part of the base core OS I mean how exactly can get it and that's not something we support today and it's very difficult and that's why some of the I mean enablement cannot be done because it's not available yet and the next is additional REL packages so REL has a lot of packages but not everything is part of your core operating system so for example USB guard or for example Dubres one or anything they're not part of your base operating systems if you want that on node I mean definitely the idea is to containerize things use make container and then use it but not everything can be containerized and that's why sometimes people want it to have it directly installed in the base OS and it's a long process if you want to do really REL getting an additional package for example we have USB guard and that's where extensions come into picture and to get that you have to we have a lot of conversation like should it be part of the base OS because every additional packages increases the size of the OS and that's basically increases it right so that's why we have to be mindful, careful and we have to test it, support it, everything so it's like long process and it's not necessary it will get into it has to be like okay very much like reasonable to make the use case and the third one is the performing hot fixes for example you have a new kernel and now it's security fixes and you want to apply it on your cluster and for that you cannot go and really get it unless brand ships a new update right and there is a new release and then you have to go through that then only update will happen so everything will take time for this constitution so these things like happens but it takes a little bit time also you need to know like what is machine config that's basically on which we I mean MCO based on so yeah so these are the things that basically lead to lead to what we wanted to do which is called chorus layering so chorus layering probably over hard in other talks in the DevCon before as well so here we are focusing more about OCP chorus layering like how we are leveraging layering in chorus in OpenShift so the layering it's OCP chorus layering this is basically based on the layering technology where the whole ARCOS system basically has a root file system in standard OCI container image so you might be wondering like what exactly is a standard OCI container image earlier also I was telling that we are shipping the chorus in container image but there is a little bit different in like standardizing this as standard OCI container image then what we used to ship before we'll get to that a little bit later so basically using this OCI standard we can basically use any standard tooling in the for managing this image that's where Vittable Container comes into picture so basically with this OCI container image that we have now that we ship you can basically use any container technology tools that's available to build and layer anything on top of it and it works as a delivery transport mechanism for the updates like we use like we use Admin Moisture Update you say and there's new image that we ship and it will take automatically care of everything and the your system will be updated to the latest version of whatever is in the payload and this takes care of also all the image content source policy like probably no end open shift like some people use mirroring so they have all those defined in the registered record so this will take care of all those things as well and everything is a factor in this new format too so yeah, so here is the comparison between how exactly the OS image content was before and now so before like when I say before it was like before 4.12 when we started OpenShift 4 until 4.12 and that's where in 4.12 we basically implemented the chorus layering in the OpenShift and when you see so it used to be called machine OS content the image that we ship in the release payload for the OS and if you so for example you cannot run it like I have I can show you machine OS content so this is basically a image which is basically that we have for 4.13 and if I say part man run I'm not sure if it's the same one so I will just cancel it yeah, so it's not really executable you cannot go run or do any additional activity it's not really fun here so how we exactly used to do OS update before was we used to extract that image and so if you see all these are the content of that image and there is SRV and repo so this is where OS tree contents are there and what we used to do is we extract the content and we used to do RBMOS to rebase and it used to take the information from here and used to update that so it was not really a nice like fully integrated update system with the new format we have this is rail coroise this is available in from 4.12 onward and here you will see there is like this is very native to OSI container image and we have actually a kernel here in the user lab modules if you go we can check it later here and it's very native to that and you can do lots of thing here yeah, so how exactly it works so since this is a very standard OSI container image you can basically use the you can create a container file or a file however you say it and yes you can layer additional packages or additional file or anything you want it's like it's whatever you want to do here so nothing is like limitation here so for example this is something like use case for us here as I was talking about hotfix so suppose the new secretary update has come and new build is not available but suppose you got the fix or it's still in the center stream which comes before coming in coming into the rail so you can try out and test those for example here I have so this is what from and this is basically the shot of the image and this command basically we need to use for kernel override but usually we don't need it we just need to do ArchBeverstery, install and other for here override because we want to do hotfix from the center stream so this is the simple container file nothing fancy which is good we don't need fancy everywhere and for demo I will just show up what we were seeing here so I have a cluster this is running on gcp and I will show how exactly we apply those changes in the learning model for example I will see it let me cluster version so I have for the 13 cluster and I will so this is basically I'm fetching this command OCADM can you see okay so this OCADM release info and this is the image name it allows the tag so this will give me basically what exactly the image URL is that I need to use for the I mean from where basically your base image will start and and yeah so now so I'm not typing it here because it will be definitely time-tagging and so I have everything here ready yeah so this is the from I just copied from there so this is the same you can see I'm using this for as a base image and we saw earlier in the in the documentation in the presentation so that's where I run RPEM Australia apps so this is using for the override stuff and I'm also layering here another package which is from center stream and this is IO top so we are doing two operation here override and hotfix of existing base OS things so you can do more things but for demo I'm just doing this and what you need to do just using I mean regular container tooling you basically build an image and I will do it a podman build and it won't take time because I'm using the cached version I have it locally and once you have build it you need to push it to a registry it can be anywhere however you want and for me it's yeah it's my personal query.io escomari user where I have and yeah so remember this is where we push it this container and so what we do here right now I will just go and apply this into my cluster but in for the use case where exact where it is used you basically there is lot of things you can do you can like doing the CI CD I mean you can run some tests there and do all those stuff and then after that basically you can go and and apply to the cluster so lot of potential here to do the testing based on the use cases and now I will so basically how exactly you apply to the cluster so basically you have to create a machine config and this is how it looks like so this is the this is the CI-D4 machine configuration where and here we are basically overriding the OS image URL and this is the URL this is the URL with the Shah so we don't take tag basically the reason of not taking tag is it can change and we don't want to deviate and that's why we use Shah instead of tag and you can get the tag by just saying scopio inspect yeah and what was the image name yeah I'm sorry but this was the wrong one moment just a second so this was the base image and I will say scopio inspect and is there something missing oh yeah yeah yeah yeah yeah absolutely thank you we need to define this so yeah usually it show up it will take some time because of the internet And so basically, we'll have the shy information in this here. Anyway, I was actually looking at wrong place. I should look at the squared out IO, not the other way around. Scorpio inspect, I should have somewhere. Yes. So yeah, we don't need to inspect the original image. We need to inspect the image that we created. So we need the share for this and this was the format. So I'm just showing how exactly we need to use and yes. So this was the, this is how we generate this image URL. And now what I will do, I will apply to the cluster. And to apply it, you just say OC create. So, and probably if you know, I don't know, I'm not sure. But this is how basically it applies to, it uses an MCO, a machine config operator. That's where you basically monitor the change. And a new config gets rendered here. So this is OS image URL overwrite, and it was generated six again, and this is new render. I was applied for just worker. If you've seen that, so this is basically applied to the role where they all are in worker pool. So it won't apply to other pools, so it's pool specific. And one interesting thing, so, sorry, yes. So here, this is the node is scheduling disabled, because when we do that, we cordon the node and drain operation starts, and then we apply the changes. So the interesting thing I wanted to show us, OC get node hyphen white is very weird, because we have to increase the size, but the interesting thing is the kernel version, where is it here? So we have 5.14.0.hyphen 284 kernel version, and we just created a new image with the centOS stream, and that was, let's check, a container file. So this was basically 5.14.0.325. So we should have, basically once the node is updated, we should be seeing 4.0.325 version. It will take some time. And we can just, let me check, get pod OC get pods hyphen white. So this will show me all the pods that are running in the MCO namespace. And I am interested in this one, because this is where update is happening. So in MCO, by default, it will apply to one node. This is because for safety, so let's give it some time. And meanwhile, what we'll do is we go and inspect the image that we have. So cat container file, this is the image. And I will run, podman run, itnfnrm. So here we can see rpm hyphen qa and kernel and iotop. So yeah, it's 5.14. And there is no iotop here installed, because this is the base version. If I remove, I think it will be here. Yes, so you can see there's no iotop in my base image. That was coming from base arcos. And now I will check for the check here. So podman run hyphen itnfnq kernel, iotop. So yeah, you can see here, we have the latest image kernel that we got from the CentOS and we built from the image and the iotop. So we can basically go inside the container that we created through this standard OSI container image that we have now and that's how it works and we can, let me see if we were able to get. So you can see this node has now actually 5.14.325. So this got applied, the custom image that we created, it got applied to this node and similarly it will happen on the rest of the node. So this was our demo and so where we are exactly with this whole new layering model. So like we said earlier in OpenShift 4.12 we basically have this new format of image for arcos and MCO knows how to understand it and apply it. And in 4.12 we basically support the hotfix model where you can apply hotfix and through the customer portal and all those stuff. And off cluster build basically went GA in 4.13 and what is off cluster build? I will tell you here. Off cluster build is basically what we just saw here like you created Dockerfile and you do on changes that you want and basically you take the control of the whole cluster and the OS update mainly and whenever a new update happens from the OpenShift and the OS will not be updated. It will be done when the user admin does it and creates the new OS image and with the new changes and that's how it turns. So MCO basically won't be making any OS update when there is an overwrite and you can go back by basically deleting that machine config that we created and things will go back and reset to the default mode and from there onward the MCO will take care of the applying of the OS update as well. And yeah, so from all these things to remember is like with this new model of cluster build and using the bootable container for OS update machine config is not deprecated. So the way it works it will continue working and off cluster build admin takes the control and they are responsible for doing the OS update by themselves by creating their own image whenever a new update happens or whenever they want to do it and for accessing rel packages. For example, when you do pull in a package that is like part of rel, you need to build this on an entire host so that you can fetch those content which is from rel. And yeah, and RT kernel and extension those are not yet supported because they conflict the way it's designed in the open shift of doing the RT kernel, applying the RT kernel or extensions they are they are conflicting with the current model. So we need to work on that yeah, so what's next next is in cluster build. So off cluster build is there but not everyone wants to really take the control of building by themselves and maintaining the OS update. So we are working on in cluster build where user can basically define what they want to give and MCO will create a new build, apply it and give it and put it in the registry and everything will be like a battery included kind of model that we say. And better integration with the console so that it has a good user experience right now you have to do it by yourself and hopefully with the OC integration we will have a little better and feedback. So right now this is very new we are now for 13 and for 12 it came up so more feedback we are looking for after the users are trying out and see how exactly it looks like after the after the after the try so that we can improve it where exactly we need to improve and also yes the article that is missing we want to really make it in supported ways so that user can there are lot of people who wants to use actually article so this is something we would like to support and give a better way forward for this yes so this was my talk and these are some of the resources that we use some of the upstream links are also there which which you can try out and see if you don't have an OCB cluster and yes thanks and shout out to Colin Walter because he we were trying going to give to talk together but he couldn't come up so yes I am yeah thanks to Colin as well for helping with this presentation and thank you all so if you have any question feel free to ask yes yes the question is with can I use any other commands add or something that's supported by by the container file right so yes you can do that can you use multiple so the question is if I have multiple machine config with OSC Majora lower right will it conflict or what will happen so the way MCO work is it it takes the there is a particular ordering so if you have multiple machine config one of them will supersede so the latest will be taken taken so like alphanumeric the way system D handles the name name ordering similar to that views that pattern so during the new rendering process it will basically pick the latest one latest one in the sense that you have a name of the machine config right so the name matters so whatever name is the latest the way MCO handles so whoever is the latest one that will basically be picked up no problem any more question sorry I couldn't hear the question can someone repeat how does OSC handles container image update so that's the question that's a very good question and I think I am not the right person because the chorus team works on that so I think that will be a good question for some of the people who worked on it so yeah so internally internally yeah it's it's a new new way of using it so there are definitely technical details here so that I'm not aware of of it that that was before 412 that's how we were doing with with new with for after 4.12 the native container image the way so at the most we handles itself now we just in during update we just say at the most tree update and the URL of the registry image registry and OST in behind takes care of doing the update so how do you give a new OS custom OS image that's the question so so the way we handle it so you basically use the OC ADM command OC ADM release info and here I'm here I'm just I'm using the existing cluster but there is also a way to provide the actual URL from where basically it's coming the whole open shift update and you can basically this is my I can just say real core OS tag so you have all the all the all the image information there for a particular release so suppose you want to go from 4.12 to 4.13 so you basically use that release payload that's available for 4.13 and you give it to OC ADM and you should have the URL for OS and use that does this answer the question we have three more minutes yeah that's a very good question so we did hit some some issues but yes we do handle it so and it happens because during scale-up what happens the cluster that are from born from 4.1 or 4.2 they're very old and we still do not update basically boot image so they boot up from 4.1 and then we update it so that's how it works so it works and if there is a workaround actually because we don't have the new arpimasteric capabilities so there is another system rerun and invocation we do and we apply the update update of the new OS image so it works but there is a workaround there so all the quotes are in MCO so I can point you later if you are more interested into how it works any more question since not so thank you all for coming yes testing yeah so hello everyone thanks for for joining the talk my name is Andre I am a PID student at the University of Vietnam Alps also in a partnership with RIAX technologies and today I'm going to talk about scheduling policies for serverless based edge cloud continuum more specifically we I will start with a container layer our scheduling policy and then some new progress that we have made with a few other objectives so this project is also funded by the MII project and the physics project European project and talking about physics it's the goal is to provide a visual program environment to create serverless workflows and we had other two talks here at DEF CONF one was yesterday we had this workshop and tomorrow Yanis is going to present these other ones so if you are interested please join it and so to start I will define a few concepts and then the edge cloud continuum for us we understand the edge cloud continuum as this infrastructure composed of several different layers called the cloud cloud clusters add clusters or add resource on cloud clusters we have the global continuum layer where we have several different clusters and in the local level we have the edge clusters are fog and the other edge resource so the idea here is that we come from this big view of the clusters and go to the local ones and of course on the clusters we have the big machines that are have more power resource and then as far as we go to the edge we go to to less resource and and more mobile mobile machines when we talk about serverless to define serverless I would like to do the transition of the from the cloud to the to the serverless and so when we have cloud platforms in general we have this generic scenario here where we have developers that need to develop big applications and deal with all the settings of the platform and then put the there are applications to run there and this this application we will stay there let's say forever I mean it continues there and the the user will pay by the machine hosting time so as far as it's there it will pay for that when we go to the serverless we have two main points the first one is that we will split this view of the of the scenario and now the user the final user we can be a data scientist a developer and so on we will just see this part of the scenario so we split in half and now the developer just need to deal with functions which are much smaller piece of code instead of whole applications just small functions and just a few settings let's say amount of CPU amount of memory and the second point that the main point of serverless sorry okay and the second main point is that the other part will be full managed by the platform providers so the platform provider needs to offer and to deliver everything about the platform so scalability provisioning of machines and so on so the user will just deploy their functions and the platform will provide everything that is needed to to run those functions and then in the end the the final user will just be targeted by function execution time and these functions are not staying there as as before now we are talking about stateless functions and functions that will be triggered by by events so they are these functions are going to be executed just when needed so one function is triggered it's deployed on the platform and then it's executed and then with the scheduling policies here we want to reduce we want to work on a few objectives and which can be cost can be energy or time so today I'm going to talk a bit about these three objectives the first part of this presentation I'm going to show a few results of this paper that we just published at CC Grid on May of this year so if you are interested here are the QR code for our repository with all the reproducible artifacts so the paper is completely reproducible so yeah if you're interested please check and I'm going to start with that and then in the end to present a new a new step our recent progress so as I was talking about serverless computing here are a few main points that we we focus it when working on this first step first when we talk about serverless for us we talk a lot about containers so here we are talking about functions that are deployed inside and executed inside containers these containers deployment are not negligible so through our studies and a few other papers we understood that the time to deploy containers sometimes take longer than the function execution by itself because on serverless we are talking about fast execution functions so we are talking about minutes and sometimes containers can take longer than that if we are downloaded a container from afar let's say cloud and etc but we can share the layers of the containers so in this first project the idea is to share the container layers even if you are not talking about the same containers so here we have the infrastructure of the add cloud container that I presented and our motivation so we want to reduce the amount of data downloaded to deploy the containers and also the amount of data downloaded to upload and download functions input and output so in this scenario we can say that functions are triggered from the edge are deployed here on the local level but the containers come from the cloud so we want to reduce the amount of data managed on on both directions and when we talk about the containers we want to to reduce the total amount of download how we do that we proposed a function orchestration algorithm so we called it FOA that's the it's based on linear program that will optimize the amount of data downloaded respecting a constraint of makespa so the idea of this linear program is that we optimize the entire replacement of the functions on the platforms respecting a constraint of makespa and if we are not happy with the output of this linear program in terms of makespa then we are going to run this looping here going to from the output of the second step and coming back to the linear program reducing our restriction of makespa so let's say we optimize it the whole containers placement but still it outputs on a solution that will take hours to be executed so then we take this output we say no I want this in half this this makespa in cut it in half and try to compute a new solution and then we go on and on as far as we are satisfied with the solution then we will have here we arrive at step 3 when we would we optimize the download of the layers and then we will have the final schedule output a few reference for the linear program and the minimal cost integration it's based on a dual approximation algorithm of Shmoisin Tardosh he's the he's the paper here and our experimental protocol we are doing it through simulations and to do that first we adapted functions on a benchmark called the function bench we deployed a serverless platform called OpenWisk on top of a academic cluster called Grid5000 in France and with that we measured and calibrated the results so we run there this benchmark for a while calibrated how long each function takes and then we could build our workloads for our simulations why because we are running a linear program so it's an offline approach so we need to know every information in advance to evaluate the our scheduling policies we have this simulated environment on top of Batsin and Syngridge with our combination of a bed scheduling simulator and as I mentioned everything is open source and available our design of experiments we've added sizes of workload size of platforms heterogeneity level here we also investigate the heterogeneity of the platforms by change the CPU speed right now we are talking about the light levels of heterogeneity by change the CPU speed of the machines so when I say that we have different levels of heterogeneity it means different clusters the number of different clusters that have the same CPU machine we've added our two scheduling policies FOA and kubernetes image locality we try to reproduce kubernetes image locality policy which is basically a first-come-first-serve approach taking into account the image the container image so kubernetes we will check if on the platform we already have the container deployed in in some machine and if it is there it reuse the same node but it just look for the entire containers here is our novelty that we want to check also the layers of the containers not on entire containers here is the table of all the functions that we adapted from function bench and the different input values that we use in so we have functions like mathematical functions like a million float operation link back metrics multiplication and also another functions like image processing video processing and so on here is the simulated infrastructure of about seeing a sin grid so the idea is that about seeing we will run the simulation on the first big square and then a sin grid will perform the simulation on the the real infrastructure so they they communicate to that and we needed to add a few layers on top of these tools to perform like a serverless platforms so we added an extra layer here on the profiles of the workloads of about seem to use container layers and so here to show a few open source contributions that we did on the project here are the the function bench here is the function bench a repository with our functions already merged that there and then we go to our experimental results the figures that I'm going to show now we'll follow the we'll follow this this structure by the small facets here the small squares we combine workload size platform size and with the xx here we are showing the makespa and on the IX we are showing the amount of data downloaded so what we have on this figure are the Pareto curves where we investigated the trade-off between reducing the amount of data download and the makespa what does it mean it means that we cannot reduce both at the same time in an optimal way so if you want to reduce more makespa okay we need to relax a bit on amount of data downloaded if you want to optimize amount of data downloaded we need to relax a bit the makespa and here with the colors we are showing the repetitions of our algorithm that I illustrated in the first step so the idea is that all the first solutions are here on the extreme right with the best cost with the best cost so at with the minimum cost but with big makespa so as I mentioned if you are not satisfied with the makespa we constrained it in half and then we go to step 2 and a re-perform the algorithm so we can see that we reduce the makespa but increase a bit the amount of data downloaded and then we go on and go on and go on and then the last iterations show the best makespa but the worst amount of data downloaded so what we conclude here is in all scenarios about three or four repetitions are enough for us we don't go to the last one because we we will lose in terms of amount of data downloaded and then now I'm going to show the specific objectives that we we had so here I'm comparing again with the same structure of combining workload and platform size but now in the x-axe I'm showing the heterogenic level of the platforms and in the I-axe I'm showing the amount of data downloaded so here we are seeing the difference between our approach for and our baseline in terms of amount of data downloaded and what we can see is that for outperforms image locality in almost two levels of magnitude in terms of this objective so yes for we can do much better because kubernetes image locality actually has this grid approach and we are focused on on reduce this amount of data downloaded what we can see also with the different combinations of size is that when we have very loaded platforms we don't have too much choice because if we have for example here we have 10 functions per machine so we have a lot of functions small platforms we don't have too much options to say okay I want these functions to go here or there but when we have a lot of options for the can behave much better and the the next result is the number of machines used here it was not one of the objectives again the objective was to reduce makes plan and amount of data downloaded and in the end analyzing the results we just check it that we did this optimization at the same time that we used it less machines so in terms of serverless comp in serverless computing where you pay for as far as you use it's a much very very good result so we use less machines and optimize both parameters yeah so the conclusions at this part is that grid algorithms may not profit from from the other heterogeneous yeah sorry I forgot to mention that we can see that the as far as we change the heterogeneous of the platforms the grid algorithm does not does not change the behavior while we can see that foa can can adapt a bit better so they not to may profit from from heterogeneity foas outperform the the baseline in terms of data transfers makes fun in addition to system optimization which means the number of machine used by up to two orders of magnitude and foa then minimize code start delays so if we minimize the amount of data downloaded it means that we are reusing more and more container layers so we minimize we optimize code start delays and it speed up function execution so if we don't spend too much time deploying containers we start the functions as soon as possible however at this step foa is still very time-consuming because it's based on a linear program which has many variables and it performs in order of minutes while our baseline perform in order of seconds so this is the biggest trade-off or our approach and so this is one of the main points that we work it on this next step that I'm going to present now so the point now is this was the the motivation and let's say the objective of the first part we want to continue with the good results we want to improve it foa in terms of performance and we added one more objective to this multi-objective approach we want also to reduce the energy consumption of the platform so now we do that by splitting our scheduling policy in two phases so now we are talking about a two level of scheduling policies one that will run on the global continuum so at this step we are going to decide in which cluster each function are going to run and then at the local level we will decide in which machine inside such a cluster this machine this function will run so we do that by reducing at the global level the energy consumption and the function is execution time and at the local level we are going to use our layers our scheduling policy to optimize it at first since we are going to work with energy consumption we have started to try to to use the Kepler so we are in touch with the community and the work is in progress to to to be able to use it and I'm going to show a bit how we these investigations in terms of energy consumption since we do not have yet something to run on our Kubernetes platforms yet we are using power watt meters on direct on machines to analyze the energy consumption at the first step again to model our workloads to prepare the inputs for our nearly in our new linear program so the first step is to analyze the energy consumption here we have an instance with three nodes with open whisk running on it and we have the the first node is the master and we have two workers but just the the third work was used and so we can see that the two much two nodes that are let's say idle are still very energy consuming but here we can see that the functions are running and then if we cut this and zoom it we have this figure on the right so here I'm analyzing the energy consumption not the energy consumption yet sorry the instant power per time for different kind of functions and then with that we computed the energy consumption by computing the integral of this area before that even if we zoom a little bit more on a few functions we can see the behaviors of the functions so we can see here our first investigations point that for example we have a few spikes at the beginning of each execution and these spikes may become from the containers deployment so the preparation of the environment to execute the function may take may consume a few more of energy and then we execute the functions here I'm showing different input sizes for the same function so link back on left and Camille on the right okay so once we have it we computed the energy consumption then we remodel our workloads and we came back to the same steps that the same environment setup that I showed previously so we executed everything on top of open-wisks and grid 5000 model workloads and then we performed the simulations with the new algorithm so it's still a linear program we still do the repetitions to optimize the makespa as far as we are not satisfied with the the final makespa of the solution and now I'm going to show a few results so the same structure for the the plots with the combinations of workload size and platform size we can see that first we still do much better in terms of amount of data downloaded here now I'm our baseline is not anymore like Kubernetes image locality but because now we are talk about two levels of scheduling policy so our baseline also has two levels of scheduling policy so now we implemented a first come first sever at the global level and then cash lock image locality on local level so the same we are optimizing a lot in terms of amount of data downloaded in scenarios that are not that loaded we don't have a big difference but when we can do when we can save resources and improve these objectives we can do we can do that number of machine users the same we also reduce the number of machine users as a grid algorithm our baselines tends to use all the machines available but we not and the most important part right now is the energy consumption so with our preliminary experiments in this direction we already have good results showing that we can reduce the energy consumption in the median like half in the median in all the scenarios and we we are going to to continue on that so the conclusions for the second part is still the grid algorithm may not follow the difference of the heterogeneity of our platform while foa does the new foa foa e outperforms the baseline for data transfer and energy consumption in addition to yet reduce the system of utilization but now we lost in terms of makespan so now we are not doing better than the baseline in terms of makespan yet so this is the one of the directions that we are going to work on again we minimize code start delays by minimize the amount of data downloaded and now the very good new is that foa performs in terms of seconds and not anymore in terms of minutes so this is it was one of the goals of between our two steps because now we can say that foa is reasonable to run on on on made products on on Kubernetes so future work sorry to continue improve foa's model and try also other linear solvers to study applications that can be modeled as workflows because right now we are talking about stateless functions we are talking about batches of stateless functions we want to include the Kepler for a continuous measurement of the energy consumption so if we can continuously measure the energy consumption of the platform maybe we can turn from an offline approach to online online approach and to continue this investigation towards the reduction of the energy consumption as I showed the platform by itself it's also energy consuming and we did not take that into account yet in our model we are just optimizing the functions so optimizing also the platform may be one of the directions and we are working on the real implementation on Kubernetes for both of our scheduling policies image locality is already in progress to we already scheduled talks on the community of Kubernetes to put it on the on the main branch of Kubernetes and foa is in on development so thank you very much again this is the reference for the for the paper that we published for the first part please scan the QR code if you are interested and then on my GitHub repository you can also found the results for these new steps that I presented today thank you very much and I will be happy to answer if you have any questions no questions everything's clear okay so thank you very much again