 screen real quick. So thank you everyone for joining. Kristen said my name is Ofer, I'm the CTO at Carbon Relay. I'll be talking to you today about resource management and Kubernetes and what it means and tell you a bit about what we do to all of that. So hopefully you all can see my screen. Start with a quick agenda. So some of the topics we'll be talking about today, obviously resource management is a big thing, which is a lot of different topics. I'm going to focus starting with the basics, requests, limits, quality of service. And if these things don't make it to you, well, hopefully they will soon. We'll talk about common pitfalls, my experience, the company's experience, and I'm sure at least some of you will have experienced that before moving to best practices and what you should be doing when you're looking into resource allocation and Kubernetes. And then we'll shift into what I consider a slightly more advanced topic in terms of performance optimization, resource utilization, and tuning, evolving application and what we call the continuous optimization process on Kubernetes. Obviously at the end, I'll leave time for your questions, hopefully I can answer all of them. So to start off, why are we even talking about resource management? But hopefully for those of you who are here, you're not actually asking that question. But as background, for those of you who've been in the development world for a while, you know that in the old days, we had applications that were running standalone, right? I would write my application, big monolith of code, and my expectation would be to deploy it to a single machine. And that box is all my application had to play with, and it would grab as many resources as it could. Because it was standalone, there was no need for any kind of fencing or dating. There was no neighbors, which I'll talk about later. Life was nice and simple. Obviously that's not the future, not where we're going, and it's not even the situation today for most of us. Most of us, if not all of us, are moving towards distributed systems for a variety of reasons. When we have applications that are distributed both in terms of the workload and in terms of the resources, we start talking about orchestration. We need to fence different resource allocations. We need to figure out how different components work together. In the case of old applications, these applications are usually not meant to be containerized. Again, like I said, they're meant to run standalone. Whereas today, you know, you have plenty of containers, pods, different services all running together, all sharing a pool of resources that is your cluster or your data center or whatever that is. What this means to use, you know, doing the things we've used to do in the past when it comes to resource allocation is not going to work when you're moving to distributed systems. That's true if you're migrating the application. That's true if you're developing a new application to be cloud native. And it's even true, as I'll show you, if you're just growing some example online and trying to run it on a cluster in a distributed fashion. So getting up and running with Kubernetes, I don't know how many of you are actually running applications on Kubernetes now. For those of you who have already gone down the path or journey of Kubernetes, you know how we use this, right? Super simple. Just get a cluster, you get your application, you deploy everything nice and life is great. Obviously, I do that. Facetiously, Kubernetes is amazing for many different ways. Simplicity is not one of them. You know, when we first want to get our distributed application up and running and focusing on Kubernetes, the first step is grab a cluster. Honestly, not the painful process that it used to be in the past. You can go on GKB or EKS today, click a few buttons, give them a card and get a cluster up and running. And then the second step is, well, let me take an application that I can run, I want to design and deploy it, right? I can write it separately and then deploy it to the cluster. That is not as simple a process as getting a cluster up and running. I need to figure out, well, what are the different components that need to be allocated different resources? How do I fit them? How do I make sure they share the right amount of resources and they all sort of behave naturally together? How do I make sure they scale the right so that when my applications are load, they actually behave the way I want them to be? And this really is the crux of what we're going to be talking about today. How do you take an application and you make sure you set the right parameters for in order for it to behave where you want it to behave? So starting with the very basics. We're talking about resource solution. We have to talk about the resources themselves and what we have available to us. So in binaries, there are a few what I would call native types, things that I managed by code for you. The memory, the CPU, I'm not going to touch pages as much, but it's good to know that it's out there. Those are things that natively we all think about when we start thinking about resource allocation, right? Whenever someone says, well, how many resources does your application need? The immediate thought is, well, how much memory and CPU does each pod or does each container need, right? How much is this going to consume? How much memory does this database need in order to not crash or flow over? How much CPU do I need to get to my workers so that they can perform and give me performance that I expect or the users will have the performance that I expect? Underlying those, let's see, you know, as a second, as a second order, there is a slew of parameters or slew of resources that we still need to think of, right? If you're running a Java application for the Java crowd, if you've ever tried to run a Java application in a distributed fashion and didn't have to tinker with a JDM, you're probably some sort of magician, that's amazing, but JDM keeps us super critical to think about when you're splitting up your resources. The same goes for resources. Obviously, there's tons under it, and there's a lot of things that we as developers need to touch these days in order to make sure the applications are running in a stable fashion. The first thing to think about when you go to Kubernetes is requests and limits. I'm going to have to make a slight note on the word request. This is obviously within the Kubernetes world, so for those of you who aren't as familiar with Kubernetes, the requests I'm talking about here are not to be conflated with HTTP requests. These are not the same requests. This is a particular sort of reserve word in Kubernetes. When you start an application, you define an application, when you write the manifests for an application, the definitions of what the application is, you can specify two things at the container level. Requests and limits. Request for any given resource is as it sounds. Me as the container, I'm requesting a gig of memory in order to operate. The minimal amount of resource that I need in order to be able to carry on my operations. The actual resources assigned to me can be bigger than the requests, but they can't be smaller. If I don't have in my cluster enough resources to give the container requesting it, the container just won't start. The complemented that is limits. If we said request is the absolute minimal number that I need in order to operate, limit is the absolute limit. Here's where I would say I can have a container. I would define the request for the memory for the container as one gigabyte memory. I can define the limit as two gigabytes. Give me a bare minimum one gig. I need that to operate, but don't ever give me more than two because I know I don't need it. Maybe I've run some tests on it before. Maybe I just don't have it. Limits is where you set the right guard rails to make sure applications don't, your containers and your pods don't get ahead of themselves and over one your cluster. What does it actually look like when we go to the point application? On the right-hand side, you can see a snippet of a manifest. If you've never seen a manifest before, great starting point to Kubernetes. It's all based on YAML files, and we write things as verbosely as possible. What you see here is a pod that has two containers and the two containers have explicit limits and requests for both CPU and memory. When I go to the poet, I may have a node and see a simple picture on the left. I have a node that has a certain amount of CPU and memory. Kubernetes will ask, can I deploy this pod to this node? What it checks for is the total number of requests given by all the containers on the pod, both on memory and CPU. If I look at this, I have a total request of 1.5 gigs of memory, and I have a total request of .6 cores on the CPU. When I deploy this pod, I basically go and look for a node that has enough of those. If it doesn't, this pod cannot be deployed. If it does, it will slap it on a node. I'm going to be talking about scheduling in bits and pieces. I won't go into it too deeply, but the process of putting a pod on a node is called scheduling inside Kubernetes. That's on the requests. If I have enough, I'll be put on a certain node. The limit, as I said before, is once the pod is on the node, dynamically it can be allocated more or less resources. This manifest on the right guarantees that this pod will not get more than 3 gigs of memory overall, and the same for 1.5 cores. Again, those are the very, very basic of requests and limits at the pod and container level. When we go to write these manifest, the whole goal here at the end of the day is to assign values to the memory in CPU at the pod and container level. Like I said, not just memory in CPU. We're going to get to places where it's way broader than that. How do we think about these numbers, and how do we think about what should I put there? Obviously, there's going to be some sort of Goldilocks that may or may not even be static. Too low. If I set my requests and limits to be too low, what I'm going to face is performance degradation. Don't give enough memory to your database. It might crash. Don't give enough CPU to a worker. Your users are going to be experiencing high latency. Too high, which I will admit, is the default that I see today and the default we all sort of converge to if we don't know what we're doing. If I say, you know what, I have no idea what my database needs to be. Just give it 5 gigs, but it'll be fine. I'm running these two issues there. The first one is obviously I'm just wasting resources, right? If I don't truly need it, then my database is going to sit there idle, eating away at both my credit card bill and my ability to deploy other applications to my cluster. More than that, if I did constrain my cluster to a certain size, and now I'm requesting large chunks of memory in CPU, what I'm going to end up with is unscheduled pods, and Kubernetes is going to come back and say, sorry, I still have room for this, and then I'm going to have to go back, basically go back to the drawing board, figure out, well, why am I requesting 5 gigs and what do I actually have left? I'll have to go through deployment and monitoring and do this manual process that I'll describe later to figure out, well, what is just right? And at the end, we'll get to a really nice automated process to get the just right. So that's on request elements as a baseline. Obviously, with everything, Kubernetes is always a more complex layer. And in this case, it's the quality of service. So quality of service is something that I will admit I was not aware of when I first started with Kubernetes. I heard request elements, numbers make perfect sense to me, let's just go. But actually, there's a second order of complexity here that is tied to the relationship between the requests and limits at the pod level. So quality of service is a definition at the pod level. And there are basically three classes for quality of service, guaranteed, burstable, and best effort. And they are just as they sound. Guaranteed in order for a pod to be classified as guaranteed quality of service, every container inside the pod has to have a definition of both limits and requests, and those have to be equal to each other. So what does that mean? It means that for every container, I say, I'm requesting one CPU memory, and my limit is one CPU of memory. That means it is the safest bet I can take. Hey, always, always, always give this container one CPU memory, always, always, always give this container two gigs, sorry, one CPU, one core CPU and two gigs of memory or whatever it is. There's no fluctuation. There's no questions on whether or not I'm going to eat more resources and have noisy neighbors or anything like that. What it doesn't hurt is rigidity and scheduling. So like we said before, in order to schedule a pod, in order to place a pod on a node, that node has to have enough resources to accommodate the pod. Because I'm being rigid with my requests and limits, I'm likely going to go over what my baseline is, which means I'll have a harder time scheduling. I'm going to have a very rigid scheduling of the pods, and I'm probably going to have to bump up my cluster a little bit or at least understand the exact resource utilization of my cluster. The next class of quality services, first of all, so again, just as the name suggests, these are pods that are meant to be, first of all, they're meant to be able to respond to a workload by upping their resources, but hopefully stay at a lower baseline. So where does this come in? You can think of maybe an e-commerce application that during the day sees a very low threshold of activity, and then 5 p.m., when everyone logs off Slack, they go in and they start buying stuff, and I get this sort of a spike. Well, how do I want to get my requests and limits in? I don't want to set them at the baseline level, because if I do, when I see that spike, things are going to start crashing. I also don't want to set them at the spike level, because that means that 80% of my day I'll just be wasting resources. This is where burstable pods are an option. Burstable pods are defined as those that the limits are actually larger than the request, but strictly larger. They're not equal, meaning they're not in the guaranteed. What this will do is it will allow me to schedule the pods more easily than I would have in a guaranteed quality of service, but it will also allow the pods to extend their resources if something comes in and there's a bigger workload. By setting the limits larger, I basically tell Kubernetes, the pod tells Kubernetes, hey, I can stretch up to three gigs, or I can stretch up to three cores or whatever it is. The downside here is a term I mentioned before, and I'll mention again several times, Noising Evers. So now you can imagine if I have multiple burstable pods on a single node, and they request more resources at the same time, because one thing we don't control is when do these resources get allocated? There are ways of controlling it. Those are much more advanced than sort of in real time. You can see a place where one service requested more resources at a given moment, basically throttled the node. Now I can't schedule or I can't even get more juice out of the other pods on the node. My third class, best effort. Again, it isn't a sound. No request or limit set at all. This is the most dangerous option, and I'll show you an example later. It's actually one of those things that if you were not aware of the best effort quality of service, you may have fallen into it because you just didn't set anything verbosely. Two things that are going to happen with best effort. So one, again, I keep talking about scheduling. There's something in scheduling that determines the priority of each pod. Best effort pods are going to be evicted first. Meaning if you didn't set anything and some other pod needs more resources now, Kube might just take your pod out altogether, and it'll get evicted from the node. The other side to that is by not setting a limit, what is likely to happen is you'll just grab as much as you can. We're now going back to sort of the old days where parts of the application can just grab as many resources as they can, thereby throttling anyone else that's on the node. So what usually happens? What do we usually do? And this is called these little sins are something that every single one of us have done. Again, if someone contradicts me with this, I'm going to call you a magician, but everyone has done this to some extent. Going with the faults or just ignoring it altogether. Well, I have my application. I used to run this. We've ran this application for 20 years. I don't need to change anything that runs just fine. Well, we've talked about this cloud native architecture is not the same as what you're used to running on monoliths, and it will require resourcing updates. At a bare minimum, it requires sort of fencing, even if the total number of resources is set, understanding the different components of the application and the resources required for those different components is critical. Something similar, hopping from a blog post. I'll show you an example of just that if I have time. Configurations that work for someone is the equivalent of it runs on my machine. The configurations and the actual parameters that you set in your application are specific to your application, your cluster, and most importantly, your workload. Even if you have the exact same cluster, but you're seeing different traffic patterns, you may be seeing completely different behaviors for the exact same configuration. You have to tune your configuration to the workload. Another third thing is not testing this, not going through a rigorous process. This isn't something that's historically been part of our toolkit as developers. We never had to think about this. We just deployed. Deploying blind is super dangerous. The first step could be things don't deploy. Honestly, that's even the safer option. The more risky option is it does deploy when you deploy it. And then at two in the morning, when someone on the other side of the world wakes up and starts doing transactions and crashes. So we have to make sure we tune our application to our specific needs, application, cluster, and the workload itself. Converse to that, what should we do? Well, first of all, I've mentioned quality of service. You have to understand the quality of service of your pods. Like I said before, when I started, I didn't even know what quality of service was. I didn't know that it existed. The class itself is just as important as the numbers that you assign to the request and limits. You have to think of the different pods and how critical they are to the operation of the application. Some may have higher priorities than others, and you have to set the quality of service in that way. Your critical pods, I'm going to say, have to be guaranteed. They don't have to, but they have to take high priority when it comes to the actual resources you allocate to them and the confidence that you have in those pods getting those resources. Noisy neighbors already talked about this. Depending on the number of your nodes and the number of your services, you have to start planning for what happens if these two pods fall in the same node. And the last point here is obviously dynamic loads. I've talked about the load itself. If you have a very static load that is constant throughout the day, there's not a lot of reason to go with burstable pods. All they're going to get you is just more noise in the system. If you have a way of tailoring your requirements of your resources to a good baseline, you'll save money and you won't have stability issues. Specifying resources explicitly. Someone tied to quality of service. Don't ever, ever, ever use best effort. Some people may yell at me for saying that, but I'm going to say that as a source of truth. Don't use best effort. It's not stable enough. And more than that, your fellow developers, if I come into a manifest and I don't see anything specified, I'm probably just going to gloss over it if I don't know what I'm looking for. Increased visibility into what the resources are is always good for you. The last thing that provides you, and I'll show you in a second, if you do have resources specified explicitly in your manifest, then at least you can tell later on when you go and you iterate on the configurations if you have either history, if you're going with getups, or at least a way to see and tune things individually. Last thing, which I haven't talked about a lot, and again, it's a whole sort of separate topic, is quotas. So if you really want to manage your cluster well, if you really want to keep a tight ship, set quotas at the namespace level. Separate out your applications to namespaces and understand what each application needs to get. If you set that correctly, if you know that this application is a whole, yet needs two gigs of memory and two cores fully, set that quota, and then you can backtrack from that to the different containers and pods and understand how to separate those out. So now let's talk about actual tuning and the actual process of getting to the right numbers. I mentioned before, if we have all these, what we call parameters, and I'll show in a second line, we have all these parameters, we have all these resources that we want to tune. The issue is the number of these parameters grows exponentially with the complexity of your application. If I look at a simple application, five services, I'll show you in a second, and I want to tune resource limits requests for those five services. So imagine maybe I make it easier on myself. I have five pods, each pod has one container. For each container, I want to tune requests, limits of memory and CPU. I'm looking at 20 numbers that I need to set, and I haven't touched replicas, I haven't touched JVM, if I have Java in there, I haven't touched anything internally. I will say me personally as a human, being able to see 20 numbers at the same time, all of which are correlated to some extent, obviously because when the application runs, there's a lot of things happening in the background. It's going to be hard. I don't have a good mental tally of 20 different numbers, and I can say, well, if I just tweak this one a little bit higher, a little bit lower, I could get to the right answer. Again, we talked about distributed systems. When you run these things, a lot of things happen in the background. It's completely dynamic. There's no way to just immediately know, oh, I understand completely how the application works. You have things that are out of your control, unless you really, really lock down how your scheduling works, how your resource allocation works, everything's going to be dynamic because that's part of the power of Kubernetes. Because the allocation is dynamic, and it's out of one shared pool, this tuning is moving to target, and that's what we talked about earlier with scaling, and I'll talk about that again in a second. Not all tunings meet the same kind of criteria when it comes to a workload. All that is to say, actually tuning, if you're trying to do this by hand, it's an incredibly inefficient and tedious process. Not sort of dare anyone to come and tell me that they actually like it. So where do we see this process come in, and where is it important to actually look into it? I will admit that a lot of us have not even thought of doing this a couple of years ago. We strongly believe that as part of your CICD process, as part of your CICD pipeline, there has to come a step in which you tune your parameter values inside your application. And where that part comes is between your unit testing. So when I build a new component, obviously I do some unit testing. I push it maybe to a staging area, and I want to start doing integration tests. The integration tests historically tell me, well, all these things work together. Data is falling from one to another, my requests are coming in, nothing's malformed, everything's okay. But as we've talked about before, that has nothing to do with your workload. It doesn't say anything about your application being stable or being able to meet multiple loads. This is where tuning the actual resource requirements and the different parameters inside your application becomes critical. As you do integration tests and as you sort of build the application back up, you have to put it on the road when you have to think of which resources am I allocating where? And will this meet the demands of the business at the end of the day? All that has to come before deployment, and all that has to be done continuously. This isn't a fire, forget, well, I did this once, now I'm done. The application evolves, the workload evolves, the clusters evolve, and the other applications around your application evolves, right? If you have 10 or 50 other developers putting stuff on the cluster, you're going to run through those neighbors all the time, you're going to have to keep tuning your application as you go. So all this process has to be as automated as possible. So some options that we have as developers. The first one I already talked about, Charlie, I'll just put something, hopefully I won't ignore it, at the very least I'll put some number to my parameters, and I'll deploy them across my fingers, hope nothing breaks in production. Easiest way to get started. Absolute worst way to make your time efficient and valuable. And again, I personally have done this manually. I hated it, because it's confusing, it's not intuitive, and there's nothing, at least to me, incredibly rewarding about doing sort of manual tuning and figuring out what is the memory and CPU that I need to have in this particular piece. An option that I particularly like as a scientist is do what we call a design of experiment. So treat optimization as a scientific experience. Say, well, I have some numbers that I need to figure out. I need to do this methodically, and I need to think it through. Maybe I have some things that I already know. Maybe I deployed this application before, and my DevOps team came back and told me, hey, we monitored this, and this pod is really getting throttled on CPU. Or, hey, we're monitoring the resources, and you ask for five gigs, but you're utilizing one. So I have a couple of ideas. I can take two or three parameters at most and try to come up with a grid of potential values. And just go ahead and run through all of them. Run tests and see what works the best. This works significantly better than doing trial and error. It's, like I said, it's scientific, it's methodical. At least you're thinking through the problem and you're getting some kind of information out. But as I mentioned before, like the higher complexity, the harder it's going to be to manage multiple parameters at the same time. Two or three is a good number. If you get to five, you're already starting to be thoroughly confused. The third option, and one that I'll show you soon, is using machinery. And I'll explain why machine learning is a perfect use case for this. The goal that we are pushing towards is running this thing automatically. No developer should sit there and try to think exactly if they need one or two cores or whatever. They need to understand the overall architecture of the system. They need to define sort of high-level definitions of this test, or what we call an experiment. Anything, go ahead and run this automatically. Intelligently, automatically, completely hands-off, all you have to do is say, hey, I want to tune these 10 parameters, and I think they should vary between X and Y, whatever it is. Tell me what the best configuration is. And as I mentioned before, this continuous optimization process should be closely tied to your CSD pipeline so that whenever you have a new release or you have a new tweak or whatever it is that you do, you run through this if you think that it's going to have a meaningful impact on your cluster or the other applications that are running. So quickly on the platform that we have RedScaps and how do we do it, I mentioned the words, experiments and parameters. What we provide is an ability to do this design of experiment but at a much larger scale, much, much faster and much more scientific than what I can do by guessing on a grid. You as a developer, like I said, you define a few things. Parameters, those are the things you want to tune. So memory, CPU for the following pods could be two, could be five, could be whatever you want. You want to tune the JVM, you can tune the JVM at the same time, replicas, disk size, whatever it is, anything that you can expose, the platform can tune. And the nice thing about Kubernetes is the whole point is to be as flexible as possible. We can expose almost anything we want in there. Once we have that definition, the experiment itself is completely automated. So the machine learning model explores the parameter space and it comes back with configurations to try. And we have our own Kubernetes controller, which I'll show you in a second, which actually goes ahead and tries the different configuration, each one of those we call a trial. As it tries those, the machine learning model learns from the performance of the application and it learns what the parameter space looks like. Now, one thing that's really, really important to mention here, we're optimizing towards something. I keep saying, well, we want an optimized application, but what does it mean to be optimized? And the reality of the situation is optimal application is completely up to you. And this is why before I said defaults, if you take someone else's defaults, you're also taking their assumption of what does it mean for their application to be performed. It could be low latency, it could be high throughput, maybe you just want to minimize resource utilization overall while the application sort of doesn't crash. All those use cases are valid and they're driven by either the developer needs or the business needs. And so we allow you to define the different metrics that you want it to enforce. So you can say, again, hey, I have these 10 parameters that I want to tune. I want to focus on two metrics. One is throughput. I want to get as many users in as possible. And I want to minimize my overall resource utilization. So maximize throughput and minimize utilization. And the machine learning model will learn the optimal results, or in this case results, if you have competing metrics for that and gives you the optimal configuration. So let me show you a quick example. I'll show you an example of the pitfalls and how we actually end up solving it. Some of you may know the Docker dogs versus cats loading up. Somewhat simple app, like I said, simple, five services, right? I have my back end to calculate the results. I have a Redis queue. I have a worker, a DB, and the actual front end that I serve to the user. And it's sort of up there. It's available. I can always go and pull this example online, right? So this is where sort of these blog posts come in. Hey, I saw this cool thing. I'm starting with Kubernetes. We would love to get this thing deployed. I go to GitHub. I call the repo. You know what? Even better, they have a whole folder on Kubernetes specifications. Fantastic. Here are the five services, DB, Redis, Result, VotingWorker. I go into the result service. Okay. Looks good. Hopefully you'll immediately notice that the word requests, limits, memory, CPU, replicas, nothing's there. All these manifests in Docker's repo, they're all going to be defined as best effort because none of the services have any kind of requests or limits in them. So what's going to happen if I actually try to deploy it? So I have a little cluster here. I have the application itself. I will admit I made tiny tweaks so that I can expose the actual loading service and show it to you. But really, all I'm doing is just go ahead and Coup Cql apply everything that's in there. And pray a little bit that it's going to deploy, right? So, okay, everything's created. I'm going to go to the cluster and see. Everything looks good so far, right? See the last few stragglers. You know what? I'm done. I deployed. I'm actually wrong. There's nothing wrong here. Everything works fantastically well. Well, like I said, this is part of the old way of doing things is assuming that if the application deployed, you're actually running well. So I'm actually going to go ahead and do a little port forwarding so you guys can see the application. So all I'm doing now is just forwarding both the result and the voting service. So I have these two guys up here. So, really, what the application looks like is this. It's waiting for incoming votes. There's no votes yet. And all it's going to tell me is what is the percentage of people who voted for dogs versus cats? Like I said, application is running. I can see it. I mean, more than that, I can just go ahead and vote. And obviously, I'm going to vote for dogs. Boom, 100%. Application is working fantastically well. Obviously, at this teenage, you have to do some sort of performance test on the application. So for that, I have Locust up and running. Let's imagine I wanted to expose this. You know, I want to put this up on Twitter and obviously have a huge following and have everyone vote cats versus dogs. I'll do a thousand users. Not even that much. Spawn. A couple of users a second. It starts warming. Okay. So far, so good. This thing is up and running. You'll see the votes coming at the bottom. 433 votes. It looks like I'm choking. Too many files open. My incoming connections are already starting to drop. My performance is already degrading. By doing this out of the box, I've managed to get 433 votes in before my application crashed. Definitely not something I want to do. So what can I do now? I can go into the logs. I can start looking at certain things. See if I can maybe get some more CPU. Is it the CPU? Things are coming back up again. Okay. Great. Something clear though. Should I go maybe to the database? Maybe it's the Python backend. I have no idea. I have no idea what's actually causing the issues. And there are plenty of ways to debugging it. My point is, I don't want to. This part, the part of trying to figure out where should I put in requests and limits for my application is not something that I find particularly intriguing. So I'll show you very quickly how to do this in a sort of scientific and automatic fashion. So with Red Sky apps, like I mentioned, we have our own controller, which I'll show you can grab from GitHub. You install the controller in the cluster. I think I have mine installed as well. But you can just see we have our own tool called the Red Sky CTL. You run Red Sky CTL in 10, 20 seconds and you have a controller. You'll see it unchanged because I think I already had it installed. But it takes the same amount of time. Once I have Red Sky CTL and I've logged in, we have a completely free tier you can use to connect to a machine learning model, I can go ahead and define an experiment. So I have my voting app. And like I said, I haven't defined requests and limits for any of my services. So at a bare minimum, I want to start defining my parameters and tuning my parameters for the different services. And I'll show you here, we have 10. So we're tuning replica CPU and memory for most of the services. You'll see Redis in the database and the backend and the worker. The only thing I don't actually tune is the front end in this case. And as I mentioned, the question is, what are you doing for? So in this experiment file, which is sort of our proprietary YAML to define these experiments, I want to do two things. I want to maximize my throughput, meaning I want to get as many votes in as possible. And I want to minimize, you can think of costs here as basically cluster resources. I don't want to pay a lot for this, it's just a hobby app. So get me as low a cost as possible. Once I've defined this, everything that we do is sort of good native. So I can just go ahead and apply it. And as soon as these things come up, you'll see pods come live. And what I'm going to do now is basically wait. I'm going to wait for the machine learning model to come back with a few suggestions. So this is the Redis YAML UI. You can see I already have an example that I started. This is the example that I just started running. It's going to take a while to run. Basically, what's happening now in the background is the controller set the machine learning model, the experiment said, hey, here are my definitions, go run this experiment and tell me what's going on. The machine learning model started suggesting what we call trials, which is to say configurations. And the controller tries them. So as you saw those pods come up, basically the controller is spinning up the application and loading it. So we actually have locusts served inside the cluster to perform a performance test on the application. So what you see here is a complete experiment. It can take anything from like 30 minutes to maybe an hour or two, depending on the complexity of your application. And each one of these guys is what we call a trial. And you can see here at the bottom the different configurations of each one of the trials. Like I said, we're tuning 10 parameters automatically. And at the end of the day, what you're left with is these what we call best. So these points, otherwise known as the Pareto front are the ones that can't be beat on both throughput and cost. You can't get something which is super cheap and super performing at the same time, but you do get the trade-offs. So instead of me going in and tinkering the 10 parameters and trying to figure out, you know, am I here? Am I here? Am I somewhere in the middle? I have no visibility at any of this. I can just get all of this automated and then find the configuration that I want. Go ahead, export the manifest and deploy and be done with it. So a couple of last notes before I open up for questions. I just want to talk about the machine learning model really briefly. Why machine learning? So obviously I'm heavily biased towards machine learning being, you know, having a background machine learning myself. But really this is not just to say we're doing machine learning for the sake of machine learning. The core of the issue here is the complexity and the number of parameters and really what we would call the dimensionality of the parameter space. I don't know a human that can fit in their brain, 10 parameters level of 20. And so we have to use a machine learning model that can handle that kind of complexity, that level of complexity. It can explore the parameter space way more efficiently than we can. If you look at the number of trials I had up there, I had about 200 trials. And again, I challenge every one of you to go through 200 trials and give me the same kind of results for throughput and cost. And the nice thing about our optimization process is it's completely like problem agnostic. I don't need like a whiz data scientist. I don't need a lot of data up front. All I need to know is what's your application, your actual manifest, and the parameters that you want to tune. You define the parameters, you define the metrics, and you define the low test and you just run it. So with that, I'll open it up to questions. Thank you all for coming. Again, I highly encourage you, if you're only taking a few things out of this, is think about your research allocation. Really think about how you tune it ahead of time, not as an afterthought, but something you do pre-deployment. Make sure you understand what your quality of service is. And really think about having this as part of your process. This tuning and optimization process is a critical piece of your CI CD pipeline in general. Cool. And with that, I'm going to go to Q&A panel. So first of all, what is the role of limits on the container level? I'm not sure what you mean by container level, but I would say limits as it sounds. The container, if you set the limits to be one CPU, all this container will ever be able to get as one CPU, will never be able to get more than one core, essentially. Same goes for memory. Someone asked if we have two containers in one pod, each has a limit of one CPU, request of 0.5 CPU. It's great. I feel like I'm in math class. Request of 0.5, see how much more will be reserved. So the word reserved is interesting. So really what I have is I have a total request of 0.5 CPU, sorry, I have a total request of one CPU. Right? If I have two containers, each one of them have half. If I can still do arithmetic, then I'm going to get to one core. The word reserved is, you know, I'm going to say it the other way around. So it's reserved. Who will look for a node that has one free core of CPU in order to schedule this pod? Once it's been scheduled, it can burst up to two cores, right? But at a bare minimum, it has to be scheduled on a node that has one free core CPU. Someone asked, is Red Sky Apps available for any Kubernetes distribution, be it on cloud on premises? Absolutely. You know, we support, we have back support. I forget which version, but really we go back quite a lot. When it comes to on-premises, if you're air-gapped, then come talk to us. Right now, the free tier offering is obviously hosted by us. So you need to have some sort of internet connectivity. So I suggest trying it on one of the cloud providers, but if you have a need for something more than that, we have complete capability to deploy everything on-prem. Did you auto-scaling real-time or only using experiments? Oh, I'm not sure exactly what you're asking, so I'll talk about a few things at the same time. So first of all, auto-scaling. I'm going to talk about the actual Kube native auto-scalers in a second, but before that, I would say for scaling in general, if we're thinking about what we talked about with varying workloads, there are ways for you to run multiple experiments that tailor multiple loads or run an experiment that has a variable load in it. And then what you can do is you either run an experiment with a variable load and you export the manifest for that, or something that I particularly like is doing what I call scenarios. Basically, I can run a scenario for low, medium, high or whatever that is and export all three and then save them. We currently don't have a way for you to switch between them in production because we don't work in production, but we're working towards sort of extending our pipeline completely. Right now, you basically have to switch between scenarios yourself. Basically, Kube-CTL applied both in whatever process you go to apply the different scenarios. On auto-scaling in general, I will say we're working towards optimizing your auto-scalers themselves. So as part of your experiment, you can tune the HPA. For instance, like if you know you're deploying with the HPA, you can actually expose the HPA's parameters in the experiment. And then what you get is an intelligent auto-scaler inside your experience. Basically, you're making your whole system, the entire Kubernetes ecosystem, more intelligent. With this work with Cloud Native K, it's a service, EKS, Rancher, yes to all of it. So we work ourselves, like our stuff is deployed on both GKE and EKS. We're perfectly fine with that Rancher is a good partner of ours. I'm going to say yes across the board. In the case of replicas, we're kind of load balancing that you consider how load balancing can affect vertical scaling. It's a good question. I mean, right now, if you go to our recipes repo, so you can see here RedScaps recipes, we have multiple recipes, including the one I showed you that does tune the replicas. I think we basically use one kind of load balancer. I honestly haven't looked too much into how that would affect vertical scaling, but I'm wondering if that's something that you can expose. I can tell you that we're working on adding, you may have already added categorical parameters. You don't have to put in numbers. You can switch between different types of load balancing if you have something fancy and even incorporate that in your experiment. How much does it cost? Free tier. Just go up, sign up, and you can start today. You don't have to pay us anything. Show me the ML for setting parameters and how you can set the limit request values in the container. All of this is online. I can go back to that soon if we still have time. I think we're bumping up on the R soon. The way you set the limits, actually, I'll just do it now. The way you set the limits in the container, here's the experiment file. This is the experiment that I started running and I said, here are the parameters. What really happens is as part of the experiment, we patch your manifest. In the back here, in the application folder, I have a bunch of manifest and we just go ahead and patch that based on the values that we get from the machine learning model. I hope that answers the question. How do you keep the system pods stable, like the auto-scaler pod, which is in the Kube system, if the auto-scaler pod goes down, then the skating advantage is dropped. This is just a general question that we get along the IWI. We find more and more people are interested in tuning more than just their application by actually tuning Kube resources. You totally can. I don't have a great example. Like I said, we have great examples of tuning the HPA itself. If that's part of your experiment, if the HPA pod goes down or doesn't deploy, our experiment will show that as a fail trial. The machine learning model will learn from that. The machine learning model, one thing I haven't mentioned, is it actually learns from failures. So, just keep trying things. If it notices space or areas in the parameter space where the application fails to deploy, it doesn't touch it anymore. But yeah, you can add more and more Kube resources to your experiment and just keep going with that. Does it work with OpenShifter? It just regulates. I think we actually just had the first experiment on OpenShift a week or two ago. So, it does work on OpenShift. If you have specific questions on versions and stuff like that, you can come talk to us. Previous question plus the CICD. Not sure what you mean. So, right now, what I showed you, sort of a standalone piece of what I consider to be a CICD pipeline, we're working on building integrations into common CICD tools like Jenkins or Circle or whatever that is. But I don't have that quite yet. So, we're just kind of throttling through graph using Kibana. So, I'm not sure if you mean in the experiment or just in general. I mean, obviously, if you have Prometheus, well, Kibana, not Grafana. Honestly, that totally depends on your deployment. Yes, you can find throttling. I mean, if you're looking to monitor your system and looking to specific things like throttling, I would recommend Prometheus and Grafana and sort of online monitoring. But that's a separate thing altogether. How do I manage two pods of the burst at the same time? Should we use namespaces to ensure these pods don't run on the same node ever? So, that's a good question. You don't have to use namespaces. You can use taints and tolerations to make sure pods don't land at the same place. Again, complete other topic. My first step would actually be to run an experiment with these two burstable pods and see how it works out. Because if you do have issues, those will be surfaced throughout the experiment where if we don't control scheduling, if we use the native Kubernetes scheduler, as you go through trials, some of them will have more scheduling issues than others and you sort of call us to the right answer. But if you want to make, if you know you have those, you just put taints on them ahead of time. How does it affect the HP and VPA? I just talked about this. Basically, we believe that you should be twinning those as well as you run through these experiments. How much research is the RedSky controller in Kubernetes needs? Very little. So, yeah, you saw the little RedSky system namespace in the RedSky controller pod. You can go to our docs, RedSkyops.dev. It'll list everything. It's incredibly minimal. I'm going to pause the point to a node. Okay. When the pod doesn't put to a node where the limit is higher than what is available on the node because of the actual error, I don't know where to go to the pod when it exceeds. How do you handle this? So, this is where it's, I believe what you're talking about is actually having a, well, it doesn't have to be a burstable, well, it has to be a burstable quality of service because if there's no memory available, then it won't be scheduled. This we're doing experiments with burstable quality of service is interesting. Again, the experiment, you can just run, we recommend running 20 X trials to the number of parameters. It gives you a really good coverage and the experiment itself will see it. So, if the pod gets affected or the app crashes, like I said before, the machine in the model will learn from it and so it'll keep trying and if there's no other place to schedule it, then you'll see your experiment is just not succeeding, then it'll help to go back and make sure maybe you don't have enough nodes. You got to figure out, like, you got to limit the numbers even more. How can you tie the patches into customized? We already use customized. I mean, if you saw me, I run the experiment using customized and basically everything that we do, you can then go ahead and export using customized patch. I hope to answer the question. That's all these questions I think are coming in different times. I think a couple of more. Is it required to have an account to use the application? Can we train the ML on premise? So, yes to the second question. I talked about that briefly. We can deploy the machine learning model on premise. Come talk to us for that. Do you have to have an account to use this application? So, in order to use the machine learning model, you just sign up for an account online. It's super easy. Just email. If you don't want to give us email, the controller itself is completely open source. You can go ahead and get the controller. And you can still run experiments. So, you'll see the docs to suggest command. What it means is you won't have the intelligence behind the experiment because you won't be connected to the machine learning model. You can still run, you know, so the second option that I mentioned of design of experiments. I think that is it unless someone has any other question. Cool. Well, thank you all for coming. Thank you very much. And I hope to see you again soon. Thanks everyone. Have a great day.