 So this guy started yeah Carlos Santana. You can follow me on Twitter Paul from IBM both from IBM You can't follow me on Twitter because I'm not on it Let's get started without talking about cost start. So just an introduction We saw that some people were new to K native so to give an introduction of what is K native serving? It simplifies the deployment like mentioned before of Kubernetes abstractions It has traffic splitting which we saw a great demo by a boomer how they use traffic splitting and then rollouts and One of the my favorite features of is auto auto TLS HTTPS. So you would define your service and K native serving takes care of defining the route and the traffic aspect of it and the configuration As always in Kubernetes CRDs you can compose them Separate things routed and configuration, but we suggest to use service or service We have seen many talks about that is a scaling to CEDO But I think the most difficult problem from scaling to CEDO is scaling back to one like I always like to joke next slide So just for reminders and I get this question A lot is people ask us like how do we how K native skills to CEDO? And I usually kind of like I said well an interesting part is how it scales to one It's the interesting thing not a skill to CEDO and some people actually think we have joke about this that this There's some magic and sprinkles in the cluster that makes this happen, but actually there's engineering involved right and if people want to get You know contributions and learn I know a lot of people want to learn how it works This is a good opportunity to get involved and actually help with coding testing and Involving serving so the working group is always looking for contributions So one simple example is like the life of the query wanting to make this a little bit simpler for people to use But when you start with a Kubernetes cluster, you usually have a request coming in into an ingress in our case will be something like contour courier Istio you don't have to use Istio We have a service It's a type of service, but it represents the same abstraction, but the key aspect is there's no pods So there's a deployment out there for every revision, but there's zero pods So how do we get that request to a pod that doesn't exist usually that's that's the key question And the answer to that is the activator and I love the names that we give these two guys activator all scaler they sound like Transformers, so I'm a fan of transformers So we have the activator and activator is in charge in there of like if there's no pod to receive this request I'm going to go into the middle there So the end points get configured to be able to capture that request and take the the aspect But somebody needs to get one pod up. So this is Kubernetes. We're managing Kubernetes So that's what the aspects of Kubernetes native is current K native So the activator needs to tell like I need the first pod. So how does that happen? So that happens through The activator communicating with the autoscaler and the autoscaler like normal Kubernetes was said that the replica said to one and This is where things start slowing down telling Kubernetes Give me one pod and eventually a pod would exist and when the pod comes up We have to bring up minimum two Containers if you see needs to we mesh with three But in our simple example will be a the Q proxy which is another component of K native This is the explanation of how to scale to one and then our application So that our application is the one that that boots up and and comes up the Q processor Usually spits up very fast. It's a very simple Low low container But the applications defines how long your pot will come up and be ready and then the Q proxy start like pinging the app Are you ready to see if we can receive that once it does that? The activator says okay I can send start sending requests to the Q proxy and the Q proxy is there to make sure the concurrency For the autoscaler to be scraping so the autoscaler now start getting the metrics from that Q proxy and Decides when is enough traffic coming in that you need two replicas and then the next one comes up You get you get the next deployment you hit your cost start depending how warm a type of levels of heat do you have? And then the activator sends that to the Q proxy and eventually it gets to a stable state stable state that we The system can get to it that then the activator is not in the critical path and it goes there So this is kind of like one of the latest examples and somebody was saying there needs to be a way to like We need more type of these diagrams to explain to newcomers how things work inside So they get used to in terms of using and an operating it in terms of a operators So that's that's the service once it gets into a stable state And there's enough are you said? capacity To sustain the load then the captivator doesn't have to be in the middle, but Activator is a primary thing there. So leave it there. What happens there when we need that first spot So just again talking in terms of timing What are the things that happens to happen between when the request is received a when the request is processed across kind of This but just to kind of recap here. Let me speak in the microphone The request comes in it gets routed to the activator the activator triggers the autoscaler the autoscaler updates the replica count The pod is created and ready the activator knows the pod is ready the activator forwards the request to qproxy qproxy forwards it to The app the app responds to the request and the request is processed So this is just kind of the list of the things that have to happen if we were to represent this more and kind of a you know How long each one of them takes you would see that the pod being created and ready is what takes the longest amount of time Now why is that that's because there's a whole lot of things that are happening there We have to create the container we have to perhaps pull the image if it's not already cash on the node We have to wait for the containers to start. We have to wait for all the probes to run There's a lot of things that are happening there and there's a lot of kind of different scenarios Carlos mentioned the levels of heat that we're pulling from a presentation that Matt had made previously We'll call them different kinds of latency types And what this means is it's really just referring to the different types of scenarios on a particular node So there's kind of two kind of main variables here one is the container running on that node Yes, or no to has that image been cashed on that node already? Yes, or no, and so we'll call a cold start is when that node does not have the container Cashed on it and or the image cashed on it and the container is not running. That's the cold start That's what takes the longest amount of time to happen Next to be what we call a warm disc. That's when the image has already been cashed on said node But the containers not actual actively running A warm memory scenario, that's when the containers been paused so the containers up, but it's been paused It's not kind of actively serving requested as it's being paused by the container runtime or something like that And then finally what we call the warm CPU scenario where that container is actively running and can serve requests kind of immediately So thanks to Matt for kind of those categories there You know that's kind of the problem space that we find ourselves in now The question is how can we make things faster? So back to Carlos. Yep. Thank you So what what this you can use in user land and we get a lot of questions in in slack About cold starts and one of them is image pools so one aspect of this in the general general terms for the talk is Looking at how to speed the cold starts, but also how to avoid cold starts If they can path if they can happen infrequently then you are in a better better luck with that So image pools, there's a lot of talks and a lot of conventions actually a lot of tools a lot of innovation in the CNI space most of the OCI space and the cloud native space around How do we pull the bits that I need? It's like a like a star GZ Always don't use always pull and always pulls is also something that some users cannot avoid because You're working maybe in a multi-tenant environment So you you don't want another pod from another user or bad actor to like use your image that you catch So you always as always pull, but if you can avoid that always like don't use latest tag And catch the image on the notes. That's kind of like the the best one So another one is like getting the image closer. So I'll talk if on a few of them The other one is in it optimization. It's like what what can you do in your? Code or your framework or the type of initialization that you're doing right? What are the things that you're loading at boot time that you can maybe do later? or maybe you don't maybe the type based on the type of request you never actually there's no co-path in there and The last one also is hidden costs Some people are using certain components or frameworks or something from a cloud provider that they don't know that is affecting their co-star So I'll give two examples of that around is still And and see and I and then I like I said The other strategy is try to avoid the the co-star right instead of like They they are costly so see if you can modify maybe the concurrency and all the aspects to just avoid them So we'll touch on some of them and this is like a Lot of users need kind of guidance and what are things are available. So one is always Forcing the image the image pool at deployment time. This is something that you can do On your own like for example, you can deploy with the CLI the candidate service and then behind the scenes Try to deploy something that will go to certain nodes Like maybe your your services run in certain nodes from your cluster So you create a deployment a demon set that just pulls the image, but it doesn't do anything There's some all also like you can do things like deploy the service But never make it run the first pod There's a flag and an annotation for that and also we have in K native a CRD or CR gets created per revision and he has all the information from I guess the parent the owner that Created that revision and he has the URL that the controller does the tag resolution So we always like try to match the tag to the chasm. So we give you have the the long Image tag and you can write it your own controller or or That that knows what are the images that need to be downloaded and So there's some community repose to have an implementation of that But it could be as a simple as watching this resource that it comes up that can it if creates for someone to create to implement it so that's one ways of and and Again, even if you have an image registry that is located in the same Availability zone or VPC or from the provider It would not it would not solve all your issues just by downloading the image Sometimes the image is very small and that's not where the most of the time is spent and pause going to talk about Where is that spent? Some languages I think somebody asked earlier about Choosing a language. I understand that some companies or some projects They don't have a choice to select the language But certain functions like Marisa was explaining some certain functions are maybe done in async way And they can choose a language and remember other functions can be written in other language So you can have a polyglot of languages But be careful because some languages like I show here like for like a straight Java We'll take a long time But if you use something like grow VM and compile that with Quarkus or just grow VM You can you have a faster speedup and this is kind of like comparing the the boot time for different languages So that that's part of the innate optimization there And then another one that people I guess is is is for some of us that we've been involved of trying to solve this issue For everyone in the best way possible Some users are fine to set this some knobs that we have but the thing is sometimes they're not aware that this knob exists So for example on the last talk that we're talking about setting the mean scale, right? Minimum set of replicas and some users are okay at this time by paying that that penalty of having a pod Running 24-7, but it'll be running 24-7. You can't even not shut it down The other way that you can have is another knob is scaled down delay, which is Once the Q proxy reports that there's no more Request coming in that is okay to scale to zero it tells the system Hey wait this amount of time maybe you wait ten minutes instead of being the default the default is very aggressive Wait this amount of time before you shut down real terminate the pod and that could be a setting that can avoid the the Coasters that we could say avoid that Have less frequency of costars and the last one it could be like maybe you want Not every pod to be like left longer But maybe the last part of the revision of the deployment leave it Alive a longer amount of time because there might be a request coming in likely Coming in and that that avoids a costar. So it's about measuring this knob So always measure and run a load test that looks like there your your load if it bursty and Measure what you change but there's knobs that are available and sometimes we don't mention it because we want to get to that Nirvana place that co-stars are like You know physics is zero right and that's not possible and And this one is one that I think For people using is still not everyone uses is still but it is still be careful when you add is still and you have it On top of that use it in the steel Mesh that you're adding another container that needs to boot up that needs to do things in your IP tables. That's the unique container Plus you have your east to side card and plus the two Containers so everything needs to come up and be ready for that request to go and get processed So that will hit your costar and you just have to measure how long That will be for you and actually we have a A tool called care Cape Earth that can help you nurture that the other one that I Recently found out about doing the end user interviews, which by the way, I'm still looking for end users grab me to schedule an interview is type of CNI's cloud providers or some cloud providers would assign you like a Elastic IP or real IP And that would take more time to talk to the cloud provider to get to the pool to get the an IP assigned to a Pod and this is some sometimes a you this came from a user that was moving from burn metal on premises to on cloud And they haven't modernized all aspects of their security model So they need that pot to have a certain IP address So that's disgusting them the costar actually it's a it's an end user that is in production Which I was super happy to hear about that But it's something that it came up in the conversation that is it's affecting them in costar So watch out for those type of things of what are the things that container D or Cubelet are doing with a cloud provider that might hit your costar So sometimes it's in your hands and sometimes it's maybe modernizing the way you deploy these worker nodes And then I'll let it to Paul to see what we're doing with upstream So Carlos went over a number of ways that you know you can do things right now to kind of improve your container start time I want to talk about some things that you will be able to do very soon That we're working on both upstream and in Knative to make things faster So an upstream we've been starting to work with the Kubernetes folks to see you know What are ways that we can get containers to run faster? So Marcus went to signal gave a presentation on what kind of Arcade native use cases are and we've been working with them to Try to get some of these things upstreamed and find those things out We work with Mike Brown who's a container demaintainer who's got a number of great ideas that we're making use of so I mentioned a couple of them here Just so that we're aware that these things are coming You know we talked about the the issue of if you have to always pull your images You never want to use the latest tag because that takes time to have to pull an image But it might be a security requirement if you don't want tenets to be you know using that cached image on Your thing so there's this cap out there for ensure secrets pulled image that will basically allow you to not have to use the latest tag oops Not use the latest tag but still be able to so he can kind of catch your image But it'll still do the security authentication check in your image That's gonna help speed up image pools not having to use the latest tag there There's a number of performance improvements that are happening in the CNI layer There's PRs out there that have reduced you know when you're loading your CNI plug-ins not loading them in Parallels opposed to in sequence which speeds up the runtime yourself Disabling the dad which has dropped about a second off of the runtime Which are very good and then in terms of cubelet performance You know one of the things we do in Knative is we run probes to make sure things are up probes run at a second interval So if you know it's not ready when the probe it's first time you got to wait another second for that thing to Run again, we've got a cap out there now To run probes on a sub second interval so we can respond even faster to things like that And we presented the sig node I think two weeks ago some of the red hat folks Derek and others Mentioned that they were working on us a similar cap on the evented plague Which is gonna use cubelet based on events as opposed to kind of the the polling methods that should speed things up as well It shows them things upstream in Knative itself work against performance enhancements as well We're working on more probe support adding startup probes Which hopefully make things a little faster and then two things that I kind of want to demo for you now The container freezer which is something that Jules had poced a while back But we've got it in an alpha in the sandbox and then k perf which is a performance benchmarking tool that we have So the container freezer just this is you know, this is similar picture We saw before this is kind of how Knative works We've got the activator the autoscaler you've got a pod that's running what the container freezer does is it runs a separate Freeze demon and what that demon does is it freezes? The user pod so when a request comes in still follows the same Process gets the cube proxy the cube proxy calls out to the to the demon sends it an unfreeze request We then in the container runtime send, you know a resume request in container D. It's resume I think we've got there's a PR out there for cryo As well that unfreezes a container allows it to run Then the request goes to the application as normally nice. My this here is you can leave your pod running And it's not, you know actively running you pause it in between requests when the request is finished You know the cube proc sends a request back out freeze the pod pod gets frozen So that's kind of out there in an alpha right now And people could use Right one of the nice things about this is because you're this is different between kind of that with the you know the the figure we call the warm memory warm container Responding to a pause or a pause pod versus not knocks. I can't add in my head, but two seconds ish Off the time so it definitely speeds up the the the response to a request when using said thing right and Then kind of finally k-perf. It's our performance benchmarking Oh, well, oops, I did want to kind of just kind of demo kind of what the container freezer looked like really quickly bear with me One second, so I'm just gonna kind of set this up really quickly just to kind of show kind of how this thing works So basically just kind of a basic service here called sleep talk or the sleep talker just kind of ticks on a thing So I'm just kind of show this works really quick. So this is just running the log from said pod You can't see anything. I'm happening right now when I call on the So you're gonna see it's ticking and you can't really read what that thing is as it's tiny But basically what happens is when the container gets its request the pod runs you see it in the logs We'll show that one more time real quick Let's tick tick tick then when the screen stops Things up such as kind of a live demo of it actually pausing a container as it runs K-perf, which is our performance benchmarking tool You just kind of see here. I'm not gonna run this up But basically kind of you can run a number of pods gather benchmarks on them and kind of compare them against it here We're just showing kind of Pod creation times you can see the yellow line at the bottom. That's when the pod is scheduled the Orange lines when the containers ready so you can see kind of the amount of time that it takes and you can see the different Time to that it takes for us the queue proxies are at the user container to start So yeah, and that we can run that in the CI I made her did some good work adding that Into the CI so we can test kind of how each PR that we add Improves a performance or doesn't improve the performance as the case may be I think that covers it K-perf generates that that's HTML graph is generated by the tool So it's not Something that you have to do anything about it is the tool generates that HTML also does the does the load load load testing and It's a repo that is looking for contributors So PRs are welcome So we have yeah, that's one of the one of the repos that if you're new to to the community And it's something that that you want to get started. It's a good one to Small one and scope that you can help and and get started. So with that I'm glad that we cut the slides in half. I thought I'm going to go over Questions I think that's it That's it Does anyone have okay? Hey, hello So just wanted to understand exactly like what does happen behind the scene when you say the container the Pod is paused like I just wanted to understand in terms of like when it autoskills down to zero We are saving on resources with respect to compute right like is that a Similar thing that we can achieve during this because could the coast are definitely guests Of with the container pausing and then pausing but what what exactly it does happen with respect to the compute resource saving Does how how how does that work like because I was actually looking into the repository I could not find a lot of documentation around how it works Yep, so the question is kind of when you freeze a container what happens with the resource usage So right now what does is it pauses CPU usage? So the memory still is allocated to the pods you're not saving on Memory but you are saving on CPU so this is you know it prevents Bitcoin miners for example if you've got an idling pod They can't you know, there's no CPU cycles that are spinning so kind of at the moment It's really just saving on CPU resources as opposed to memory And there's some things maybe down the line like maybe like cry you or something like that We might be able to kind of also save on the memory, but for the moment. It's just it's saving CPU cycles Okay, it's similar. It's similar if you do like a container have different states the container can be stopped Can be can be running but but also there's another state that is pause So if you are familiar with Docker in Always been there I think you can do Docker pause and basically that's essentially what is what it's doing But we're doing it with container ID and there's a PR is the tpr open, right? Yeah, somebody from the community did a PR to add that to cry all So it's a an API that the demon calls to cry saying please stop this container from this pod And then and then would on passes or pass on on pass. Yeah, and that was just added to Cryo in their 124 release. That's Was recently added because I checked and I described supports this and then I went over there and it was Was recently added and then they jump over and then did the PR Just to mention because we're talking about you know contributors This was a contributor who wanted this function for cryo. It wasn't implemented in cryo So whenever the cryo did the PR to add the implementation in cryo and then wrote the PR for the container freezer So a great example of community involvement any Okay, I think anyone else good questions to Carlos and Paul Evan yep So What What barriers do you see to turning this on by default in Knative I think one of the big barriers right now is around kind of the probing interface right now We were on a readiness probe that checks to make sure the container is ready And if your containers paused the readiness probe isn't like that Which is hence part of the reason why we went on and when I add startup probes so that we can you know, perhaps make the readiness probe Not always there by default So I think that that's I think that's the biggest thing and then honestly like we need to test it a lot more It's it's alpha for a reason because it's not really been it's still Still though the ink is still wet on that one as the case may be And I don't know if the folks have an answer for that But another one would be like take advantage through the cubelet right if you we set pots to have certain limits For CPUs and you have a node with four parts that are pause You can fit other parts in there So how do we teach Kubernetes to be aware that there's Paul pots that are pause not consuming? CPU but it's consuming memories that it can squeeze something in and then maybe looking into like The feature about moving a frozen container or pot from one node to another pod That's something that I call it the V-motion pod and I'll be remote V motion is a VMware. I get and getting dated That aspect but it's the management in peace and the probe But the first one to to kick out is crew Prax is not even aware that these container is frozen So I don't know Matt has more to add. I Thought the slide said that the queue proxy requested that it be frozen Sorry, doesn't the queue proxy reach out to the freeze demon to tell it to freeze it It's sense like a like an event of the concurrency It's part of the handler chain So we kind of add that in there So the call will go out and we'll freeze and then when the request comes in it'll trigger it again to unfreeze So it should know when it's frozen Yeah, I think we we got a flag and it has said this first I don't think there's anything explicitly in there at the moment, but yes, I could know so I mean it also proxies all of the probes So it could just stop probing Yeah, we talked about that, but I think the well that's under discussion, but yeah, it technically could yes, okay So I'm curious what how the cry Interface for freezing is occult at the node level and And how that's expressed in terms of the privilege the freeze demon needs Is it like I assume it's mounting something like a unique socket to talk to The Christ yeah, it's mounting a unit socket and then we add the it so it runs per node So it's a daemon set it runs per node it and I think it makes the call over the socket and we Have to go back and look it's been a little bit But when we mount the secret in a in a volume that then The service account has access to So that that may be another thing to consider in terms of getting it on by default as some folks may not want You know a new demon set running with you know some significant level of capability on every node But I think it sounds super cool, and I'm glad that that landed in CRI Thanks Anyone else one two three, okay Thank you Paul and Carlos for your talk. Thanks everyone