 Okay. Well, it's fine. So, we're going to turn things out to Paul Moray. He is actually a key contributor to many Kubernetes upstreams and he's specifically responsible for the SIG Related to multi-cluster, but today he's going to talk to us about K-nated He is one of our resident experts and I'm very honored to introduce Paul Moray. Paul over to you Thanks, uh, burr. Have you seen the short long meme that's going around? No The short hair long hair meme No, I haven't seen it. You haven't seen it. All right. So I can I got a neat trick I want to share with everybody real quick Which is I'm going to do the meme is that you show a picture of yourself with short hair And a picture of yourself with long hair. Okay. And the trick is I'm going to do this With in the same photo. You ready? Yeah All right, because it's quarantine, baby So we got the short hair. Uh-huh. We got the long hair and since we're not going anywhere We're growing it out on the sides All kinds of wildness and now we're going to just put that away But what's funny though is I was I was expecting different colors, but it looks like some color right now Colors colors for normal times Well, fantastic awesome stuff and actually I didn't haircut once during the pandemic. Uh, it's time for another one Well, I tell you what next time you want one Uh, we'll do a distance. I'll I'll get some clippers on like a broomstick and I'll cut your hair All right, right Paul you can screen share and now you're off and running Okay All right. Well if folks, let's see you can see my screen, right? Yep coming through Okay All right, let's see. Let's get this thing into Presentation mode and burr are we are we running a little bit behind? Like do I need to fit this into a shorter amount of time? No, no, you can go right up to about 11 0 5 Okay Got something in my eye here real quick. All right So, uh, good morning everybody or good afternoon or good evening is the case. Maybe my name is paul I lead our serverless engineering teams here at red hat and what we're going to be looking at today is, uh, the some details of how the k-native autoscaler works So if you don't already know, uh, k-native is a project that is focused on, uh, building blocks That provides some fundamental technologies for serverless and we make a distinction here between A lot of people when they hear serverless, they think of functions as a service and Uh, products like amazon lambda Um, when when we talk about serverless here, we're talking about something that is Uh, that is is less than that full experience that you might associate with a function as a service And is really about enabling some core technologies and core Concepts, there's two major functional areas in k-native. Uh, there's serving that's about Uh, that is really about scaling on demand Uh applications and there's eventing Which is about working with events that are emitted by different sources and things that you might want to do with them Today, we're really going to focus on, uh, the serving part And specifically we're going to talk about some deep details of how that autoscaling technology in k-native works So a couple things that I want to share with everybody as we get started here Uh, to kind of set the context the first one that I'll just mention quickly is that, uh There are a number of different ingresses that are supported in k-native For example, we got a few here on the side courier that's actually was created by our three scale team here at red hat Istio, which you may have heard of contour, etc Are all supported. There's others But the the fundamental thing I want everybody to understand is that the ingress Implementation that you bring to provide your ingress in k-native is pluggable Um, so just keep that in mind as we talk about some of this stuff Um in terms of api resources There's four that I want to talk about One is service and you can think of service as just a high level container that manages some of the Uh, the other resources that we're going to talk about Uh, you can think of it like a a resource like a deployment and Service sort of brings together the spec or the intention part Of two other resources that we're going to talk about. One of those is configuration Uh configuration is another resource that is is similar to a deployment And what configuration does is each time you update its spec and and we'll take a look at the spec of a configuration and And spec of a service each time you update a service's spec in the configuration part or a configuration's spec It creates a new revision So each time you update this resource it's capturing that update as a unique revision So if I update a configuration Directly or I do it via a service Three times i'm going to get three different revisions that are immutable snapshots of What I declared in the spec So as we update this object we get different revisions that the route resource allows us to direct traffic to So route is allowing us to do things like traffic splitting like for example if we want to send Say that in in the steady state we have a revision that is handling all of our traffic And we want to roll a new revision out The route allows us to send some of that traffic To the newest revision but keep most of it on the old one So that we can do things like test and see how the new revision is behaving before we direct all the traffic onto it So to just pull it back up to the service The service allows us to declare a spec for a configuration and a spec for a route It gives us one easy point That we can manage both those things from and I'm just going to escape out of this real quick and we're going to see a demo Um bur would you let me know please? Uh, whether my font sizes is viewable. Yeah, that's good Okay, great. So what I'm going to do is I'm actually going to use the uh the What I'm doing now is I'm going to run the the kn client It's the cli client for knative Uh, and I'm going to just show a really really simple example of creating a service Via the kn client So we're going to do kn service create hello is the name and we're going to use uh example hello world go image from Which is one of our project samples and we're just going to give it an uh environment variable to Let us Say What the response is going to say so since we're we're at dev nation. We're just going to use dev nation there So what it's doing is we've just created a service that service has created a route and a configuration The come the configuration has created a revision that captures our intent Uh like this version of our intent and we've now got a url that we can visit to just uh To hit that service and uh, let's see if I can get that font size a little bit bigger Hello everybody It's read our dev nation out of that environment variable. So it's saying hi to you Um, let's take a look at the resources that were created So this k svc is our shortcut to access k services or uh, k native services And we're just going to see Which ones we've got All right, so let's do We're going to get the yaml for this one everyone likes yaml. That's what I hear, you know, uh and we'll just uh Sorry, so we're looking at the K native service that we just created with this command and a couple things I want to point out here Um, we're looking at the objects spec Right now and you can see under spec if you're familiar with the uh, the deployment api a lot of this will look pretty similar um There's this one extra feature though container concurrency that talks about How many containers can we have going to handle requests? um But this is looking pretty similar to what we would expect if we were writing a deployment. So the api is very very close to uh, the apis that you might be Uh familiar with and then we've got this traffic section that is about configuring our routes and we can see here that we've got it on Uh 100 percent go into the latest revision So if we uh Yes, if you could bring your terminal window a little bit because actually there's video thumbnails covering your Ah, I see. Yeah, you could just change Yeah, all right So really quickly we'll just take a look at The configuration that created So we can see up here. This is a k native serving configuration And we'll just take a look really quickly at the spec for this We can see this is the spec that uh, was in that service that we just looked at We don't see anything about routes And we'll just take a look at uh, the k native routes really quick So again, this is this is sort of a child resource that is created by the top level service Um Now here's something interesting in the time between When I deployed this demo And now You'll notice when I tried to get pods in this namespace That all the pods had spun down Well, let's see what happens when we hit that url again And look at that just like that when we hit that url again What do you know our service spun back up? That's the autoscaler working Okay um So let's talk about exactly what happened there. I'll just get back into my presentation view here Um We're gonna kind of take a dive Into the autoscaler the autoscaler is the component that did that magic for us there are four actors in In the implementation of the autoscaler That I'll just quickly walk through and then we're going to see how they work together So the first one is the autoscaler itself The autoscaler's job is to collect and receive metrics from the relevant components that are uh, That are assisting the autoscaler and providing, you know functionality for different parts of the autoscaling tech So it's collecting those it's receiving those via web socket in some cases that we'll talk about And its job In addition to collecting and receiving these things is to make scaling decisions And then finally it carries out those scaling decisions by programming the kubernetes api server to change replica counts The next one is called sks for short Or serverless services and this is an abstraction on top of kubernetes services That allows us to control the data flow into the revisions that we have deployed Or to the activator which we're going to learn about in a second via its serve and proxy modes um Then finally or sorry not finally but pen ultimately is the case may be We've got a component called the activator. This is a data path component that's involved in scaling to and from zero Uh, it also performs some capacity aware load balancing to handle bursts Uh and is involved with uh the handoff when we scale back down to zero as well as scaling up and finally we've got the q proxy and This component is a sidecar to all of our user pods that are that are deploying our code for our service um Its job is to collect metrics that then get scraped by the autoscaler that first component that we talked about And it also cues requests if too many reach a pod at once um a quick note here You may have noticed that I haven't mentioned apa hpa currently Which is the horizontal pod scale autoscaling functionality in kubernetes. We don't currently use it The primary reason for that is that it hpa doesn't currently support scaling to and from zero at a ga level um, there's some initial treatment for Scaling to and from zero with custom metrics, but it's really only one piece of the puzzle and we're going to learn a little bit more about that um Just some history quick is that hpa is designed to scale based on cpu and memory metrics requires a custom metric server to scale based on requests Uh, the canative community felt that kpa, which is what we call the the autoscaling uh pieces in k native Were easier to follow and maintain than a flow using hpa HPA in the steady state and also, uh, there's a critical use case where we're going to see that being able to poke the autoscaler is important and uh HPA isn't set up to do that currently So with that footnote in mind, um, let's just uh Let's handle some of these scenarios. So we already saw Scaling up from zero. Let's talk about what happened um So just to orient us, I'm not sure if you can all see My cursor, but I'll do my best to kind of indicate Uh Indicate where I'm at on this diagram. So the first thing that we're going to see is the ingress That's how our request when I made that request from my web browser got to us And it went into the serverless service um the That request When it was made the serverless service was in proxy mode and proxy mode means that When this request came in from the ingress it went into the activator component So as we as we said a moment ago the activator is plays a very important role in scaling up from zero In this flow what happens when that request hits the activator is the activator tells the autoscaler it pushes A message to the autoscaler that says scale up from zero scale to one And uh, it's connected to the autoscaler via a web socket. So it's basically Handing the autoscaler and interrupt that it has to handle The autoscaler then is programming The kubernetes api server to say we need to scale up The revision that this is supposed to go to to one The activator buffers the request while this happens and then when we've got a deployment that can serve our request it Sends that request to the user's pod it goes through the qproxy and hits our application um Burr since it's kind of hard to assess whether the audience is following this Can I ask you are you following this so far? Well? I'm following it and I think i'm actually echoing through your laptop But uh, but uh, yeah, I know this stuff already. So it's hard to know how much people are How in-depth they are with k native? You know, and we just don't know right there was actually a ton of master courses that we provided We've done a lot of free ebooks So hopefully most of the damnation crowd is well versed in k native at this point because there's been you know Probably a dozen presentations on it. So but I do appreciate the the extra detail you're providing here Okay, great So that's scaling up from zero and we'll just go through that again And try to make it real concise so that everybody can follow it So before this picture before we start going over this picture Imagine we have zero pods running just like we saw a moment ago when I came When I came back from describing the api and I got the pods and they were all scaled down. Excuse me Um, so when that first request comes in we go through the ingress We go to the serverless service the serverless services in a mode that sends that request to the activator The activator is sending a message over to the autoscaler that's handled as an interrupt That says scale up and meanwhile the activator has the request buffered when the activator Understands that a deployment is available that can service the request It sends that request that it's been buffering onto the application and that's what happened when we scaled up so Let's talk about how these different actors in the system work in the steady state we talked about the We talked about the the serve in proxy modes of the sks And one of the things that I should have told you a moment ago That I didn't really complete and good thing. I've got time now Is that that final step after we do the scaling up from zero? um And this is this is a little hand wavy I hope to have a little bit more time to talk about bursting, but reading the clock I'm not going to have time to go through bursting. So let's just say after we scale up from zero It's probably good enough for now to think of that last step as being that the sks gets put into serve mode so if you remember In proxy mode the sks sent our request to the activator In serve mode, which is what we'd expect if you know if we're getting enough requests to remain Scaled from zero and maybe we get A new request load that that we can think of in in this picture that we're going to see um Since the sks is in serve mode It's going to take a request that comes through this ingress up here in the top right hand corner And send it directly in to a user's pod so Um, if we start getting too many of these requests What happens is that the q proxy Metrics are going to reflect these and remember q proxy is a sidecar to all of our application code And it's collecting metrics about incoming requests how long they take how many Requests are queued the autoscalers scraping all of these so The autoscalers are treating metrics from all these q proxies And if they start having to queue requests and buffer requests The decider which is an inner piece of the autoscaler is going to recommend Scaling up and creating more replicas of the necessary revision um, so We're just going to slide back on over to the terminal here and Let's take a look at another Knative service that i've got here Which is the autoscale go service. This is actually using another example And the reason that we're using examples today is that I Wanted to give something that was accessible that you can pretty much follow at home um So you can you can do this very same demo if you've got an open shift cluster if you've got a Minicube cluster kind cluster it should work on just about anything But this is a this is an example service that's going to allow us to pause As part of the request And what we're going to do is i'm just going to run the vegeta tool and We are going to See the autoscaler in action scaling up So In uh, you know do the magician nothing up my sleeve. We got no pods Running now Let's see what happens when we start running vegeta to give us some request load And that's a lot of beeps So we're going to stop that but we can see here that uh We scaled up pretty pretty fast. Uh once we started slamming requests into Into our service that gave us 10 pods. I cut it off and let's just see how many are still running Okay, we still got 10. We'll give it maybe a few seconds um, but what happened in Uh in that moment where I turned on vegeta, which is a tool that we used to Uh, we used to drive some load type tests and k-native is uh that the cube proxies For the revision that we're active started keeping requests the autoscaler was scraping them and said hey It looks like we need to scale this application up and gave us Uh gave us some additional replicas Um, so I'm going to try to just make it through the rest of this really quickly That was steady skate autoscaling and let's talk about scaling back down. So In the flow where we're scaling down back to zero the The autoscaler scraping those cube proxy metrics and it's saying hey, we don't we don't have any load coming into this application It looks like we can scale down Um, and what it's going to do is it's going to scale all the way down So it's going to program the cube api server to say we want replicas equal to zero And once that is scaled all the way down It's going to update our sks or our serverless serverless service abstraction to put it back into proxy mode Which is what is going to put the activator back on To the data path so that when that next request comes in We'll basically be back at this flow Uh And we won't drop any requests because we're scaling all the way down before we take uh Before we put the activator back on to the data path So we're going to make sure there's no drop requests and when that next request comes in We're going to do this flow again where a request comes in from the ingress It goes back into the activator the activator Pokes the autoscaler and says hey, it's time to scale up Once that scale up is complete the activator is going to take the requests It's buffered and send them back over to the user pot So, uh, burr, I hope I hit the timing mark. I think we've got a couple minutes for questions We we do but of course we We've severed the chat so let's uh, yes, we did but hold on so then I can turn the chat back on And uh, if there is anybody here Because we we had to turn chat off we've moved people around we set people to slack Yep, and um I'll add the link back to the slack chat And I'll jump check the slack chat slack chat. Good god, there's too many things Is knet of independent of istio is one question. So paul how would you answer that one? That's a very very good question. Um, the istio is the initial ingress implementation that when the project was first developed was added first But it is it is not Uh required. So as I said, uh, number of different ingresses are supported Uh, and for example when you use Knative on open shift, which we ship, uh in something called open shift serverless You will get uh, the courier Ingress and not istio. You do not need istio to use knative Exactly that's a good point and I'll actually paste a link Burr, I've realized something and I need to go back to my presentation because I forgot the most important part Which is the thank you will just allow me a moment to do the thank yous Uh, okay, so I want to say thank you for watching this talk I hope it was informative to you and I hope it was entertaining. I want to say Uh, thank you to my teammates and all my open source colleagues for developing such awesome technology Are you going to screen share when you did that? Oh, oops. Yeah You're making something really amazing to see here Well, just some amazing thank yous to amazing people. Uh, let's see if I can pull that thing back up here Okay, can you see it? Uh, there we go Okay Uh, so thank you for watching this talk. Thanks a lot to my teammates and my open source colleagues that developed this stuff Uh, I want to say a special thank you to marcus who wrote the doc that I learned a lot about this topic from Uh And the links right there, uh, and I'll I'll share the the url to these slides Uh, I if you found this content interesting, I I'd advise you to take a look at um At the that linked document. There's a lot more detail there um And finally thanks to marcus and evan anderson Who created the diagrams that I used in this presentation? Thanks a lot everybody and thanks for having me burr