 Hello everybody, thanks so much for joining this session on Nodeless Knative just in time all the way. We're all here for Knative Con, so we don't need to be sold on why we shouldn't have always-on applications. It's about time to move from always-on apps to just-in-time apps. So treat apps as pets, not as pets but as cattle. Looking at the infra-backing serverless applications, Kubernetes gives you a lot of operational simplicity, so it's awesome to run your serverless applications on Kubernetes. Again, we're at Knative Con, so I don't have to tell you on that. But why stop at Kubernetes? Why not take the concept of just-in-time all the way down to Kubernetes in-fry itself? That is where Nodeless comes in. Nodeless basically looks at compute as commodity and not as pets. So the idea of Nodeless is to take your compute that's backing a Kubernetes cluster or a fleet of Kubernetes clusters and convert it from being hand-managed and treated as pets to commodity that comes up just in time and goes away when it is no longer needed. That's where Nodeless comes in. So Nodeless can work in two phases. At the first phase, if you think about a single Kubernetes cluster that's backing your Knative functions, you don't need to have any compute nodes backing your control plane at all. The moment a function starts up and its corresponding pod is started, just-in-time compute, that is bespoke and cost-efficient for the pod is provision just-in-time. It's a regular kubelet worker node, and it joins the control plane. Your pod is shipped to the just-in-time provision compute node, and once the pod is terminated, the underlying compute automatically goes away. The nice thing about Nodeless for a single control plane is that it will pull from the latest and greatest fleet of compute that's available in your Cloud environment. Whether it be ARM device shapes or on-demand spot, Fargate, or any kind of compute shape, that is available to your Kubernetes cluster account that will be used to source your just-in-time compute. So let's look at what are the options for using just-in-time compute for a single Kubernetes cluster. There is the default cluster auto-scaler, which does bin packing, which is packing of your pods into larger compute nodes. But there are also other options that do bin selection, which is basically provision a single compute node per pod, which suits really well for k-native functions because the form factor is pretty small for these pods, and there are economical compute shapes like ARM shapes and smaller size on-demand and spot instances that are really good fit from a resource footprint point of view to fit your function pods into the just-in-time provision compute. There is carpenter project from AWS, there's autopilot from GCP, and there's also Luna, which is a Cloud Vendor agnostic and control plane agnostic project. So let's look at how just-in-time compute would look for a single Kubernetes cluster. I hope the font size is large enough. If it is too small in the back rows, can you please let me know? It's okay. Cool. So out here, I have a single Kubernetes cluster that is running on AWS, on EKS, and it has two k-native services running Hello and Hello k-native con, and I'm looking at the pods in the environment and also the nodes in the environment. So this Kubernetes cluster is running Luna, which is one of the node list options for any public Cloud control plane. There are two compute nodes that are currently running, and these are running system resources, system pods, et cetera. Now if I go ahead and curl the k-native con endpoint, what we want to see is we want to see just-in-time pod come up, and it is in pending state. That's because the node list component looks at the pods resource footprint and it provisions just-in-time compute that is right sized and also the most cost effective for the pod. So for the first connection, this would take a while because in my current configuration, I don't have pre-warm nodes, but there is also a configuration knob that you can turn that will have pre-warm nodes so that you don't pay this cold start overhead. Once the compute node is provisioned, it should be available as ready in the bottom window, and once the node is available, the pod would be scheduled to the just-in-time provision node and your app should get a response. So for the first call, we might see a timeout, but from the second call onwards, and for auto-scaling, it should respond as soon as possible. So the kind of worker node that is provisioned just-in-time is a regular kubelet worker. It is nothing special. It has the regular kubelet worker stack running on it, and it pops up. It's the most cost effective compute node for the resource footprint for your application. For example, if you need one vCPU, one gigofram resources for your application pod, the compute node that's provisioned will be one vCPU, one gigofram, and it can be an on-demand instance. It could be a spot instance. It could be a Fargate launch type. Whatever launch types are available for your cloud account for that cluster, they will be used for sourcing the right size compute. And once the compute is available and ready, the pod will transition into pod creating, container creating state. There you go. So the pod has transitioned into container creating state, and once the container is up and running, we should see the response. So let me actually create another curl, my second endpoint, which is hello, and we should see the same workflow get repeated. So the second pod is in pending state, the first one is in running state, and we should see a fourth compute node pop up just in time. There are also norms that you can configure where compute nodes after the pod terminates, the compute node is still alive for a little while, so you have a pre-warm node if the traffic is spiking. So there's a lot of throttling that can be done on the compute node side, so you're not paying for cold start times. I'm going to pause here to see if this makes sense because we want to clarify that node. Let's make sense for a single cluster before moving to the multi-cluster scenario. Are there any questions? Nope. So with this, I guess with your deployment, is this assuming that you always have defined requests and limit types on each deployment? That's a really good question. So by default, it works off of the requests and limits, but if requests and limits are not set, a vertical pod autoscaler will right-size the requests and limits. So if the pod, for example, if you did not specify pod requests and limits, you notice will by default pick the smallest instance type, and vertical pod autoscaler will notice that it is using more CPU or more RAM than what is available, and it will adjust the requests and limits for the pod accordingly, so that the next iteration onwards, it is going to get larger compute shapes and not default super small compute shapes. Does that make sense? Yep, and what is a store that like inventory, like when it takes that snapshot and then recalibrates it essentially, where does it put that, like how does it know for the next instantiation that it must have more compute? Yeah, yeah, yeah. So that is vertical pod autoscaler that is typically used in conjunction with node list. So the way vertical pod autoscaler is a Kubernetes project. So it comes in three modes. The first mode is do not change the resource spec for the pod. The second is forcibly terminate the pod and restart it with the new updated resource recommendations. And the third one is only apply the updated resource recommendations when the operator restarts the pod. So based on how aggressive you want to be with adjusting your resource recommendations, you can pick one of the three. The most popularly used option is restart when the next, don't force restart, but apply it when the next time the pod is started by the operator. Does that make sense? Yeah. Are there any other questions on single, sorry, hold on one second. Can we get that on the recording? So just to clarify, you're running full Kubernetes node for this function. What's that? Are you running full Kubernetes node for this function? Yes, yeah. But are you worried about overhead, like Kubelet take resources, Docker take resources? Yeah, yeah. That's a really good question. So Nodeless actually does a cost benefit of whether it makes sense to have one compute node per function or stuff multiple functions into a single compute node. So it does make the cost benefit analysis of should we stuff multiple functions into a single worker node or does it make sense to have one function in one worker node? So it makes that call based on the resource footprint as well as the behavior patterns of how the pods were started and at what rate are they coming in? So it doesn't always do one compute node per pod. It does it when it makes sense from an economics point of view as well as the rate at which the pods are being provisioned and terminated point of view. Does that answer your question? Yeah, yeah. So it's not, it obviously doesn't make sense to have one compute node per pod if the resource footprint is super, super, super small. Any other questions on the single node? So we would eventually see the two nodes that were provisioned for the functions. This one, the 40.74 and 71.251, they should eventually go away when we don't have any more traffic coming in. So we'll revisit this window after a while. So taking it one step further, do we really even need control planes that are sitting always on? If you have, if you want to serve your functions on backed by hundreds of Kubernetes clusters, should we have these control planes that are always on? And why can't we take it one step further and have just-in-time control planes that are popping up if there is a workload that is scheduled to that control plane? And if no workloads are running on the control plane, why should we even maintain a control plane and why should it be on? So a node less when applied to a multi-cluster environment, it provisions just-in-time clusters. The control planes are provisioned in the right cloud provider in the right region. If a workload is scheduled to that control plane. Let's say your workload needs arm, device, shape, X or GPU shape, Y, and those shapes are only available in region A, in AWS and region B in GCP. The multi-cluster scheduler for just-in-time clusters would be smart enough to figure out, hey, this function needs these resource shapes and these are only available in these cloud providers in these regions. So I'm going to spin up just-in-time control planes in those regions and schedule my function pod to that particular control plane. And once no workloads are running in the control planes, the control planes themselves enter standby mode. So they won't be always on. So you're not incurring the overhead of maintaining control planes and figuring out, hey, which version, what are the security patches that are applied to all the compute nodes in this control plane, et cetera. So let's go ahead and see how this would work. The environment is slightly different from the cluster environment, from the AWS environment. So here I have, let me bring it up to the top. So here I have a federation of two clusters and these are both kind clusters running on my laptop. And you see that they both have ready set to false. So the control planes are not ready. They're simply in standby mode. Idle set to true, which means that there are zero workloads running on the control planes and they are on standby, which means that if a workload happens to be scheduled to this control plane, it'll come back up alive. So we also want to make sure that there are no workloads running on this federation of clusters. So we see that there are no pods running. Let's go ahead and create an application that's scheduled to, let me create nginx that is scheduled to this federated Kubernetes cluster. And what we want to see is that one of the clusters should get out of standby mode. Let me see where the pods are running. Okay, so the pod got for nginx deployment. The pod got scheduled to the first cluster. So it got scheduled to the first cluster. So if we look at the get clusters again, we see that the first cluster is now ready. It's gotten out of standby mode and it is now ready. And it is running the nginx deployment pods. So if we delete the nginx deployment, we should see that after a while, the first cluster should enter standby mode. So if we delete the nginx deployment, we make sure that the pods are terminated. What we want to see is let's watch. So after a while, we'll see that the cluster one, because it's not running any pods, it should transition to standby set to true. So we'll have just in time control planes that will come up and disappear based on the function lifecycle. And if the function happens to be scheduled to this control plane that has the resources that are needed to run the workload. I'm going to pause here to see if it makes sense and if there are any questions. So let's go back to takeaways. We'll revisit the slide in a bit. Always on apps, if possible, it's easier and simpler and more cost effective to move towards event-driven functions. And that's why we are here and backing your serverless functions by Kubernetes simplifies your operations quite a bit. And why stop with just in time stack there? Why not convert your Kubernetes cluster, each Kubernetes cluster into an old list mode? So you get just in time compute for your just in time function and taking it one step further, having just in time clusters themselves is actually making sure that you have a just in time stack and to end from your function all the way to your infra. So if you have zero parts running, zero apps running, zero functions running, your resource footprint and your infra footprint is zero. You're not maintaining fleet of clusters of compute nodes that are always on waiting to run your resources. Does that make sense? Yeah. So let's go back and revisit the two demo. So we see that it entered standby, the first cluster entered standby. So it takes two minutes for each state transition. So two minutes after it gets idle, it enters standby. And after that ready should get to be false pretty soon. And let's go back and look at the nodes on the single cluster. And we see that the two nodes that were provisioned for just in time function parts have been terminated. So we are back to our stable state of two worker nodes in your Kubernetes cluster. Let's watch until this thing transitions to false while I take on any more questions. I'll update the slide deck on SCAD. And I have a recorded version of both demo videos as well. So I'll make sure that I'll update it to SCAD as well. So you have a copy of it. Oh, I guess I have a question. So I don't have to run the mic to myself. So in this example, you're showing two different clusters and you used a cube control create to create a deployment that was gonna end up in one cluster. How does this work with Knative where the deployment will exist but it will be at zero replicas and there is some sort of ingress. Where does that ingress run? Yeah, so this one for the multi cluster demo, I did not use the Knative stack. I just wanted to illustrate how the scheduling works and all that. So for the multi cluster thing, we would have federated Knative stack running the deployment. The way multi cluster scheduling works is the main scheduler, which is called NOVA. It is simply an API server. So your Knative stack will be running there and it takes care of scheduling and federating your deployment. So the moment your deployment object is created and you're scaling the number of replicas, those objects get scheduled to the right workload cluster and it takes care of the networking component and setting up ingress components. So the objects are pushed down to the workload cluster. So there would be Envoy and activators running somewhere on one or more of those clusters? Yes, yeah. Yeah, and we are also looking into a Tanzu service mesh as well, trying to make sure it makes, to see if the global namespaces, all of those can be used because what the scheduler does is it simply schedules the compute object. So it has a smarts to figure out, hey, I want to schedule 80% of my workload to cluster A and 20% to cluster B, but it is not taking care of inter cluster networking. So we would need to integrate with something like Envoy or service mesh. Does it answer your question? I think so. I think you just said at the end, if you're Envoy, like your Istio ended up in one cluster and your K-Native activator ended up in another, you might not get activations working properly without extra work. Yes, yeah, that's correct, yeah. Yeah, so right now the first phase, we are scheduling the whole stack to a single cluster, but what we are looking to see is can we actually have a federation of K-Native stack that's running on the multiple control planes as well. That would actually make it a lot more highly available than scheduling the entire K-Native serving stack onto a single workload cluster. But first step, we are trying to do, schedule everything into a siloed control plane. Any other questions? Well, thanks so much. This was super valuable to hear your questions.