 All right, let's get started. Thank you, everyone, for coming to our talk inside Knative Serving. I am Andrew Chen. I'm a program manager and a technical writer. And with me is Dominic Tornau, a principal engineer. So for the past year, we've been working together to try to figure out why Kubernetes is so hard to understand. When I started as a technical writer over two years ago, I was bombarded with terms such as control plane, data plane, Kubernetes object, controller, ingress, and the list goes on and on. I had at my disposal all of the Kubernetes documentation, yet I still lacked a big picture understanding of how everything worked together. Was it because I didn't read all the source code? Who even has the time to do that? And do you even believe that it's possible to understand a software system without examining its source code? Well, Dominic showed me that there is another way. It's called systems modeling. So for the past year, we've been writing blog posts using formal and conceptual models to help people better understand how Kubernetes works so that they may reason about its behavior with confidence. So we are applying this methodology now to Knative Serving so that you may understand Knative better. And what makes this presentation a little different than others you may see is that we're not going to show you source code. We're not going to show you a console. We're not going to do a demo. We're just going to be showing you systems models. And yet you should still be able to walk away today with a crystal clear understanding of how Knative Serving works. And without further delay, here is Dominic Tornow to talk about Knative Serving. Thanks, Andrew. Thank you. If you heard of Knative before, you probably heard that Knative is a Kubernetes extension designed to manage serverless applications. This statement about Knative is correct. However, this statement does not describe Knative well. Knative is a Kubernetes extension. A Kubernetes extension is a collection of custom controllers and custom resource definitions that enable new use cases on top of Kubernetes. If you hear members of the Kubernetes community say, Kubernetes is a platform to build a platform, they're talking about Kubernetes extensions. Knative is, in fact, a collection of three Kubernetes extensions, Knative Build, Knative Serving, and Knative Eventing. In combination, Knative is not just a serverless extension, but it is a zero operations extension for reactive microservice applications hosted on Kubernetes. Zero ops, also called no ops, or more formally, operation automation, refers to the fact that most or all tasks which are required to operate an application are performed by the system, not performed by the developer. Today, we will discuss Knative Serving. In particular, we will discuss two aspects. Knative Serving is a zero operations extension for the lifecycle management of reactive microservices and Knative Serving as a serverless extension. To understand the benefits of Knative Serving, we need to shift our point of view from an architectural perspective to an operational perspective. From an architectural perspective, a reactive microservice is an individual, stateless component that processes individual requests. Microservices are located behind the gateway. The gateway is responsible for traffic management. The gateway acts as a reverse proxy, routing a request from a service consumer to the correct service provider. From an operational perspective, for each microservice, there exists at least one version, also called revision. In order to release the initial revision of a service, you have to perform two steps. First, you have to deploy the revision. To deploy a revision, you have to create a workload specification. A workload specification is a set of resources that specify how to process requests. And by the way, if you are thinking workload specification, that sounds like part specification. In essence, you are correct. Second, you have to roll out the revision. To roll out a revision, you have to create a traffic split specification. A traffic split specification is a set of resources that specify how to route requests. And now, if you are thinking traffic split specification, that sounds like ingress specification. Again, in essence, you are correct. Here, all traffic splits are probabilistic traffic splits. In order to release the next revision of a service, you have to repeat these steps. First, you have to deploy the new revision. Second, you have to roll out the revision. With a probabilistic traffic split, you have two options to do a rollout. You may choose an immediate rollout, or you may choose a gradual rollout. To perform an immediate rollout, you have to update the traffic split specification only once, shifting all requests from the current to the next revision in an instant. To perform a gradual rollout, you have to update the traffic split specification multiple times, shifting all requests from the current to the next revision over a period of time. As an operator of a microservice, you are locked in an endless, tedious cycle of deployments and rollouts. And if this dire situation does not ask for automation, nothing else does. But let's not get ahead of ourselves. So far, we talked about microservices, but we did not talk about Kubernetes, and we did not talk about Knative yet. Well, let's talk about Kubernetes first. Kubernetes is a prominent platform for hosting microservices. Kubernetes provides an extensive set of well-rounded abstractions to compose a microservice. However, Kubernetes does not provide a dedicated abstraction for a microservice. Simply put, Kubernetes gives us everything we need, but we have to piece everything together ourselves. And there are many options to choose from. Turns out the most prominent is also the most basic pattern. Here, one microservice is represented by composing four kinds of objects. A Kubernetes deployment, a Kubernetes horizontal port autoscaler, a Kubernetes service, and a Kubernetes ingress. Deployment and HPA represent the workload specification. Service and ingress represent the traffic split specification. In order to release the initial revision, the developer has to create a deployment object, a horizontal port autoscaler object, a service object, and an ingress object. In order to release a subsequent revision, the developer has to update the deployment object. Again, the basic pattern from an action's point of view. In order to release the initial revision of a microservice, the developer has to create a deployment object, a horizontal port autoscaler object, an ingress object, and a service object. In order to release a subsequent revision, the developer has to update the deployment object. This pattern is simple to implement and simple to operate. However, the developer has limited control over the rollout, that is, limited control over the traffic split specification. This diagram illustrates the mechanics of the traffic split specification of the basic pattern. Here we have one microservice that is currently in rollout. That is, traffic is shifted from deployment number one, replica set number one, to deployment number one, replica set number two. Kubernetes ingress directs all requests that are bound to one microservice, to one Kubernetes service. In turn, the Kubernetes service directs requests to matching parts with equal probability. In summary, traffic split is implicit in effect, determined solely by the Kubernetes service and dependent on the number of parts per replica set. For example, here, if there are three parts in replica set number one and three parts in replica set number two, the resulting traffic split is 50-50. To grant the developer full control over the traffic split specification, a more advanced pattern emerged. Here, again, one microservice is represented by composing four different kinds of objects, a Kubernetes deployment, a Kubernetes horizontal port autoscaler, a Kubernetes service, and an Istio virtual service. Deployment at HPA represent the workload specification. Service and virtual service represent the traffic split specification. In order to release the initial revision, the developer has to create a deployment object, a horizontal port autoscaler object, a service object, and an Istio virtual service object. In order to release a subsequent revision, the developer has to create a new deployment object, a new horizontal port autoscaler object, and a new service object and has to update the existing Istio virtual service object. Again, the advanced pattern from an action's point of view. In order to release the initial revision of a microservice, the developer has to create a deployment object, a horizontal port autoscaler object, a service object, and an Istio virtual service object. In order to release a subsequent revision, the developer has to create a new deployment object, a new horizontal port autoscaler object, a new service object, and has to update the existing Istio virtual service object. In case of an immediate rollout, the developer has to update the Istio virtual service only once. In case of a gradual rollout, the developer has to update the Istio virtual service multiple times. This pattern is still simple to implement, but way more involved to operate. However, the developer has full control over the rollout, that is, full control over the traffic split specification. This diagram illustrates the mechanics of the traffic split specification of the advanced pattern. Here, we have one microservice that is currently in rollout. That is, traffic is shifted from deployment number one, replica set number one, to deployment number two, replica set number one. Istio virtual service directs all requests that are bound to one microservice to a configurable set of Kubernetes services. In turn, the Kubernetes service directs requests to matching parts with equal probability. In summary, traffic split is explicit, in fact, determined solely by the Istio virtual service. So far, we had good news and we had bad news. The good news, Kubernetes is a convenient choice to implement microservice applications. The bad news, Kubernetes is not a convenient choice to operate microservice applications. You have to perform an involved repetitive sequence of steps for the initial release and every subsequent release. Well, meet Knative serving, finally. Knative serving automates a sequence of steps for the initial release and every subsequent release of your microservice. Knative provides a custom resource definition, the Knative service object that implements the advanced pattern and automates its operations. When you create a Knative service object, Knative automatically creates the initial deployment, the initial autoscaler, the initial service, and the Istio virtual service for you. Please note that Knative replaces the horizontal port autoscaler with a Knative port autoscaler. The Kubernetes horizontal port autoscaler scales instances based on metrics like CPU utilization or memory utilization. The Knative port autoscaler scales instances based on in-flight request count. In addition, the KPA can scale from end to zero. We will go into more detail in the next section of the presentation. When you update a Knative service object, Knative automatically creates the next deployment, the next autoscaler, and the next service and updates Istio virtual service for you. In conclusion, from an action's point of view, Knative reduces the operational burden on the developer. In order to release the initial revision of a microservice, the developer simply has to create a Knative serving object. In turn, Knative automatically creates a required set of objects. In order to release a subsequent revision, the developer simply has to update the Knative serving object. And in turn, Knative automatically creates and updates the required set of objects. Still, in case traffic is shifted immediately, the developer updates the Knative service once. In case traffic is shifted gradually, the developer updates the Knative service multiple times. Let's take a closer look at the mechanics of Knative. A Knative service object combines the workload and traffic split specification of your microservice. Ultimately, the workload specification is your part specification, specifying the image of your service. The traffic split specification is the Istio virtual service specification, specifying one revision or two revisions with a traffic split as your traffic target. For each Knative service, there exists exactly one Knative configuration. When the service object is created, a configuration object will be created with the service's initial workload specification. When the workload specification of the service object is updated, the configuration will be updated with the service's new workload specification. For each Knative configuration, there exists at least one Knative revision. When the configuration object is created, a revision object is created. the configuration's initial workload specification. When the configuration object is updated, a new revision object will be created with the configuration's new workload specification. Each revision results in a deployment, a Knative port autoscaler, and a service. In addition, for each Knative service, there exists a Knative route. When the service object is created, a route object will be created with the service's traffic split specification. When the traffic split specification of the service object is updated, the route object will be updated with the service's new traffic split specification. And ultimately, a route results in an Istio virtual service. Let's take a closer look at the workload specification of your service. As stated, the workload specification is a part specification. You have to specify one image called the user container image that contains your application, or more specifically, that contains a revision of your application. Your application must be an HTTP application processing HTTP requests. Knative injects another image into the part specification on creation of the deployment, called the queue container image. The queue container is a reverse proxy to the user container. The queue container intercepts all requests to the user container, and is responsible for collecting and reporting statistics, namely the in-flight request count to the auto scaler. Again, we will go into more detail in the next section of the presentation. So to release the initial revision of a microservice, the developer creates a Knative service object with an initial workload and an initial traffic split specification. In turn, Knative creates a Knative configuration object with a workload specification. In turn, Knative creates a Knative revision object with a workload specification. In turn, Knative creates a deployment object with a workload specification, a service, and a Knative auto scaler. Additionally, Knative creates a Knative route object with the initial traffic split specification. In turn, Knative creates an Istio virtual service with the traffic split specification. At this point, the initial revision is released and receiving 100% of requests. To release the next revision of a microservice, the developer updates the Knative service object here with an updated workload and an updated traffic split specification. Knative updates the configuration object and, in turn, creates a new revision object. Additionally, Knative updates the route object and, in turn, updates the virtual service. At this point, the next revision is deployed and receiving 100% of requests. So we could stop here and enjoy the sweet benefits of operation automation. However, Knative has one more trick up its sleeve. Knative serving is able to scale a revision from end to zero, an approach that is frequently called serverless computing. In a traditional environment, resources must be acquired before a request can be received. Simply put, in Kubernetes terms, if your pod is not up and running yet, your application cannot receive requests. In a serverless environment, resources may be acquired after a request has been received. In Kubernetes terms, even if your pod is not up and running yet, your application may already receive requests. Typical implementations of a serverless environment do not release resources after processing a request immediately. Instead, once acquired, resources are held in anticipation of additional requests for some period of time. Cold path refers to the situation where receiving a request and processing a request are separated by acquiring resources. Hot path refers to the situation where receiving a request and processing a request are not separated by acquiring resources. But how does Knative scale from end to zero? And if there is no pod who is listening to your requests, meet the Knative serving activator. For this walkthrough, I assume one Knative service, service number one, two revisions, revision number one and revision number two, and that the service is currently in rollout. That is, traffic is split between revisions. Currently, both revision number one and revision number two are scaled to zero. Now this is a fun part. When a request enters the system, the gateway inspects a request, determines a service, and selects a revision to process a request. Here, I assume the gateway selects revision number one. Since no instance of revision number one is running, requests to revision number one are on a cold path. The gateway is configured to forward the request to the activator. The activator buffers the original request and sends a request to the autoscaler to scale revision number one. The autoscaler sends a request to Kubernetes to increase the replica count of the deployment object corresponding to revision number one. Kubernetes creates a pod object and executes the queue and user container, therefore scaling from zero to one. The gateway is configured to forward future requests for revision number one directly to a pod of that revision bypassing the activator. The activator forwards a buffered original request to the queue container. The queue container forwards the original request to the user container for processing. In addition, the queue container sends a request to the autoscaler to increase the in-flight request count. Omitted in this animation, when the response of the user container is returned to the caller, the queue container sends another request to the autoscaler to decrease the in-flight request count. Now, to do this all over again, when a new request enters the system, the gateway inspects a request, determines a service, and selects a revision to process a request. Here, I assume the gateway selects revision number one again. Since an instance of revision number one is running, requests to revision number one are on a hot path. The gateway is configured to forward requests for revision number one directly to a pod of that revision bypassing the activator. Again, the queue container forwards the original request to the user container for processing. Again, the queue container sends a request to the autoscaler to increase the in-flight request count. If the number of in-flight requests passes a configurable threshold, the autoscaler sends a request to Kubernetes to increase the replica count of the deployment object corresponding to revision number one. Kubernetes creates additional pods and executes a queue and user container, here scaling from one to three. Ultimately, the same process takes place for revision number two. In summary, Knative is a zero operations extension for reactive microservice applications on Kubernetes. Within Knative, Knative serving is a zero operations extension for the lifecycle management of reactive microservice applications on Kubernetes. Knative serving provides a dedicated abstraction for a microservice and automates its operation. That is, it automates deployment and rollout. Additionally, Knative serving is a serverless extension. Knative serving is able to scale a microservice from and to zero instances in response to service requests. Thank you very much. With that, I hand back to Andrew. Thanks, Dominic. Thank you. Thanks, Dominic. That was great. So if you like how Dominic explained Knative serving, not just the content, please come and talk to us after the presentation. We'd love to hear your feedback. And I think we have some time for some questions. Just one second for the microphone to arrive. I want to ask, if I run application in Knative service, and which component to insurance that my applications is a high available? Because my application, perhaps, has some trouble. And it can be broken down in the Knative service container. So should I develop another application to fix this problem? Or the Knative has an internal mechanism to help me to do that? Knative actually does not address this problem. Knative falls back onto Kubernetes because a Knative service, or more specifically, a revision of a Knative service, ultimately translates into a deployment. So from there on, it inherits all the properties that the deployment exposes. And since Kubernetes deployment controller makes sure, or at least tries as much as it can, to bring up as many pods as specified in the replica account, Knative falls back onto the availability guarantees of Kubernetes. But the point stands and is correct. Kubernetes deployment does not give you a guarantee that it can scale to this point. It just gives you the guarantee it will keep trying. Knative does the same thing. Yeah, because our applications always was deployed in our production environment through the deployment workload and the state-side workload. But our application is a database system. So there are so many unstable problems will be happened in the container. So the Knative service is very convenient to us if we want to provide the database service to our customer. Actually, in that case, so Knative serving is specifically designed for reactive microservices that is stateless components that process HTTP requests. If a database is running in your container, then Knative serving is not the choice for you. It's not a solution. OK, thank you. You're welcome. Well, thank you. I'm a little bit confused with the concept with the cold path and the hot path. So here I'm confused about, like I say, it seems that Knative can sense first sense that there whether there is a rollout going on and then to put it into trigger that with a queue container. And then that means that to wait to finish that rollout and then do the scaling. So I mean, like I say, the rollout and scaling will they happen simultaneously? Or I'm a little bit confused about that. Thank you. I see. So actually, the queue container is not involved in cold path or hot path. The only component that is involved in cold path or hot path is the gateway and the activator. The ingress gateway is if there is no pod running, the ingress gateway is configured to forward a request to the activator. And then the activator will talk to the autoscaler to scale up. In the future, after a pod has been acquired, the gateway is reconfigured via the Istio virtual service to then direct requests directly to the pod. So this situation now, or actually the blue line, you can think of the blue line as the hot path. The queue container within the pod, its sole responsibility is to count statistics. So every time it receives a request, it increases the in-flight request count. And every time it returns a response, it decreases the in-flight request count. Thanks, sir. Super helpful. Please, we have one question over there. Thank you. There's some in the back. Hi. The gateway is configured to route certain requests to service one, revision one, and certain to revision two, while the rollout is happening, right? So now in the cold path, we had the first request come to revision one, and your user container came up. And now it's directly going to the queue container of service one, revision one. Now, say a request comes where the gateway has to route it to revision two, but the cold path is still enabled there. So to reduce the response time, does it route it to revision one? Meanwhile, no. No. Not to my knowledge. I mean, please do not quote me on that. But I am 99% sure. No, it does not. Right. So if we wish for us to have SLA of 99, like triple nine or something, and have lower response times, can we configure it to do that, though? I do not think that this is a possibility. No. It's a strictly probabilistic traffic split that does not take any other environmental conditions into account. Understood. All right. Thank you very much. I got the final warning, so I'm not sure I can take more questions up here on stage. But please do not hesitate. Come find us. We're going to hang out right in front of this room and are happy to answer any more questions. Thank you very much. Thanks for coming.