 Hello, everyone. Thanks for coming. My name is Benjamin Elder. This is Antonio here. We're both senior software engineers at Google and Our talk is on keep calm and load balance with kind So what is kind kind is kubernetes in Docker? It's a tool that we built For testing the kubernetes project initially It uses Docker containers to simulate nodes so you can run them locally and it can build and run kubernetes from source As long as as well as some pre-built images it boots a cluster in 30 seconds because everything is packed into the Container and running locally This was really important for us for the kubernetes project to have very cheap fast local testing because we need to test Changes the kubernetes constantly. It's minimal, but fully conformant once something is simple and streamlined and flexible It has multi-node support which is required to actually run all of the conformance tests properly because there's some tests are unrolling node behavior and It has persistent volume support Everything else comes from kubernetes core itself and the rest is just getting those to run We have a very very minimal lightweight networking agent called kind at D It just ensures that pods are routable between nodes with the minimal simplest CNI There's no load balancer and no ingress. We have bring your own and some docs for that So that's where the project is coming from It looks a little bit like this. So you've got Docker running on your host Within each Docker container for the node we have system D because we need an init process for kubernetes We're running container D and we have a bunch of images for all the kubernetes project and the binaries We run kubernetes and our kind net D pod and everything else is standard kubernetes So you have some coordinates pods to proxy to controller manager at CD Cube API server and your user workloads And these are in nested containers. If you're familiar with Docker and Docker, it's the same idea in Kubernetes CI we actually run on kubernetes. So we went up with Container D and Docker and container D It's it's it works. I don't recommend it, but it solved our problems So as I said, it supports a multi node, which looks something like this Spread across if you have a single node, we'll just untaint the node and you can run your workloads there And for most application testing that was appropriate, but for testing kubernetes. We really need to test behavior across nodes So it looks like a pretty standard kubernetes cluster. It's powered by kubernetes. You've got a control plane node and you've got some worker nodes standard Kubernetes for networking We have some really simple stuff as I said It's a docker bridge network and we're using v8 pairs with very simple standard CNI plugins And then just a small agent that handles some things like IP masquerade that are necessary to get Traffic working in and out of these nodes You can see the IP tables fire here If you've ever worked with IP tables containerized, you'll know what we're talking about when you Run IP tables the API is some user space binaries There's not really a kernel API you're supposed to use but it kind of needs to match and over time It's actually moved away from being backed by IP tables to being backed by enough tables and You really need that to match what's happening on your host. We have a little bit of magic in the entry point code developed with the core kubernetes project to Detect which one is in use and switch to the right one and try to match Three layers down from your host to the kind node to the cube proxy running inside the kind node a little bit of fire But it works pretty simple for the most part So since 2019 Antonio my co-maintenor and I have been talking about we should really have an Actual load balancer implementation so we can test these other things like we started with the conformance tests That got us a lot of great coverage But there's a lot of other for kubernetes functionality that people want to be able to develop and test and load balancer was a pretty obvious gap And I'm gonna hand off to Antonio to talk some more about what we did there Thanks, thanks, man. Okay, let me well as been said the kind Become more and more popular and and kubernetes become more and more and more complex right? I'm the elincing network too. So I use kind a lot and As you may know when you create a service you can have a service type load balancer What was the problem with this the problem is that for being able to test it or to just to use it for kind users when we don't have a good solution, right? so We needed a controller and something that that creates this load balancer that configure this external IP and that Make this accessible from from outside the kind cluster we have a Good solution and temporal solution that is metal ray, but the problem with metal v is that have several gaps right one is that it uses If we go to the technical part it uses our L2 or L3 to be able to expose the virtual IP for the load balancer Right and this has a hard dependency on the linux bridge So users need to use linux to be able to use that and the other problem is that You don't have a lot of the features that the cloud providers load balancer have so When we were developing in in sygnet word is kept the 90s the 16 1669 terminating in points I Started to struggle with how are we going to test this because the main point of these Features to user to be able to have Rolling updates with zero zero Disruption right so this is a very tricky feature to test with a lot of timing a lot of race conditions and with multiple moving parts this this feature is based on on Is you understand how the pod lifecycle works? You know that the pod has several states right the pod start running But the pod is not ready the redness of the pod is what indicates one The pod must be used as an endpoint of the service that means that is signaling Every network infrastructure that say okay. This pod is ready to receive traffic Because until this kept the How can I say the the until this kept them points were binary So all the pod can be ready or not ready But the problem with this is that when one pod was terminating It is still be able to serve traffic until the new pod is coming right so we need to add a new condition to the pot That is the pot serving and terminating right and this allows to implement terminating in points the way that this work is is It's a bit complex. This is kind of a diagram and explaining everything so You have a pot the pot the start running the status of the pot is published by the cubit in the API server, right? The there is another controller the point is like controller that runs in the Q controller manager This controller is watching for the pods continually So when you configure a service you have a select or just let all the posts and this controller is starting to watch Okay, the pod is ready. Okay, you can send traffic to this pot When the pod is terminating Before this kept the pod was removed from the endpoints What does it means that the pod is still able to serve traffic? There is going to be a gap between the pod is terminating until the new body running where the pot is the traffic can be black hole so To be able to implement in this diagram so to able to solve this problem You need to coordinate different components right one is and the more important component here Is the external load balancer the way that it's work is with The health checks of the load balancer if you see this diagram you have a Typical high value high availability the primer right you have your deployment and you deploy one instance in each So right the load balancer is health check in the the pods It's it's checking the Service health check not board to see the health of the pod and he's sending traffic traffic to the one that are available So so far so good When you are doing a rolling update Usually, let's go to the simple model that you say, okay I'm going to make only one and available at a time so Then the pod start terminate right But meanwhile the pod is start terminating the load balancer need to detect all these bodies is still not ready So the health check has to still be failing but traffic has to still going to this this spot to to be serving this To fix this race, right? I think that at the t0 the pod start terminating But the load balancer is not going to notice that it's unavailable until the health check fail So this is the key of the feature and the complexity and that's why we need to develop this solution because Doing this with a load balancer in a cloud provider is a need to eat test that can run for minutes and it's very racing Once the new pod start running the health check start to be Serving again is start to be green again and the load balancer keeps sending traffic to it So this was the the main motivation right there are as you see there were a lot of issues open during the time but we never had this a strong use case to invest time of this and What we decided to do is okay. We want to test cloud providers We want to and also the our users like the cloud provider experience because it's more real they didn't they don't need to do work around with the poor mapping with the Redirecting and all the staff. So let's try to build this cloud provider kind and What the cloud provider kind is is doing is just mimicking the operation of a normal couple other, right? So he detects the service when he detects the in is a load balancer spawns a new load balancer The implementation right now is an H.I. Proxy container that runs in the same Docker network So when you create this container, it has an IP from the Docker network So the the users can use this IP to to connect to the load balancer The other important features that we wanted to implement is just thinking in the future right is let's see if we cover more use cases for Load balancer and maybe multicluster right so we decided to keep Iterating on this idea of okay. Let's try to do our own crowd provider and What we implemented is an out of three? And how a component that runs out of the cluster that is able to handle multiple cluster at the same time in the future this can be used for implementing global load balancer for example, or I don't know I mean there are a lot of Ideas that we want to hear from all of you or the people that has idea Please come to us and talk about what are you are use cases and we we we think about how to implement them So I prepare a demo. I mean this is super Okay, let me find it. I prepare a demo a demo For showing all this complexity So let's see So you create I created the kind cluster already so and get clusters So you have a kind cluster with three nodes, okay Let's see that I didn't have any left over. So this is empty This is empty. Okay. Oh what usually do I have this deployment. Let me Is the screen good enough or should I assume? make it bigger, okay, so You can go to the repo and you have these examples on how to use it. You can see I have a deployment, right? This is Typical deployment that you should use for a Serial and tan if you are worried about availability and you want to roll out applications in We'll see that on time. You need to to think in using these Features for your deployment, right? So important things that we have here We want to define the rolling update Strategy, right? You define the maximum number that you want to be an available It is in that you have a problem or something you don't want to roll all of them at the same time Another important piece here is the termination grace period seconds What this is doing is that when the pod is terminating The kill that this is still going to give him like 30 minutes to finish to doing things in this case What we want is okay. You are terminating the same is it start to Graceful terminated TCP connection. Just don't break all of them Another important thing is we want to have some anti affinity I mean there are different options to implement that but ideally you want all your replicas to land in different nodes, right? Because you can hide availability if you put the pot in the same node This is going to be a single point of failure And your application need to be able to handle this termination Period right so it's you are going to receive a sick term at the time you need to start to do the cleanup So it's important. This is not magic There are multiple components working here and your application need to be aware of this and play well with them The another important piece is that you need to to define the standard traffic policy to local This is important Is you know when you have a standard traffic policy you can have two standard traffic policy Right, this can be local or cluster when you have cluster the load balancer will Send the traffic to any node in the cluster And then when it reaches one node the q-proxy whatever service implementation you have is going to bomb Bounce it back to any node in the cluster So you have two problems because of this one is performance because you are doing a double hop on the node And the other problem is that you are going to lose your Pre-use source IP. You are going to see the node IP That's something that most people don't like especially if you need to be Want to have some sort of telemetry on your application about who is using your application? okay So far so good. Let's apply these Load balancer deployment. Okay, what did you know? Okay, I've been nervous today, sorry It's good I think okay, so now we are going to see Now, I don't know there has it's going to be a problem. Let's see who is put it first So we have the pause running I will have the service So you see the problem, right? How do I reach this service? I mean I have my application But I'm not able to reach the cluster AP. I'm outside of the cluster. I need to reach and this is when Cloud provider kind comes to help us. So this as I say, this is a binary that rounds outside of the cluster Let's put some verbose it right and everything happened to fast but I'm going to fast forward and explain you what's happened so It detects all the nodes it detects that there is a service it detects that the service is Stenotraffic policy local and that is a load balancer and what it does is okay I'm going to configure create a low load balancer. Let's see here That's this container right h a proceed whatever we can Use another thing, but this was handy because we already had this image and then I'm comfy I'm going to configure this h a proceed to be able to connect to this Important in the important thing here the health check That's why I said before we have the metal be why you don't use metal be metal These are three or two load balancer. It gets the IP or nothing else with this we have this granularity We can configure health checks We can define a different port for the health check because if you know when you have a service load balancer with Stenotraffic policy local the system already provides you with a What's the name not poor health check Health check not poor that's a special not poor that is able to answer How many endpoints are alive in that node, right? And this is what the load balancer is going to use to detect it. It should send traffic to that node or not Okay, so now we should have the IP You see we have the IP one and One I to one six eight in three so and let's see now We are going to inspect the container of the chip proxy and this has too much And you see so basically is how I Mean over simplifying but it is most more or less the techniques that the cloud provides is used to to provide your load Balances when you are in the crowd, right, but it's Okay, now let me clear this so We have the pause We see that we have a different pod in Different node Those are the names. Let me get the IP and we are going to query the IP And then as who are you I I Need to do the cooking because of the way. Okay. This is big key Wsw that is this one And if I came querying I got the other one, right? So the load balancer is working. That is great, right? And what you're saying I can do that with the material be right. You can do that, of course The problem is as I say I was super worried about this terminating in point features And I needed this well tested because it's critical We are talking about people that is worried about losing one or two packets So this has to work like a clock. So what we are going to do We are going to emulate exactly the same behavior that the people that won't sit on time one so Let's emulate a basic client pulling the load balancer Okay, I don't know what you did I do Okay Now you need to trust me a bit, but you see that things changes, right? So basically every time is hitting one of them. So what we need to do. So this is the scenario, right? We are pulling the load balancer Ideally this is spreading the load equally so 50% of each request should go to one of balancer But now it's the keys. Okay. I have a new version of my application, right? Get deployment zero So and Let me find the container image Some of these places. Okay. So my application is 2.39 and now I have a new version that is 2.40 And I want to roll out my new versions without any disruption. So how do I do that? Well, I don't remember the command exactly. So I'm going to Go to the history Okay, you see basically what I'm saying is update this deployment to upgrade my image to 2.40 and let's see what the start to happen here, right? So when we see here get bots You see, okay, there is a pod pending and another pod terminating, right? I have a great period of 30 seconds. So at this point You see the load balancer detected all the one application is terminating So let's go only to the one that is available, right? It keeps terminating it keeps the stuff that they need to do but I keep sending the traffic to all the only one that is alive So when we reach the 30 seconds We should start to see the point. Okay. You see this new application the 2.30 is start to be created and Both start to receive the traffic, right? So everything is good. The load balancer detected everything is good And now again this is start to be terminated and all the traffic is switching to the to the good Let's say the good version If we can keep looking after some seconds when all the rollout is complete Everything has terminated and the new application is running Now don't fail me. Okay. You have everything working What this is One of the main motivation the other one that we want to keep developing is here more ideas We know that people from Mac and Windows want to use load balancer. Just know that we have You in our minds, but it's much more complex than you know, and we will try to solve that problem And I know I think that's it. Did you have any questions? Just nice If you have questions there's a mic here The program started that created the load balancer Do you want the code tell me you want to see this? Yeah, exactly What so I guess it's talking to the local DNS to grab an IP? So it's another new IP from the local network going on the post Okay, that's from the Docker where is the IP grab from One second. Let me sell So Is that already? Yeah, yeah, this this this cloud provider kind is an official Kubernetes project, right? This is now testing the CI. This is when you When you use this feature this program is testing that we don't regress So the way that this work is this cloud provider kind has to query to API Right is the API of the control plane of the cluster and the API of the Implementation of can in this case Docker, right? So you go to the Docker and is able to create containers So when you create the load balancer, you create a new Docker container that is a load balancer So this IP is from the container that you just created So it's Docker in this case who is providing the IP so the controller is able to glue these two things the Infrastructure provided provided by Docker and the Kubernetes services cluster We've also been saying Docker this whole time But podman has nominal support for this one and nerd cuddle support is landing in kind So we'll probably look at nerd cuddle supporting cloud provider kind Node It's the bridge network that the Docker nodes are running on so the so the can so there's containers for the the kind cluster nodes They're running on on a bridge network. It's the same bridge Yeah, that's been saying we want to make this portable, right? We have podman We have Docker and we have now nerd CTL, right? So that's that's what we are trying to do We try to abstract the the controller from the infrastructure, right? Not to reuse this API that dr. Pullman provide to us and we just fetch this information and we use it for for the different implementation They're always using a chip proxy would say I want to use something else It's doable look is when I started this I was having a great fun, right? So the first thing that I try is okay I'm going to do a super load balancer and remade myself with Golan whatever and When I spent two days and say, oh, this is super comfy Let me try to get the chip Russia working and fancy. So ideally I would love to have something more fancy, right? this keeps the things working and we actually We didn't talk about kind has some HA control plane support for testing Kubernetes It's not very mature, but it's there enough to do some so we already have in the core kind project We provision an HA proxy container for similar purposes. So it was easy to start with but If there's a really strong demand, we can look at something else. It should just be an implementation detail though The It's template and now it's templated but I mean it's We had a cheap proxy if historically we just reduce it the things that we have It's just implementation It's not to you, but anyway, I want to ask so basically you create a support or you create an application that Making available to test application or services that have type load balancer and I know kind is Does not have currently network policy support and the official documentation Suggest to use another plug in I Yeah, like I said when we started we're looking for this is the set of things that you're expected every kubernetes cluster to have network policy Unfortunately, it's not one of those things that is actually super widespread And there is no implementation in the in the kubernetes project and I Would say something of a little bit of a loose API. So it hasn't been a focus yet Let's try to keep contested when kind of start to have this company which senior C&I are you going to you? I and then I want to use calico. I want to use senior. I want to use Friday I want to use with it. So then they say, okay Let's do our minimum senior C&I and then we have a knob to say disabled C&I and you can install Calico, see you in Flanet, whatever right if you go to any of these CIs they are using kind So you can install see you in Calico in kind and use that right the problem is with the fittest as you say is Are we going to implement a whole network policy demon in our C&I? I can't tell you I implemented network policies in different C&Is It's a lot of work and very complex. So what we decided is okay Let's keep the people the minimum and let's keep the people that build C&Is Yeah Opportunity to you know hook into our project and use it so the people that want network policies They can install the other C&Is and use it It last question because Thank you for kind So it's a little bit of an open question We wanted to start as an out of tree project because another thing about kindness it has extremely minimal Dependency set people embedded in their test runners their cluster API tools that sort of thing if you look at the main kind Module it imports 12 packages So we wanted a little bit more flexibility to iterate on this The other thing is as in Tony mentioned we're interested in being able to test things like multi cluster So now you need it to be provisioned once across them So it kind of has a separate lifecycle from the cluster if you think about it like it's like your cloud provider, right? You don't bring up a cloud every time you bring up a cluster. You have the cloud running and then you bring up the clusters Yeah, we're just trying to enable the Kubernetes projects to test things and there's some multi cluster stuff in the in the Kubernetes org Okay, well, thank you very much and you have more questions. We are going to be around