 Hello, and thank you for joining me for a load balancing one of one in Kubernetes. My name is Christopher. I'm Lucy I know when I work for IBM Today we're going to be going over how networking works inside of Kubernetes We're going to pay particular attention to how these service types align with each other as well as how Networking works to travel from node to node by way of IP tables a colleague Srini will join us afterwards to Talk about some of the key features missing inside of layer for layer 7 load balancing as well as Demonstrating a shared load balancer project that we've been working on to solve some of these problems So in our typical Kubernetes cluster, we're working with a few nodes and each of these nodes contain a collection of containers known as a pod One of the missing pieces that we're doing about is how does traffic get from node 1 to node 2? What is the missing piece for pod 1 on node 1 to get to pod 2 on node 2? This is where the container networking interface comes up and the container networking interface provides with spec and libraries for tearing down and Bringing up connectivity between containers on the node It takes care of IP address management known as IPM and Gives all of those IPs out from a range specified at startup time up to your API server and These can be either IPv4 or IPv6 addresses You'll see a variety of different plugins inside of CNI Thick plugins are normally associated with the sort of brand name like Calico or Silium and A lot of these incorporate thin plugins, which are found in the CNI plugins repository And these are things like setting up a Linux bridge DHCP or port mapping Also within the types are underlay plugins versus overlay plugins Underlay plugins run in conjunction with your existing network much the same way that switches and routers do these are normally considered a bit simpler than their counterparts of an overlay network and Performance increases can be seen by using an underlay plugin as well Popular protocols include BGP and OSPF on the little overlay side. We see a separate network created atop Your underlay thus overlay and this is because it creates its own virtual network thus segmenting your network from your underlying underlay Popular protocols for this include VXLAN and GRE So in our typical VM setup, we have one app per node and this app Communicates with other apps by going through the ETH0 interface on the machine Now in a Kubernetes sense, we have multiple pods running on the nodes So how can these pods? Work through the same node and hit pods on other nodes. Well, this is where CNI comes in CNI with CNI each pod has its own IP and each container within the pod gets assigned a unique port and Then we have a collection known as an endpoint which has all of these individual IPs representing the different pods And we're able to do this by using Linux technologies known as namespaces To simulate the one app per VM setup as we have seen before We use pod network namespaces each pod gets own network namespace with its own ETH0 And then it communicates to the root network namespace By way of a virtual ethernet set up for each individual Network namespace that are bridged together To exit through the ETH0 So in this setup we see a pod network namespace one would communicate with the virtual ETH0 And that would hit the bridge and then flow out through ETH0 interface onto the node two So we collect together all of these individual pods into an endpoints object As we can see here for the kube DNS pods we have two different nodes and On each of those nodes has a unique IP address representing that pod And we have two different lists inside of the endpoints objects for ready endpoints and one for not ready endpoints That have not passed their health check yet And we can refer to these using one VIP By way of a service Similarly to the endpoints we're selecting over labels on pods So for the front-end service we look for all of the pods matching the front-end label And we're targeting individually the Port 9376 Which will be tacked on to our cluster IP and will points to port 80 inside of a pod Because we can have multiple containers inside of the pod so we want to make sure that we hit the actual web service application and not something like a logging agent and This is by way of the cluster IP which gets assigned From a range past at runtime to your API server These are reachable only with inside of your cluster So it Thus cannot be hit outside of your cluster unless you've bridged your way in Another example is the node port service and this opens up a port Specified by whatever you put in for node ports on each of your individual nodes such that I can hit the public IP of that given node with this node port and I'll be directed to To the given Cluster IP resembling the pod that I want to hit at its given port these things wrapped together And then the final way to do this is with the load balancer service These are normally fairly cloud specific, but in the end normally you get a TCP load balancer from your cloud and it is assigned a Publicly addressable IP as we can see in this example starts with 969 at the bottom And we still if we look into The example here we still see node port and target port in 80 because all of these different service types get wrapped up in the end So when I hit this public addressable and IP address That resembles the load balancer. I get bounced to a node port Backing the given service that I want to hit and then to the given cluster IP that I want to hit There are certain cloud providers that allow you to go directly from load balancer to the given pod and Also, they also have features like the pod ready plus plus application to make sure that I've tested does the load balancer Successively reach the given pod if not don't add it to the load balancing pool The one thing to note here that we'll discuss a little later is when I'm creating each of these individual services I'm getting a load balancer every single time for each of these things Now, how does all of this traffic get there once it hits that given node ports? What if it's not running on there? This is where the crew proxy comes in and it's a component that runs on each of your nodes and All that does is sit there and it waits for service creation requests that come into the API server and Then in the default mode It'll create IP tables rules to direct this traffic in and out of nodes Based on the cluster IP and the node port and all of that There's a couple of different modes though aside from IP tables as we mentioned IP tables is the default and All this really does is creates a series of rules and then that pre routing hook of IP tables And this is a little bit simple. It's pretty simple to debug But I Put a trademark symbol on there because this is really dependent on how often you've seen the IP tables output and In the IP tables mode the output could get kind of messy. We'll go through an example of this a little later Because IP tables creates a rule for each of these individual back ends Representing the pods and you could have multiple rules depending on how many nodes this given pod is running on The algorithm is really more of an O of n type of problem To offset this a bit the IPvS kernel module is used in the IPvS mode and this is specifically Suited for load balancing. It's got a more constant cluster lookup size and the API Also offers several specific load balancing algorithms that can assist you with load balancing or traffic like around Robin least requested Or shortest distance Be sure that you're looking into it does your CNI plug-in support IPvS because well, it has existed in the Kubernetes releases for a while some CNI plugins are still not available to leverage it. They're relying on the IP tables mode General advice on which one to choose IPvS has been shown to scale a little bit better in terms of CPU time As well as round-trip time when you're getting above a thousand services However, you only see a modest increase above IP tables if in the IP tables mode you just make sufficient usage of application keep a lot of connections Let's delve into the IP tables mode though because it is the default Because this mode essentially just writes out IP tables rules We could determine how we're going to reach our services by inspecting all of the IP tables rules. So IP table Save spits out a ton of information Most of it pretty scary But it in the end just gives us a linear list that gets followed until matches for our individual services are found So let's start with looking at the kubectl get on a given service of Kubernetes dashboard In this example when we look through our IP table save dash l output We get this rule here for kubeserv that satisfies That we want to be hitting the kubeservs services service range there with the 172 address We're matching on the TCP Option and we even have a successful comment for what this is supposed to be used for we're saying this is the kubernetes dashboard cluster IP and then We we want to match all of the TCP's traffic destined for this given service and Jump to kubeservs XG blah blah blah blah So let's take a look for that kubeservs blah blah blah in the output That next jump leads us to yet another separator and this line in the end directs us over to The final destination of 172 30 the cluster IP of the service which we can tell in the end matches if we do a kubesitl pod describe and The D net they're seeing in here is actually the destination network address translation that happens When we're jumping around between these given pods Now you might be wondering why do we need to jump through all these hops instead of just pointing to the first line in the IP? Tables rules that directs us to the given end points. The answer is a lot clearer when you have multiple pods. So let's take a look at a Service node local DNS that we know as multiple pods spread out across multiple nodes You first look at the cluster IP for the node local DNS pod and find multiple rules One for each protocol. We have TCP and UDP if you look at the dash M flag, let's trace the UDP protocol jump So once again our grep shows two different rule sets and we see more options beyond the simple jump that we saw in Kubernetes dashboard So let's just trace One keep tracing that UDP one and this option We have a random probability mode of IP tables and this uses a random Number generator to cause 33% of the traffic to hit one end point and the second rule 50% of the time will hit a different end points Now on subsequent request contract will be able to remember and for their quest over the same connections So you're not going to get multiple things hitting multiple different backing pods And if we continue to follow that 33% rule we line it on our final rule Which just denets the traffic over to our destination endpoint of 172 So in summary each service will have a kube service rule for each different port And we'll also see a number of kube service hash entries with various endpoint weights for each port Each port endpoint will see a small number of kube separator hashes with a denoted pod endpoint depending on how many different nodes this is running on and That exact number can also be influenced by The total number of end points and whether you have a bunch of node ports or load balancers in the way So we can see a huge chunk of IP tables was dedicated to this We can also Use DNS to refer to these things we don't have to remember these cluster IPs and Kubernetes provides this with core DNS The pod service IPs is Stable for core DNS, so you don't need to constantly be killing your cache in order to hit the Possible IP that you want and the final Method that we can use for reaching things inside of our cluster is ingress and ingress operates at the layer 7 level popular controllers to provide this or HA proxy and ingress engine X the current API is a Little limited in scoped in order to have maximum portability at HDP level But there have been enhancements to specify this a little and make it a little clear with the version 1 offered in Kubernetes 1.19 But here's a basic example, which you'll be familiar with if you've seen engine X or Apache configuration And in this example, we're focusing on the my app Service which is backed by something listening on port 80 and we want to once we hit this ingress controller specified by an IP address at The special path directory there. We want to forward on to the my app Now let's move into the second part of our presentation with Srini to talk about Ways that we can get around some of these limitations where we have multiple entries for these things per service Hello everyone Thanks, Chris. Chris has talked about networking in Kubernetes And he also talked about the services different types of services like not port or load balancer as you can see here Where there are services are used to connect to a back-end and forward or the RL sound workload. They are dedicated but with L7 we have ingress controllers, which will allow you to Have incoming traffic connected multiple workloads in the back end. That means we can share the single connection There are many popular ingress controllers like engine X, odd Y, traffic etc. But if you look at L4 something is missing here I would like to demo a solution to the problem we have here How can I expose my L4 internal workloads in a shared way using an L4 ingress The problem as you can see a user has to know how to connect to load balancers for service A or service B to connect to the back-end workloads on Pali0 and on L4 B1 This is not a hypothetical requirement We are trying to solve our internal team require To have such a shared connection for their L4 services to minimize the cost connections And wanted the application to be portable Primary variation is cost of course We wanted our solution to be user-friendly so that I do not have to remember many IP addresses of all the load balancers I am creating And also have a uniform way to manage the infrastructure. The main problem can be broken into three simple problems How do I open additional ports on a load balancer to make it share? How do I associate the ports to the back-end part? How do I give this accessing information back to the end user? For example, the user should be able to create a simple Kubernetes object to get the IP address in the port of a well-known service that he wants to connect to So if you look here, I have a customer resource object called shared-end B SLB-inchon The expected goal is to use information through kube-control and you and be able to connect to your application and No balance of the intransparent. You see the custom object here called shared-end B That is providing connectivity information you need and also refer to the cloud infrastructure set No balancer just in case Like shared-end B has four instances here four instances of using the same external IP Which is IP of the load balancer With different ports 4001 connects to a back-end application 4002 connects to a different back-end application To simplify the view now we have instead of two Load balancers. We have one shared load balancer with two ports, port B and port B connected to the different back-end applications Now my application is sharing the load balancer. This is more cost-effective User-friendly minimum operation efforts reusing existing kubernetes assets without reinventing the way consistent with kubernetes programming model Let me explain this in detail A load balancer incoming port is connected to a north port of an internal service Which we create to a target port of the part which is running the workload Three things are happening here We derive the information from the custom resource object that the user created and we create a service community service in the back-end And we create or use an existing load balancer by just opening a port or And then associate the service with the load balancer so the entire connectivity happens for us So as you can see there are five steps here in the diagram step one user creates a shared lb with the information we need Using that we either find an existing lb load balancer or we create a new load balancer And we create a port on that and we also in step three create a a normal community service with source ports and target ports and proper label selectors derived from the spec of the mcr object we created And then this is connected to the back-end workload Using Kubernetes mechanisms And once we do that in step four we get the information about the ip and the port that we have created Like put yeah And put that information back into the custom resource object in step five user Rescue control command to get that information from the shared lb We use cloud providers sdk to do things like open a port Configure security groups incoming rule to make firewall happy to pass through traffic to this port Create internal service That uses Kubernetes network to talk to the workload Making sure that traffic hits the internal service using our lb port roads Um, we are using crds as a facade for the end user and namespace crds are used. So there are no security concerns Create real load balancer on-demand basis and manage the lifecycle of it we make n Configurable which is the number of the capacity of the load balancer. So you can by default. It is fine You can have five connections on the load balancer or you can tune it to whatever number you want Depending on the criteria on the truth of your workloads or you know The latency and the workloads, etc We adopt all the best practices um for the For the controller like the controller ref Analyzers, etc to clean up the objects. Let us do a demo. I mean our Solution works on all the cloud providers, but right now I want to show you on Google cloud We have a three node cluster on gke And we have uh, let's create four deployments four workloads running in gke We have a definition of the shared load balancer on gke and now If you look at the shared shared LB customers to subjects there are none So we need to create four for the four workloads So we created all the four at this point in time our controller reacts and creates load balancer How many load balancers do you think it created? It creates only one even though we You pushed four shared LBs to be created concurrently The controller is smart enough to create one load balancer for you If you go to the gke console And look at here for the load balancers There is one load balancer created And of course we create this load balancer with one port default dummy port For creation we need that so two three three three three is used as a report. We can ignore that But now we have we started crossing these four shared LBs So that means we are going to open four ports And as you can see two of those shared LBs are already processed and you got the IP address associated and port numbers And associated services will show you the port numbers we created on the load balancers 31 725 30205 31 878 31 174 all these ports are created on the load balancer you go back to the gke console and it's Refresh you see all those four ports are open on the load balancer Not only that we also created a firewall on this Network so We open All the ports from 30,000 to 32,767 for both dcp and udp We we can only do One firewall rule for each of the ports we open but this way you know, it's simple. I will have one rule And as you can see there is only one external ip that we are paying for With four forwarding rules for four workloads that we are running inside our kubernetes cluster so Each of the sports correspond to one workload And in the cell piece now, they're all populated With the external ip and the port so you can cat any of these ip's and ports and you should be able to lose the back end application Now going back to Our presentation As you can see Now we have presentations about Similar demos available for azure amazon and ivm clouds But in the interest of time, I would like to conclude and our information is provided here if you are interested in this Solution you can You can reach out to us With that I'll open up For questions. Thanks for listening to us. Um, you're grateful to be here I'm grateful for your questions. Thank you So we're around for the next few minutes just to answer any questions you guys might have About the presentation. We have one question here in the chat about what the potential call savings of This approach would be and it's it's really dependent on what cloud you're using A lot of this is based on Prices that get charged as referred to number of load balances Load balancers created or if you have a significant amount of services So it's it's really going to be dependent on your cloud provider. So it's A little off to try to figure out Exactly how much cloud savings or exactly how much dollar amount it would be Um on the second question to see Can the share look balancer seride you made to work with a bare metal infrastructure? Yes, it could assuming that you are using Some sort of bare metal infrastructure that provides apis for gathering the sort of information Such as in the open-stack world, there's different types of apis available that some people have ported over to work on bare metal as well So if there's an api one could be able to do it Another question, what's the performance of the integrated load balancer? As soon as running in a kernel space How to control and gain to use some level performance So this is also going to be dependent on the load balancer Of your choice depending on the cloud providers. It's all we're really doing With the crd is talking to the load balancer that you already have It's going to be the same level of performance as Which you're already getting from that cloud provider. We're just slotting in more information Fourth question on running a private infrastructure There is 4 million requests per second private cloud And they're worried about you know, what what to do since they don't have infinite capacity on the load balancers What technology can station multiple load balancers like nginx to distribute them down across over all the load balancers This is kind of a general kubernetes question You can often have like an overall load balancer Atop some of these things To distribute the request there's nothing in kubernetes Natively that does this it's almost more of like a layering approach such that You can distribute it out With like a master global load balancer or you can use a project similar to to istio or roll your own on boy So that you can have more of a global view and space it out Based on your requests or whatever shaker breaker policies that you have set up for that So the four questions if you get follow-up replies or questions Please visit us in the number two spotlight of networking A select channel and can answer more questions in there