 All right, welcome everybody to the March 1st OpenShift Commons briefing. My name is Paul Mori. I'll be your host today I have with me Yuri Tsarev from ABSA group and he is going to talk to us today about the Kubernetes global balancer So why don't you take it away Yuri? Hi, thanks a lot Paul. So I'm Yuri, I'm principal engineer in ABSA and I'm part of the platform engineering team and we are mostly focused on building advanced automation on top of Kubernetes. And one of the open source projects that we produced in 2020 is the Kubernetes global balancer. So roughly around the concept. Guys, can we go? Yeah, we can hear you. Yeah, yeah, sorry, there was some. Go ahead. Yeah, yeah, thank you. So this project is originated out of the need of global load balancers that is Kubernetes native and pretty much cloud native. So we tried multiple vendors, none of them work for us well. So we decided to develop one global balancers from scratch, but this global balancer is not a standard approach for load balancing. It doesn't pass the traffic through itself. It's actually modifies the initial responses on fly and it monitors the Kubernetes primitives from inside the cluster. So it is developed as an operator pattern like this point from Corez back in the days. It doesn't have any single point of failure. So a controller is installed on top of the target clusters where work load are running and there is no control cluster. So there is no single point of failure in that regards. And everything is built on top of standard Kubernetes primitives ingress services associated endpoints and down to aliveness and readiness for props. So the core of the operations DNS protocol and as it's running internet is pretty reliable. It obviously has its own limitation, but for global global balancing scenario, it works pretty good. And we built, we try to build the KGB this way. So it's as much independent of environment as possible. So meaning that we relying on environment DNS, for example, we call it edge DNS. It's route 53, for example, or info blocks now on print scenario, or NS1 as we have another integration. So we are configuring only the DNS zone delegation on a input block. And the rest of the DNS responses are served dynamically by KGB itself. So from implementation standpoint, we use the operator SDK SDK and it worked very nicely for us and it allowed us to bootstrap project pretty quickly. Another integral in the part of KGB score DNS, that's exactly the tiny part that serves the DNS responses dynamically and basically steering traffic to desired clusters according to load balancing strategy. External DNS is taking care of communication with external DNS providers as I mentioned. So it's as a route 53 NS1, info blocks and maybe something else in future. So these three we already very well tested. So this part takes care of automated zone delegation. And we also were using at CD like dedicated at CD cluster and associated at CD operator to actually populate local at CD database dynamically so external DNS would read information from dynamically for that dynamically populated DNS and point CRD. And external DNS would locally create the at CD entries. So for, and eventually core DNS would read from this at CD. It's so called the SkyDNS backend. So it worked like out of the box. We would do less community procedure here, but it had quite amount of reliability and maintainability problems. So for example, at CD operator is dropped by community completely at CD itself and it wasn't working reliably enough. So, you know, long running clusters we were finding at CD in a degraded state from time to time. So it was a problem for KGB reliability itself. So that's why in a recent version we dropped at CD and CD operator completely completely and we replaced it with a core DNS built with a custom. Kubernetes CRD plugin that we developed recently. So core DNS in our case can read information directly from DNS and point. I will show everything like in a demo how it works. So now we can create this dynamically morphed to DNS responses. This information out of DNS and point CRD directly from communication of core DNS is basically Kubernetes API bypassing unnecessary layer of at CD at CD operator and it made whole setup much more reliable and everything is driven by a single CRD of GSLB type. So we have quite amount of integrations. So this is the DNS provider some mentioned the info blocks we carefully testing like not testing the using in the bank and that was the first provider we implemented so we'll fill our business needs. Route 53 using AWS reference setup that I will show you today on top of two geographically dispersed DS clusters and NS1 as our future integration. We are very close collaboration with these guys. They are amazing. And we have very nice open source cooperation with Admiralty project which is a multi cluster scheduling and it works very nicely where Admiralty schedules workloads on top of multiple clusters and page B basically enables a load balancing for them. So we have a associated. Tutorial for that on both of the on the pages of both of the project. So you can quickly go to the demo. So just to quickly on a on a project website so it's KGB IO and associated you have obviously so just to provide a context between the demo so we will run to a case clusters. Yuri and I hope if you just bump the font size up or zoom in on that picture a little bit. I'll try. Is it somehow better. There we go. There we go. Oh cool. Thank you for that. Yeah. So, we will run a demo setup similar to the picture here. So to geographically dispersed clusters. One is in Europe and Dublin another one in Africa Cape Town. Both clusters page B is already pretty ployed and there are some simple workloads and we will work with GSLB customer source definition and customer source definition pretty much works. It looks like like that it's on the index page so we have a kind of GSLB obviously have a special API. And we have embedded ingress resource type so basically it's kind of the same respect as a standard to bring this one. It's actually behind the scenes. Under the hood it's exactly the same goaling type and we embedded into GSLB. And we specify this pretty standard to also host and backend service so and the controller will be monitoring its healthiness. And on top of that we attaching the adding the specific load balancing strategy. So let's go straight to the demo I guess. Is my console visible or good? It's visible to me, yeah. Cool. All right, so what we're running here. So two clusters. As I mentioned, you just will check these get nodes the geographical locality. Okay, and I'm surprisingly logged out. Sorry for that. Sorry, token just got expired. Wasn't very lucky. So yeah, we are in Europe. The second cluster is in a Cape Town. So we see it by this text right now, you know, not names. So on the right pane, we just running continuously this stuff. It's just a demo script which continuously pulling the associated FQDM. So it's just while true, right? And it grabs the message because the sample application contains the geotech to actually demonstrate where it's located. So it's super simple. So we continuously pulling it currently everything is healthy and located in Europe as we can see by a geotech here. So let's investigate the testing setup. So first of all, how KGB setup looks like. So controller itself, core DNS, and external DNS, which is taking care of the three side of configuration. So effectively on delegation. So as you can see pretty minimal footprint as of now, especially after the good read of at CD operator and at CD, especially the clustered part. So for testing workloads, so we are running in test GSLB namespace special workload, which is like standard for the info, which exactly returns this response with a geotech. And we have GSLBs. So this special GSLB failover one, which is actually deployed out of the spec, which is very similar to to the ones that I showed on the index page. So again, failover test KGB IO as a main cost backend service, which we are running in a testing namespace of test GSLB here. And failover strategy and we are pinning European cluster as the main one. Before the failure occurs. So in case of some malfunction. It should be failed over to African one to a kept down. So, in a, in runtime. It looks like this. But better to specify. It's fully with the status. So it detects that the service is healthy. The ones that is running in test GSLB namespace. The ones that is that is front end put info and the ones that is referenced in number embedded in respect right and it populates the DNS endpoint. With the IP addresses of. Of a load balancer associated with the, with the workload ingress. So if we will get ingresses. And a specific test GSLB. Failover. Now, we will have in a AWS that have that we are running associated and I'll be so if we dig this and I'll be. You'll see this addresses and that's exactly the ones that are getting populated. In DNS. By page to be so you pretty much and deep right now. And they are identical. So, basically, currently workload is running and we are pointing to European cluster. If you go to. The Cape Town. So just to verify that we in Cape Town. In Africa. You'll see that we are running exactly the same setup. So, without any modification is the same spec. And both clusters are returning consistent responses. As you can see this. African cluster also returns. A DNS response which consists. IP addresses of European. Cluster where workload is healthy. And which is a. Labeled as a main one. You're a question that I that I thought of looking at this. How does the geo tag field of the status get computed. Yeah, so geo tech, it is specified at the very. Beginning of deployment of the cluster. So, for example, if you look in a European. Pgb configuration, right when you it's basically calm values. So we specify and geo tech. And we specify and geotechs to talk to and in case of. In case of. African. You're doing exactly the same advice. Geotech is the ones that we label the cluster with. And. Extra objects to talk to is European one. So we love them and the rest is. Is created by convention. So. Zone navigation. And servers names they are all consistent geotech and zone. And basically clusters out of this geo taking. They know how to contact each other through this convention. And that's how they basically sharing the information. Also through DNS protocol. Got it. Thank you. So we can actually. Go to the European cluster. And try to emulate the filler. So we can do it simply by. As we usually doing it by scaling right so. Of deployment. So this is important for the info and scale it to zero. So what about to happen. Just will be. Should. Detect on healthiness. Is there a next reconciliation. So as you can see is a limitation of DNS protocol. So we're running like DNS details 30 seconds. Also running some reconciliation loop on top. So is there will be some. Five or three during the failure. Let's see how fast it will happen. So reconciliation loop already done. So basically it already. Ready to return. African. IPs IP addresses. And basically failover already happened. So it was pretty quickly. So. What important to. Pretty much repeat button in a failover scenario. That again we have exactly the. My mirror resource in Africa. And it also returns it against the uniform response. Right. Now it took over. And both European and African clusters are returning. African Cape Town IP addresses. Of C slot bouncer and basically. Both of them are steering traffic to Africa. Because for quote in Europe is that. So. Everything looks great. And as expected. So we are getting African. Geo tech as a response. So we pretty much can scale it back. In Europe. And see. How it will return back to master cluster. So next reconciliation. It should pick it up. While we're waiting for that to happen. Here's another question. What is the. The highest number of clusters that you. Used. The global balancer with so far. Well we always using it in pair right. So we even have a ticket like to test it like. Is more than two. But so far the all the testing was. Done with one player. So we have a multiple players. Like the running. Around. 122. Clusters and the page being able. But they always like in this pair. So we are not. For example for three clusters or four clusters. Not yet. Definitely have it in the back lock. Yeah. Understood. I think this is a pretty new project right. How long have you been working on this. Well it started in December. 2019 so it's like slightly more than one year. Okay. Yeah. And it is already. Used to be seen. So we run it and production for server projects and more and more teams are. Adapting. So yeah obviously finding new issues and challenges. On the way so for example. Currently we are thinking how to actually. Handle multi-tenancy. Usually our clusters as to like. Yeah. Controlling them. With the rancher centralized way so. There is no problem to spin up. A new pair of clusters for a team and they will fully own it. But sometimes. We need a multi-tenancy and page B is actually not ready for it right. And by the way yeah if. Overpack. To switch back to the main customer. So now we are in Europe. Everything is expected. Yeah so and to. You have like if you go to the. Github. Pretty much. Good activity. We'll be using github issues as a. Not just like the reporting box but also. A milestone so we have quite a mouth of plants. Outstanding lots of for example. How to implement more complex. Strategies so currently it's a failover. And around robin. I can quickly show you around robin. Yeah. So. Basically it's pretty much the same. Here in this. Example. We just operating. Several applications. Several services in background right. So just to. Demonstrate the statuses like none existing on healthy. In front and pitch scales are ready. And around robin strategy. Yeah this one additional controls. Over the specific stuff like DNS detail for example. If you want to make it shorter. Or longer. So it's all available. So. For actually I can. I can show the round robin runtime. We should have this. Just to be. Around. Yeah so round robin is basically. Merging the array of IP array of both clusters. Right. And basically making them. DNS standard DNS round robin. Out of two clusters. So as you can see. So. If you scale so. If you emulate yet another failure. And. It should return only half an array and then return back. So. While. It's converging. Yeah so the next step for us. We figure out. How to implement more advanced load balancing strategies like. We don't have a. We don't have a. We don't have a. We don't have a. We don't have a. Argent business needs for that. Like these two reliable strategies are pretty enough. As of now for us. But from community standpoint. We definitely want to implement. Something like more interesting. Like geographically where geographically like. Return the closest. DNS response for example. Poses to the. Requestor and all that stuff. So we are thinking of idea. Of writing some. For DNS plugin which is like. Aware of situation more. And. Modified DNS responses on the fly. Was currently it is. It is nice. It can do its job but it's pretty much. If you. If you get to this DNS and points right. That's. How it works in the back end. Controller dynamic to complete this DNS and points here. This kind. And. And. For DNS reads it. And according to the strategy. With a specific. IP addresses and. Obviously it has its limitation. It's enough for basic load balancing strategies. One I demonstrated but it's not enough. For something more advanced like. Weighted. Load balancing or again geographically. Geographical locality because. It's not enough dynamic here right. So. Is there. Is there a particular strategy that you. Heard request for. Many times from community folks. Well. Not yet. So we are still. A little bit on the radar right. So. There is no yet direct response but. It's very nice that red hat actually. Initiated some. Conversations so. And doing to contribute so. You probably. Know him. Very nice person. Here. Approached us. Is a very technical question. Regarding KGB and. Like. We are going to have very nice. Collaboration. In regards of integration to open. Shift. So yeah looks very promising. And we definitely. Have some some plans. Regarding. Regarding strategy so. Here. Pretty much. Look at the issue so you want to add the topology some. Manual weighted. Consistent round robin because currently it's. It's pretty much random so yeah. We have this. Plans in mind. So far. We heavily. Worked on. IPI stabilization. And the overall. Reliability of solution itself. So. Many. Enhancement in a management for example. So. Easy adoption. By teams. We implemented like a. Backward. Ability to create. Just will be strategy with a simple. In recent notation I think. It's better to show in a. In a documentation. So if we go there. Yeah so. So. So one of the main goals of KGB. Is actually to give. Developing team a power over gold. Load balancing right so. Instead of standard. We would listen. Both props which are defined by application teams. So and the strategy. Is the describable. By a simple CRD. But sometimes like it's a little bit overhead. To. Add yet another CRD into health charts. And we have multiple teams. So that's why. And it was also. Community request for modern reality project. We added ability to. Add the specific page be annotations. On top of standard ingress. Controller. The same. Information. As we specify in the spec. So like. Strategy type and geotech in case of fail over. Controller would pick it up. And create this will be automatically. For this specific progress reference it. And they'll close the loop this way so. Basically. Application teams. I even not required to. Control. Yet another CRD they can. Pretty much. And they can. Provide a standard ingress visa. Couple of additional annotations. And the global load bouncing will work for them. Control. Controller will take care. There's a question in the chat about whether. KGB has been. Submitted all to CNCF. Or is that on the radar is that on the road map. It's totally on the radar. Yeah. I really. Want to submit it to send box as soon as possible. It's really good question. Yeah. Just getting ready. So it will be stabilization. So I think we are pretty much. Tested currently like. Both internally and we will be by community. So. Project is pretty good shape. So I think it's ready for the sandbox. If folks want to get involved. I'm sensing that the github. Repos are the best place. Is that accurate. Yeah, totally. So we are doing everything in github. Yeah, for pgb. So again, you have issues I use. Not just as the issues, but as also. And it's anything future request road map planning. And. Anything and probably will sleep or request a welcome. And. To shoot an idea. Yeah. And for chat. We are hanging out in a. Sigma cluster. In a. So. You can find us there as well. Diane, I see you have a question in the chat. Why don't we take that one offline. Absolutely. Yeah, I was trying to trying to figure out who. Who's working on this besides absa. Is it or is this just coming out of absa. Mostly. Contributing. This project. Yeah, so it's coming out of. But we are trying to gather community. And as I mentioned, we really have a very nice. Conversation with Red Hat. And. Looks like. We're happy about this. And there's another question. Yeah. Any other questions. Yeah, vipin is asking how is. This different from F5. CIS. Because he's there using F5. CIS. I'm not sure if I'm familiar with CIS, but generally. Different from any kind of standard. Ballaster. So the two things. So. Doesn't. Pass the traffic through itself. It would be less so. It basically. Works purely over DNS. And it is aware of internal cluster resources. So it doesn't employ any standards. HDB checks. It uses. So. To make that. Balancing steering decision. Kind of start to open source. Hopefully. Somehow answers the question. I know we have a built in load balancer. I believe. For open shift. Now. So is this, would this replace that Paul. I'm. I think you're referring to the router. Yeah. I, I. I don't think that I have enough information about what's been discussed. To. To comment about that. I look forward to seeing it in the sandbox and. And getting some more folks and more red headers. Working on it. And seeing it again in action and integrated into open shift. And a demo of that sometime soon. That would be awesome. So. Yeah, maybe we'll have. Maybe we'll have a. A sequel to this one sometime soon. Yeah. That would be great. All right, let's see any other questions in the chat. If, if not, what I'd have you do Yuri is go back to your home page there for your project for KGB. Yep. And it's so close to KGB. I'm going to have to keep myself from saying that. Yeah. That's not bad. And then I would say. This is where people can go as well. More information, or as you said, Yuri, this, the SIG cluster, the CNCFC. Buster. Yeah. It would be a great place to find you all. And. I look forward to seeing it used in at scale and production and getting some more feedback on this. I think it's going to be a great addiction, a great addiction, I almost said addition to the CNCF and to, you know, just to the open source communities. This is really a very, very interesting project. So, we'll be definitely following along closely. Thank you so much. Thanks for joining us, Yuri. And just one thing, I think that was Kubernetes SIG multi cluster. Yeah, it was great. Yeah. So basically, I'm utilizing your flex channel guys. Yeah. That's okay. That's good. That's what it's there for. It's perfect. All right. Well, it's great to hear from the Apsa group and looking forward to hearing more great things. And this is the beauty of open source. So thanks, Yuri and Paul for having us today. All right. Thank you so much. Thank you everybody soon. Bye bye. Thanks a lot.