 Okay, so thanks for being here and today I'm going to be talking about load balancing and specifically its applications towards networks in general and I'm going to be focusing upon what load balancers are and how they are implemented. I'm mostly going to focusing upon algorithm analysis, what algorithms are being used for load balancing and then I'll be touching a bit upon what services are there in place to implement load balancing. So the idea for this came to me through one of the courses I was doing last semester, so last year. So I was doing a course on system administration and there I was learning about servers and networks and what types of servers there are, how they are being configured and one of the more obvious use cases of servers to me as a person new to this was about how they are going to manage load. So as a person who spent a lot of time on the internet, I always wanted to access whatever I could in the quickest time possible. So that's how actually came about load balancing and so that's the title of my talk right now. So a bit about myself, I am Shivayan Mian and I am a so for more student at TripleIT Delhi doing computer science and engineering and I did my Google summer of code last year under the low-clack team and I was also one of the guys behind the SUSE AI that you all have been seeing a lot of. So I kind of do a lot of stuff, I've been doing meddling around with AI and this is obviously a DevOps talk and I kind of end up working with whatever I find interesting and you can contact me on that email, I know this is a bit hard to memorize but that's my GitHub handle and you can always look me up on Facebook. So what's it, what's the big deal, what is load balancing? So it's very simple. So load balancing is a technique to distribute load optimally across servers. So that's as simple as it gets. So by optimal here, I mean you need to maintain the minimum response time for accessing the service, you need to get the maximum throughput and you also need to reduce the overhead or the load on the servers. So that's how load balancing works. So as you can see, it's just a traffic comes and load balancing, load balancer kind of acts as a middle person between the client and the servers and it depending on some algorithm and some obviously some parameters about the server, servers and the clients, it routes the request to some one of the servers. So some of the servers. So basically. So why do we need load balancing? So the first reason we need load balancing is simplification. So it is a single point of entry to for the client because as I said this acts as a middle person between the client and the server. So the client goes to the when a client wants to access any website. So the website or the IP is actually going to is actually the one of the load balancer and when he goes there the load balancer routes it to one of the servers indirectly invisibly to him and that's why it's like a single point of entry and obviously this this makes use of abstraction because of that and this is scalable because you can involve as many servers you like and you can have as many clients you want. You can even have distributed load balancers because so for bigger systems, bigger companies they use distributed load balancers and reusability. So here there's so load balancers involve reusability of server IPs. This is a technique. This is called TCP multiplexing. So whenever a client wants to access something so the load balancer maintains a server pool of sorts. It maintains the IP addresses of all the servers and whenever a client wants to access it, it just picks one and then it allocates it to the client and all of this can be reused. So that's what is reusability. Then it load balancers is one of the most important features they are supposed to implement failover which is when a server goes down then the rest of it should be able to work perfectly. You should be able to balance load between the rest of the servers. So they mostly involve stuff like error reporting, logging and yeah, it's obviously dynamic because why you can obviously one of the servers goes down, the load is balanced between all of the other servers and if you want to add a new server, you can add it and the load will be distributed between the other servers. Responsiveness, as I said, so one of the core features of load balancers is to minimize response time and distribute load efficiently. So the response time obviously depends on the kind of algorithm you're using. So it's pretty obvious. So if you don't have a load balancer and if you're just sending requests to one server or two servers, you don't have a load balancer so you have a high chance of many requests being accumulated on one server itself and that's going to use up your server's resources and obviously that's going to introduce some sort of lag. So the algorithm here plays a big part in how the load is being distributed. And this is really important. So this slide is probably a bit misleading. So ADC isn't actually a consequence of load balancing. It's actually the other way around. So ADC here means application delivery controller. Load balancers are actually implemented within application delivery controllers these days and what an ADC is, ADC is a software or a device that does some common tasks for the web servers so that the load on them is reduced. So a very trivial example, suppose maybe you want to go to the admin page of some website, let's say Stack Overflow. So maybe you want to go to the stackoverflow.com.admin.php. It exists by the way. And obviously you need an anonymous user should not be able to access that. And only authenticated user should be. So what you can do instead is place the authentication within the load balancer instead of all of the servers. So that the load on them will obviously be reduced because it directly hits the load balancer, the ADC, and then it gets redirected back. So an ADC will implement a client to server to client flow because the client will send a request to the server and the ADC actually has an additional capability. It has the capability to modify or to modify the response that the client gets. So and that reduces server loads and this ADC is, it involves load balancing. They have load balancers built in and this is how load balancers are implemented today. So some services, well hardware, so they're hardware based and software based load balancers in the market. So in hardware based, I think F5 is one of the more known names. So they provide load balancers to Microsoft actually. And F5 is the one that actually came up with the term ADC. They actually use it. So and software based, IngenX, IngenX everybody's used it here and it's pretty easy to use. You just need a config file and just you add parameters and it just starts up. And this another thing called balancing and that's actually open source. That's why it's here and there's net skill of my Citrix and so on. Now let's get to the fun part. This is going to be tasty because I'm going to be talking about algorithms now. So while I was doing my project, I was actually reading about what are the algorithms being used. So I observed that many of these companies actually use really naive algorithms like round robin or lease scones on random. Random is pretty easy, right? And all of these are really trivial to implement. So what actually did strike me was, is there a better way to do something? And you can have many, many parameters. You can have a much larger load and maybe some other parameters or features which are in the server that you may have not considered, which we may have to filter in. So that's what I'm going to be discussing. How efficient are these algorithms? So round robin is basically if you have K servers, then the ith request goes to the I mod K server. I am using I mod K because the first request will go to the first server, the second will go to the second, and the K plus 1th request will go to the first. So it's basically I mod K. And lease cons, basically the load balancer will route the request to the server with the least number of active connections. And random is random, just this has equal probability of any server hitting. So pros, obviously very simple to implement, scalable, you can add any number of servers you like. And for random, this has easy failover. Even if something goes down, it's anywhere random, right? So it can easily balance between the rest. And random also has really less number of edge cases. So there is probably very less number of instances that a random can actually completely fail in. So for lease connections, there are actually some really big edge cases, which is why it's not preferred. And for distributed load balancers, and it's the same approach. So in distributed load balancer, you have many load balancers, right? So if one load balancer implements random, the other one implements random. So the combination of them will obviously be random. So it's like the same approach. What's bad in this? So obviously, you are not filtering in request latency at this point. So it's not guaranteed that every request is going to take the same amount of time. And it's not guaranteed that the server will react in the same amount of time. And lease connections in RR have a lot of edge cases. So I'll give you an edge case. So suppose you have n servers, and all of these n servers have something, have load like 100 or maybe 1,000 connections. And suppose you add one more server, okay? So if you have something like leased cons, so that server will obviously have no connection, so that will be a lease connected thing. So as soon as you add that server, the load balancer will be like bam. And all the connections will be routed there, and because it's so much of load, the server will fail. So that's the edge case there. So and obviously this is actually reducing the capacity of the usable space you have. So the reason that I'm going to be analyzing why this is actually going to be happening through one of the most simple and well-known problems in probability theory, which is the balls into bins problem. So the balls into bins problem is basically given m balls and n bins. What is the maximum number of balls that can go into a bin for maximum likelihood? Because obviously you can randomly just put, you can have a chance that all the m balls go into one bin. But that's really not likely. So in the maximum likelihood circumstance, what is the maximum number of balls that can actually go into a bin? So here, obviously, balls are your requests and bins are your servers. So that's how we can actually analyze. So the solution for the balls into bins problem was actually figured out in this paper, and the solution is actually quite nice. And I mean, you can feast your eyes on this. So basically, this is like, okay. So I can't actually explain this much. So with the KAs, they have like four options of m and n. And this was a paper by Rab and Stieger. And they actually found out a solution for the number of balls, for the maximum number of balls that can go in. So being the guy I am, I didn't actually go much into the proof of it. I obviously wouldn't understand it that much. So well, let's go to some code. Let us consider we just have these n bins, and we have m balls. So here, I'm just using n is the number of servers, m is the number of records. I'll just set them to some values. And then you have n servers, obviously. And I'm randomly picking up indices, and then I'm just incrementing them by one. That is actually basically a request hitting one of the servers. And then what I'm doing, I'm just printing the server list. And then I'm printing the, what's the maximum load of server has, the minimum and the standard deviation. So this is what you get on a random run, just one run. So this is basically this. And you see the maximum load of servers getting is 139. And the minimum is 100. And this standard deviation is like 12. And you see, it's like there is like a 39 request difference, which is actually quite a lot. It's like two extremes. One is actually a bit less, and the other one is a lot. So what do you actually want in load balancing? As I said, you need to reduce overhead. All of the servers should actually have equal load, because in many cases, you may have the same number of resources for the server. They might have the same memory, etc. So they should kind of all be balanced equally. And also, this is actually, this has one limitation, because I'm just incrementing by one. I'm just treating all requests to be equal, which is not the case. Because obviously the request can take any amount of time. They won't take the same amount of time. So they actually can't be treated the same. So we need to filter in something called latency into this. Basically the response time. We need to filter latency into this to model this better. So how do we model latency? So because yeah, how do we model latency? So there are a couple of things done in this. So there's this guy, I think his name is Andrew. I don't know his surname, but anyway. So what he did was, this is basically a graph of response times. And I think this is number of servers, I think, number of requests. And it's basically response times he was getting. And this was approximating a Pareto and or a log normal distribution. So by the way, Pareto is also a log distribution, which is mapped to e to the power x or something like that. But most of us, we actually use log normal distribution to model latencies. And there are a couple of reasons for this. So one reason for this is basically entropy. So first is the principle of maximum entropy. It states that the ideal distribution is actually the one which has the least entropy or something. So log normal is actually the yellow line acting. This actually fits in quite a lot, quite good. And also we tend to use logarithms for parameters that are different, that are independent. So this is basically a graph of he used about five or more than five parameters. So because these parameters are independent, so just taking the log, you can just add them together. So that's why it's actually pretty nice to use. So that's one of the ways. And that's why in entropy information, it's actually treated as the negative log. You actually use a log. So that's why we are actually using log normal here. There can be better ways to express this possibly. And this was this paper by Ulrich and Miller. They actually used the log normal distribution and found out that it fitted really well to reaction times and so on. So what we are going to do is we are going to have a log normal distribution to model latencies. And since we actually don't have any of that right now, so what we are going to do is we are going to generate latencies. And then that's how we are going to model it and we are going to generate them so that the mean is one. They are all around one. So doing this again, so basically since this is log normal, so I'll tell you the N and M remain the same. The capital M and capital S in a log normal distribution are basically the mean of the natural log of your random variable, which here is latency. And S is the standard deviation. And in the paper here, when they actually use that and they actually set the SG to like 0.15, I think. They used SDS 0.15, I'm using 1.15, it's just a random thing. So then I'm just, this mean is how the mean of a log normal distribution is. And base latency is basically you can't have latency as zero. There can't be latency zero. You can't have response time as zero. So I'm just setting it to 0.05 in like five milliseconds. And then normalizing is basically scaling because I'm setting the mean to one. I want all of them to be around the mean. So it's just basically scaling down and then I'm just basically doing that. So and what I'm adding is the normalized weight. And I'm then doing the same thing. So what we get here is something like this. So this is actually really skewed. So the maximum is 160, the minimum is 185, and the SD is 21. Which is actually a larger variation than what we saw before. So this will actually be really skewed and kind of oscillating kind of stuff. So basically, there is actually no perfect way to map this. So we had to, there's actually no perfect way to balance load. So there's actually another way to think of this. You can have another algorithm for this maybe, where you could use a greedy approach. It's a greedy algorithm, which actually works really well. And this algorithm, which I'm going to describe right now, is I think the one that's being used today. It's a really well-known algorithm. It's called the join the shortest queue algorithm. Basically it is joined to the server with least number of unfinished requests. So it basically routes to the server, which has the least number of unfinished requests. It is similar to lease connections. And we use something like randomized joint shortest queue. So randomized because as I told you, lease connections has an edge case. That if you have a lot of load on the other servers, and a lot of requests are coming, and if you are just one more server, then that server is going to get a lot of load. So which is not cool. So randomized JS queue is going to implement the failover capability. And it's actually going to prevent that from happening. So randomized JS queue is actually really used. So if I need to implement randomized JS queue, I just chain like two lines. So the earlier piece of code we had was we just took in this line here in the loop for weight and NP random. So basically we are choosing a random server, and then we are just adding a normal. And then we are just adding the normalized weight to it. So in the JS queue, what we do is we just pick up two indices, two servers, and then we compare them. And the one with the smallest is the one we actually add to. So we just pick up two at a time, you can pick up any. So it's actually been found out that the powers of twos are actually better. So if you do it two at a time, then it's kind of good, but you can do it any. So I basically pick two and then I compare them and I find out the one with the least load. And then I just add the weight to this. And this gives a really nice result. So the standard deviation is dropped to three. And the maximum is 127 and the minimum is 114, which is really close. And this is actually really balanced kind of setup. So it's not that this is the most efficient algorithm. Obviously, there's a lot being done here. So then there's stuff coming up. It's a really on field of research. But this is one of the more efficient algorithms. And as I said, there is actually no perfect algorithm to balance load, which can probably filter in everything. Because there is going to be some or the other stuff happening, which is going to prevent you from doing that. But yeah, JS queue is actually really efficient. And well, yeah, so to wrap it up, well, I've just seen going through all these algorithms. And well, I hope this helps you build up some better algorithms. And I think making software load balancers is actually pretty simple. So while I was working on my project, so I use node.js. And there are actually really helpful libraries out there. There's one called C port, which actually implements the failover all by itself, which actually maintains the cluster. And once your cluster is there, the load balancing cluster is already maintained, then all you need to figure, write the code for is the algorithm. And round robin and lease cons is actually pretty simple. You just need the list of the servers, and then you can just balance load on this. So well, I think that's it. Thanks a lot. And any questions? Thank you. Yeah, anybody have a question about load balancing? It's not necessarily related to what you were talking about. You're talking about the algorithms with respect to load balancing. So load balancing is obviously related to DevOps. OK, so my question is more along the lines of the load is being balanced with the load balancer. What about high availability of the load balancer itself? So that's when you have high availability of the load balancer. So basically, you're considering the case when the balancer goes on and what happens. So that's when you use load balancer. That's when you use a distributed system, obviously. So that's when you use a distributed system. You can't actually have one load balancer for all of the servers you have. So the companies that you have, they have so many servers. You can't have one load balancer for that. And besides, obviously, they are obviously taking more measures to keep the load balancer active. So they are caching all its data and all of that. So yeah, it's mostly implemented distributed kind of system. You can't have one. Any other questions? OK. OK, thanks. Thank you. Make sure you have a question.