 Okay. Hello, everyone. Good afternoon. So today's talk is about how do we... What are the patterns we see in running microservices in a cluster management environment? And what are the load balancer techniques which we are seeing? And what's our... During our learning, what is the experience and the insight which we got? That's what we are going to share. So the agenda is basically go with what is the use case we are trying to solve? And then what are the challenges we face and how we are trying to solve it? And what's the architecture we are using it? And then some of the conclusions we came from it. And if you have any questions, we can cover with that. So this is a talk jointly between me. So as an introduction, I'm Kamal. I'm part of the platform as a service team in PayPal. And we are working towards building a generation deployment platform based on Mesos. And along with me is Ranga, who is a CTO from Avi Networks. And he will be talking about Avi's load balancer and how it is architected. So when you typically look at any applications, the way we classify it is an application which needs an endpoint, which means which is user-facing or services which are accessed by other application, you need to come through an endpoint. Endpoint could be an IP colon port, or it could be a DNS which are accessed through any protocol. That means that you enter through a single endpoint, but then the traffic gets load balanced between multiple. The other type of application doesn't have an entry point, mostly like daemons batches which runs in your VM, but doesn't need an entry point. It mostly asynchronous. It takes data from behind processes and do something with that. Today's talk mostly is about the first section, which is about application which needs an access point or endpoint through a load balancer, and what are the approaches we are looking at. So before going into it, I'll give a context from a PayPal use case what we are doing. So PayPal is running our own private data center, and we have our own proprietary pass for deploying PayPal applications. So we are a VM-based deployment right now, so which means that we take an application, your application will have n number of missions, and this n number of missions will be fronted by a load balancer, and the traffic gets load balanced between that. So when we started looking into a cluster management environment, what are the challenges? So we started writing down what are the challenges with that environment when it comes to cluster managers. So because in a cluster management environment, every container comes and goes more dynamically than a VM environment, because in a VM environment you will say that I need 10 instances of my application running. What we do is we create 10 VMs, map it to a load balancer, and most of the time the 10 VMs are static, right? You are not deleting, recreating every time, unless you have an immutable way of doing it. Mostly if you are a code deployment is more like an overwrite, then you have the same set. So there is not much of a dynamism involved in a VM-based world, but in a cluster management world, there's a lot of dynamism involved. So we started exploring what are the different patterns in which it exists, what works, what doesn't work for our use cases, and that's a sharing here. The first thing we looked at when we wanted to move from a traditional VM-based deployment into a cluster management like MISOS or Kubernetes, you take anything. The first thing you see is that whether you want to have a load balancer in front of your applications, or you want to do it like what this new concept of service mesh or a proxy is trying to do with our east-west load balancing. So that's what these two models are. One is a north-south and other is a east-west. Just to give a detail, a north-south is something like you have a load balancer in front of your application, and every request comes to the load balancer. That means it's the single point of entry, and then the data traffic gets load balanced between different instances. In the east-west model, your services know about each other, and there is no one more op. It automatically have no which endpoint to connect, and there is a client-side load balancing happens there. So here what we have tried to do is that figure out what are the pros and cons. Each of these approaches has its pros and cons, but we wanted to see what are them, and then map it to our use case and really see which one will work. So when we did that, these are the differences we found. So as traditionally, we were on north-south, so we wanted to see whether east-west will work for us. But there were certain things which we decided to stay with north-south for now, and I'll talk about it. So these are general things which happens. One is in a north-south, it's a central load balancing, which means it's a single entry point. You have full control. That is one of the biggest aspect of it, that if you want to take traffic out, you can do it at a single point. You don't need to have that data distributed to every VM, where in case of a service mesh, you need to have a proxy aware of this, and if, let's say, there is a network partition happens, and if the proxy doesn't get that, you'll still be sending traffic to it. So from an enterprise level, you don't want to send traffic. You don't want to send traffic. You don't want to send traffic. You don't want to send traffic. You don't want to send traffic. That was possible only with north-south. We were not able to do it with the east-west model. And the other thing was that when you have a central load balancer to do two things, what happens is that you have a, it's a cluster aware, which means that if you have 10 VMs, it understands what's the load on each of this VM. When I send traffic, whether it's a black hole in happening, a lot of information can be found, but when you are doing it on a client-side load balancing, that all this information are little tricky because there is a, each VM just finds one random host and sends data. Now there are new things coming up where they're becoming aware of traffic awareness, but still it's in an evolution. It's not there where it should be. So this is what I was talking about. So we, if you have a, if you are working on any typical cloud, you'll have a load balancer and you'll have a set of nodes. You create a LBWIP and LBPool and you have a monitor. A monitor is nothing but to say whether this instance of application is good or not. And then if that is good, then it will start sending traffic. And you have an access point, which is the LBWIP, through which you access the service and then the load balancer, based on the algorithm we have selected, it sends traffic to all these nodes. So if it's a round robin, it keeps round robin. If it's a least weighted, it takes less connection. So it depends upon the algorithm, it do it. Now, this is what I was talking about is most static, right? In this world, you have VMs. Once the VM is created, mostly the LB side of it does not change. It almost remains static for the lifetime of these VMs. Only if there is a change and new VM comes in or you replace a VM, you need to go and touch the load balancer configuration. Otherwise, it's statically there. But the same, let's take a custom management environment. Let's talk about mesos or any other custom manager. Now, when you bring a custom management, the first thing that happens is the containers are dynamic, which means that it can die, it can come up somewhere. Or when you do the next version of the deployment, your whole container, what is running, is gone, and new set of container comes. That means that there's a lot more dynamism and there is no in-place code replacement. Now, you are going into a more of an immutable container which gets deployed, and when the container dies, the container comes up somewhere else, which means there's a lot more dynamism in the cluster. When there's a lot of dynamism in the cluster, your load balancer configuration also has to be dynamic, which means that every time a container dies and comes up somewhere, you should have taken out the previous container and put the new container up. In that way, the configurations has to be generated in a more dynamic fashion. So what we figured out is that, even if you go to a model where in a cluster manager, let's say every pod, I'm using the term pod to represent a set of containers, if you take your application as a pod, then if you deploy that pod, each pod should be represented by an IP in a typical world. If you want to take that from a... If you don't want to do port mapping, which is another layer of complexity, it is better to do a pod where a pod represents a set of containers and each pod will have a unique identifier, which is your IP. Then this pod, that IP colon pod, has to be mapped to your load balancer. Now, if you delete this container and the same container comes somewhere else, it won't get the same IP, because transferring IP is an even more difficult problem, and in a stateless world, it doesn't make to have this IP portability. So in that world, that means that a container A running on a box A, if it dies and comes back in a box B, it will get a new IP. The load balancer should know now, oh, now I have a new member, I need to send traffic there, and also it should have removed the old one so that it should not send any traffic. So there is a lot more dynamism involved, and now the load balancer configuration is tied to the lifecycle of the container. Every time the container dies and comes, the load balancer configuration has to be updated. So just from these certain categories of patterns, what we looked at, these are the challenges we looked at. One is that the lifespan of the LB configuration is very short. Anytime a part comes and part goes, a configuration in load balancer to be updated, which means that the load balancer's control plane, which is nothing but the API layer, should be very responsive. That means I should be able to go and say, if I bring up 10,000 containers by deleting a 10,000, that means 10,000 operation to say remove, and then another 10,000 operation to say add, which means it should be scalable, and it should be very performant. If it is not performant, then what will happen is that your operations will get stuck and queued up. That means new containers cannot take traffic. That is one of the main things. Second thing is that there is one drawback with this, that whenever container dies and comes, you have a mechanism to detect update, but let's say the whole VM went down for some reason. There may be some residuals in the load balancer, because you then have a mechanism, so you need an external system to, detect that and also add and remove those entries so that there is no residual left in the load balancer. And I already talked about the IP per part, so when you have an IP per part, that means that every time a part dies and it comes back, it gets a new IP, which means the load balancer configuration has to be updated. And again, containers keeps moving, especially with the cluster manager, it can move at any point of time. There's various scenarios containers can move, so that means the system, whichever is going to do this, update should also be aware of those things. Now, with all this, we looked at it and then said, okay, these are the patterns for the control plane sign of it, but from the application traffic perspective, what we can do? So in a traditional deployment, we talked about in-place code deployment. Let's say you have 10 VMs, and then a load balancer is sending traffic. When you deploy code, a typical word, what you do is that you take one VM, take it out of traffic, and then replace the code, and put it back into code. That means that one VM will have new code, nine VM will have old code. That's a rolling deployment, right? So you typically roll code in a staggered fashion so that you use the same capacity. You don't have a new capacity, but you replace code and then take it from one version to another version, which is what is a traditional VM-based capacity where you don't add any additional capacity, you deal with the capacity what you have. Now, on the other side, on a cluster management, since you're running on a cluster and you have containers, you can do more different strategies. It is possible to do the same with VMs, like GreenBlue, for example, Netflix has been operating with the VM, but let's say if your IIS is not supportive or you don't want to pay the additional cost of that, or you want to have a faster deployment, and the VM side is very difficult to do GreenBlue if you don't have a fast IIS provider. If you don't bring VMs faster, then you cannot do it. In a traditional, let's say you have your own private data center and you have issues with that and you don't have additional capacity, doing a GreenBlue on a VM-based deployment is very hard. That's why on a cluster manager, it's very simple because what you're talking about is only containers. If you have 10 containers and you want to deploy a new version of a container, then it's very easy to bring n plus one version of the container without affecting the 10, and then slowly ramp it up, or you can even bring 10 new deployments and then take down the old one. So it is more flexible to do these kind of deployment models with cluster management, and that's where it is more effective. And the third point is basically what I talked about, depletion of capacity, because let's say you have 10 nodes and then if you're doing a staggered rollout, you have to take two down, which means you're depleting your capacity from 10 to 8, where in the other side, you're adding more and then taking down in a way. And again, the rollbacks are faster because in a GreenBlue deployment, let's say you have 10, 10, new version got deployed, and when you switch the traffic, if you find there is an issue, you can just switch it back, the load balancer, to the old version, so there's no code deployment. So that means you can rollback to a faster, to an older version, which is very good, very important for a business because you don't want your customers to face issues, and you want to go back to the good-known version as soon as possible. So this is a very simple representation of what we are doing. So this is a very common pattern where anytime a pod comes up, you have something called as a Registrator which is on the box, which listens to Docker events, and whenever we see that that particular pod is up, we get all the metadata about that pod, and then we go and register in the load balancer saying that there's a new load balancer member and start sending traffic to that. And similarly, when a container dies, we similarly track that event, and then we figure out this container has gone down, and we go to the load balancer and say, remove the center. So it's all localized to a VM for two reasons. One is that it has tied to a local place rather than a central place to do this operation because if you keep it in central, when the container moves because of reasons like VM dying or the scheduler moves it, you cannot track it, but if you keep it to the common denominator, which is your host, then it's easy to track because on the host is your source of truth, whether which containers are running. So you can always do a reconciliation of that and update a load balancer with the right details. So with all these different criteria, like we started looking at different load balancers which can support us. So we talked about green-blue deployments, right? So if you wanted to green-blue deployments, we need the load balancers supported. So load balancer should have a mechanism to say, okay, I can have the same set of pool having two things like pool A and pool B and then has an ability to transfer traffic between them. So that means that we should be able to do dynamic configuration and move traffic. That is one. Second thing is we needed a load balancer which has a very strong control plane, which means API layer so that we can hit that very hard. Like if a thousand containers has to be brought up and thousand containers has to be brought down, we should be able to make the call and get responses as soon as possible. That means a strong API layer. The third thing is that if this has to work, if all this has to work, then it has to be horizontally scalable, which means it should be fault tolerant. When something fails, it shouldn't be totally collapsing the system. And also the main point was metrics and monitoring. So when you have this kind of dynamic environment where container comes and goes or it takes traffic and where it runs in this cluster, you need a high level of metrics to figure out where there is an issue. Let's say there is a slowness, or you see that there is a lot of finder happening on one host where it is not happening another. You need to have a visibility to figure out where is it happening and why is it happening. That means the load balancer, since we have gone with the north-south model, there's a single point of entry through which all these interactions are happening. So it's very easy to figure out with the metrics what we get back that, hey, what is happening. So that means that there should be a very high level of metrics available at this load balancer system to do any kind of analysis. And definitely since we are a fintech company, we use TLS everywhere from node to node, which means that we should be able to get a high throughput even with TLS traffic. So with that, I'll hand it over to Ranga to continue on. Thanks, Kamal. My name is Ranga. I'm CTO at Avi Networks. We have been working closely with PayPal for over a year. And let's see what we provide for high availability, for security, and for monitoring that makes this deployment possible. So in this picture, I want to walk over what a traditional load balancer looks like. This is how load balancers have been deployed for 20 years now for most of us who have been through traditional deployments. These are usually hardware specialized hardware boxes, and they have both the control and the data plane in them. It's fundamentally software, but it is individually managed, and you have to talk to each one to configure them and monitor them and so on and so forth. What we have done at Avi is disaggregated much like a cluster manager disaggregated the control plane from the data plane. And then the data plane essentially becomes software proxies that you deploy as a distributed fabric across your clusters. All the control, the maintenance, the management, and all of this happens through the central controller much like a Mezos master or a Kubernetes master. And once you deploy this solution, you can take these proxies and you can fundamentally run them on bare metal, which gives you performance, very high throughput on performance. You can deploy them as virtual machines on either vSphere or KVM. You can also deploy them as containers, Docker containers in a cluster like DCOS or Kubernetes, and all of this is possible in the public cloud as well. And this allows a single central point of management for the proxies and also for all your services. And the controller not just provides a control in the management plane, but also provides a lot of analytics. A key part of what a load balancer does is it proxies the connections between the clients and the actual pods. So it is aware of all the traffic that goes through it. It's aware of the performance of the application. It's aware of the response times, for example, from the backend pods. And so it has all this information that lets you monitor. So when something goes wrong, not only does it take some action, it does health monitoring and all the other good stuff to remove sick instances, but it also lets you debug brownouts, for example. If out of four pods or four containers, one of them is misbehaving or is slow, or 5% of your users complain of slow responses between 2 a.m. and 3 a.m., you need to know what's happening to those specific users. And that's really what it lets you do. And we'll see some of how it does and what it does. And the controller also works with other orchestration systems, with DCOS, for example. So when you create an application in DCOS, it automatically creates a corresponding service for it. Or it also provides API endpoints for you to natively control, for example, the services. So in the case which Kamal just outlined, when a new node comes up and there are containers running on that node, an agent can call an API at a central point and immediately register all the containers running on that node. So this central API endpoint allows you to centrally control and manage the lifecycle of these proxies and also the services themselves. So let's take a look at a live demo system. So this is a demo system. So this is the controller that you're talking to right now. And what it shows for every service is this dashboard. And the first thing you see here is a end-to-end timing. This shows on an aggregate how far your clients are, what is the latency between the proxy and the instances? What's the application response time? So this is the time when the request was sent to the back end to when it started responding to the request. And then finally the data transfer time. Now each metric here tells a tale. If the client latency is high, which it can, if you have clients coming in from, say, across the continent, then essentially it means that you need to reduce that. Maybe you need a distributed application across multiple data centers. You need to move your application closer to your client. If the server latency is high, that means the server is being saturated. I mean, it's either far away, which is somewhat unlikely, or that it is just slow. If the application response time is high, that really means the server is just being slow in responding to the application. And finally the data transfer time is high. Either response is big or your network has throughput issues. So it's just taking a long time for the response to stream back. So this lets you, in one glance, identify where the potential bottleneck could be. But this is an aggregate picture. If you want to go a little bit more granular, it lets you do that. Each entry here is a log entry that shows the information for a specific HTTP request. And so if you want to look at all users whose response time, for example, was more than nine seconds, you can simply do that. You can say, I want to look at all users whose response time was more than nine seconds. And well, you'll see who these users are. You can see, is there a specific pattern for these users? Are they coming from a specific location? Are they using a specific type of browser? And other things that will let you get to the problem as soon as possible. And the other thing that this fabric does, in addition to monitoring, it also uses this information for auto-scaling instances. So it is constantly monitoring the application latency. It's looking at, for example, a number of connections that are open to the application. It can also look at other things like the CPU memory utilization of instances. So if it starts seeing that your application performance is slowing down, it's able to auto-scale that by, for example, increasing the number of replicas in DCOS. So this analytics allows you to build dashboards and consume the information, and it is also internally consumed for improving the application performance. And the other thing is about resiliency. So if you have any instance, specific instance, that's going through, for example, a brownout, maybe it's out of memory, and so it's beginning to drop requests. There's health monitors that are constantly monitoring them. And the health monitors will make sure that the instances are really not being used in an active way for load balancing. And that's just not all, because if the health monitors are indicating or falling below a certain threshold and you want to have a policy that says, if I just have 75% of capacity out of all my instances, then you can, for example, auto-scale, increase the number of instances you need. And it works the opposite way, too. If you see if you have excess capacity and all your instances are being utilized very less, then you can do the opposite and auto-scale it. Blue-green deployments. Blue-green deployment is a case where you really want everything to be graceful. You want to send, let's say, a percentage of traffic to some new version of the application while not affecting your existing clients. So you want existing client connections to be preserved, send a percentage of traffic for new clients to a new version, monitor that if everything looks good, then you want to completely cut over to the new version of the application without, again, affecting existing clients. So it lets you do that. So it's a range of functions, including central management, elasticity of not just your applications, but also the proxy systems, software-based that you can run anywhere, high performance in terms of TLS or SSL offload, load balancing itself, throughput, a lot of monitoring and analytics information that you can use and the system itself can use. And finally, enterprise-class security features. You want security policies, let's say blacklist, whitelist, rate limiters, you want, for example, anything that is necessary to protect your application, let's say a WAF. That's what it provides. And so the past year or so, we have been partnering closely with PayPal and working with them on this journey in making a container cluster-based application at a large scale successful and we have learned a lot in that process and hence the capabilities of some of this a lot. And we have built this to work for a production, quality, large-scale, clustered application in a container cluster. Thank you very much. Yes. Was the question about DCOs integration? Yeah, yeah. So this runs as a Docker container inside DCOs. And at the beginning of time, you can install it using the DCOs universe or you can just do a standalone install. But what it does is it creates a fabric of proxies, one proxy per node, and it performs both ingress as well as internal load balancing and proxy, things like security and policy enforcement. And you can use that for both load balancing within one cluster as well as across multiple clusters for a global load balancing. So fundamentally it runs as another Docker container on every node, and that's really how it's deployed. Yes, please. That's right. Both layer 7 and layer 4. Layer 7 for HTTPS traffic and a few others like DNS. Layer 4 for a standard TCP traffic. Yes, please. One more question. Mm-hmm. Yeah. Take that. Come on. I think it was about a full blue-green sort of... Yeah, I'm sorry. The way we have implemented it is more like if you have an instance of containers running there, so whenever there is a new version comes in, so in AVI it supports a pool A and pool B concept so within a virtual service. So basically we add the new members, the n-plus-one members into the pool B first, and then AVI has an API to transfer traffic, so we'll say send 10% of the traffic to the new ones and keep the remaining 90% here. So in that way we start looking at the new 10% how it is taking traffic and everything. And we support two models. This is something we built on top of AVI. One is the complete... So basically you have 10 instances we can set up the whole in 10 instances again and then just play with the traffic numbers or rather than that, because it's a capacity play there, we can do 10 and then maybe we can do only 20% of it, which could be two nodes, and then see these two nodes how it is going and then take down the old two nodes and then increase it. So it's a slow ramp up, so we'll go that. So these are two implementation through which you can do blue-green deployments. Sorry. Sorry. Thank you.