 Hello, everyone. We'll start now. Thanks for coming to our session. We are going to talk about one of the important network Services for moving your enterprise applications into OpenStack, load balancer. I'm Praveen Yelgandhula, and today I have my colleague Anand Patil. We're going to talk about Elbas in OpenStack. It's designed for a beginner Session, so we will do a primer on Elbas and go through the evolution of Elbas APIs in OpenStack, go through the implementations that have happened, and then talk about the newest service VM architecture that gives really high scalability, high availability load balancing as a service. So let me start with our primer. Most of you know what load balancer server, the main goal server, high availability and performance. So if you have one single web server, if something goes down and you have all your traffic coming for it, then you basically don't have any availability. So typically what you do, you have multiple web servers, multiple of your instances of your application, and have a load balancer in front of it to basically spread your user request across those servers. So that even if you have one server that goes down, you still have others to serve your traffic. Now, one of the things to notice here is that the load balancers basically have to monitor your back-end services. So they're not just sitting there and splitting your requests across these back-end applications, but they have to monitor and make sure that the back-end servers are always up and whichever is not up, take it out from service and so on. Over time, the load balancers have been involved to expect more and more to do more and more and move the application's features into the load balancers. So we'll go through some of them. First one is session persistence. So if you have a user coming in and if you just split and spread the load across all your servers, that's not very good. If you have put in something into your cart and when you visit the same site again, you want to see those objects that you have put into your shopping cart to still be there. So you want to have session persistence. So that means the load balancers have to know not only just spreading the request, but need to track state about who's the user, what IP address they're coming from, or put some cookie in there to figure out where to send that request to. So there is session-based persistence that the load balancers have to do. And also, that means that they also need to terminate your TCP connections, look into deep into the packet to figure out who the user is, like, for example, to look at cookie. And so if you have another user, then you basically persisted to a different server. The second thing is policy-based switching. So you may have different services that you expose through your load balancers. You may have the media servers that is where you host all your images and all your objects in your catalog and so on. And you may have, like, other set of servers, which are your PCI compliant, where you do all your grade card processing. So policy-based switching says that, like, be able to redirect traffic for different kinds of your user request onto different sets of backend pool members. So you need to have the capability to be able to specify, based on your regular expressions or even more deeper semantics, to say these are the kind of requests that go to this pool and these requests that go to the other pool. Now, as I was mentioning, to be able to do all these load balancers need to terminate TCP, and as you know, most of the new traffic is also, like, all the traffic is HTTPS, like you want it to be secure. And so you need to do load balancers need to do SSL termination on the connections. Now, there are multiple reasons people want to put SSL termination on the load balancers. One, you're trying to move the load of doing these expensive SSL handshakes away from your application. The SSL handshakes are pretty expensive during which the SSL session is established. And the second, if you look at this kind of picture, you have lots of these multiple application instances in the back end. So if your user is directly talking to these end points, then the user's browser has to do, like, if you have six media servers and it has, like, eight connections spread across those six servers, then it has to do, like, six handshakes, separate handshakes, because unless you start to share the SSL state across these, versus by pushing it into the load balancer, you have single place where to manage your entire users and their user sessions. And also it provides a single place where you can do all your SSL-related enforcements that you want to do. For example, as a company, you may decide that, oh, I don't want SSL v3 anymore because it's been proved not to be so secure. Then you can essentially apply the policy at that single point to say, don't do SSL v3 and just do TLS. So there are lots of these kind of features that are now expected from the load balancers as part of moving the functionality from end applications into the load balancer, like rate limiting the request coming from the user or doing the user-auth instead of each end application, doing the user-auth, you do your user-auth in one place and then just use the user identification for providing the rest of the services. Now, if I take one step further, the global server load balancing is the case where you install, you set up your applications across multiple data centers, across global regions. And you want to have... The main reason you do it is that you get very high fault tolerance. You're not just restricted to one data center where for some reason, if the data center goes down, then you don't have any backup. So you want to have it multiple places. That gives you high fault tolerance. And the other thing is that if you want to do active... Like, if both these systems are up and then you want to do geolocation-based load balancing, your users in one region that are close to the application, you want to route them there for lower latency and hence better perceived performance. And you also further want to have the session persistence. So if your user moves from one location to another location, you still want to preserve that user to the same site. And this is similar to session persistence that we talked about, the local load balancing, but here at the site level persistence. So let's move on to the OpenStack LBAS APIs and talk about how they have evolved over time. So LBAS was part of Neutron when it started. And Neutron LBAS version was version one, which was introduced in Grizzly. Very simple model, and it lacked a lot of features but gave the basic load balancing. And over time, once LBAS V2 started to add all the features that one was deprecated, and the LBAS V2 has been around since Kilo. And Octavia is the newest project that now moved out of the Neutron, and LBAS is not served under the Neutron APIs anymore and has its own separate API endpoint to directly talk to, and that's the Octavia, which is backwards compatible to LBAS version two APIs and is superset over LBAS V2 and been adding more features and so on and so in there. So I'll just start with LBAS V1 model to set stage on what the model looks like and because this is kind of expanded and enhanced over time. Any load balancer, as I mentioned, has basically kind of three or four components. There is a pool, which represents a set of servers, and you have a health monitor associated with it to tell how you want to monitor your back-end pool members, whether you want to do your HTTP-based monitoring as a ping-based monitoring or HTTPS-based monitoring, or you can add more things. It's a MySQL server, so you want to do maybe MySQL-based load monitoring and so on. And then you have the construct of the VIP, which is the front-end side. What's the IP address and the port at which this service is provided? That's what the user's connected. So that's the basic model for any load balancer. And LBAS V1 took this approach of having this pool as the centralized object and the whip at the top and so on. Unfortunately, there are several things that were missing in LBAS V1 right out of the back. People had issues using that model because it supported the single port, but whereas most applications have both port 80 and 443, and also there was no support for SSL termination and so on. So LBAS V2 is the next round of the API enhancements that have happened. And here the first construct that was added was the concept of listener. So you have the load balancer as the top-level object, which is where you specify what's your front-end IP address is. But instead of having single port baked along with the IP address, now there is a resource called listener. That's where you split your service across. So if you have port 80 and 443, you will have a couple of listeners. And on the other side, you have the model very similar to LBAS V1. And LBAS V2 added the support for SSL termination, for TLS support. For every listener there is, you associate a SSL cert and say use this SSL cert to terminate the connection so that load balancer can implement it and terminate the connections in the load balancer. And the LBAS V2 also supported SNI, which is where you want to have a single IP address but used for multiple websites, say foo.com and bar.com. I want to have same IP, but I want to serve different certificates. Otherwise, browser will not accept those certificates if you just serve one certificate. So SNI is the concept where you can say this is the server name or the host name and this is the certificate to use. So that way the load balancer can serve depending on what user is interested in a specific certificate for that site. And in case of LBAS, Newton-LBAS, all these certificates are stored in Barbicon. That's the service for secrets, managing and storing your secrets. And all that you put into the LBAS configuration is the handles to those Barbicon certificates so that your certificates are kept safe and your LBAS simply has the pointers to those certificates. And the other thing, big enhancement in LBAS V2 is the policy-based load balancing that we talked about. So you have option to specify multiple policies and associate with HTTP or HTTP as load balancers. And each policy has a bunch of rules and an action. So you can have something to say, like all these so-and-so HTTP URLs, the action could be dropped or action could be forward-used, a specific pool, and so on. So LBAS V2 was a big jump over LBAS V1. But as you go up the stack in virtualization, L2, L3, L4... At L2, L3, L4, the kind of... the amount of things that you have to create APIs on to virtualize are kind of low. But as you go to the load balancer, there are so many requirements, so many things that application developers now expect from load balancers. So there are lots of things that are still, like missing from LBAS V2. Like there were missing protocols and SSL support was limited. Like you couldn't specify, like if I want to specify that I want to use only these ciphers and those not those some deprecated ciphers, it's not... There is no way to explicitly specify it. Whatever load balancer provider provides in the system, those are the only ones that are available. And also there was only support for one default set. And the health monitoring, the options that you have were limited. Octavia is the current project. And as mentioned before, it's a superset over LBAS APIs. And they have been working hard on providing lots of enhancements based on the community requirements, like UDP support is added. Even though it's not there, like the reference implementation doesn't give it, but if you have a provider that supports the UDP-based load balancing, at least you can bribe that through the APIs. And then there is statistics on listeners, quotas on objects, and like the alternate monitor port for pool members and so on and so on. So that covers the LBAS APIs. In OpenStack currently there is no GSLBE support. There was a project that was Atlas that was there, floated, I think about couple of years ago, but that got abandoned. At least there wasn't much that's there now. So to go through the implementation, so we talked about the interfaces, APIs. Now to talk about the implementations, we can categorize the initial implementations of LBAS into like kind of two different ones. One is the hypervisor process-based, which was the reference implementation using HAProxy, and the LBAS V1 and the initial LBAS V2, and the appliance-based. In case of the HAProxy-based implementation, the initial implementation was that you run HAProxy process for every instance of load balancer that you created on the OpenStack network nodes. And there is a LBAS agent that takes care of starting these HAProxy processes and killing them off once you remove the load balancer. And your compute nodes have the VMs. This kind of thing has some issues, right? Like you're not so trafficked as going through the HAProxy and going to VMs. And your east-west traffic has to come to the network nodes and go back. The HAProxy, this implementation is limited in scalability because you are tied to how many instances you can run on your network nodes. You can add more network nodes and so on. But it's the one single instance is dedicated to one load balancer, so there is no scaling out across multiple nodes or anything like that. When you build HA, you have to run like Pacemaker or keep LID to keep your HAProxy instances up and the tenant isolation is mostly best of it because it's all localized and running on network nodes. So it was not really for enterprise-grade clouds. The second kind of implementation is based on the appliances, where you have these box-based load balancers that you might have and you have in data centers and you put them alongside your installations of OpenStack. And you need to plug them into the underlay so that they can understand if you have VXLAN traffic, they need to understand the VXLAN. If you have a VXLAN overlay, then these load balancers need to understand the VXLAN protocol to be able to really do load balance. So in this case, not so traffic. It's now hitting your network nodes and going back to your appliances sitting outside and coming back in. And in case of Easterst, it's basically coming out of the OpenStack racks, going to these boxes and coming back. So there are several kind of issues with it. Overall, it's pretty complex and expensive because you need to have, like, to really do tenant isolation, you need to have multiple instances of these and these need to plug into the underlay. So setting up that and making operationalization, everything gets pretty complex. So we'll move on to talk about the service VM architecture. That's the new architecture where instead of having the appliance based on the left-hand side, like the legacy, you have this next generation where you have kind of a control plane and a data plane. And data plane is essentially these virtual machines, the service VMs that are running as other instances in your OpenStack cloud. That way you get your elasticity and you can have better management and you get all the scale that you want. So Octavia takes this approach of service VM architecture. You have the control plane and AMP photo or the service VMs or service engine as running as containers in this case. And you have the control plane talking to different services to implement your load balancer. So here, Octavia has been around for two years, has been implementing this and is making quite a bit of progress. Now it has active standby and active-active is in progress. And now I'll switch to the one implementation, the service engine VM-based implementation that we at Avi has been doing. And we'll go on to talk a little bit more about that. So what we are is we have a software-based load balancer and it has the web application firewall, service mesh, and analytics, everything built into one single thing. And we've been around for five years doing this service VM architecture and we've been in deployment at several large companies and been well tested. So how does this architecture look like? So we built on SDN principles, like you have separate control plane and the data plane. And the data plane, as I was mentioning in the service VM architecture on the bottom, they're all like VM instances or container instances or running on bare metal. And these run either in your local data center on prem clouds or in the public clouds. And it could be OpenStack or AWS or vCenter or anywhere. These are essentially small instances that run anywhere and do your load balancing. So it's a multi-cloud solution. And like any SDN, the central controller is the central brain. It collects the analytics about all the service engine VMs and that's what provides information about how well the system is doing and also analytics about your applications. And it's fully 100% restful API so you can have all kinds of automations like Ansible or Terraform or all the other kinds of automation that you can think of. Now how does it integrate into OpenStack? So you have the controller that talks to these different services that you have in OpenStack. As I mentioned, for service VM architecture, you need to be able to spin these VMs and be able to connect them to the right networks to receive the traffic and do the load balancing. So we integrate into Nova Neutron for the VM creation and for doing the networking. And also integrate into Horizon so we can provide all the analytics and everything into like a single framework in Horizon. And it's integrated into Keystone for providing multi-tenancy. You can configure the load balances through the Neutron Elbas APIs and all are through heat objects. And we also have full heat resources for our full resource model. So you can create full-featured ADC instances using heat. Now I'll walk you through a little bit steps of how really it looks like. So I'll ignore a lot of details and go through like an installation. Just walk through like how it works and show it some animation. So typical deployments, the administrator creates the RV controller, connects it to OpenStack controllers by providing the credentials so that the RV controller can talk to these different services. And there is a management network which is the management network used for talking to the service engine VMs. And let's suppose user has two servers, server one and server two, and I want to load balance across these two servers. So I create a load balancer. What the control plane does is to spin up these service engine instances as NOVA VMs, regular NOVA VMs. And these are created as many as needed like if it's an active-active configuration we'll spin up at least two. And by there will be two, and if it is best offered there will be one. So whatever is the criteria that user wants based on to meet those SLAs will spin up as many instances. And then connect them to the right networks using the Neutron APIs. Actually we use NOVA Attach interface for connecting these and to figure out the right Neutron networks and so on. And connect them to the right Neutron networks. Now, once you have this, the service engines attract the virtual IP traffic that's the front-end traffic and simply start load balancing across different servers. Now, if you look at this architecture with the service engine VMs, one of the key benefits of this architecture is now you can start to do all kinds of nice things about like tenant isolation or if you want to do sharing and all those things are possible with this architecture. If tenant isolation is easy, you just create different service engines for different tenants. So if you have different servers that you're load balancing to across different tenants, then you basically get strong isolation. So one tenant doing something, some wrong configuration or taking out some virtual machine is not going to affect the traffic going to the other tenant. Now, I ignored a lot of details in the previous slides to didn't say anything about where those virtual machines are created or how they're getting created and so on. So this is, here we'll go through how this service VM architecture gives the flexibility on where you place your service engine VMs. So one possible option is that you can place the service engine VMs in the tenant's context so the tenant gets to see how many VMs that they're using and so on so that gets counted against their quotas. Or you can have them all collated into a separate LBAS tenant so that the tenant doesn't see them and won't be able to do anything to them, but you can still provide the load balancing and purely as a service and that is what tenants consume. And this mode also gives another flexibility that if you don't care about tenant isolation, say you have a thousand test projects creating a individual service engine VM for each tenant means like you're just spinning up 2,000 virtual machines for your thousand test tenants which basically means you need lots of resources. So by having the flexibility of spinning them up in a separate tenant you have the option to basically share the same VMs for your test tenants but at the same time you can have isolation by creating separate service engines for other tenants where your production traffic is. So all those things are possible with this service engine VM architecture. Now to talk a little bit about availability, the different kinds of high availability that you can have, Active Standby is the one common one where you basically spin up 2 VMs as soon as the load balancer detects that one of them has gone down, the other VM starts sending out gratuitous ARPs to attract the traffic for that VIP so that the traffic gets diverted to the other one while the controller can spin up a replacement for that service engine. So this is the Active Standby and you have Active-Active case where you're basically splitting the traffic across multiple service engines and basically both of them are active and working on the traffic. In this case again, once one service engine VM goes down the other one takes the load temporarily for all the load but the control plane will spin up another service engine again and spread the traffic across. You can have like a N plus M HA where you have always have N service engine VMs running and extra M capacity that is used and kept as Standby instead of just one one in Active Standby. So all those configurations are possible in this model. So the control plane, the other thing that I mentioned previously was about elasticity. So as you have the case where if your traffic starts to increase the control plane monitors how well each of these service engines are doing how much traffic they are taking on and you can either do or autoscale or do policy-based elastic scale where you say as soon as the CPU load goes above this percentage or the app response times falls goes above certain latencies then increase the number of instances and then spread the traffic across those instances. So the elasticity is another important thing that the service engine VM architecture brings that you can't get with those single instance deployments that we the implementations that we have seen before. This is one of the big powerful things but it also brings out one of the complications in OpenStack itself. So OpenStack neutron the network abstractions allow only a single IP on a single interface. So that means there is no con like the virtual routers the neutron virtual routers doesn't have a concept of ECMP. So if you have traffic the only way then you can really spread the traffic across is basically attracted to one single VM and then punt it out to the follower service engine VMs. But this means that you have a bottleneck. Now you have a pps limit on single vnake and you might have known that if you did any performance measurements with traditional KVM OVS ML 2 the number of packets per second that you can get is pretty small. So typical ways that you... Sorry, let me do this. So typical ways that you get around this is either you talk directly to underlay to say, okay, neutron abstractions doesn't provide it but talk to if there is an underlay that supports ECMP talk directly to the underlay and set up ECMP so that you can get the traffic. Or the other option is to like do BGP peering with upstream router which is what I'll show with this picture here. So you can have a upstream router and the service engine VMs can do BGP peering directly with the upstream router. This is not the virtual router. Your physical router that supports BGP or a virtual router, not the neutron router. And the service engine VMs can do BGP peering with it so that and start advertising the virtual IP so they can do real ECMP across with these virtual IPs. And this is like we have some customers that are heavily dependent on this because they have several hundreds of thousands of connections per second that come through and several hundreds of thousands of SSL handshakes that you need to do per second. And you can't really do anything like that without this kind of ECMP support. To quickly talk about GSLB, with support GSLB, you can have our control plane and data plane installed across your multiple data centers or in your public cloud. And these control planes communicate with each other and track information about like where your service is up. And do DNS-based GSLB. But we also do active standby or active-active across these locations. And it has geolocation-based load balancing and the session persistence. All of that is bent into it. Now, we want to go through demos. Anand, why don't you come up and just quickly show. Thanks Praveen. Hello, everyone. I am Anand. I'll take you through the demo. So this is the OpenStack Horizon UI. And we have OpenStack Cloud configured here with AVI-Elba's plugin installed in it. And I'm going to create a load balancer using Horizon. This is the demo. I'm choosing the subnet. We can choose any of these on Robin. And I have a server here that I'm going to add as pool member. Monitor type is going to be HTTP. My server is listening on port 80. So now the load balancer driver is actually calling AVI-Controller invoking REST APIs on AVI-Controller and creating the virtual services. And it's going to place the virtual service on one of the service engines, the VMs that Praveen was talking about. Here, if you see, I think I should refresh. Now if you see, this is the load balancer that got created and it has one listener. I am going to associate a floating IP to this load balancer. You can see this is the floating IP that got associated. Run some traffic. And this is the application that's running on the server and this is the root page. Going back to the load balancer, this is... So we also have Horizon dashboard for OpenStack Horizon where we show the entire AVI-Controller UI and one can go in this UI and, you know, there are a lot of features that are provided here that one can use. If you see all the virtual services, all the load balancers that were created in OpenStack, they are visible here. Of course, there are many more. The recent one we created was the demo one. There are other things that you can do with this load balancer that you have created. You can scale out the virtual service, which means you can, you know, choose a service engine or it will spin up another service engine if you want to scale out the virtual service. You can migrate it to another service engine. Here you can see the details like the service engine that is hosting this virtual service. And if you see this panel, right, this is the analytics that is provided by AVI and if you see there is some traffic running over here for this virtual service and you see the type of metrics that we are creating here, collecting here and reporting them. If you see this is the client RTT, which is the time taken for a request to reach to AVI load balancer from client and this is service RTT, which is the time taken from the AVI load balancer to the backend server and this is the actual app response time and here if you see this is the color coding which provides you a greater insight into what is consuming more time. Here if you see the green ones are the client RTT and which is taking most of the time. If you see 105 to, you know, it's like more than 100 milliseconds and the app response time is pretty good and we also collect the logs here if you see and these are all the requests that are hitting the AVI load balancer for the virtual service and clicking on one of the requests you see a lot more details and you have a lot of insight into this. Here on the left-hand side, if you see it gives details of the client from where the request is originating and which device is being used and browser and all those things and here if you see what is the actual URI, you know, and if you see the... For each request, we are showing what is the client RTT, server RTT and app response time and the data transfer time. Data transfer time is actually the time taken for the entire response to get to the client. And we also have some more analytics here. You can choose the... You can click on the devices to see what are the top devices that are making these requests. You can choose the location to see from where the requests are originating. These are the top locations. End-to-end is basically the end-to-end time taken for the request to be served. Here if you see there are some of the requests which are taking a lot more time than expected and you can just click on them to see further details. Here are the actual requests that those took this much time and again you can further drill down and see what actually is taking the time here. If you see the client RTT is like 958 milliseconds, so it's the request from client to the load balancer which is taking most of the time. That's all I have for the demo. Thank you, Anand. Okay, so just to summarize the things... Yeah, so our AVE solution, the Service Engine VM architecture does the elastic albass and has the integrated analytics. It's 100% API driven and we built it as a SDN kind of architecture with control plane and data plane with the data plane elastically increasing. And it's multi-tenancy with keystone integration and we have very deep visibility and insights built into the product. So it's an enterprise-grade ADC analytics engine for your cloud, whether it's on-prem or whether it's AWS or Azure or whichever cloud that you're working on. So thanks for spending time with us in this session and we are open to questions now. I guess we ran out of time. Yep, so how do we implement SNI in our implementation? So as I mentioned, SNI is about doing it as the TLS connection comes in. You have the host name in there and we're basically doing the SSL termination and we are looking into the request so we know what's the host name that's associated with that request. So at that point, the SSL certs that we offer to the client, those are based on the name that you see. So that's implemented in there. Okay, so for that particular aspect, those are the kind of reasons why we couldn't expose everything through the Elbas dashboard because not everything is there. So that's why we integrated it into Horizon so that our full UI is available and you can actually do the SNI directly through this element in the Horizon. So for example, I have this demo listener and what I can do is create a virtual service. Let's do the advanced setup. And I will say I want to do a virtual hosting and I can say this is going to be a child for the demo VS. Again, that's the other virtual service that we created. I guess that one is not valid for that, so I'll have to create a virtual... We have to designate which VS is going to be the parent VS since the VS that we created through Elbas is not really a parent VS that we could use. So you basically create your parent VS and your child VS through this interface. And then specify for each child VS what's the domain name that you want to serve, say something like foo.com and which particular certificate that you want to use for that. So we have the SSL certificate that you basically specify which one to use for that. So that's available through... Yeah, so good question. So your question is how do we do the policies? And you're asking on, like, F5, you have the iRules language to specify the policies? Yeah. So what we have done early on is that we basically looked at the typical rules that people write for doing the policies and figured out that most of them, like 99% of them you can do with primitive policies itself. So they are, like, in our model, they are primitive policies in the sense that you can actually configure everything through the object model itself instead of writing any code. It's basically building to the APIs itself, similar to the Elbas V2. Elbas V2, the options that are available are pretty limited, but we have much more deeper, like, policies that you can do. Let me show, like, what you can do as an example. Say I pick this up, if I do this. So we can do, like, all kinds of, like, network security layer policies or HTTP-level security policies, HTTP request-level policies. And you can do a bunch of things on, like, what you want to match on, like, all these different kinds of fields. But we also have lower-based data scripts, which kind of, to fall back on, worst case that you have, like, very complicated logic that you want to build into ADC and have to do some specific rewriting of lots of headers and fields that you can still do, we have lower-based data scripts that we support. That's the, like, the fallback. Yeah, lots of them. It's basically pretty functional. Yeah, you can do full regex on the, regex and replace on the data that's going through the request. Yeah. Yes, this is ours. Yeah. So the product itself is not free. This one is open-sourced code on how to integrate it into OpenStack, but that's, yeah. Thank you. Thank you, everyone. Yes. Yeah, you can. Yeah. Yes, you can. So you don't need to have any pool members at all. You can just create a VES, write some rule. Actually, we also have this, not even write anything. Like, let's suppose you want to attract traffic for a specific port and drop it. You can do it through, like, by setting a security policy and just dropping the traffic. Yes, you can. You can basically say, for HTTP request, the response, the action can be, you just either redirect, modify your L, or do content switch. I think this is on the HTTP response where you can have, one of them has that. How is it is to integrate this into OpenStack? So the thing, like, basically there are two points of integration. If you want to use Elbas version to APIs, if you need to use them, then we have a small driver that you install in your Elbas as a provider, as another provider. And then this horizon code is, again, another small pick package that you install in Horizon for exposing it. But our controller, our solution itself, there is the control plane that you need to install. That's one, two, three controllers, depending on, like, whether you want available redundancy or not. And then the controller itself spins up the data plane engines as needed. Thank you.