 All righty, well good afternoon folks. This is the load bouncing as a service version 2.0 liberty and beyond talk So if that's not what you're here for you're in the wrong place It's introduced people we have Franklin of all and Brandon Logan from Rackspace and Michael Johnson from HP and I'm Steven Blucoff from blue box and if you haven't heard we are now an IBM company so Yeah, anyway Yeah, well no, I'm sure it'll be more. Don't worry. Okay, so we're gonna talk here about Well, everything here anyway, so just as a quick explanation for those who have no idea what we're talking about up here Load balancing is a vital component to cloud applications because it allows you to do Scale out as far as the ability to it's it's pretty anyway. Sorry Basically, it allows you to have multiple machines in an application environment Servicing the same IP address on the front end same service in the front end it's the only way you can actually get horizontal scalability out of a Well, it's the best way to get horizontal scalability out of a cloud application And it is a so here we go. It's going a little quicker than I thought anyway, so I Don't need to talk to you guys about why it's important really if you if you don't know why load balancers are important to you Or in the wrong place as well. Anyway, so the The people that are involved with this There's lots of people that are working on load balancers of service At this time most of them are from racks base and HP and well IBM anyway Moving on so I'm going to tell you a little bit about the history of this and then we'll have these other guys We'll tag team off in terms of telling you about where load balancer is So as you notice here this picture is actually pretty special to me because this this this is the Tacoma Narrows Bridge as it was in the middle of the 20th century and this is about 10 miles away from where I live and the interesting thing about this bridge was If you if you didn't know what happened with it after it was shortly after it was built They found that when there were high winds the bridge started doing this sort of thing to the point where the it got a reputation for swinging in the wind and And they started calling the bridge galloping Gertie well on one particularly nasty day The the oscillation went from this to this and eventually the stress became too much for the bridge and it broke right in the middle And the result is what you see in this picture The important things to note about this is that it wasn't a construction problem or a usage problem It was a well-built bridge not much different than any of the other ones They had out in the day that suspension bridge and they weren't overloading it with anything It was just that there was a serious design flaw with the bridge and it as a result It's become an icon in all of engineering for bad design. So now that I've fixed that in your mind. Let's talk about Elbas version one Okay, so Elbas version one It did actually provide a lot of really good things Specifically it accomplished a task of providing load balancing Sort of important And you had a few features here that it did pretty well it does it did I should say it does still because it's still there But we have persistency cooking insertion and a driver interface for third-party drivers So vendors who made load balancing appliances could plug into Neutron Elbas and and sell their stuff However, it had some serious problems and all of them when you look at these they all boil down to the model The model was the problem. In other words, it was a design flaw So they weren't following industry standards when it came to the terminology or concepts and used for example in Elbas version one There's this thing called a VIP Well when you use the word VIP in any other context and information technology anywhere else in the industry other than Elbas version one It has a specific meaning and people pretty much know what you mean in Elbas version one. It doesn't mean that and so because of this We were barely able to deliver what we would call an industry feature set you could do load balancing But really not much more than just splitting traffic between various back-end servers It wasn't following what other vendors were doing as well as far as is what what load balancers are supposed to deliver There were improvements made to it But all of those were were difficult hacks and so from the inception of Elbas version one for two years There were really no new features added Again people wrote them, but they were never incorporated because they were really dirty hacks because you they were working around the problem It wasn't the fault of the developers was a fault of the model that they started with And as a result of this it was not scalable. In fact, nothing about it was scalable unless you Ripped out significant portions of it wrote your own schedule. It did whatever and then you weren't really using Elbas version one anymore You were using a hacked workaround The problem with this though is that people actually started using it so The as I like to put it the the tenant API was a dead-end It was polluting user mind space because people were expecting load balancers to work like that and they really shouldn't work like that And on top of that there were no cloud operating controls So that's the old and bad and one of the things I want to make very clear to everyone here is that you should not be Using Elbas version one anymore. It is now deprecated. So you should be using Elbas version two in fact Okay. Oh, yeah, right here. So this by the way this new bridge. This is actually the Tacoma Narrows bridge This is they rebuilt it, but they figured out what the design flaw was and corrected it and you'll notice there's even like some of the struts here are Are the same struts that were on the original Tacoma Narrows bridge They reuse the parts that made sense, but they replaced the parts that didn't and as it added bonus They made it horizontally scalable So Elbas version one versus Elbas version two So this might look very very subtle to you where we've taken here that there's not a whole lot of difference between these two models But what we've done is we've taken the VIP and we split it out into a load balancer object which contains the IP address and listener objects and You know that might seem like a tiny little thing from the outset if you if you've not worked with load balancers before It seems pretty small, but it turns out it's it's all the difference in the world Because of this we can now more easily offer features like transport layer security We have a way to plug it right into the listeners and to use make it, you know, SNI use SNI compliance right out of the gate and This one and anyway, these guys will talk more about what are the other advanced features are that we're going to be adding with Elbas version two and So anyway, right now these are the drivers that are available for the two versions and they're not compatible So if you are a vendor and your name is in Elbas version one But not an Elbas version two you need to write an Elbas version two driver because it's if they're not compatible And we're not carrying it forward for you. So there you go And from here I have no special story about this picture. It's just the Liberty Bell So what we've done in Liberty what we released so new to Elbas v2 is now not experimental It's a part of the full release. We did leverage the experimental tag Before this so that we can make a backwards incompatible change and that was on the listeners Default TLS container. We changed that from ID to ref With that the new turn Elbas v1 is deprecated as Steven just said so like he said if You're using v1 or plan on riding a driver stop. Yeah, don't use it anymore. Yeah, don't touch it We so one of the big things we did and Liberty was we made Octavia the reference implementation Before this it was the same Driver the namespace agent driver that v1 had we just rewrote it for v2 But now we replaced that with Octavia that that old namespace agent driver is still available if you want to use it But Octavia is going to be the reference implementation And it took a lot of work to get Octavia feature parity with that So that's kind of what we focused on a lot in Liberty But it found we finally get it done During that time we did actually do a lot of work on the v2 horizon dashboard Because one thing that's missing with v2 is horizon integration That's gonna be coming in Mataka, but it didn't make the cut for Liberty same thing with l7 content switching It was a lot of work done, but it didn't make the cut So I'm gonna let Franklin talk about testing real quick because that was a big work done for Liberty. Yes, so we Started with a handful of tests starting in kilo Most of it was unit tests and some tempest tests so for Liberty we expanded on that with a Test plan we had a hackathon actually at rack space where we created initial functional tests and created a test strategy on top of Our initial work and we've iterated over that over several months with several Companies so With I'll go over some of our functional tests so We have 100% positive API test coverage on with Delabast 2.0 We have substantial number of negative tests so an example of a negative test would be passing in a Attribute such as a foo For a protocol and expecting So an invalid protocol for example would be a negative test would be a negative negative test for example, for example Yes, and positive tests would be just creating a load balancer with a valid name for example, and we have Yeah, several tests like that So we expanded upon that by creating data-driven tests using test scenarios and tempest and this allowed us to have like hundreds of permutations, especially around Admin state up these Boolean values that we would pass in for each of the entities this allowed us From creating 10,000 lines of code to like in a couple hundred and we were able to uncover like Very a lot of critical bugs from that. So there's still some more in there. Yeah, so there Hence here's a one of them. Well, I posted this this code, but that's a truth fable from one of our tests around create listener now We've been working on scenario tests a scenario tests for example would be creating Well, this involved the whole the whole stack so everything from Nova Glance and the networking part so an example of a scenario test would be spinning up to to servers creating a load balancer passing in traffic via that load balancer verifying that the The traffic gets load balanced between the two nodes and verifying that the algorithm is correct Which was chosen. So Right now we have TLS tests in review as well as session persistence So and we plan to expand more on that So I'll hand it back to Brandon. Thank you for it. So what have we got planned for mataka? So why as I alluded to before l7 content switching a lot of the work has been done One of the big what if you don't know what l7 content switching is it's put it simply it's a way to tell the load balancer how to send traffic to many different pools based on the information that's in the traffic so for example if you have a website with the Resource slash elbas So you want to have to go to a different pool and the rest of the default traffic goes to another pool That's not slash elbas and the URL or like flash images for like a static content server or something like that It can go to a static content pool. Yeah, and the rest can go to it So another thing is pool sharing this actually help out l7 But currently without we don't have pool sharing if you want to say have two listeners Go to the same pool. So traffic on port 80 and traffic on port 443 go to the same pool You now have to create two different pools with the same information It's become the usability issue because whenever there's somebody wants to change the pool change the member information They have to do it in two places So with pool sharing they would just create the pool once And then they would just point have a reference to that pull and point to the other listener without creating it And whenever they change the pool once it changes it for both this also helps l7 content switching because the Content switching will go to a separate pool and they kind of are independent pool the pools are independent So therefore you can you can create a pool with a listener the current model right now A pool is tied hard to a listener and and this improvement is going to make it so that they're Slightly detached to the point where it makes l7 way easier to do just So another thing that we've been want to do is a single API your request to create a low balancer right now You have to go through these steps here one two three four each one of them is a separate API request to create a fully Functioning a low balancer that serves traffic and low balancers the traffic. So that seems a little cumbersome It doesn't help me horizon has to go through the same process. It it's kind of a it's like I said it's cumbersome So with a single create API request and it takes an entire low balancer tree in one request and it Creates all the it sends it all to the driver. So you get all the network configuration up front So the drivers can make more intelligent decisions about how to allocate resources and as I said before it's easier for horizon So the flavor framework for neutral and advanced services not just the Elbas thing, but we're going to be utilizing it so The basic concept is operators can set us a define different tiers of low balancer types for example example gold silver bronze and they can point each one of those to a specific driver like say a gold would be a hardware implementation and a bronze would be The namespace driver that we have Also it can all you can also split it up by different functionality on the back end So say you your gold is also an HA and your bronze is a non-NHA low balancer All right If you remember back on the slide where we talked about the two different sets of drivers for v1 and v2 If you looked closely on the v2 side you saw Octavia was in that list of drivers So Octavia is a driver much like the vendor drivers. It plugs into Neutron Elbas as a Back-end driver, so I'm going to talk through that architecture a little bit Maybe There's the slide advances So up on the the corner, you'll see we have Neutron and then Neutron Elbas version 2 and there's an Octavia driver that plugs into that That accepts your Neutron requests your Neutron CLI or API requests and passes those on to the Octavia API Component so each of the orange boxes that have a gear in them Those are actually processes that make up the Octavia driver that control controller. Well, yeah. Yeah Controller as part of the driver So as you can see the Octavia API uses Oslo messaging to send control create and update Commands into what we call the Octavia controller and it's the the main Pieces there across the top the Octavia worker health manager and housekeeping so moving on the Octavia worker is the component that actually does all of the automation and provisioning of what we call M4 In the current implementation our M4 service VMs So they're booted via Nova and that's where HAProxy and our actual load balancing occurs Those M4 are in cloud workloads. And so as opposed to the old reference driver implementation the namespace implementation that was running HAProxy on your network nodes Or I guess you could put on your compute nodes. Most people didn't yeah But in this case we're booting up service VMs that Run HAProxy and so basically you can scale that as you scale your compute environment Moving across the health manager this component Has two major functions. The first is it received the heartbeat and status messages from those in fora And then based on that information makes a determination if the M4 is healthy or not and if the M4 has failed in any way, you know HAProxy crashes or Or That M4 is completely disappeared somebody didn't know the delete shouldn't we go into a failover and so in this initial Reference of mutation release. We have what we call hot spare failover. And so you can have a spares pool of them for a booted up They just aren't configured Sitting in your compute environment and should one of the primary M4 a fail We will take that out of the spares pool configure it and bring back Your load balancer into function That process is a little lengthy It can be a minute, you know to get that configured and get all the networks plugged into that in forum And so I'll show you a demo here shortly where we're doing active standby And that's a close to a second failover Between in forum and this this will it hasn't landed in Liberty, but it will plan probably the next few weeks Yeah, we're trying to be talk of definitely talk of one I'm going across again. So housekeeping manager This is another process that kind of does background jobs on behalf of Octavia So it does database cleanup. So when we delete load balancers, we keep those records for a short period of time It's a configurable setting And then we'll go through and do a cleanup process there if you are using the spares pool So if you've configured that to be more than zero This is the process that actually goes to Nova and make sure those are booted up and running And then coming soon It will also do certificate rotation So the m4a and I'll talk about that here in a minute with the m4a driver has to command and control Methods one is a rest API that runs on the m4a and that has certificate protection So we do a two-way SSL validation on those m4a and so coming in mataka will do automatic certificate rotation on the m4a As part of that housekeeping process So you'll notice in blue. We've got a number of drivers. This is one of the components of Octavia's we've tried to make everything modular and Driver-based models so that people have flexibility to adapt it to their environment So right now the controller worker driver runs Open-stack task flow to do the provisioning So it has all of the flows and sequences for booting up Nova doing the network plugging etc We actually have another controller worker driver in progress that will help us work with containers There's a problem with containers right now We do a lot of hot plugging of networks in Octavia. So when you add a new member To your back end pool for load balancing if that members on a different subnet We will hot plug that other subnet into your load balancer as needed but with containers We can't do that hot plug. We actually have to create a new container and move to that to have all those additional networks available So there is actually a another controller driver in progress So the controller worker driver has a number of drivers itself one of those is the m4 driver. I mentioned earlier There's an SSH driver which controls the m4 of the SSH. We're actually looking to deprecate that And then there's the rest API driver. I mentioned for certificates Again, I mentioned In this case it's TLS certificates So we do TLS offloading in the m4 so we connect to Barbican as our secure store for those certificates and keys So there's a driver that interfaces to Barbican. We have a compute driver that interfaces to Nova You know so spin up those service VMs Again as we go into containers that may change to other implementations that will yeah, yeah, and The network driver which of course interfaces to neutron for plugging our networks So if you were in Vancouver, you saw this roadmap almost Identical we've added I think one thing in the metaka time frame But we did release Octavia 0.5. It's on pi pi. You can pip install Octavia now and we did Reach our reference implementation feature parody We did the service virtual machines and the spares pool failover capability For 1.0 metaka we we hope to meet metaka with all these features Many of these are already in progress. So active standby again. It's completely coded. It's up for review We're working on bugs and I'm going to demo that next High availability control plane. This is having multiple controllers So right now the current implementation Is set up for one controller running that stack of software. I just showed you But we want to have that at high availability as well where you'll have multiple controllers running in your different AZs or however you do your HA The layer seven rules we talked about earlier Container support. I mentioned as well as in progress and then of course the flavor framework Looking forward to 2.0. We also want to go one step farther. So we have the hot spare failover today We're going to do active standby, which is second failover We're looking at active active. So that's having many m4a behind your load balancer all able to handle your traffic So being able to do horizontal scaling as well as your high availability Yeah horizontal scaling of the service delivery itself of the load bouncing service delivery Yeah, we have people working on that too. So it might actually land earlier than 2.0 But we'll see I don't know we'll see I was working on that Okay, so I have two demos. I'm going to show you have active standby the first one is Just a pure round Robin load balancer So the first thing I'm going to do here is a nova list. So I have two web servers booted up the ANOVA Those are the back end so that we can show that it's doing something. Yeah So the next thing we're going to do is curl those you can see it responds It gives its IP address and then a connection count. So you can see the first hits to those web servers our connection zero Every 10 seconds and as we go through the demo, you'll see those counters increment every time the connections created All right, so we're creating our load balancer and we're putting the VIP on the private subnet in this case You can see it's in pinning create state. So right now Octavia in the background is booting up that VM via nova and We're cycling on the API waiting for it to go active At the bottom you can notice the the VIP IP address. We'll use that in a minute to actually hit this load balancer and query those web servers So as you can see nova does take a little bit of time. It's about 40 seconds to boot up that service VM. That's one reason you might want to have a spares pool It makes the provisioning much faster All right, so we're active. I did a load balancer list here. So we can see we've got a VIP and it's active in the upper box window. I've Started the backup script here. All that's doing is looking at the sys log on One of the two M4 we just booted in the active standby. So that's going to be the Standby M4 sys log. So what you'll see is when we do the failover that's going to pop up a message and say I am now master In this environment. So we've created a pool now And it's pure round robin you'll notice the session persistence is blank here So we're not doing session persistence for any incoming requests in this particular demo So what we should see is the web servers alternating for each request We're adding our first member. So that's the first back-end web server And we're adding the second one in the upper box. You can see I've started the sys log on the master now It's in master state and it's waiting for me to hit a key to actually shut down that M4 So trigger the failure Okay, so I've started a curl in a loop here You can see the connections alternating between the two web servers and the connection count incrementing So now I'm going to hit enter and stop that M4 If you blinked you missed it, but we just failed over to the backup secondary M4 You'll notice that the connection sequences are still intact and we're still alternating between those two back-end web servers So that's how fast failover happens in active standby The next demo Go ahead and move on This one we're adding session persistence. So failing over M4. That's great. That's not too hard everybody does that right? But the trick is now we're persisting those connections such that the client always goes back to the same back-end server So in this case, I'm going to use source IP persistence So you'll see down the bottom window. It's basically running the same script. The only difference is we're going to turn on session persistence You'll see in the bottom window that when we go through the load balancer It's always going to have the same web server responding Because that's the same client always hitting it. We're maintaining that session persistence to the same back-end web server So the same thing will happen here. We'll shut down the primary and for It'll fail over and you'll see that the session persistence is maintained For that client. They won't know that that failover occurred. They're still going to the same back-end server So once again, we're booting up the two M4 here as part of the load balancer create Still takes a minute. Yep. I will note if you're doing this on dev stack or you're doing this in virtual machines You do want to make sure that you have nested virtualization enabled. So you're exposing virtualization acceleration to your environment that's running the controller because Nova like in dev stack uses QMU and It will use TCG Software emulation to boot that VM and if you're running in that environment It takes five to eight minutes to boot a VM with Nova on that hardware without the virtualization enabled you enable the virtualization It's Under a minute to boot those VMs So some people will see it takes a really long time for your M4 to come up It's probably you don't have the nested virtualization enabled Okay, we've got our listener so we're going to create our pool and now you'll see we have session persistence enabled We're not using a cookie here, but we're using source IP as our session persistence And you want to watch here when it stops up in the upper right how quickly it takes over here on the upper left Yeah, and you'll see just a slight pause in that client request So we've added our two members Next thing we're going to do is start our query loop as our client So again, we're using session persistence. So it's always going to the same back-end web server I just killed the M4 We're gelled over As you can see the session persistence Makes that client continue to go to the same back-end web server So that's active standby again. We're going to try to make that land in M1 and taco one milestone So as I mentioned you can run Octavia yourself in dev stack You can also install Octavia from Pipe with pip You use the normal neutron client to configure that and create those load balancers There's also a vagrant script in the samples directory for dev stack there Lots of pictures. Oh, wait a minute before we go into the next slide So we are looking for contributors. We're looking for feedback So please if you're interested in load balancing or looking for a project to contribute to Or if you want to see some of these features land more quickly. Yeah, you go throw engineers at us, please So we have an IRC channel on on free node. There's people there Pretty much all the time because there's some people that like to stay up late Work all night long. I get my best way to do it. We also have meetings on Wednesdays at 8 p.m. UTC So you can join those ask questions Secure and status, etc So now I'd like to introduce a new project that we're starting up called cosmos This is a collaboration between the designate team and the neutron L bass team And this is global server load balancing. So this is kind of like load balancing your load balancers It'll set up a DNS name that will direct incoming requests to Multiple back-end endpoints and those endpoints could be at different data centers Could be different regions and eventually we'll add the capability to do geo Routing for incoming requests. So if a user from North America makes a request in They would get directed to a data center in North America Someone from Asia Pacific would go to an Asia Pacific data center Unless that data center is down Well, yeah That's that's the piece of cosmos is monitoring those endpoints and Being able to direct the traffic appropriately. So we're just spinning this up. We have specs out. There really isn't much code yet We're just getting started. But this is a an architecture diagram. I borrowed from Graham's architecture document So we have an API, of course everything in OpenStack has an API conductor which manages our our database access and incoming API requests The cosmos status check. This is one of the key pieces This is the the piece that goes out and monitors those back-end endpoints to determine which data centers are healthy and what their Low levels are so in the initial release down here at the bottom, you'll see endpoints We're targeting LBAS version 2 to be that initial endpoint that we're going to support and we can use the status API in LBAS v2 to see all the way down into that load balancer and even how many back-end nodes are currently healthy So we can make intelligent decisions about waiting a data center may more heavily than others based on that information So that's the the key piece of that health check and making sure that we're sending traffic to healthy endpoints on the back end The cosmos engine is kind of that business logic. It's taking in that status information. It's taking in user requests and Taking the appropriate action based on that inserting or removing servers from DNS records and then of course the GSLB appliance is the piece that's actually responding to user requests for those Resources and the initial reference implementation is going to be designee for that We're also looking for contributors here We are just getting started. We're a pretty small team at the moment So please if you're interested in global server load balancing Join us again. We have an IRC channel and weekly meetings That you can come and join and ask questions So with that I'd like to open it up for any questions you have from the audience Start over here Can you share those scripts? Thank you My question is related to performance numbers. So maybe you have some gathered some performance statistics on comparing direct access to VM versus access to VR H8 in VR. Yeah, sorry, sorry for H8 proxy. Yeah, I love balancing. Yeah, you want to talk about that Yeah, I can I can Yeah, too hot over there. Yeah, I can talk to that a little bit I mean performance numbers are really interesting particularly when you have service VMs because it's really dependent on the infrastructure that you're running Right. So what you're hosting those VMs in and the servers that underlie them We did do a comparison from the namespace driver that we had previously as the reference and Octavia and it was within a couple percent difference in number of connections per second that they can handle It should be also noted though that even you know once we move to Basically when we can use containers as an infrastructure, we expect it to be essentially no difference between them and The added bonus that Octavia gets you is this ability once once active active lands You're gonna be able to scale your service delivery horizontally. So at that point It's gonna far exceed anything the namespace driver could have done. Correct. Oh, yeah, absolutely scale better Good question. Thank you Okay Yeah, sorry, sorry, I so in the heat There is an auto scaling feature or something related to load balancing Well, in in those ages when we developed, so I'm one of developer of this old bridge so in those times we had some Heat was able to configure its old load balancer to use for auto scaling and to use for as a pool of instances How is it now? So is it able to use this new load balancer version 2 to do the same? What about this area? So, yes Well, actually we have somebody from IBM who's working on updating heat support to use LBAS version 2 So that's work in progress. Literally. I was looking at patches this morning. So it's it's very close Any other questions? It's very a way to modify the traffic before sending it to the pool for example So what you're talking about doing there is That that would be part of layer 7 the layer 7 support that we have right now is for context switching So it doesn't actually modify the traffic. However, it is actually Nobody has really pushed hard for that, but it's the same line of thinking So for example, if you wanted to insert a cookie or something like that at the load balancer layer to before you shove it on the back end if if you want that done Come to our meetings and and say that that's a feature you want added because it's actually not much more work to do that Once we have layer 7 switching done. So right now the short answer is no We don't do that just because nobody's come to us ask for it, but we see it's that's what obviously will fit Yeah, exactly. What about x4 to 4 headers? Is that oh, yeah, so yeah, okay There's there's some basic ones that do get added every time That it for example is the x4 to 4 header so you can tell for example what the client's real IP was who was at talking to the load balancer and When tls termination happens, I believe there's a x4 protocol or something like that where it says well the the client talked to us using HTTPS so on the back end even if you're talking so when you do tls termination you have the choice of Terminating the the tls session with the client at the load balancer And then you have the choice in the back end whether you want to talk HTTPS to the back-end servers Most people tend not to they tend to talk to straight HTTP to the back end because they trust the internal network for some reason anyway But the idea there is your application still needs to know was that was that client request over a secure connection or not And that's what you would look for in the headers is the x4 to protocol header Yes Okay, so in those ancient ages so there were a requirement for Wenders to have for own CIs to run tests against their drivers So is it the same for version 2 and are you insisting on having these CIs for vendors just to make sure that their drivers work? So see I for the drivers. Yeah, we're not insisting on it, but it's a good idea Yeah, yeah, I'm actually asking because before nature and split out on this all the services It was like a requirement like strict requirement. Yeah, but now it looks like Make but you may not We may get more strict about that as we go along Yeah, that's I mean there's there's there's voices that want to split drivers out from the yeah from the LBAS version 2 code tree Entirely similar to neutrons when the plug-ins. Yeah, exactly And that's that we're very likely we're gonna push in that direction So just be aware that doesn't mean you can't have your third-party CI and even get it to its voting But you know, I don't know if it's very likely now. Yeah after today, so you push hard I just don't just say that all right. Thank you. We are out of time