 Hello good afternoon everyone So this is a really good crowd we were we were seeing that the numbers were going to be a couple hundred and we're glad that people are actually here So first of all, thank everyone for showing up. We're very excited to be here. We were in you know in preparations for this for this presentation for this Speech we were talking about a couple days ago. We're talking about what we really like about conferences and what we like about You know general tech conferences and one of the things we really like was that you know when you make a submission to to To do a presentation like this you if you get accepted one of the great things is that you know? It's a it's a justification especially in a big company like IBM It's a justification to come to a conference to to travel to to visit to to meet with people and But the interesting thing about open-sex summit is that it's a little bit different You know, it's a lot like normal tech conferences and that there's an opportunity to talk with partners with customers with See what the competition is doing But open-stack is really unique in that here the the sessions the technical sessions are really give you a good variety of information they give you lots of Lots of details lots of technical details if you're interested or lots of high-level architecture Overviews and discussions and beyond that you actually get to talk to the developers and so we're We're really excited about that. We like the opportunity to talk to see the design sessions to see all that And so we're we're very happy to be here and we were very pleased to be accepted here with our with our presentation What we so what we decided to do is when we're planning the proposal is think about what we liked best in These in these sessions We've been to we've been coming to these sessions for for a couple now and what really what we liked are the sessions that are the combination of technical and That give you details on how you do things but also give you an overview of What you did and what and what's going on with the topologies with the the thinking behind things are and so That's what we wanted to do. That's what we proposed and we're really happy got accepted So today's presentation is Titled a practical approach to deploying a highly available and optimal and optimally performing OpenStack and really the objective is the reason we're doing this is that at the heart of them of Everything that we've been doing in high availability is we've been taking all this information from the community On how to configure your high available highly available systems and how you set up Your policies and basically we wanted to give back because there's a couple there's There's some areas where there's some gaps between the documentation Available officially and the documentation that that that individuals have so we just wanted to you know At IBM we find it really important to give back And this is one way for us to give back from the the practical approach to high availability It's obviously not running an OpenStack. So that's the situation we have right now. Let's go this way guys You're gonna have to excuse us. We're gonna have to do it this way. Yes. Okay. Sorry about that So the the first thing we'll start with is we'll talk about Jeffrey will talk about our active passive HA environment that they set up in China After that he'll show us a quick demo Then we'll talk about Sean and I'll talk about the active active environment that we set up in the software environment Then finally we'll have Tony talked to us about HA orchestration with heat and chef After that, hopefully I didn't spend our question time right now You guys will be able to ask us a couple questions and like I said, we're really here to we want to give back So if you guys have any more questions after the mics have been shut off, we'll be outside and we'll gladly answer any questions Okay So let me hand off the this time to Jeffrey to talk about active passive Thanks, Samero. I'm going to take about 10 minutes and to give you a very and a brief Demo and we captured from one of our key customers in in China and and In the production and we have the high availability of the database and So our goal is a very simple We use my C crawl and a database and we want to keep the database a high Availability and to make sure the data in the database and the persistence and if we we lose and a one-off The database in the pair and we still want to keep the data and also the the IP address and we and maintain it And I win the database Damon and the failover and when the failure detected So this is a production and environment and We are that we just captured in a one week ago and before we came here Okay, first of all and I give up very very and a high level and architecture So under underneath and this is a one physical box. This is another one underneath And we have DRPD distributed a replicated a block a device and Which is in mirror mode? DRPD maintains DRPD maintains on the data on a synchronization automatically The Damon works on the raw disk so on top of that we have active passive standby my sequel database and install on each of the nose and On top of that we have coral sink and a to Maintain the heartbeat and send of the signal to make sure and another pair and a works and they know each other and pacemaker is a Process Trigger and a whenever on the coral sink figures that there is a failure it will trigger and a pacemaker to take the action and for the recovery and the recovery Contains and the IP Floating and the my sequel Damon on the recovery We have the two physical boxes and I said earlier and this is a The the host 10 and you can see the host name here This is a host of 20 and this this in one box that this is a host of 20 and I open The pacemaker monitoring and on the fly and to monitor the pacemakers and a coral sink and so we can see the master is some on Host of 20 and a slave is a host 10 so we have the file system of my sequel and which is offered by by d rbd and the IP address and the my sequel Damon and So we can from here we can see the my sequel Damon and is running on host of 20 and We check the d rbd and the status so we can see the primary and the secondary So they know each other and this part host 20 is of the primary and this part of host of 10 is a secondary so we can see and they know they are they talk to each other and Also, we see the IP address assigned to one of my sequel Damon 10 10 2 0 3 19 and We check the IP address here and it does not show up so and and after We make an emulated failure and the IP address and it will be an afloat it and from a host of 20 to a host 10 so I ran the The the the coral sink a command and the two and a make a host of 20 standby so you can see the status that changed and it switches and the the file system of my sequel over d rbd from And the host of 20 to host of 10 so you can see and everything and they changed it and the host of 20 is in standby mode and we check on my sequel Damon status so you can see it's a stopped and Host of 10 it is running also We look at all the d rbd and the status and so right now It's just in crashed mode and standby mode and it is it is Unknown and so this one to take the primary situation Then we make the backup online and Oh before we make everything back up and we Take a look at the IP IP address so before the IP address of my sequel Damon is on host of 20 And after we make the failure and we can see the IP address and a switched and from a host of 20 to host of 10 automatically So it's gone 203 19 works a host of 10 so right now we Make a host of 10 in standby mode and to force all the daemons and coming back from 10 to 20 So that is pretty much it and what I want to talk. All right, I give it back to you. I don't know well Thank you. Okay, so let's talk now about active active high availability Our goal really was we you know as we were working with environments and developing new environments We wanted to improve the stability the reliability and the scalability over the previous environments that we've been building and then we wanted to build a platform that was Reliable and strong enough to deploy to to deploy cloud foundry workloads If you saw yesterday our colleagues discuss the cloud foundry Deployment on top of open stack. So that was that was part two to this presentation And so cloud foundry care workloads have very unique characteristics the big one the one that we found that really stresses the Environments the most the open stack the most is that when it deploys initially it deploys about 30 40 VMs At once and with the previous environments that we had it was actually stressing open stack and we were having some Some issues that will detail a little bit later And so that's one thing you have to consider you also have to consider that cloud foundry uses Volumes a lot stores a lot of information on cinder. So we have to be careful of that There's a lot of network IO. It's very chatty in between and so you have to worry about the reliability of your network And then also there's a lot of API. It's a very heavy API open stack API user So you have to consider that as well And so we had a couple decisions to make with regards to our architecture The first one was do we want to scale up give it more hardware do want to scale out? And what we ended up doing is that we decided on on scale out? the combination of cloud foundry and open stack makes it very easy to to They're very complimentary in the fact that you can add hardware add resources and they both utilize it very well And this this gives you the ability to match the requirements that the workloads have And then the second big decision that we had to make was between Active passive or active active and you know with our experiences last year especially with With the previous releases we really decided that active active was a requirement. There's something what we needed to do so the the big advantages are obviously Availability but also the fact that you're the utilization is distributed between all your you all your servers And obviously since the utilization on the whole Per server goes down the response times get better and obviously your your failover time is improved So now Here's an architectural overview now if you if you've seen if you're familiar with high availability This is the general architecture that open stack community has Has pretty much defined right that we've agreed upon the the cloud controllers the The two cloud controllers that the data nodes the storage nodes everything the high availability model This is it the only thing that we've really Modified is we added a little bit some more load balancers and then some and we'll get into the details But the point was that this is we basically took the community Recommendations and then we've found some tweaks and some gaps and we've we've modified them a little bit So now Sean's gonna walk walk us through the rest of the architecture, okay? So when we're trying to build out Probably available and scalable open stack deployment. We went we need to ensure a couple things one that we have a stable and fast responding message queue because as you all know all the services talk over the bus so one of the things we looked at was comparing or looking at different Queue systems like you did but we end up going with rabbit MQ in this case rabbit MQ is Very simple to set up. It's clustering You simply go through these these few steps here. You basically pass cookie around to enable the trust between the nodes And then you basically tell the nodes to join the cluster So as you see here in the picture, we have three data nodes They're all running rabbit MQ And we'll show you it also runs our database as well But once those queues are set up and put into the cluster We define the ha policy to make sure the replication across those data nodes sync up all of its messages Rabbit MQ also comes with a pretty good Monitoring application that you can enable which helps really debug if you you know if you're having some issues Or you see some performance issues inside of your open-stack environment It really helps you kind of narrow down to figure out. Well, is it really backing up or queuing up all of these requests or You know, maybe is it a database problem or somewhere else within that environment the second bit or a point Service that's running on this is my sequel and for an active app active Replication model. We're using Galera to synchronize all the data across the persistent information We found it fairly straightforward to implement Galera On the box it works. However, we there was a note that we found online that you know, it says, you know There is that contention where you will run into some deadlocks and we actually ran into these a couple of times And so what happens is Typically when you when you're writing to one Database you could have multiple Services or endpoints updating that same row where it would cause this deadlock issue So the good thing about open-stack is if it does catch this error, it will retry and Reattempt that update. However, you do see a lot of warnings inside their logs or Your your service logs and that's something that we want to get rid of so We decided to kind of implement more of an active standby config which was shown in the next chart But some additional things that we wanted to do with my sequel because we're you know trying to scale up and ensure a performance Performance metrics support our cloud foundry deployment that we have to do some tweaking as well in the database So we looked at you know tweaking max our connection sizes our thread pools And bar for size etc so one of the fundamental things for getting to an ha config in open-stack is You know you need to ensure that you have some way of load balancing all of these requests that are coming in And we do this using ha proxy for a load balancer any load balancer typically will do this is one of the Recommended configurations that are supported in the open-stack ha guide. So we decided to try that out We found it to be fairly useful does provide some types of stats functionality where it can report a number of connections That are active and things like that which you know provides a great way to help also look into tuning your system as well We also found that in the configuration for ha proxy that Your session timeouts do matter depending on what type of service are connecting to it for data nodes like Your connections back to your queue and your database that these time session timeout values have to be a lot longer because All those services typically want to persist connections back to the database So that's something we have to tweak and but for most of the other services like Nova API keys and etc Most of the default values work fine So in conjunction with HAProxy, we're using Keep alive to manage the virtual IP or the VIP This basically helps redirect in case one of those load balancer go down To migrate that IP over to the secondary load balancer Just a quick note. We we use three pairs of HAProxy configurations this could all well just be put into one Load balancer, but we wanted to have a easier way to kind of help debug and look into you know Whether if it was there are some issues within our data nodes or our storage nodes or compute So this is just a high-level kind of picture of how we have our data nodes set up As I said, we have three of them all running my sequel rabbit with active active replication across All the nodes fronted by a pair of HAProxy Load balancers So going back to that previous point about the right lock contention There's two kind of two configurations that we saw fit for this. So the first one is a targeting a single primary so What we mean by an active passive config We're doing really active passive load at the load balancer level not at the database level So in the back there's all they're still active active replication But we're all targeting one primary node to write with and then the replication of that data gets pushed across So the way we do this is as you can see here a little snippet for the haproxy config We're all targeting the first my sequel node and then if that dies then Fell overs pushed to the secondary node and so on but with the previous approach We don't get full utilization of all our servers. So One method we found was being able to Configure the load balancers with different ports so we could configure ports to particular services. So for example on the first node we could have all of our Nova requests go to a single node Followed by maybe keystone requests going to a secondary and maybe all other services that are not as chatty going to a third node This provides us a way to kind of load balance all of those Requests but still have that replication across in the back For for the open stack services. So now that you have a you know a pretty solid back-end Foundation for all your messaging and persistent data Now you can really get started with your open stack services The one thing you need to do here is once you have those virtual IP set up for your load balancers But you need to register those as your target endpoints So what we see here is a picture of our services registered in Keystone with the virtual IPs that are sitting on the load balancers and This is critical because if you You know if you register the services up front with with a particular node Then you have to go in and reconfigure it because all of the services that make requests and the token exchanges rely on what's kind of stored inside of the service registry in Keystone for controller nodes. We have horizon Nova API and Keystone running Same configuration on both nodes So some of the things you need to do Configuring these in these nodes is one set up your my secret connections to point to the particular node and port that you want to write to Second you need to ensure that you enable HA for your Q services. So here we have In our config in double double.com we will have rabbit hosts with the three back-end nodes notice. We're not targeting the VIP here We're targeting each node There's code in Oslo that Basically provides the mechanism for failover in case one of those nodes die. It will migrate to the target requests the next node And finally enable the HAQs The last part of the configuration is Setting up the HA proxy configs. So here we're using a round robin load balancing with standard timeout values and We configure load balancers to point to our two physical IPs on the nodes For storage nodes, we have Cinder and Glance Backed all by running local storage with an array 10 configuration But the configuration for an HA config is pretty much the same as the other services Here we set up the SQL fig IP and port we set up the HA Replication and then finally the HA proxy config Here we decide to use a little band scene algorithm of source basically Setting all the source requests to the same node one thing we did find trying to run everything in active active configuration was The problem with the scheduler so both Nova and Cinder. So when we were testing this out And with cloud foundry being sort of the workload on top as Manuel mentioned There's a lot of requests that get pushed off at once and so when the schedulers pick up these requests What happens is the schedule will both pick up a request to deploy VM On all of our compute nodes and since they all kind of have the same perception of the you know The lay of the land they all end up targeting the same host So what we ended up seeing is a backlog of all these requests targeted on one particular compute node Which ended up slowing down a lot of the provisioning times because that particular host ended up getting backed up to actually get all these VMs provision so Workaround here is to actually configure these schedulers in an active passive configuration similar to what Jeff really was mentioning before leveraging pacemaker But with this config what we do is disable stoneth to ensure that Doesn't kill off all those services and that we want to ignore a quorum since there's only to a pair of these these nodes And then we also want to enable stickiness to prevent you know resource failback So in case one dies and it gets redirected we don't we want to ensure that it just keeps going to that one node because there's no there's no problem continued on that path And then finally for networking You know we all kind of heard some of these challenges with neutron and so we were in for this particular deployment we were running grizzly so What made sense to us was to leverage? Nova networking to use a multi host configuration which basically allows No single port of failure type of configuration where in case one knows dies It doesn't really affect any of these any of the other nodes that are could be hosting critical applications with neutron in grizzly There is no real active active configuration for the all three agents Which basically hosts your virtual gateway routers so To enable HA for Nova networking Basically, it's setting up this multi-host property in Nova comf and then We set a few more parameters to enable our for example And each of these nodes are end up running a Nova network Agent on that our network service along with the API metadata service and compute To summarize the RHA experience or active active HAP experience Some of the lessons learned here as we mentioned Nova and sender scheduler, you know can cause some issues So we kind of move to an active passive config our mysql Galera You do there are there are chances where you will run run into right locks So we can segregate or split them up to load balance across multiple nodes and still have active active replications in the back Another thing we found was out of the box infigrations when you're starting off They typically don't have all the all the config settings that you do need for high availability So you end up having to go back to Looking in in the open-stack wiki to see okay. What parameters are missing and then try to fill it in But that's what that's something we hope that could be improved in the future And in the grizzly release there was one slight Issue with the rabbit host config for load balancing if you noticed Typically what would happen is all of the requests would end up going to one primary rabbit config This this has been recently fixed where it would actually spread all the all the requests Throughout the the nodes that you have to find in that that pool So now I'll hand it over to Tony to talk about h.a. Orchestration with he then Jeff Thank you, Sean So I'm gonna talk a little bit about how we actually Deploy our our cloud Hopefully it could be could be some sort of inspiration to you guys So we have been talking about configurations configurations and configurations and So I think basically all of us agree that we need an installer We we had one of course But the issues we have them found Is that the installer is usually designed from a development or test perspective And when we are dealing with real production level environment It's usually not enough It doesn't configure the network topologies out of the box. I cannot handle high availability issues So we have been Investigating stuff we have been investigating heat we have been trying chef So with chef it basically manipulates a single node You have all the cookbooks you have some roles to have some attributes in the environment and heat is as to manage the whole deployment and It's really cool stuff that you can actually Abstract the whole environments with with some templates and With these stuff We came up with what we call deployment service So with this deployment service If you if you're gonna deploy and in cloud You're gonna need two phases The the under cloud and the over cloud We sort of Bored the concepts from from the triple O The the under cloud is sir is really an all-in-one open stack deployment Which we use to spawn the over clouds and The over cloud is it's actually Is the actual service to our customers Of course we we're gonna describe the over cloud with Chef with the heat templates For example, if you just want a very simple All-in-one open stack cloud you can just to write a very simple template However, if you're gonna if you're a big enterprise your your need you need some sort of high availability Capabilities you can write it a you know relatively complex one So it has a lot of possibilities I'm gonna explain a little bit how how the deployment service works So basically we're We have this templates actually I'm gonna walk you through this process with it with with this with this example This is what we actually ship with our product So here we have In the resources part we have to know The the control node and the standby node You may want to pay attention to to the metadata part where we can find this chef run list the roles And stuff also this one You're gonna have you know all these All these recipes Consolidated in the role for example, you have probably Nova API here also since it is highly available You're gonna also have another Nova API like in here You know you can you can configure your roles and What the deployment service does is that it will Oh It will actually pick the values from here and it will It will actually Apply these run list into into the resources So with chef, it's actually all about cookbooks roles and and the environment we have already you can you can actually download all these cookbooks from from the stack forge and Since you're you also have the roles what we're gonna need is is some some attributes in the environment Say Since since we have you you can't just hard code all these values for for all the customers We actually ship our our environments with some of the some of the placeholders like I can hear It says if you're how you how you're going to configure the log logging level Is it debug or not and here the network manager type? So The end users is gonna provide these the actual values from from the parameters part We have some sort of default values here, but of course you can overwrite those values when you're Trigger those commands from the from the command line And The output part we can actually see those values. Those are the exact Placeholders we we saw previously in the in the in the environment on those the deployment service will will actually pick those values from From the parameters we have configured and it will replace all the placeholders with the actual values and That's gonna of course do the actual deployment so with these stuff with the chef and heat stuff we can Simplify the cloud deployment while still keeping enough flexibilities so Okay, thank you. All right guys. We have four or five minutes. If you guys have any questions The microphones are there All right, and if not you can catch us up You can follow us on on Twitter or some of our accounts. There's some more technical sessions tomorrow or as you can see up here and That's it. So thank you very much and it will be around