 Okay, so it's 5.30, it looks like it's about time to start. So quick introduction, my name is Paul Matthews, I'm a system architect at Bluehost. And first off, let's do a quick survey. So how many people here use RabbitMQ for their messaging system? Okay, Cupid? Okay, anybody using 0MQ? Great, you guys don't really count because you're with us. So today we're going to talk about going brokerless, the basically our transition from a Cupid-based system to a brokerless 0MQ-based system. Now, first off, RPC messaging, as most people know, is a critical component within any single open-stag installation. The problem you're going to run into is very quickly, if your messaging system's not stable, if it doesn't work, that doesn't scale with you as you continue to grow, you're going to run into major issues with the rest of your open-stag installation. The reason why is because many of the core systems that open-stag utilizes depend upon messaging to be reliable and to work at all times. If we look at some of the systems, for example, Compute, open-stag Compute, uses messaging, if you can't do, if you can't get reliable messaging across, you can't start instances, stop instances, reboot, anything like that. Solometer uses it, cells use it, in fact cells, that is the only messaging system that you can use in our cells. BNC, Neutron, Conductor, and if our messaging system goes away, then all these systems cease to work properly for us. And I know it's not just us that have RPC issues. For example, I was at a Rackspace presentation yesterday where they were talking about their issues with monitoring systems. They said one of their major systems that they had falling over was RabbitMQ and how they dealt with it. So for us, we went with a Cupid solution in the first place. Now a lot of people are going to say, well, heck, why did you use Cupid? It seems like most people use RabbitMQ. Well, for us, we use SinoS and Red Hat based, which means that as a result, Red Hat packages up a lot of stuff for us that we just benefit from. One of these things is of course Cupid and because they package it up for us, we have a high degree of certainty that they're going to be stable, that they're going to be performance tested, that the packages are going to work properly, that they're gonna have security patches backported for us. And that's a benefit that we can use. And a second reason is that Cupid offered the possibility of horizontal scaling. At the time, in earlier versions of Cupid, they had a clustering module where you can have multiple Cupid servers acting in a highly available fashion. So you could have two or more servers that are highly available and you could query any one of them and be able to get a response. Unfortunately, this feature was removed in point one of Cupid and is no longer available. So for us, our Cupid experience, first off, we started with a single instance. And a single instance started to be problematic for us very quickly. We found out that around 3,000 to 5,000 nodes, it started presenting problems for us. Now a lot of people are gonna say, well, wait a second. We have thousands of nodes. Rackspace has tens of thousands of nodes. They're doing it just fine. Well, they're using cells. We have a single cell implementation. So we're not using cells to break anything up. And what we found was is that that single instance of Cupid was not able to scale. We would have lots of problems where messages would fail to go through. Messages would be slow and it caused major problems for us. Another issue that we found is that Cupid connections to Nova Compute were lost. So the only thing that we could actually do to get this to fix itself was to restart the entire Nova Compute instance. And it seemed to be something with Eventlet where the message was not being picked up, processed properly. We never really investigated it very much at that time because we're, at that time, we're already moving on to zero MQ. So a lot of people are gonna say, well, hang on a second. You guys use Cupid. A lot of people use RabbitMQ and have great success with it. Why didn't you go with RabbitMQ in the first place? Well, when we looked at RabbitMQ, we saw that it had the same broker design as Cupid does, which kind of scared us away a little bit because the broker model has a lot of drawbacks. You have a single point of failure. You have a single point where all your messages have to go through, and as a result, that one piece has to be able to scale. And we'd also talk to other members in the community who reported many of the same problems with their own RabbitMQ installation, and as a result, we looked at other things. Now, there are a couple of possible solutions that the community has to scale a broker. First off is cells, and the second is using clustering or availability. Now, first off is cells. The biggest problem with cells, the one biggest problem is they do not directly address any of the performance issues with anything. Scheduler, MySQL, messaging, nothing. All they do is they are, it breaks up your installation into smaller pieces. And also cells have a major issue in that if you're using cells, if I have an instance in this one child cell, let's say I wanna move it over to this one. Can't do that with cells. And we did not feel that's, we didn't wanna be bootstrapped by limitations that cells had. And one of the other things is because of the way that brokers are implemented, where they pass messages from one down to the other, they change services, it can magnify problems as they occur. So for instance, if we have this child cell, and we send a message from our API cell down to this grandchild cell, if any issue happens here on our child cell, the grandchild cell is also gone as well. We can't get any messages through to them. Now on clustering, as I mentioned, we started out with a single Cupid instance. We moved on to a clustered solution. What we found is clustering is extremely slow, extremely unreliable. As you actually add nodes, you actually lose performance. You don't gain it. The second thing is is we, actually using a clustered solution with Cupid, we actually saw more problems than we did with a single instance. Lots of times we would look at the instance during the day. We'd have messaging problems. You'd go look at the cluster and what would be happening is one of the clusters had fallen out, was re-syncing for whatever reason it had problems. And during that time, messages are not reliably passed from one machine to the other. They're not reliably responded to or not responded to at all. Another thing was is that we could never get, if we added a node, if we ever tried to add a node into the cluster while it was actually in operation, that one node would never, ever sync up. Now if we added, so basically what we had in that solution, in that instance, is we had to take an outage late at night, take the entire cluster down and bring the entire cluster back up so it would actually sync while we turned off, keep its messaging services. Now as far as RabbitMQ goes with clustering, it does have a highly available solution, but the problem is is it's extremely complicated. You have to have shared storage, something like DRDB, pacemaker, heartbeat, core async to make sure that everything works properly. There's a lot of moving pieces that can cause problems. So not only do you have to make sure your shared storage works, but if everything is working properly, you also have to make sure that when you fail from one node to the other, does pacemaker or whatever solution you implement, does it actually fail over gracefully and not break? And the bottom line is, is we found that scaling a broker is not actually a practical solution. So what do you do? Well, we decided to throw out the broker. Brokers don't actually scale. They're a single point of failure that everything has to pass through. And so we needed a new solution that would work for us and we had a couple of requirements. First off, we had to have no single point of failure. We didn't have to be dependent. We didn't want to be dependent on a single cluster of nodes, a single service for our entire open stack installation to work. We needed the solution we implemented to be something that was horizontally scalable. So as we continued to grow and scale our installation, it would continue to grow and scale with us. And it also had to be reliable for us. So we moved on, as I mentioned earlier, we moved to zero MQ. And zero MQ has no centralized broker. It's more of a peer to peer distributed system. Basically what you're essentially doing is you're moving your installation from a single broker on a single box to brokers on every single machine. Every single machine at that point gets a zero MQ receiver which will actually receive the messages that are passed to it from other nodes. Now a lot of people are gonna say, hang on a second. This doesn't sound like a good solution. I don't wanna have to have all of my nodes knowing where everything is. I like the idea of having just one instance where everything goes to. So I just have to keep it host. Well, zero MQ handles this very gracefully. It has a matchmaker file which I have a little example of here. If you look, we have two queues. We have the scheduler and console off and then a couple of hosts that are defined. And the matchmaker is able to pass through if you get a request for the scheduler. It passes it through the one of those hosts. So as far as messaging topologies goes, if we look at Cupid or Rabbit MQ, basically a broker based solution, you have a start topology where you have a broker in the center and all of your nodes have to pass messages through it. So not only do you have a single point of failure but also you have an issue where any messages that have to go through have to pass through your broker. Zero MQ on the other hand is a partially connected mesh. Now I say partially connected mesh because nodes only communicate with the other endpoints that they're going to talk to. So for instance, a compute node in the current implementation can't pass a message to another compute node then off to the API or whatever else. Now this could be easily implemented because of the way that zero MQ is written but as it is a partially connected mesh. Now as I mentioned, zero MQ we could program this in. Zero MQ has a lot of flexibility inherent in its design. Brokers on one hand have a very inflexible design. You have exchanges, queues, fan outs, subscriptions and there's not a whole lot you can do with that. If you look at zero MQ on the other hand you have four simple methods. Connect, bind, send or receive and with those four methods we can therefore basically implement any kind of messaging topology that we want and because zero MQ is so simple it uses lightweight sockets. The resource utilization is extremely low and as a result we're also able to pass a lot more messages through at any time. So I did a little benchmark testing at zero MQ versus Cupid and Rabbit on a single core VM and as you can see this is the number of casts per second. More here is better and as you can see zero MQ I guess it's not very well shown here but zero MQ is almost three times as fast as rabbit MQ. Now another thing to keep in mind here is rabbit MQ and Cupid are because they're brokered solutions. This is the maximum number of messages that they're going to pass through the system at any given time. Zero MQ on the other hand you're passing messages from host to host. So that's the maximum amount of casts you could pass from one host to another but that is not the capacity of your entire system because it's distributed. Now people are gonna say great zero MQ sounds wonderful how do I use it? Well all you really have to do is first off obviously change your RPC backend to use zero MQ. The next thing you need to do is zero MQ has several different matchmaker capabilities that it has. In most cases we're gonna wanna use matchmaker ring. The reason why is because matchmaker ring it actually so for example in this matchmaker file same one as we used before sets up a hash relationship between the Q and the machines defined in it. So if we have a message that passes to the scheduler the matchmaker file will return one of these hosts. Once that's configured all we have to do is start the ZMQ receiver restart Nova compute or whatever services on the controller and we're good to go. Now as far as a migration process goes we're kind of bootstrapped because of Nova's configuration. Because we can only have a single backend at any given time unless we have some way of breaking up our installation unless we have some kind of logical division there's not a good way that we can change from one RPC system to another. As a result we needed a new solution. As I mentioned before we moved from a single qubit cluster or from a single qubit instance to a qubit cluster. The way we did that is we set up another group of compute nodes and we would move nodes from one group of compute nodes to the next. Now obviously this is not an optimal solution. It means that it's very complicated for us to move nodes leaves a lot of room for corruption and it's very complex. We really have to make sure that everything goes through properly. And it's not something that we wanted to go through again. We needed something where we could well obviously at the time we had grown from about 5,000 nodes to tens of thousands of nodes. And in our particular case where we're a budget hosting provider these are all customer instances. Any impacts in messaging are noticed by them and are not good for our credibility. So we needed to find a way to migrate these instances from one messaging system to the next with little or no downtime. And we needed to be able to do this in a slow roll fashion so we could roll it to small number of machines at any given time, make sure that they were working and then expand that out as we gained confidence in it. So we started working on code looking in Nova and what we came up with is dual messaging backends. So what actually happens in this is that nodes can currently connect to both Cupid and RabbitMQ backends and can use either one depending on which messaging system they receive a message on. It means that we can roll out the code separate from the actual configuration so we can roll out the code and enable the configuration at a later time. And once we enable it either messaging backend system can be used. Now I can probably sit up here and talk to you about how this works for the entire 40 minutes and a lot of people would probably never catch on how it actually works. So I figured the best way to do it would be a more graphical representation. So let's say this here is a graphical representation of our OpenStack installation. So first thing we roll out all the code to all these machines. Next, we deploy our configuration to our controller nodes. We deploy that out, start the ZMQ receiver and restart the services on the controllers. Now what happens is a message comes in to go to Compute One. Now obviously Compute One as you can see is not running ZMQ, just the controller is. Now the way that we determine how we're going to broadcast this message via ZMQ or Cupid the first thing we do is we check to see if the ZMQ receiver is listening to an important 9501 of the compute node. We find out that it's not and so we broadcast that message via Cupid, no problem. So next thing, we decide we need to roll this out to some of our compute nodes. So we roll out the configuration, start the ZMQ receiver. Now a message comes in for the second compute node. The first thing the controller does is says, hey, is that receiver listening? In this case, yes he is and so we broadcast that message out. Now a lot of people are gonna say, well hang on a sec, I have a poor CMS system or let's say for whatever reason I want some of my controllers to run Cupid or Rabbit for whatever reason and the other ones to run ZMQ. Well that's no problem. So let's say we have another controller node here that has a broken configuration system or for whatever reason we want it to run Cupid. If it sends a message out to compute two which is running a ZMQ receiver that compute two will respond via Cupid. So whatever messaging system the message comes in on is which messaging system it will be responded to on. So the actual configuration of the dual backends is very simple. Basically the only configuration switch we use in the code is if the RPC backend is configured ZMQ then the dual backends are enabled. Now obviously this code could be prettied up and have a configuration option. It would be relatively easy to implement. It was just something simple for us to implement and so it's not totally clean but it can be cleaned up very easily. Anyway so the only other thing you have to besides implementing ZMQ is you have to make sure that your Cupid host name is there. It's not gonna affect anything. All it means is that the hosts actually have a back end to be able to connect to for their broker. And again once those nodes are switched over they'll respond to whichever message that whichever backend the message came in on. Now as far as our migration to ZMQ the dual backend code for us obviously meant we had minimal downtime. All we had to do was configure, change the configuration restart the services and we're up and going. The actual migration for us was very smooth. There weren't any major issues with rolling it out. We didn't have any major message outages. And a lot of people are gonna wonder about the connection checks between ZMQ and the remote nodes whether that's gonna cause any noticeable load. We didn't notice any. And honestly the first thing we did was we rolled out the code to the compute nodes so those compute nodes were doing ZMQ connection checks to every single node all the time. And we saw no noticeable issues. As far as ZMQ in production now that we're using it it's a lot better than Cupid or than Cupid was for us. It works wonderfully. It's more reliable for us. We're not losing messages all the time anymore. It's a lot faster and allows us to, it's allowed us to scale our open stack installation further than we were before. Now I wish that I could say that there were no issues with anything that we had dancing unicorns and rainbows in our data center but unfortunately I can't say that. There are still some lingering issues with moving to zero MQ. The one issue that we are seeing and fortunately it's very rare, it doesn't happen very often is that the ZMQ receiver receives a message, processes it but Nova actually doesn't consume it. Now in talking to some of the developers at the conference here we think we may have found the issue. It may be related to a bug that is known and has a fix for it but has never been pushed up to mainline. So hopefully that's fixed for us and that would be wonderful. We don't really know that yet though. So a lot of people are gonna be wanting to switch over to ZMQ possibly wanting to look at this code. So there it is. There's a link on GitHub. You can go ahead and use it and finally any questions. Do we notice any issues with the ZMQ driver? None that we have seen. I mean as I say it works for us. It works a lot better than KUPD ever did. It's a lot of the scalar installation and we haven't noticed any larger issues than there were in KUPD. Yes. Yeah, so the question is as far as the dual back end code after it's no longer needed can you switch back and answer is yes. Basically it's only something that you need during the transition period. Once you've switched over then you can just switch over to pure ZMQ and you're done. You don't need to keep it any longer or obviously you can retain it. I mean during our transition phase it didn't cause any issues. So I wouldn't see any issues with maintaining it in production but so how does it work? How does it work? How does it? Redis? Mm-hmm. So currently in the web there is a matchmaker. Mm-hmm. There is some of the installation issues. Mm-hmm. Now the question was with Redis and ZMQ. Now I know there is a Redis back end that you can use. We just used the matchmaker file which is just JSON. So no issues there. I can't really address that because we didn't use it. Okay, go ahead. Yes, go ahead. Which what? Which tool? Did we use for what? Basically I just monitored how many messages were being passed through. I mean nothing complex. Those were against Folsom. All right it looks like we're out of time so thank you very much.