 All right, cool. Thanks for coming out, guys. So this is a beginner level talk on how to tune your RabbitMQ at a large scale cloud. So this is just an overview of some of the lessons we've learned deploying OpenStack at Huawei. So in this talk, I'm going to focus mainly on highlighting some of the challenges we had deploying RabbitMQ and also show the performance changes that can happen when you deploy RabbitMQ in different deployment models. And lastly, I'm just going to highlight some techniques we use at Huawei to monitor and debug our RabbitMQ environment. So this is actually not me. My name is Gordon Chung. I'm a developer at Huawei in Canada. I primarily work on OpenStack upstream development. I focus on Oslo and telemetry projects. So I'm actually standing in for Wong, who is actually a software engineer at Huawei as well in Hangzhou. So he operates the OpenStack cloud for us. He has about three years of experience myself. I've been working on OpenStack upstream since 2012. Cool. So just a basic overview of RabbitMQ. If you're not familiar with it, it's pretty much like the foundation for everything in OpenStack. It's how all those services communicate with each other, whether it's launching an instance or handling metering. All the messages and all the communication mainly goes through RabbitMQ for the most part. There are other options, but I think generally in deployment, most people are using RabbitMQ. So it's a messaging service. It's a broker-based messaging service. So if you're not sure what that is, basically, when you send a message, it actually goes to a broker first. And that broker service dictates where the message will end up going or who the receiver will who actually receives that message. To interact with RabbitMQ, we use also messaging in OpenStack. Most of the projects in OpenStack use also messaging. There are a few exceptions, like Swift. But it basically provides a kind of direct API to handle message communication between projects. And it allows us to connect to not just RabbitMQ, but if you should choose. There's also different drivers, such as ZeroMQ and Kafka. But RabbitMQ is definitely the best supported driver out there right now. But there is a lot of work being done to make ZeroMQ and Kafka viable alternatives. But yeah, again, for us at Huawei and our FusionSphere product, we use RabbitMQ. And this talk will highlight just RabbitMQ. Cool. So the main topics I'm going to highlight today regarding problems is revolve around performance and reliability. As I mentioned earlier, when there's different deployment models you can leverage when deploying RabbitMQ. And it kind of depends on what your system is built like. But in general, when you deploy OpenStack, the more services you use, everything kind of increases exponentially from that. So when you have more services, you have more messages. And when you have more messages, there's just a lot of pressure being put on at RabbitMQ. And often when deploying OpenStack, RabbitMQ becomes kind of a failure point when one the first failure points that a lot of people run into. So regarding performance issues, a lot of them, I think a lot of people will run into issues with memory. So RabbitMQ, you can choose to store your messages into memory or disk. Generally, people use memory. And as you scale up your cloud environment, your memory requirements will also need to scale up with that. So say if you're booting like 1,000 instances at once, you're going to have a lot of messages not just between Nova services, but also just messages between Nova and Neutron, Nova Glance, and just secondary messages that you're probably not aware of. And as memory consumption or memory or messages kind of increase in your system, your message consumption has to kind of scale up with that. And if it doesn't, things degrade rather quickly. So if your queue actually gets larger and larger, the larger your queue is, the slower RabbitMQ will actually process your data. So it just kind of explodes really quickly if things start failing. So that's one of the things that I think a lot of people need to be aware of. And just because RabbitMQ is such a critical part of open-sac infrastructure, if it goes down, pretty much the entire cloud will go down because nothing can communicate with each other. So I think generally when people start out with RabbitMQ and just deploying open-sac in general, they'll start off with a single RabbitMQ cluster. So basically, you'll have all your open-sac services kind of communicating through one cluster. And if you think about it, when you're kind of deploying your open-sac, you'll have hundreds of compute services, hundreds of slumber agents metering those compute services, and also hundreds of neutron agents working to deploy all the requirements for your instance. And it can get pretty overwhelming, especially when you kind of scale up your cloud. So when you have hundreds of services running through your message queue, you'll actually have hundreds of thousands of messages actually being passed through your message queue. And they use different mechanisms, some are topic-based, some are fan-out-based. I'll go into a little bit of that a little later on. But yeah, as you can see, there's obviously a single kind of point of failure there that will kind of impact your system if it goes down. So in addition to that, everything going through one service kind of creates a lot of noisy neighbors where maybe your NOVA messages are probably more critical in deploying your instance, whereas your slumber data might not be as critical, but all the messages that slumber has will impact the performance of your NOVA messages as well. So when we first deployed RabbitMQ, I think a lot of people started off with that. And eventually, they moved to kind of a federated model. So basically, you kind of localize all your messages to specific clusters. So this kind of eliminates your noise neighbor scenario where Slumber will be generating millions of messages at a time where that will actually impact your performance in NOVA. But if you separate your clusters and actually have two separate logical brokers, the messages in Slumber actually won't impact your messages in NOVA as much, especially if they are on different hosts. So yeah, in this scenario, we kind of co-located everything to their respective functionality. So Slumber is a lot of the services are co-located to kind of the same, not necessarily the same hosts, but in the same cluster. And similarly, we do the same thing for NOVA and Neutron. And depending on how kind of chatty your systems are, you can kind of co-locate your Cinder and Glanson services, either shared with another message, with one of the other services, or if they're really high-demanding, then you can probably separate them as well. So just like to test kind of differences in performance between the two designs that I showed, we did some very, very basic testing. We basically just captured the timestamp of a message that before it was sent onto the queue from the publisher, and we captured that same timestamp on the consumer end, and we kind of just did the diff between the two. So basically, we just synchronized the time on all our servers when we tested this. And yeah, cool. So when we're testing Neutron, it has a pretty unique scenario where you have a small set of publishers or senders, and the way it works is it'll fan out that same message to multiple consumers. So you end up having very few senders and a lot of receivers. So in our test scenario, basically, we set up three senders and 500 receivers, and we sent 1,000 messages across them. And each message averaged about one kilobyte per message. So testing for NOVA, we did something. Most of the calls in NOVA function a little bit differently, so they used topic-based publishing. So you basically have almost a lot of producers and very few consumers. So we did the same thing, but reverse for NOVA. We set up 500 publishers just to send out 20 messages each, and we created three consumers to listen to all those messages. And the message sizes were also at one kilobyte. And lastly, testing for Slometer. This one's also kind of different. So Slomper actually scales horizontally, but just tested with one publisher sending all the messages, and we created three consumers. So the messages will end up being kind of split between all the consumers in a greedy fashion, so the consumer will just grab messages as they come along. You'll notice that the match of sides here is one meg, which is pretty big compared to the other messages. I'll kind of explain why that is a little bit later on. I should also mention that the way we tested this is we used RPC. So RPC, if you're not familiar with it, is remote procedural calls. So basically, you send a request, and you expect the response back. Because of that workflow design, you essentially need two queues where one queue handles your request and one queue handles your response, and your sender and consumers need to register to both queues. That's actually not required for Slommer's model. Slommer's workflow model is kind of a one-way stream. And we actually don't recommend using RPC in Slommer. There's a work queue publisher that is suggested. But just for the sake of testing, we tested RPC here as well, because that's what Nova and Neutron uses for their workflow. So for our testing environment, we used OpenStack Juno, which our most recent product is based on. So there's a few customizations there. But in general, it's OpenStack Juno. We use RabbitMQ 3.56. It's not the latest, but it's close enough. And we also use the awesome messaging 1.51. That's also pretty old, but it's relative to Juno. And so for our test environment, we actually used five servers. Any server had 24 CPUs, 128 gigs of RAM. And they're all connected via a 1 gig network. So yeah, just keep in mind, a lot of these results are based on Juno's code. So there probably are quite a bit of optimization since then. So based on results, this is what happened when you had one single RabbitMQ cluster. You can see that, in general, Neutron and Solometer had the same relative performance. On average, it took 50 milliseconds. And there are certain anomalies that took seven seconds in Solometer. But Nova had a bit more drastic results based on our test thing, where in some cases, it took 30 seconds plus. We're actually not sure why that happened. So I don't really have an answer for that for you guys. But it's possible that maybe Nova kind of misread a certain message. And because of that, it gets recued into the same queue and ends up having to process the entire queue again with that same message at the back. So in this graph, the y-axis is actually the time between us sending the message and receiving it. And the x-axis is the message ID, so each individual message. So we also tested the same thing using multiple clusters. As you can see, the performance was a lot more consistent. We didn't have crazy spikes randomly across the board. And when it did spike, it wasn't a 30-second spike. It was a few milliseconds spike. So if you notice, Slumber and Neutron kind of form relatively the same, maybe a little bit faster. But Nova did experience a significant boost there. So this is just a comparison between the results from a single-clustered RabbitMQ versus the three-clustered Rabbit MQ that we had in solution. You'll notice the performance increase for the max kind of spiked heavily because of the random spikes we saw in Nova. But generally across the board, we saw a good consistent increase in performance across all the services. And we had consistent behaviors in Nova. So as I mentioned before, the message size that we're using for Slumber was 1 meg, which was considerably larger than what the message size we used for Nova and Neutron was. And the main reason for that is that when you use Slumber in a large-scale environment, you'll have tens of maybe thousands of nodes. And on those nodes, you have hundreds of machines, which ends up being tens of thousands of virtual machines. And then you have hundreds of thousands of ports for those virtual machines. And you have thousands of volumes and spirals based on that. So Slumber handles a lot of data. And actually, Slumber has this mechanism to aggregate samples together. So instead of having millions of messages, you can have, instead of having millions of individual messages, you can have hundreds of messages with hundreds of samples in those messages. So you can kind of decide whether you want more messages with smaller payloads or bigger payloads and less messages. But if you did end up kind of grouping all your samples in one message, the message size can grow substantially. In a lot of large-scale environments, it can grow over to one gig of data on your message queue, which is pretty large. So what we did was we played around with how we aggregated our data to kind of figure out what the optimal number was for how many samples should be in a message. And so we monitored the memory that RabbitMQ used when processing messages with 500 samples in it, 5,000 and 50,000 samples in it. Obviously, each of those messages kind of grows in size. So if you have a 500 sample message, it's about a meg, and it kind of grows up 10 times each time. So based on that, the results we found was that if you create a message size, a message that has 50,000 samples in it, which was roughly about 500 megs, the performance actually dramatically or it negatively affects your system pretty hard. So on the y-axis, you can see the memory usage required by RabbitMQ, and it kind of spikes when you have a payload that's quite large. Whereas when we had sample sizes or messages with samples of 505,000, they kind of performed at relatively the same performance. So yeah, based on our environment, it seemed like 5,000 was a viable solution to kind of aggregate all your samples on. It's definitely something you'll probably want to play with. So in addition to just performance, RabbitMQ is a lot of techniques to kind of monitor your status of RabbitMQ to make sure it's performing at its kind of optimal state. Generally, people just... I think there was a talk earlier today about how to kind of manage your RabbitMQ. So this might be highlighting some of the same tools, some of the same techniques. But I think generally, people just use RabbitMQ as this panel line interface where you can kind of query it by typing in RabbitMQ CTL status, and I'll give you kind of a general overview of how your system performing. You can also type in other commands to kind of get an idea of the state of the host that your RabbitMQ is working on. But yeah, some of the stuff that we kind of run into when monitoring our RabbitMQ is that sometimes we'll just... We'll try to monitor it by typing in checking the status, and that won't work. It'll just kind of time out. Sometimes a queue will just die. Other times, the consumer will kind of die, and the queue will end up growing, and then the queue will die. So a lot of things can end up going pretty bad. So cluster partitioning as well. So when you have your RabbitMQ nodes for a single cluster, if they're partitioned across different network domains, you might have issues there with communication between your cluster. But yeah, in general, some techniques to kind of improve the reliability of RabbitMQ is to enable clustering. So clustering kind of allows you to deploy multiple RabbitMQ nodes, whether it might be on the same host or it might be separated on different hosts, but basically it'll create a single logical broker where all your messages will kind of be divide, kind of load balance across your single logical broker, and as you scale your environment up, you can add more RabbitMQ nodes to your cluster to kind of spread that load out, to handle the messages that, or the increase in messages that you'll receive. It also gives you full tolerance. If one RabbitMQ node goes down, the other ones can kind of pick up the slack for that one node going down. So some other additional techniques we use to monitor RabbitMQ is to check the status of the nodes themselves. So you can run RabbitMQ's CTLE val and type in his running, and it'll give you some information on the specific node. So also if you, we also check the health of each of the nodes in the cluster and if there's a node that's performing poorly, we'll kind of restart it, we'll shut it down and restart it. So yeah, there's also, if say a consumer goes down, you'll notice that your queues will actually start growing at quite a large rate, sometimes specifically if it's a salometer, which deals with a lot of information constantly. Tomorrow that there's the same, using the same command line interface, you can check the size of your queues to make sure they're all like relatively low. Again, as I mentioned, if you let your queues grow really large, it oftentimes RabbitMQ will not be able to kind of revive itself from that, because it just can't deal with large queues as well as small queues. And also just, we added a lot of additional logging for our consumers. So if something does go wrong on the receiver's end and it doesn't kind of process and matches properly, we'll get logs and then we can check those logs to make sure whether we need to make changes to the environment to handle them. Also yeah, in addition to the command line interface that RabbitMQ has, we also use Zabix. So we'll check the state or the status of the server itself. So we'll collect metrics like memory usage and CPU usage, dis usage, IO state. And depending, and based on those numbers, we'll kind of be able to guess or how to handle the current load that we're experiencing. So if the dis usage is, or if the CPU usage is spiking, we can maybe consider adding another node to kind of distribute the load a bit better. Or maybe if the RAM memory usage is high, we can increase the memory there. So it's definitely something that you'll have to track kind of as you deploy your cloud and it will change like day to day. So it's something to just keep an eye on. In addition to that, there's also a management plugin for RabbitMQ, which actually provides a lot of information that the CLI has, but it gives you information and a nice fancy visual. So it's a lot more easier to consume if you don't like CLIs, I'll give you graphs to kind of give you a historical overview of like your past performance and you can also scale your RabbitMQ based on those numbers. Cool, so right now for our clouds, a lot of them are around 600 plus nodes. Obviously everyone wants to scale to a larger open stack cloud, some dream of 10,000 nodes, I don't know if that's possible, but everyone wants to go bigger and bigger and bigger and as that increases, the requirements of RabbitMQ are gonna definitely have to increase and kind of improve along with it. We don't really have, I asked the original author of this talk if he has any recommendations or what needs to be done in RabbitMQ and I don't think we really have any suggestions on what needs to be improved in RabbitMQ. I think RabbitMQ will improve over time just as open stack will improve over time and yeah, it's a lot of monitoring and just tweaking of your environment, just keeping constant checks on how your systems, how your RabbitMQ systems are handling your current node and kind of taking that information and adjusting it to kind of your future plans and where your cloud wants to be. Yeah, that's the presentation. If you have questions, I was told that you should go to the mics. I might not have the answer, I shouldn't add that disclaimer, but maybe someone else here has an answer where I can kind of follow up on that. I have, I guess I have two. Sure. The first one is, since you guys are running Juno, how did you guys deal with multiple RabbitMQ instances and heartbeat problem, the heartbeat problems? Did you? I don't know the specifics of what we've backported. There is like heartbeat support now and I know a lot of distros and a lot of products will tend to backport certain features. So that, I don't want to speak, I don't want to assume that we backported that feature, but it might be backported. Okay, and then the second one is about your kind of federated model that you have. So when you separated those type of things, and for example, Nova on the compute has to wait for Neutron to respond back that a port has been created. And you only have one entry in the Nova or Neutron configs to tell you to go where it's RabbitMQ. How is it gonna know how to get to the Neutron version? So I can get the message and create the port. So I'm not sure how Nova would handle that, but in Oz and Messaging, you can define multiple targets to listen to and either send to multiple clusters or you can connect to different clusters via Oz and Messaging. So my question is kind of two-fold. Number one, when did you start seeing latency issues? And number two, how many instances did you have? You see it licensing issues? When did you start seeing latency issues? Latencies, oh yeah. So yeah, we just- What was your environment? Yeah, so this test environment was, I don't know when we experienced it in our real production environment, but I think based on our numbers, it was like we were sending a thousand messages per, I think it's given time and yeah, I'm sorry, I can't answer that question. All right, thanks. Yeah, no problem. So with the federated model, did you see any spikes in the memory? Did I see any- Yeah, spikes in the RabbitMQ memory. Spikes? Yeah. Not in our test environment. It's what we use in our product right now and I don't know from a day-to-day perspective, but I think the performance has been relatively consistent just in general. Did you do any benchmarking of the Osmo messaging library against the native Rabbit? Yeah, there is definitely some overhead there. I don't think we actually tested that in this specific presentation, but like as a developer for Osmo messaging, there is like, depending on what driver you, even what driver you use, if it's Pica driver or Combo driver, there's different overheads based on that. From what I know or from what I remember, I believe the Pica driver for RabbitMQ performs a little bit better, but yeah, there is some overhead to Osmo messaging. There's been a few, there's been quite a bit of refactoring done in the recent cycles, so in the past few months, but I don't have any hard numbers on whether that's improved anything or made things worse. Okay, thank you. Yeah, no problem. Do you think you could comment on possibly the benefits of separating out your RabbitMQ clusters to different services instead of using something just like Nova cells where everyone gets kind of their own queue? I'm not quite sure what you mean. Sorry, if I understand it right, Nova cells get, everyone gets their own queue in a different cell and that queue is dedicated to the cell, but in this model it looks like you're splitting out in terms of, are you doing it by service or like say like salameter all gets, everyone gets their own queue and then like all the other services get their own different queue, was that? It would have its own like RabbitMQ nodes, like the clusters themselves would still manage their own queues. Okay, yeah, cool, thank you. Yeah, no problem. Do you have any specific primary tuning for the RabbitMQ besides the separation of Nova and Neutron? Not myself personally, so I'm just representing the original author. I'm sure there's other techniques to doing it. Okay, so I was interested if you have any specific. I can pass that along and see if there's, there's any techniques that we use. Cool, awesome, thanks for coming out guys.