 Hello, thank you for coming. That's a lot of you. My name is Michael and I'm going to talk about RabbitMQ operations. But first, let's see, who the hell am I, right? I work for a company called Pivotal, which is the main steward of RabbitMQ. I'm a staff engineer. And you can find me with the same username just about everywhere, Twitter, GitHub, and so on. So, a little bit about this talk. So, it's not particularly structured. Operations is a very broad topic. We can go on for hours. So, I think it's fair to say that this is a brain down from years of answering questions on the mailing list and a lot of other venues. And I will try to leave as much time as possible for questions. It also focuses on the most recent release. Some of the things that I'm going to demo may or may not be available in the release that you run. Well, that's a good reason why I should upgrade, I guess. Yeah, so let's get started. So, let's start with provisioning. Provisioning is not rocket science. It's more mostly a solved problem, but there is something that you need to be aware of, which is these days you have mirrors of RabbitMQ.com, of packages, of our app repository, signing key, so on. So, you can find them on GitHub, we use GitHub releases. You can use Bintray, we also use Package Cloud. I'm not sure if we are going to continue using that, as much as I love it, by the way. And yeah, we're looking into community hosted mirrors. Maybe I should expand a bit on that. So, certain open stack vendors approached us about hosting a mirror of our app repo that would work for them. For example, for their CI systems, for their deployments, and just as a mirror that is convenient to them. So, if you work for an open stack vendor or a company that uses open stack heavily, you would like to host a mirror of RabbitMQ, specifically of Debian repositories, because I think binary packages are well covered already. Talk to me, we will figure it out. I think the more mirrors the community has, the better everybody is off. But other than that, on provisioning, you should just use packages and Chef, Puppet, and so on. I see few reasons to not use those tools. So, next thing we see over and over is that when people deploy RabbitMQ, they forget to take care of operating system resources. So, they end up running those defaults, and that often doesn't work very well. Now, why is that? Well, I'm not sure about you, but in my opinion, modern Linux defaults are absolutely inadequate for running servers. They may run GNOME of certain people in California very well on the desktop, but not servers. Let's take one example. So, there is a default, which is the max number of open file handles per process, right? By default, it's 1024. So, which means if a process such as RabbitMQ ever needs to have more than 1,000 sockets or files open, then tough luck, the OS won't let it. And most likely, if you need to write something to disk, but you can't because you cannot obtain a handle, yeah, bad things are going to happen. So, yeah, just should be enough for everybody, right? 1,024. So, the first thing you do, maybe you should build like your own OS image or something like that for RabbitMQ or MySQL or anything that is a data service, just set your limit n, and it's respective sys control, FS file max, to like half a million and forget about it. Because the amount of resources you're going to spend by bumping that up like 500 times, it's negligible. It's much better than running out of file descriptors and basically losing data. So, just do that. So, next thing, which is a beautiful relic from the 90s. So, there's this thing called TCP keep-by-life timeouts. Basically, it's a TCP implementation feature that monitors peer availability, if you will. And when that times out, depends on a bunch of settings. By default, they vary from 11 minutes to over two hours. That was a great default for 1995. It's not exactly great today. So, if, for example, a client disappears for whatever reason, you don't want to notice that two hours later, right? So, here are some, again, sys control options. I'm not going to explain what exactly each of them means. You can easily Google that. But these would ensure that you have, what, 15 seconds after reach of an availability, after which the socket would be considered dead, and then Rabbit and Cube would notice it, or your clients would notice that it works both ways. You should also enable client harbors, which are pretty much application-level versions of the same thing, now that Oslo messaging supports it. I usually recommend six to 12 seconds. Going under this can produce false positives sometimes. So, next thing about operating system resource tuning, some people need to tune for throughput. Others need to tune for a high number of concurrent connections. Now, I'm not going to claim I know what exactly open stack means, but I'm going to cover both. So, for throughput, this is relatively straightforward. You just bump your TCP buffers, like so, to whatever it is, 16 megs, probably. And that's it. Your processes probably would never need buffers that large, and there is a real cost to that. We'll cover that in a bit, but this is just an example. You can, your mileage may vary. You can pick a different value, but these are kernel parameters that you want to look into. So, next thing is a Rabbit and Cube option. I'm using the dot-separated format as well, but, yeah, if you have ever had to edit a Rabbit and Cube config file, you probably know what it translates into. So, you can enable hype compilation. Hype is an early on virtual machine feature, which is a just-in-time compiler, which applications can opt into. Prior to Erlang 17, there have been some stability issues. Starting with 17.0, I personally haven't had a single report about hype, you know, crashing the VM for unknown reasons. So, maybe it's worth considering that throughput is the most important thing for you. Keep in mind that it has a very real cost to it. Namely, it would delay node startup time by a few minutes, depending on your CPU, because it has to compile a bunch of code into native code, and then load it. So, that's pretty much it, other than application-level concerns, such as data locality, for maximum throughput, these are probably OK. Oh, and you also want to disable Nagel's algorithm on your sockets, but that's, you want to do that anyway, regardless of your scenario. So, let's see what can we do with concurrent connections. You want to do a different thing, because your limiting factor is once you go to hundreds of thousands of connections, you would observe that if every connection takes a couple of megs of RAM, with that many connections, you sadly need gigabytes of RAM just for connections. Now, I'm going to spare you some profiling. What actually takes that memory is TCP buffers. They are pre-allocated. Even if you don't use them, they still consume memory. So, you want smaller TCP buffers. You also want to make sure that all those connections time out quicker, because when a connection is closed, it's not actually immediately reclaimed by the TCP stack. It goes, it doesn't go into heaven. It sadly goes into this limbo state called time weight, which if you ever had to lease TCP connections on a machine that has not only a lot of connections, but also high connection churn, meaning clients connect and disconnect all the time, you would see like thousands of those. So, you want that time weight to be short, because sockets are kernel resources. They are not infinite. Yeah, and another thing you can do is there is a couple of kernel options that would allow for socket reuse, basically if an application asks the kernel, hey, can you please open a socket for me? The kernel would notice that it has, it doesn't have any sockets available, but there's a bunch of sockets in the time weight state, and it would just use one of those. This is not safe in 100% of cases, but it can work very well. So yeah, how do we tune those TCP buffers? Those are application level settings. So in RabbitMQ, you can configure them using these values. So send buffer and receive buffer, I guess, are self-explanatory. And backlog is the number of inbound connections that can be initialized at the same time. So imagine you have 1,000 clients connected all at once. With the default value, which I believe is 128, the US at some point would see that, hey, I have more than 128 connections spending, it would just stop dropping those at the end. So you would get e-connection refuse similar issues, that's why. So bump that to, I don't know, a few thousands or something like that. So with this, what kind of effect can you expect? So what if I told you that you can reduce the amount of RAM used by your connections by a factor of 10? That sounds too good to be true, right? But this is, if I remember correctly, this is the value you need to go with. 16K, so yeah, this can't be so easy, right? Yeah, well, the downside of this is that throughput will drop considerably as well, at least under very heavy load with larger messages in particular. So this is, like many things, I guess, in operations, it's a matter of finding a balance, what kind of DCP buffer your particular system needs. Yeah. And for time-way timeouts, here is the kernel option, here, I believe it is in seconds, so here it is five seconds, I think five seconds is a reason, maybe 10, but certainly not two hours, or 10 minutes or whatever, and this is how you enable socket reuse. So this option is not safe when you run behind NAT, this may or may not be relevant for servers, especially servers and private networks, which I would expect a lot of open stack deployments are, but there is a blog post titled, I think, copened with a large number of concurrent clients or something like that, that explains that in a lot more detail than we have time for. Yeah, so there is another related kernel value, which is, again, like the number, basically the number of concurrent connections that you wanna bump to survive disconnection storms. So for example, if your server loses connectivity to the outside world, for whatever reason, it can be just for a moment, you can have tens of thousands of clients disconnected at once. They will probably reconnect at around the same time, at least that's not something you can control, right? So you want to be ready for those searches as well. So some of these are listed, I think the most useful ones from my experience, are listed in the docs, so feel free to take a look. So what next? Disk space, it sounds very obvious and very boring, but apparently there are Linux or rather operating system distributions that manage to break that in several ways. For example, pay attention to, by default, RabbitMQ packages would store its data under varlib. So pay attention to what partition that directory ends up on, because there are certain distributions that think that, oh, it's okay, we will place varlib on a separate partition that would be one gig in size, nobody ever needs more, and then at some point you realize that your notice is out of disk space, it cannot write to disk, and once you cannot write to disk, the problem is you cannot trust your data store. You don't know what exactly wasn't written and how it may break. So pay attention to that. Another thing that people don't necessarily account for is that messages that are not published to be stored to disk can still be moved to disk temporarily, when the node finds itself under stress in terms of how much RAM it has. So, yeah, always plan for your entire dataset, not just what you believe will be on disk. Finally, RabbitMQ has a disk monitor, which would raise an alarm and block publishers, publishing connections. It's not supported on all platforms. What I mean by that? Well, for example, draw relatively rarely used PSD flavors, and there are also Linux versions with custom kernel modules that at various points broke disk monitoring for us. Now, this is in part our own fault because we have to shell out to get that information. There is no operating system API that, well, maybe there is, but there is no in the runtime that can reliably tell you how much disk space is left on the partition where your database directory leaves. So we have to shell out the tools and their output may vary from platform to platform. But this is not, well, in a way, this is not critical. If disk space monitor exits, then you wouldn't have that data, but the rest of the node should continue running. Just keep this in mind. So RAM use, most of you are probably familiar with this parameter, which is what fraction of total available RAM RabbitMQ would be allowed to use. But there is another option, which is unless you took a look at the example config file, it may be pretty easy to overlook. What this option means is that at what percentage of the first value, so let's say your machine has four gigs of RAM and VM memory high watermark is 0.5, which means you allow RabbitMQ node to use two gigs. So at what point from zero to two gigs should RabbitMQ start paging data to disk, including transient messages to free up RAM? We have a production checklist guide on the site that recommends some of the values. Of course, it depends on how much RAM you have in total. The default is 0.5. So basically, if we take both defaults into account, at about 20% of your total available RAM, Qs will start paging messages to disk to free up RAM. So to see memory breakdown, you can use these, again, probably very well known commands. The first one produces not a particular detailed output, the second one produces an output that is pretty excessive in many cases, but yeah, you have to pick one of those. Speaking of paging, there have been significant paging efficiency improvements in the last couple of releases. And now the thing is properly disk bound, as it should be, as most would expect. And previously, Qs would find themselves in this scenario that they try paging things to disk really inefficiently and create a feedback loop where the nodes tries paging things to disk, calculates whether it should do that in various ways. Very quickly realizes it's not paging fast enough, starts the paging process again, and basically, yeah, makes things worse. So those are fixed by 3.5.6, or at least we have seen significant improvements reported. But yeah, so next thing, usually figuring out from the memory use breakdown, it's not that difficult to see what's going on, but there is one thing that is not specific enough, which is binaries. So if you see that binaries use a lot of RAM, it can mean a bunch of things. So binaries, for example, message payloads, they're stored in a separate heap in the runtime, and that heap is collected very differently from regular processes. So this is why also figuring out, because there can be multiple processes referenced in binaries, this is in part why, figuring out what exactly are those binaries can be pretty tricky, and why we don't have this breakdown yet in Rabbit and Q-Control Report, for example. But we have seen one specific thing that usually contributes, well, two of them I will cover both, that contributes to binary memory use, besides actually having a lot of messages. Yeah, so one of them is, there's this option that we have, it was introduced in one of the 3.5 releases now, what exactly does it disable? So a long time ago, five years if I remember correctly, someone decided that it would be neat if Rabbit and Q could cache the data that it reads from disk in memory. Sounds like a great idea, right? Well, the thing is, the operating systems and file systems specifically already do the same thing, but they do it significantly better, or at least a bit better, in that the operating system has all the context of all the processes that are running, right? So it knows when it's time to evict that cache, and your particular process doesn't have that information. So it either has to evict things very aggressively, or yeah, the opposite is true, it basically never evicts things, or is very lazy about doing that. So Rabbit and Q's file system buffering falls into the latter category. It just buffers things, and then they sit there unless you consume them. So don't do that in your own apps. This is why we have this option to disable it. In 3.6, which comes out in late November, it will be disabled by default because we have seen enough of this stuff. And yeah, certain people who made this brilliant decision are no longer in the company. And after that, as soon as the 3.7 cycle starts, we will remove that thing entirely. So don't do that, it was a terrible idea. If you don't run a version that has this setting, you may have this function, which we added, you only can invoke it using Rabbit to control a valve, which is a pretty sharp knife, but you can do a lot with it. You can clear all the caches using this function. So that's not great, but it's better than having all your memory being used by stuff that's not actually being used. And if all that fails, this is way outside of the stock, but there is an Erlang tool slash library called Recon, which can produce very detailed memory profile and information saying you basically producing you like a hip-tump, telling you what takes RAM, and we have discovered a few really non-obvious memory leaks in the past with it in various scenarios. So this is worth exploring. Oh, and in this talk, I'm going to mention things that are coming next to leases because there are only so many OpenStack summits per year, but we try to improve in things every month, hopefully. And one of the things that we have been asked is the ability to set RAM watermark as an absolute value as you already can do for disk. So this is coming in 3.6. Should be a bit easier to configure your nodes. So another thing that usually directly translates into higher or unexpected RAM uses, there is this thing called the management UI. You probably have seen it has this orange rabbit in the corner. So to show you all that data, it needs to collect it, right? So nodes, queues, connections, channels, external data such as free disk space. All those are collected by nodes emitting events on a particular schedule, so every and seconds basically. And then there is a thing called the stats collector that aggregates them and serves them using the HDTBA guy. So sometimes that thing, if you have like hundreds of thousands of connections, that's a lot of data coming in and it can get overwhelmed. And the key symptom of this, because you can see what node hosts the stats database, that node consumes a disproportionately high amount of RAM compared to the rest. So once you see that, you can be pretty sure that it's the event stats collector that is not keeping up with the load. So you can actually see if that's the case using this command that will produce something like this. Now this is on a node that has no load, but you can see three values in it, message queue length. This has nothing to do with rather than queue length. This is a process queue length. Total heap size and garbage collection information. Usually if you can see that the message queue length keeps growing over time, yeah, this is exactly what I'm talking about. So currently, the only, well, there are two things you can tweak. One, you can increase statistics collection intervals. So by default, it's five seconds. This sets it to 30 seconds. Now on one hand, this kind of sucks because you may want to have like more fine-grained stats, right? But again, from my experience, and we talked to our users every single day on the mailing list, a lot of people find it very reasonable to run the 20 to 30 seconds intervals. Because most monitoring systems, they typically don't need a five seconds resolution, a 30 seconds, or even a minute would be good enough. And of course, this proportionally lowers the load on the collector, yeah. The higher the interval is, the lower the load is. But if for some reason that is not enough, you can use another option, set it to none. This would disable rates in the management UI. Again, not necessarily great, but sometimes that's good enough. And this is better than a note consuming RAM or an over. Finally, because the entire stats database is in RAM and should be considered transient, you can simply terminate it. And it will be restarted, potentially in a different note. This is how you do it. We may introduce a Reviton view control command just for that, I don't know. But yeah, why exactly, you may wonder why exactly does this thing fall behind? Because it's not parallel at the moment. And parallel stats collector will probably only make it into 3.7, but there is a lot of room how we can make this parallel. And we are working on it already. So next thing, cloud formation, usually forming your cluster using like auto clustering or something like that is relatively easy. But the problem is once you need to restart it to upgrade, you run into this note restart order dependency. The last note to shut down has to be the first to come up at the moment. So we have a plugin. It was developed for the Reviton Q Cloud Foundry I done a couple of years ago, or rather it has been used at PUTL for two years, so slightly older than that. Here's where you can find it. It's open source finally, thanks, legal. And yeah, the Rhythm says how you enable this. This looks a lot like auto clustering with one key difference. Notes can start in any order. Which means you can restart your cluster in whatever order your provisioning tool does. And as soon as all notes from the list are up, Reviton Q will continue operating as usual. So this only affects cluster formation and nothing else. Yes, in fact, the tool called Bosch in Cloud Foundry. When you do an upgrade over the tile, it temporarily stops all notes in whatever order it pleases. We don't have any guarantees about that and then it upgrades in that, yeah, things start and as soon as all of them are online and reachable, things just continue as you rule. So, but this is one thing, this solves the ordering thing, but this, not everybody has this specific problem. There is another plugin by a fellow named Gavin Roy, who by the way is the maintainer of Pica and is a really, really nice guy. This plugin, the auto cluster one, what it does is it allows you to use console, EDCD and DNS A records or round robin DNS to form your clusters. You don't have to use the config file. It does not do what the cluster does in terms of note restart ordering. It does not know coordination, but yeah, this is extensible and this may, you know, it gives you options in how you want to provision things. Now obviously for future releases, we will want to incorporate both of those into the core one way or another. So if you have requirements in that area, please tell me what they are because currently those are two things that they're not mutually exclusive. We would like to integrate both of them, but maybe you have needs that are not addressed by either of them. So next question is backups. So how do I backup a node or cluster one? On one hand, this is pretty easy. You just copy your database directory. There is a missing dash R by the way and archive it and ship it to S3 or whatever you prefer. But in a running messaging system that actually can be very tricky because things change all the time and we probably don't want to block all publishers just to take a backup. We will be looking into what's the best way to do that, but it's not a trivial problem in general but in data stores, you have different usage patterns that in messaging systems make this a bit trickier I guess. So what we recommend doing is that just replicate everything off site. How do you do that? Enable exchange federation that federates everything. Every single exchange is federated somewhere else like to your different data center or whatever. Then you set, oh, and you also can export and import definitions using HTTP API and Management UI. Those are queues, users, VHOS, bindings, permissions, everything that's not messages basically. Then you import them on this standby cluster, messages begin flowing there and then the issue that you're going to have is that they're never consumed, right? They're flowing there, they've been replicated but they sit there forever. So to combat that, you set a reasonable message TTL, again, using the policy that Microsoft use, say two hours, five hours, 24 hours, whatever window you need and that's it. It has been working fairly well to quite a few users and yeah, it may be not pretty but it gets the job done. So hostname changes. If you have ever tried changing the hostname on a node that was running rather than queue, you may notice that it actually doesn't like that when it comes up. So again, I don't remember, in 3.5 release, we have introduced this command that lets your name knows. By the way, it doesn't accept just one pair. You can rename multiple nodes at once. You have to execute it on every single node. The reason for that is that in the internal database it stores peer names which include hostname. That's why this has to be done. So I don't know, if hostnames change for whatever reason, you now have a way to just rename that. It will start. I'm not sure that's a good idea to be honest because for example, when you enable VPN, your hostname may change. So it should be immediately rename everything. This is one reason. Another reason why I'm not necessarily sold on this idea is you need to, a single node can perform this on itself but it cannot perform it on other peers. At least at the moment, we are not entirely sure what's the best way to do that. So it should be done by a provisioning tool in my opinion. Again, in Cloud Foundry, this is done by Bosch, I don't think we actually had to go through a lot of migrations like that but we certainly had one migration that was, without this it would be very painful with this. It's, well, nobody really remembers it anymore. So network partition handling. It's difficult to pick a rule of thumb for the strategy you should use but when in doubt just use AutoHeal in my opinion. What that means is the minority of nodes will reset itself and sync its data. So some may say, well, that's not great and two divergent data sets should be merged. So this is something we certainly want to support someday. The problem is it also has downsides. Namely, you may have duplicates and not every application, in fact most applications don't deal with that very well. You don't want to, you know, have two VMs provisioned in OpenStack instead of just one because of that. So these are real trade-offs but once merge is coming and Rabbit can be used as an AP, as in-cap system, you would have, you know, we would be able to fix Soslo messaging or whatever layer needs fixing to deal with duplicates. And yeah, then merging would make sense. It would probably be the best strategy. Now, a bunch of other topics we're not doing very well on time, sadly. Just don't use default fee hosting credentials because this is like not having authentication at all, right? If your credentials are well-known, just don't do that. Don't use 32-bit Erlang, I mean, seriously. Do you really want to be bounded by that address space of roughly four gigs? We have seen people, seen all kinds of obscure exceptions because, well, they just figured, okay, I'm going to run on 32-bit. Why not? What can possibly go wrong? Don't do that. There is absolutely no excuse for running that. Some people tell me, well, what if my CI needs, you know, I want to run my CI in a 256 megs of RAM VM. That's not a problem for open stack environments. So please stay up to date with releases. Pre-fix bugs and introduce those small features in point releases, even though that's not great from the semantic version perspective, but we want to shift those features earlier because they can solve very real problems for people. So, but if you don't upgrade, all of that is useless, please join the mailing list ever since my first open stack summit was in Vancouver and I was surprised by how many people had questions about RibutMQ and never asked them on our mailing list. Like, come on guys, we have it for a reason, it's public, it's our primary channel of talking to users. There is a Slack channel's imputal that we use, but what's going on there, you probably don't care for at all. So it's pretty much 80% of things happen on the mailing list and then there is GitHub, but GitHub issues are not used for questions. We move all of that to the mailing list, which also means if you have a question, it's a very good idea to search the archive first. Thanks to Google, it has a decent search feature. Finally, if you want to use Spacemaker and need an OCF resource template, Mirantis have been very nice and contributed, backported a template from Fuel and they can continue fixing things as they discover them. I think the first, it's currently available on GitHub, it will ship with the next point release, so 3.5.7. And please use TLS, it can be painful at all, but again, I personally think there is no reason to not use TLS. Why? Well, you probably want, you probably don't want people easily tampering to your traffic, because in OpenStack, it can reveal a lot, I can imagine, including credentials and stuff like that, which I'm not sure, maybe everything's encrypted and passed around, I don't know that much, but just use TLS. It's a bit inconvenient at first when you figure out all those keys and what the hell do they mean. But I don't know, this is what I would recommend. Now, a bit of what's coming in 3.6, this is gonna be very quick. So, in process file buffering disabled by default, I've already mentioned that. Q master denote distribution currently, whatever node a client that declares the Q is connected to becomes its master. All operations flow through it because we need to guarantee ordering. So, you will have control over how those are distributed. So, you may want to use round robin or random or least load it, that kind of strategies that is already done. So, shot 156 or actually any other algorithms supported by the runtime for password hashing. This should be a straightforward upgrade, meaning you wouldn't need to do anything, but don't tell anybody, well, this is recorded. But currently, Rabbit Inc uses MD5 for password hashing. Again, this is a very old decision from probably 2009 or whatever. That's not, you know, that just won't fly in many environments. So, more responsive management UI, thanks to this infagination. Yeah, so, if once you have, I don't know, tens of thousands of Qs at least, using management UI to list Qs currently doesn't work very well. It takes a while to load. Yeah, we will have something that looks like actual pagination, you know, on the server. So, stream it to Rabbit Inc. to control. We will see if this actually makes it. But basically, some people complain that if you, even if you add a timeout to Rabbit Inc. to control operations, at least something. If something then actually times out, then you get a timeout message and nothing else. Even if the very last, whatever you're listing timed out, you would get no response. So, here, Rabbit Inc. control would list results. So, it would list, for example, 99 Qs. And if Q number 100 times out, well, yeah, it will be just a failed result in a list. But you would get your 99 things first. So, past that, plugable cluster formation. I've already mentioned it. If you're familiar with Elasticsearch, it has a nice plugin that say on AWS uses EC2 API to list nodes, filter them by tag and cluster with those. So, we would like to have some of that. It's a very convenient thing. On this data recovery tool, so you have a node data store and you want to restore those messages to a live node. Currently, there is no way to do that. We already have, I think it's not open source yet, a tool that basically reads those messages and publishes them. Better CLI tools in general, Rabbit Inc. control, Rabbit Inc. plugin solo, they don't necessarily have a great help section. So, that needs to be fixed. E0 site replication, probably using what I've mentioned before. We will see how exactly that's going to work. But this is something that we hear about a lot. Merge partition handling strategy, but it will take rearchitect in the entire distribution layer. So, it will take a few releases, I'm afraid. So, thank you and here's where you can find me on the internet and if you would like to join our team, there is a real opportunity to do so. We have two positions open, I think, or one, I don't know, basically talk to me. And if you have any other questions, how are we doing in time? Okay, so let's open up for questions, but if you don't have the time to answer your question, you can find me in the hallway. So, well, if you don't do that and you restart your node, it will discard everything it finds that's not durable and messages are not persistent because... What's there to lose? Just to repeat my question to the rest of the audience. How important is it to enable persistent queues for OpenStack? I believe that we're not losing all that much by losing content. Well, in some cases, you don't need, like your data is naturally transit, in those cases, you don't need that. This is why it is an option, the protocol. I think for most cases, it's a good idea, I would say. So how much would we win on performance if we do that? You need to benchmark. I'm not an oracle. There certainly will be certain amount of overhead to that, but I mean, yeah, if you have to hit the disk, you have to hit the disk. Thanks. Thank you. I can repeat the question. Do you think this plug is, would that create a dependency on the upgrades, like if I wanna be current with the upgrades? Yeah, the question, private MQ cluster and auto cluster, would it make upgrades any harder? Well, we obviously tried keeping them as compatible as possible. I don't think currently there has been any breaking changes in the cluster. For auto cluster, I'm not really the right person to ask. You need to upgrade the plugin, like drop in the new version, and that's it. We release, every release that we produce, those community plugins, as we call them, including the cluster, they are rebuilt automatically. So it should be relatively straightforward. If this is a free, if you go from, say, 3.5.0 to 3.5.6, you can run mixed clusters, so you can do a rolling restart, say, with the clustering. Upgrading from, say, 3.5 to 3.6, we have that in the docs. You need to do full cluster restart, and mixed versions are not supported because inter-node communication particle can change in breaking, and it actually, commonly does, typically does, between feature releases. So we'll wrap MQ in, have the feature to delete a channel instead of delete connection in the near future. To close a channel, usually speaking. I thought that it was already possible. If not, I see no reasons to not do that. Just file an issue if it's not there, but I think it may be. That's a way to file the issue. I didn't see the bugger report. Or rather MQ. Okay, maybe I misremember something. I certainly would welcome such a feature, why not? Yeah, all the development happens on GitHub, so please watch the repos, report issues there. If you're not sure, just ask on the mailing list. We are happy to file an issue for you. Okay, thanks. Question about TLS. It's quite straight easy to enable TLS for traffic, but it's more tricky to enable TLS for the traffic between the node, the replication. You can do that. It's a bit more finicky, but it's possible. I think we even support it on Windows. So yeah, it's doable. I think we should have docs on it. Typically people who, I think most people who enable TLS say primarily enable client to server TLS. This depends on how locked down your environment is, and if you're deploying OpenStack and say healthcare, you would probably need to encrypt inter-node communication for regulatory reasons, stuff like that. So I would say it's a good idea. It's a bit finicky, but doable. In some environments, it's probably a requirement. Is it enough to specify the right parameters in the config files, right? Yes, you have to specify three more parameters, virtual machine parameters, sorry, where are the time? Yeah, on the command line. Thank you, and find me in the hallway if you have more questions.