 activity this year. So we did one in January in London, which is actually quite successful, but since then we've really been out of it for obvious reasons. So we're trying to pick things up here. I think I heard someone speak. Go ahead. Ah, it looks like Jimmy's fixed the streaming. Okay. My heartfelt thanks to Jimmy. So who would like to introduce themselves next? Again, it's optional. You don't have to. You can just watch. Sean's probably a brand name person, but it might be good for an introduction. You know, to keep things going, I can introduce myself. Yeah, I'm actually not an operator, but, you know, involved in the OpenStack community for a long time and care about hearing about experiences with actually running OpenStack. So that's been my involvement here and helping whatever I can with getting the ops meetups going and just listening in and doing what I can. Anyone else care to introduce themselves or, you know, give us some idea what they're looking to get from this session? Nobody wants to. Okay. Too shy or too tired. Maybe a bit of both. That's fine. So I shared in the chat here the etherpad. The way these sessions work is this is not a presentation. It's around tables. We're supposed to, you know, all pitch in and talk. And we're also attempting to catch notes about conversation in the etherpad. So if those of you don't know, it's like a lightweight shared documentation platform. So if you click on the link in the chat, you'll see multicolored text and it's actually growing. People are plus wanting their OpenStack version. So that's always a fun way to get going. One of the things we've done is highlight to the OpenStack developers that there are a number of installations with fairly elderly OpenStack. And so, you know, it's a problem when they act. Occasionally, if they assume that everyone's running the latest, in particular, there was a time when they would delete the documentation very aggressively. Like, the stuff would be supported for a year and then the docs would basically, you know, fall off the edge at the base of the earth quite quickly. And that was a problem because the docs for the version that, for instance, we were running got deleted. And the Ops Meetups team actually conveyed the information to the developers that that was actually a problem and got outfixed. So what I'd like to ask you to do is to actually start participating in the easiest ways. And this presumably no one's too shy to do this, go onto the etherpad, add your name in participants at the bottom, and then plus one, every version of OpenStack that you're running in production. Now, we might have test instances and something on your laptop and, you know, those kinds of things, but something that's actually running end user VMs would be counted as production. It doesn't have to, you know, be kind of a commercial thing. You know, if you're a supporting scientist, then they're your users, right? So for me, I'm going to plus one it, but I'll also talk through. So we are building and growing our Rocky clusters, but we also are still on the target on older ones, plus one, both of those because my target is still a supported platform in Bloomberg. And the nice thing about that is that was actually the first version of OpenStack for which they stopped deleting the documents. So to this day, you can go into docs.openstack.org, select the version selector to the target and find the docs. So it is important to collect the information from real operators because in some sense, the developers are more coherent community. You know, they get together, they do the code. It's often their full-time job to develop the OpenStack code. I work at a company where I run a product based on OpenStack, but, you know, I don't cut really, you know, OpenStack release branches. I don't take decisions to delete things. But then I find that the stuff we're using is, you know, considered obsolete. Then that's a problem. So we try and balance that, you know, OpenStack developer focus, I guess, with information from the operators that, you know, what our needs are and how things can get better. So it's actually, you know, all around making OpenStack better for all of us. So this is interesting. We've got somebody who said that they run Kilo. We did run Kilo. We actually upgraded to Liberty and Natarka on those very same clusters on the same hardware. So we do have experience going that far back. I don't think we have, well, I'm certain that we don't have any Havana and Icehouse. But, you know, if someone is running something older than Havana, they can, they're welcome to extend the depend. Again, this is, this is not a presentation. This is, I'm trying to, you know, basically kick off a discussion here where, you know, we all share information. Now, actually, I see Shintaro has joined as well. Hey, Shintaro, how are you? Long time no see. Good to see you again. We just agreed. Sean and I just introduced ourselves. And I invited other people to introduce themselves if they wanted to. But I think people were either still waking up or shy or something. But Shintaro is another member of the Ops Meetups team who's actually hosted an event in Tokyo via his employer. So I asked everyone to get started with the version summary. Hey, there he is. And, you know, also put their participant name at the bottom. So we're getting a good uptake. And actually, it does look like the community, well, this is a sample of the community is making progress because we're heavily at Queens. But that's, you know, these days still quite a long way back. They just announced Victoria, I think it is. I may even have the name wrong. We are hoping to get developed, developing our product based on Osori very soon. And look at that. Now Queens and Train are vying for the lead. So this is a fun thing because, you know, you can plus one something even if you're not sure how this works and not feeling like speaking or being on video. But it's actually really interesting because without this information coming to the developers on a regular basis, they can get the idea that we're all, you know, very up to date. In our case, it's not possible to do updates at the rate that they come out from OpenStack. So we have to pick and choose, you know, so then we're faced with some of the issues that I try to sketch out later on like upgrades and things. So keep on putting your plus ones there. Did anyone want to do an introduction? I kind of stopped asking, but if anyone wants to share who they are and what their use of OpenStack is, or, you know, anything about scale or those kinds of things, speak up. We did have a pretty decent time slot. Okay. I can talk about it. Go ahead. Yeah, I'm from Lime and we're the nominee of Superstar this year. So there's a short introduction and keynotes. And our scale is supposed more than 2,000 hypervisors. And we're currently running the Takka version with a lot of customization to fit in those components. So that's kind of reason that we're stuck in Metacoccus. It's hard to upgrade since we have too many customizations. Yes, it's a very good point. Thank you. Thanks, Gene. Thanks for sharing. Yeah, congratulations on SuperUser nomination. That's great. I saw something about the scale. It's pretty phenomenal. There's this, you know, these, these sites that are not even that well known that just turn out to have all of this infrastructure running on OpenStack. It's always exciting. I think I'd heard of Lime before, but I didn't understand the scale. So you guys are clearly doing something right. So you said Metacoccus. That's great. We're still running Metacoccus on, I think, nine OpenStack clusters in three, three geographic regions. So I think that that gets probably very nicely to the next point that I put in. And by the way, even the points on the document are just my idea of what we could talk about. You can feel free to throw in whole new topics, but the upgrade pain points. So I threw a few things in that are very commonly discussed. One is the cadence. OpenStack does two releases per year and they're all full releases. What I mean is they're all each as good as the other. There aren't any ones that are prominent as being long-term support. That's not a concept for OpenStack. So you're expected in some sense to actually visit every release in the software. In particular, you have to at least do the database migrations through every version. But I've kind of had some discussions where I think that two a year is not necessarily the right pattern in these days. So I wonder whether anyone else feels that the number of releases is maybe too high. And the interesting thing is when we talk to the developers and say, two a year, why are you doing that? There's much fewer Linux kernel releases, for instance, than other things. They say, well, two a year, we want many more. We want to release all the time. So there is a bit of a disconnect there. But anyway, I threw that in. It hasn't budged so far. But interestingly, it used to be two releases a year and two summits. And I think it's moving towards, well, who knows in the time of the pandemic, but I think they are moving towards one full summit and one developer focused event. And to me, that kind of indicated a slight change in the emphasis away from this fast-paced two year. So whether or not you actually switch on your intermediate OpenStack releases, you do have to upgrade through all of them. And that's something that people have asked for, fast forward upgrade. They used to call it skip level, but basically it's been clarified that you can't just upgrade from A to C, for instance. You always have to go via B, but it can be, okay. John, the two big guy folks do want features faster than two S year. That's a very good point. There are people who are gasping for features faster. So John, you're welcome to speak. I don't know. Is it possible I'm missing something that some people can't speak in this? Are you joining via streaming rather than actually in the Zoom meeting? Like only Sean and I and Gene have talked, but maybe John just prefers to type. So it's a very good point that whilst we struggle to do that many upgrades, because we're a financial data company and we have to do a lot of verification testing, we have to have stability. Other people are really keen to get the latest features as soon as they emerge from KBM, Linux, storage. John is not joining Zoom. Okay, that makes sense. Normally, John's not afraid to speak up, John Garbit. We have a question. So are upgrades still a pain point from Sylvain? Yes. He'd love to hear examples. So again, this is not a presentation. I'm not even a good presenter. So why doesn't someone speak up if they have a problem with upgrades? Why don't you share a bit? You can speak and we'll try and tap something into the etherpad. I mean, my question is rough and sorry about that. But to be honest, we know about the cadence and we know it can be difficult for operators to upgrade, but that's also why we try to make the upgrades smoother. So basically, I think upgrades are no longer a pain issue, but if you still have problems, actually, please tell me, because as far as I know, due to migrations, RPC upgrades, that kind of stuff, at least for the main services, for me, it's no longer a pain. I think that's very valid. So I think that part of the upgrade thing is fear and uncertainty. It used to be extremely scary. And the tested paths were just the most mainstream ones. And then if you made any different implementation choices, you would upgrade and you'd find stuff didn't work and you had to pull apart the database by hand and fix stuff up. And those days are long past. So I'm going to type in, please provide specific recent examples. I guess it's fair for the developers to say bugs, tell us. In our case, I think it's very fair to say that in Havana Ice House, it was a terrifying experience upgrading in Mitaka. Well, from Kilo to Liberty to Mitaka, which we successfully did, it got better. And in those days, we were on Nova Network, which was actually being deprecated at the time. We have not upgraded in the more modern era. We're on Rocky. So I got a shared personal experience. But I think this is a place where, for instance, the upgrade checks have helped. And I think they've also cleared out the expectation that all the services will be brought online at each release version. So if you're going from A to C, you do start the services at B. In other words, the databases migrated once services have started. I think they've got rid of the idea that those services get to do housekeeping for release B. So at least they're trying to. So the idea is that if you need to upgrade a long way, like A, B, C, D, you just do A to B, B to C, C to D, and it should work. So I do accept that it's got a lot better. But there are other reasons why it's hard that aren't entirely in the hands of the core of open stack developers. For instance, and there's a good point here, third party plugins. So we actually see this ourselves because the way that we do Neutron is the Calico plugin. It's a core plugin. But it's a cool piece of software, but it's mostly their main market is Kubernetes now. So their support matrix for open stack is, you know, we tell them where we're going to go, like we're going to go to Asuri and they're like, oh, I guess we better do that then. Like they're not really on board with the full open stack program. And I imagine that other people are also running plugins that aren't necessarily being upgraded twice a year, strictly in being in open stack CICD. Perhaps someone could share some examples. I'm going to put in my example now. I don't wish to badmouth the vendor Tigera, but you know, we're a small part of their customer base. The open stack with Calico has become a niche. So what other third party plugins are people using that maybe hamper their ability to upgrade, even if the core open stack components all upgrade very nicely. Okay, so we get some content there. Thank you. Contrail plus Metacorpus, CentOS. Contrail is not something I have personally touched, but we did look at that. I'm familiar with that. Has it been renamed now? Is it called something like Platinum or something? Hey, I'm young from Workday. I was typing that in. There's an open source project that's been spun off from open control called tungsten fabric. Yeah, that's ostensibly the upstream source, but to get the vendor supported version, you still need to buy the product, which is called Contrail at this point. And we're using the kernel version. Plus it's also a core Neutron plugin. So there's definitely a very interesting upgrade problem that you get when the hypervisors are still on CentOS 7. And there's a kernel dependency. And there is also a version or the open stack plugin dependency. So it's very difficult. And again, so we get the same thing where we have to wait for the open stack release to be available before we ask the vendor to provide the plugin for us. Yeah, it's all very familiar. By the time open stack releases become available, and then we tell our vendor we'd like to try that one, and then they get it. And then they start work on it. There's probably another open stack release already. There's no reason for the CI infrastructure in OpenStack for third-party CI is mature enough. It's just not everybody is willing to devote the resources to keep up with CI. That's a very, very good point. So our vendor actually pulled their software out of OpenStack CI because they described it as annoying, like they didn't do anything, and then their code broke, quote unquote broke all the time. So they went off and put it on GitHub or something like that. But we are very disappointed with this because the reason it broke is because they're consuming software that's changing. So if they don't maintain their software, then they have basically a reintegration task every time we ask them to do a release. And specifically, it's difficult to rationalize keeping it in the OpenStack infrastructure when you're only targeting a particular OS, for example. For example, a vendor has to duplicate their efforts to certify it for Ubuntu and for CentOS and for REL and whatever. Yeah, very true. Well, if I may say certification, I'm not really sure that certification should be done on stream, right? I mean, if we are really talking about, I mean, there are two different things that I don't want to mix. First is, okay, do we like support this specific driver, this specific plugin? Then for that, we have SAP RTCI. But if you want to make sure that we can certify a specific release with a vertical, I would say with a vertical hardware, like for example, using some specific OpenStack release or specific OpenStack distribution with some specific OS, with specific hardware, is it really something that upstream should be certifying? Sylvan, I'm sorry to button, but your microphone is very, very noisy. It's actually a little hard. I don't know if you have some paper or something on it, but it's just a little unclear. But I think very good points there. I think there's a huge tension between the breadth of OpenStack and the needs of some people who have code in CICD and are expected to suddenly support it on more platforms that maybe they don't actually have any traction on. I don't have the answer. So what I would suggest is please try and capture some summary of these issues on the ether pads because I will tell you that these ether pads are gold when we go to future sessions. I mean, the way I write these now is I just go and read the old ones because there's so much material there. And we just try and bring the issues that really get the most engagement forward to future events. Anyway, sorry to button, carry on. And by the way, someone who put their hand up, I think you're going to have to just try and button and just speak up. I apologize for this format, but it's all we can do in this current times. So it's okay. I see a lot of activity on the ether pad, but I'm sure I cut someone off. Was it Jan? When did you carry on? I'm sorry. Yeah, sorry. I was just, I think Sylvain made a good point where this certification and the CI. So it's probably a good idea to separate them. I'm not sure if I get his point, but if I'm paraphrasing him correctly. But to be inside CI is a very different thing from certifying that your solution has been fully integrated in the particular platform. To be inside CI means that you're participating in integration with the leading edge of everything that open stack supports. To be certified for a specific solution means that you have decided to focus on a particular release and a particular thing that you were willing to support. However, it's very difficult to keep up the certification pace if you don't also keep up the CI pace. And you're totally right, Jan. I'm not I'm not saying we should stop third party CI. I just want to make sure that I just want to make sure that we continue to say upstream services have like support for specific drivers or I mean I don't know what to say, but yeah, basically third party CI indeed. I'm only working in Nova, so I only know from Nova, but Cinder and Neutron are different. I know about that. But yeah, and to be honest, it's really difficult, right? It's really difficult to say, okay, we won't continue to support a specific driver because we no longer third party CI, but we try to do it. And I definitely agree with you. It's something that I mean if you are an hardware vendor and you really want to support OpenStack, then please, please, please provide a third party CI for us. I mean, not all the changes. I mean, we could discuss about the CI and which number of changes or the way that we could look at the CI. But in general, it's not possible to support, to be honest, a specific vendor if we don't have third party CI. Just because, as you said, if we modify something and then the driver no longer supports the hardware, then we will only see it by the next release. And that's definitely two legs. I agree with the points you make. I think there's an unspoken thing that the resources supporting OpenStack are no longer so plentiful because there's less kind of venture capital bubble associated with OpenStack. I think one of the things that possibly these conversations where people are really running it and pay vendors can do is to maybe realign the efforts of everybody in some way. And I don't have the answer to make it more, how can I put it, sensible? I think the days when I joined OpenStack, they thought they could do everything on every platform for everybody. But I think a lot of the froth has gone. So maybe we can work out ways to be more focused. Anyway, one thing I'm seeing now is in addition to the problem with the third party plug-ins, hypervisor reboots. This is something I'd love to get to because I want to actually share something. So we recently did an entire fleet reboot on our OpenStack Rocky clusters, which is, I think, about 1,500 hypervisors already. And we're able to move every VM on those things because there's a production clusters in a few efforts, like maybe 10 maintenance events over three weeks or something. The key thing that's changed for us, well, number one, we're all on shared storage. I don't know how you can live with OpenStack. You don't have shared storage. Number two, we turned on a feature that was default off that everyone needs to know about, which is auto converge, which is a Limpvert and KVM feature, I think. But Nova has a setting for it. And what it does is, if your VM won't rely migrate, it slows the VCPUs down until it does migrate. So all of a sudden, I'm at the stage where I can evacuate a hypervisor by typing a command and coming back later. It just works. All the VMs just move off elsewhere. Shut the machine down, get the RAM fix, whatever it is, and bring it back up. So I'm going to type it in. Don't worry, but this really blew the minds of my line management. I hope I don't seem like I'm boasting. I want to share, because we moved more VMs in a shorter time than the ESX platform had ever done in its entire 10-year life at Bloomberg. So if you don't have live migration with auto converge, you've got to try it. Anyway. Can I ask some questions about that? I'm curious what networking speed you've got. Jewel 25 or Jewel 40 to each hypervisor. I'm surprised it helps you. Okay. That's really interesting. So we had some VMs that were extremely hard to move. Like we're running some Humeo instances that are basically doing log ingest and metric and actually handling the queries. And they are just on fire hot. They're huge. I think they're 32 core or something. And they just wouldn't move. And then auto converge just hits them with a soft hammer and they go, then they wake up. I was going to say that the alternative is actually that quite a few times it comes about the live migration downtime. I put a link to it there. It defaults to basically half a second, 500 milliseconds. I think this seems to be folklore that if you set that to 2000 instead of 500, you get much better effects in a similar kind of way. We had VMs that resisted moving for hours. Yeah. But this isn't that specific config. That was a timeout. It's a weird one. So what it means is, and I'm going into way too much detail about how live migrate works, basically, there's a point where you've copied all the memory. And you go back again to see how much memory there is to copy next time on your next iteration. And LibVa has to make the call. Can it do it inside that config setting value? So basically, can I pause you while I do the final copy and then move you and start your new destination? So, yeah, if you've got busy VMs that are constantly churning round, if you wait just that bit longer, very often it can help. Now, there is a problem in older versions of Nova. We were very, very slow at increasing that. So basically, what happens is it does like a little step every time. So it tries 50 milliseconds, then for a while, then it gets 100 milliseconds, and then 200 milliseconds, if you see what I mean. So it only slowly steps up to that config value anyway. So for very large instances, it would take longer to set up to that config value. So it's kind of a triple whammy kind of thing. So it's worth trying that if your instances have a problem with auto converge, because some people don't like their instances going a bit slow for a while, and prefer to be off for longer, if you see what I mean, because they just sort of slip. It depends on the workload. But wherever that conversation, there are people who say, no, no, don't slow me down. Just, you know, run this thing to take me offline, and then bring me back whenever you like. But I think that they're being tricked. They're being kind of provoked into speaking up, because now we moved the entire fleet. We moved 20,000 VMs or something. And some people say, well, I went slower, but it's like, yeah, we had to put a new kernel on every box. Anyway, it's not about me. So one thing I will say, we have a spirited discussion on upgrades, and we're 15 minutes left. It's not up to me to move the thing along. I almost feel like we needed more of these. I will say that your ops meetups team is a little bit under the weather right now, not me, but Sean and Erica for the things going on. But maybe we can get together again. One of the things I wanted to float as an idea is we're thinking about having off-schedule, but brief, monthly one or two hour things, and we're thinking calling it ops radio or open stack, something, you know, radio show, I don't know. But if people like that idea, please respond on the Twitter threads. The ops meetup Twitter account is by far the place where we get the most engagement. So we're just focusing on that. I'm not trying to say Twitter is great, Twitter may be the worst thing ever, but we don't get any response on the mailing list. So please just respond to the threads on the Twitter. So as a trying to be a beautiful moderator, I'm going to try and move forward a bit. So you've got a lot of good material about hypervisor reboots. I said, please look into order converge. John, did you put the thing about that you just explained to us about the timeout value of giving LibVert the opportunity to? Yeah, I'll put the link in there. Thank you so much. People who are sharing information, you know, human memory is terrible. Two hours after this meeting, we won't remember half of it, but this Etherpad will, so please do. So what do we want to do? We can just carry on, talk about upgrades and say that this session became an upgrade session, or we can try and get on to some of the other topics I did. There's quite a bit about scaling. So I feel like we should probably touch on that. We have 15 minutes left. You're going to say there's lots of people with plus ones on rabbit. Not that that's new, but... So yeah, we can start with rabbit. We used to get killed by rabbit all the time, get woken up in the middle of the night. Recently, two things I think saved us. Number one, a very recent version of rabbit had some very important fixes. I don't have the details to hand, but I could dig them out if people want. I'll try and put them on the Etherpad before the end of the week. Second of all, we were mirroring the queues to every node, which means that as you add rabbit nodes, you add the total amount of work, you increase the total amount of work being done by the rabbit cluster. And we change it to exactly two, I think it's called, which means every queue has to exist twice on the cluster. So if you have five rabbit queue members, each queue is only on a minority of the nodes. And rabbit hasn't blown up in weeks or months now. So that's a couple of thoughts there. If other people have ways that they may grab it better, please share. But I think every time I ever went to an open stack conference, there was a talk entitled how we configured rabbit wrong and blew up at scale or something like that. I'm hoping that next time we will get together, which I'm looking forward to, that will actually be over. I don't know if it's just new options in rabbit bug fixes or what, but that is no longer killing us as much as it used to. But I see lots of plus ones that people are still encountering it. So I will try and share what we've done with it, which is specific version of grades were super valuable to us. And then adjusting how much replication of queues you're doing across the rabbit cluster members was very important. Someone's actually shared the link, the one I was thinking of, how we use rabbit MQ in wrong way at a scale. Yes. You found it. But I mean, I saw one of those, I think in Tokyo at the summit, where someone said that rabbit blew up for them so many times, they switched to zero MQ and then some homegrown service discovery stuff. And that wasn't working too well either. Some questions here about RPC working numbers and timeouts. So I actually run a development team. I'm not currently a developer, but I will say that our stuff, our core OpenSec distro is actually on GitHub, Chef-BCPC under the Bloomberg organization. You can see what we're doing rabbit by just going and reading our code. And if anyone ever had questions about it, they'd be welcome to reach out to me on my Gmail, which is at the bottom. Another question close to my heart, is anyone having problems scaling because of networking? Our old clusters are Nova Network-based and Layer 2-based. They have a single broadcast domain for each network. And we reach the limits. There's core switches or spine switches that have network ports and cannot be expanded and just can't have any more network. And our new thing is all Layer 3 and Routed and BGP. And we've scaled it so quickly. So it was interesting whether anyone else is still on Layer 2 and hitting those kinds of limits. There is one single plus one there. If anyone wants to share what they're seeing. Another point here. Q-Router's migration. Yeah, this is something I'm not familiar with. Is Q-Router an OBS thing? Someone help me. I think that's the L3 agent. A big part. I couldn't hear that. I think they're referring to the L3 agent. Okay. Someone speaking? Go ahead. Yeah, actually it's related to L3 agent. We have an implementation that is based on V-Switch. And the inconsistency I'm talking about is while migrating Q-Routers, we have at the beginning, for example, we have three agents. Agent A and B. We have normally a Q-Router is assigned to Agent A. And during the migration, we assign the Q-Router to Agent B. But at the same time, we found the namespaces and all the resources of the Q-Router still exist in the neutral node A. So if we have a lot of Q-Routers to migrate, we really have some inconsistency problems. We have some Q-Routers that still like in the boat, neutral nodes. And yeah, so maybe it's related to the time it takes actually to clean up all the resources from the Agent A, then create them to Agent B. But we have this inconsistency. We don't know if it's related to the time. Well, I'm sure that I speak for the developers present who are looking for high-quality bug reports on these kinds of things. It's not something I'm familiar with, so I really can't comment. So we have about seven minutes left according to my clock. I don't have a time hard cut off, but I think that the session will probably end in terms of being broadcast and recorded and any of those things on time. There's another op-session coming up. I think it's in 45 minutes or something. It's hard to reason about times and other time zones. We put that down as war stories, which is always a fun one people share about the day they broke their open stack cluster or powered off their data center, those kinds of things. But we could probably do a little bit of follow-up on these topics at that session. In particular, if anyone has feedback about whether we should try and get together ops-type events online, virtually away from the summits in the meanwhile, while we're probably mostly still all stuck at home, then I'd love to hear it. We're willing to do it. I'm willing to do it. But so far, we've flared the idea. We get one plus one or one retweet, not clear. That would really indicate enough support for that. Anyway, let's try and get through some more points while we still have this public seminar going. Some great detail about the QR address thing, which I don't follow. I think the rabbit thing got subsumed in there. Let me just pop that out. Nobody's having problems scaling MySQL. We managed to have it. We got to a scale of over a thousand machines before we stopped MySQL using the out-of-the-box default buffer size and disk flush settings. We were thrashing our SSDs to death, engaging the performance of the entire cluster. We make rookie errors all the time. Then again, maybe not quite at the right level of indentation, but it doesn't really matter. There's actually some good material and questions here about telemetry. Prometheus opens that exporter. Yes. I put the question down. We are migrating to Prometheus. I was looking to see how people are running it. As most people already know, Prometheus doesn't do transactional. How are you storing the data, or are you expecting that you have a tolerance for missing out metrics? I can really say what we were doing. We used Color Ansible, and it has the ability just to start three Prometheuses, just all scraping your sources. I'm going to pop some behind a low balancer. All right, so you're scraping multiple times for each metric? Effectively, each server is just individually scraping all of the sources. All right. It's not, by default, it actually low balancers equally around all of the, what do you call it, all of the servers, which isn't ideal if one of them goes down because you used to get wiggly graphs. It should really do the same thing you do with MariaDB and this focus on Mon. Anyway, it's a solution. I'm curious what other people do. It was simpler than Thanos anyway. There's lots of good material in this Etherpad. I want to thank everyone. I'm not saying we're done yet, but it was really unclear to me whether it would be me and Sean talking to two people or, as it turns out, 45 other people. So thanks everyone for turning up. I wonder if can we get just a very specific question. The old format of ops meetups was two days, solid eight hours of work. I think everyone feels that virtual events are more tiring in some sense. I'm not sure why, but a whole day of video conferencing is very, very tedious in some sense. So we were thinking that more regular, but shorter events might work better, but I really don't know. So maybe if I can just put it on the Etherpad right here, where everyone's looking. So I think the contrast is... Chris, can you copy the Etherpad in the chat too? Yeah, actually it was in the chat, but it's just scrolled up, but I shall do that again right now. Okay, so what I'm doing is two options. And it's just above testing, because you got a little bit into testing. So if you're looking further up at upgrades and stuff, if you could please just scroll down. So we have two options. I've called it future event poll. Two-day event, like the old in-person meetups, or monthly short ops radio zoom. And I'm seeing a flood of plus ones. Thank you guys. So someone else is saying both. So the event in January was very successful, but since then I will say that the ops meetup team itself has a lot more challenges. So someone said return to see the travel when child returns. Yeah, I think when we can all get together in a great location and not only work on these things, but then maybe have dinner together, I'm totally into that. I think we all expect to do that. The thing is what to do when we're coping with the current situation, both pandemic, politics, other things. And there's a tremendous, you know, majority on this particular thing for trying to do something more regularly, but shorter on a zoom or something. So this is very encouraging. I can throw this together. And it seems like, you know, this session went extraordinarily well. So it seems like there's a demand for it Collar Club was created to get. Yes, Chris. If we go on like virtual ops meetups, then we need to consider about the time differences. So the short version of the event is better to cope with the time differences. Yes. And also we could even adopt the alternating time slot. In other words, something good for North America, then something good for APEC. Right. Yeah. I don't mind early or late with a little bit of notice. It's obviously very easy at home. So someone sharing something. So because this is being recorded, I think it's been going to chop off now. So I want to thank everyone. Feedback via email is very welcome. My email is on the sheet, but the public place where you can show interest and support is the Twitter account. I think most of you will know it, but let me see if I can share it quickly. By science, we discovered that we get more participation here than anywhere else. So I'm going to say that the meeting is done for the point of view of broadcast. I'll hang around. We can continue updating the etherpad. I do have to prepare for the next one, which is in 45 minutes. But thanks, everyone, for coming. I hope this was useful. And I guess we have a pretty good mandate to try and do something regularly, but brief rather than the two-day event. So fantastic. I'll give it a go. Thanks, everyone. Thanks, everyone. Thanks, Chris, for moderating all of this. Chris, welcome. Nice to see some of you or at least hear from you. John, thanks for joining. Sullivan, I don't know some of you by name, but thank you all. The etherpad is not going away, so you can just carry on typing in there. I don't think I get a signal. It still says live on custom live streaming service, so I don't know. Maybe I'm supposed to cut it off. Does anyone know? I think you could probably just end meeting. You made you the host, I believe. So, okay, so that's good. Actually, I am the host. So I want to ask for final comments, questions, requests, suggestions. I know some people are just too polite to leave before the meeting ends, so I will end it. But does anyone have any final thoughts? No? Okay. Well, thank you, everyone, and I may see you in the next session. Thanks a lot. Bye. Thank you.