 So, welcome everybody. Sorry for this delay. Everything was working 20 minutes ago and all of a sudden stopped working. Anyway, my name is Radoz Wawczymgielski and I'm working for Nokia. I'm part of something called CloudBend team and I'm working on CBIS. Nokia CBIS stands for the CloudBend infrastructure server, right? So, why this dock? OpenStack is complex. Everybody knows about this. Probably nobody wants to argue about this here, right? Anybody? No. Guys, I want to make this session interactive. If anyone has any question, just give me a shout, raise your hand and jump to microphone and we can discuss or maybe after the session. So, the performance of the OpenStack is even more complex than OpenStack itself. And what is the session about? It's about some kind of experience I had, okay? I was given a task when I was given a big setup and on that setup I was going to work on scalability. And while I was working on the scalability, I just hit few, I'm sorry, probably I should type the password. So, while I was working on the scalability problems, I hit also few problems with the performance and along the way I was trying to resolve the problems and this is what I want to present with you and share the experiences, right? So, the main thing which I was going to talk about is what the hell my controller is doing, okay? There's quite a lot of process running on the controller and when you look closer on the process, you can actually learn a lot and this is what the talk is going to be about. Also, the default configuration which you're getting, like when you install the OpenStack, usually you run some kind of defaults. The defaults are, well, the defaults are to have a one purpose basically. They just basically let you install the OpenStack which is running out of the box. Nothing more, right? But if you have something more, you have to actually start digging and just doing some work to optimize these controllers. So, don't trust the vendor's defaults, right? A few tips and tricks and why and how, right? So, basically, there's, with the OpenStack, it is important what kind of workload you're working, you're running on your OpenStack. I'm coming from Nokia and the CBIC platform for the telco type of application, but this probably will be kind of similar or maybe you can apply the same rules for your type of application, whatever you're running on top of OpenStack. But, like, my talk was and my experiences, my measurements were done on some kind of, like, a telco grade OpenStack. And I probably should apologize because when I was submitting the subject for that session, actually I put a few more points and when I realized that this is just 40 or 45 minutes, I realized there's no way how I can squeeze everything in that short time, short window time, right? And I also assumed that, like, this is going to be a little bit advanced talk, so I assumed that you guys know, for example, what's, I don't know, CPU context switching or what are the CPU registers and stuff like that, okay? This is, maybe I'll just spend a few seconds to just explain this, but I assumed that you guys know about this, right? So, what was my platform? So, like I was saying, I was given the real hardware, which was 7 to 2 servers in two racks. The servers are, were Nokia Airframes, 48 CPUs, 256 GB of RAM. I installed the OpenStack using the triple law. So in the triple law, you have this concept of under cloud. I don't know if you're familiar with the triple law, but you have this concept of this management server, which is called under cloud. I use, of course, the three controllers, the standard HGA configuration and 68 computes, right? So I was left with the 68 computes. I installed everything, well, there was a few obstacles along the line when I was installing this, but that's probably subject for completely different talk, right? I'm not going to talk about this here. And my workload, what I was running on this setup, so after I installed it, I wanted to work it hard. I wanted to smash it, just put it on the ground, start, make the controller sweating as much as possible, right? So that was my goal. I was using just fairly like a rather small VMs, just between 1 and 4 CPU. RAM was between half and 2 GB of RAM, so not that much, using 0, CentOS 7 or Fedora 24 images. And with telco, usually, you can expect that you have at least, well, four network interfaces, probably like a minimum. All the telco applications usually have multiple network interfaces, so this is why I was putting like between four and eight network interfaces to every single VM, okay? The IPv4 and IPv6 dual stack probably doesn't matter too much, but I was using the heat. So in heat, I created like a few complex stacks. The idea was to just make the heat working hard, and with the heat, I was creating like the every single resource was put into the separate nested stack. You had this kind of nice tree of resources where every single like, I don't know, disk network was in the separate resource, and that's actually makes the heat working hard. And the second type of workload which I was running was basically like, I was creating the VM in barst. So I was just like a submit, I don't know, 100, 200, 300, 400 VMs at one time, okay? Nova has this nice parameter, which is Nova boot, and there you have a dash dash pool or something like that. I don't remember by heart right now the parameter, but it basically create all the VMs like in parallel, okay? So I was really trying to put the controllers on the ground. That was my main goal, right? And on the end of the day, on this 68 servers, I create about 3,500 VMs. There was about 3,000 VMs running constantly, but I hit 3,500 VMs. So I didn't want to like a heat record of the number of the VMs on the single open stack installation. I was probably far away from that record, right? That wasn't my goal. But the 3,000 is because like in telco application, you don't really have this kind of like a overcommitment. They don't overcommit CPU, they don't like overcommit CPU, they don't like overcommit RAM, okay? So that was the goal, right? I probably could put more VMs on that environment, but yeah, that was the telco type of installation, right? So this is just this 3,500 VMs, right? At the end of the day. Right. So first look on the controllers. So before I even started like putting the workload, putting the VMs, putting the heat stacks on it, I just install it and start looking around, okay? So I log in on one of the controllers and start looking, okay. So how many actually processes running on this controller? And I realized that this 1,100 around processes running on single controller, okay? That's quite a lot, I would say, right? At least for me. The next thing which was surprising for me is like the load. The load on the controller on completely idle controller, okay? The control just doing some just housekeeping, nothing else, nothing more. Was the load was between 4 and 5 and 15. I don't know if anyone actually start like a doing similar kind of experiments, like checking the load on the idle controllers, all right? But it was also quite high. And for example, the next thing which was surprising for me was the high number of context switching, okay? The 40, 50,000 context switching per second, that's kind of a lot. Even if you have this 48 CPUs, 40, 50,000 context switching per second, that's a lot, okay? So I start from something simple, right? And what the graphs, there are two graphs, okay? So there is one controller 0 and controller 1. I took two controllers only because I wanted to squeeze this on the screen. And I was afraid that if I put three graphs, it will be hard to actually read it. So just I used as example two controllers only. What the graph is show, on the horizontal line, you have a time, six hour time. And over the six hours, I created all the VMs and created all the stacks and so on, so on. You can see the spikes where probably I was just like putting some VMs on the open stack. And the thing is that like initially when I start, when I drop this bump, like 400 VMs, of course it failed the first time, okay? I had to do a little bit of tweaking. And at the end of the day, I was able to create almost 500 VMs at one time, okay? But that's a different story. So anyway, you have the spikes, but in general, you can see that the average load is about, I don't know, about 10, probably a little bit more than 10 all the time, okay? Which doesn't look right to me. So the next graph, what it show is basically like, this is exactly, again, the controller zero and controller one, and you have a number of processes running on every single, on both controllers, okay? So you have about 1,060, 1,070 processes running in the same time. Actually, maybe the running word is not the right word. Actually, they exist on the controller, okay? The running is not the right word because basically, then I start looking, okay, so how many actually processes is actively running? Because there is a difference. Usually, the process is, I don't know, sleeping or doing nothing or waiting for something. And most of the processes. And this graph actually shows how many of the processes running on the open stack controller are doing some real work, okay, at the one time. And what is interesting on this graph, again, like the average value of the actively running processes on the controller is about, well, is between 10 and 15, yes? This graph is scaled to 48, from 0 to 48. To 48 because actually, like I said, my server has 48 CPUs, okay? That's the main reason why. And you can barely see some kind of, if you just draw the virtual line in the middle of this graph, at the level of 24 virtual CPU, there's barely any dot above this kind of virtual line, which means that most of the processors actually running with like a half of the power which they have. And my feeling was that it has to be somehow related to the hyper-trading, right? Because, yeah, I had hyper-trading and I belong to this. So what I did, I said, let me do something crazy. In every single, I don't know, open stack guide, I don't know if any performance guide exists for open stack, probably not, not yet. But if you had any problems with performance or anything like that, you very often can hear that just increase the number of workers, okay? So you have this, you have this Keystone workers, Nova workers and so on, so on, right, these parameters. But like in my case, I tried to do something else, because all this context switching, this, this, this slot on the idle controls makes me thinking that this is not the right way to go. And I know that it sounds controversial, but this is what I get. So after, after I reduce the number, the, the number of workers for every, for all the open stack services, like you have the Nova, Neutron, Cinder, Glance and so on, so on, for the core services, I reduce the number of workers by half, all right? And after this, I get like a 200 processes less on the controllers, on the controllers. So from, I reduce from 48, the number of workers initially, when you install the open stack, most of the time, like if you install your open stack with some, whatever you're using for installing open stack, but if you're using the, if, if that installation use puppet, usually puppet by default set the number of workers to number of, equal to your number of CPUs on the server, on the controllers, okay? So I reduce this by half, and I get 200, 200 processes less on the controller, plus the context switching drop to, to the value between 10 and 25,000, so probably by half, almost. And also, when there was no load on the controller, the load was basically, again, dropped by half to the value of five or something like that, right? Which is probably what I'm looking for, okay? And there was no difference when it comes to like a responsiveness of the open stack. I try to repeat the same test, I try to like drop the, the VM bomb, right? With the 200 or 300 VMs and everything was still running perfectly fine, okay? So it looks nice. And like, when you look how exactly the, the, the open stack services are working, so let's say Nova, okay, as an example, right? So the Nova has this, in the configuration file, it has this, this, this option workers. So what this worker means that Nova will start and create X number of API processes, okay? Those are the processes. But inside every single process, Python is running this kind of green light or event light, which is this kind of hyper, no, trading for Python, okay? And basically, you can have up to X number of concurrent API calls handled by every single process of Nova API. Okay? So like in total, you can, you can somehow probably roughly estimate how many, how many API calls your API, API service can take by multiply, multiplying the number of workers or by squaring actually the number of workers, okay? So this is number of processes, multiple by number of green lights, which will be created during the, when, when the Nova is, for example, working. I'll probably explain this later on this. There's one exception. So Keystone is exception. I'll explain this why Keystone has a slightly different architecture. So I'll talk about this later. So I was saying that the hyper trading might be actually somehow guilty of this, this weird behavior that actually there was, I saw like less than 24 running, actively running processes, okay? And there is also this kind of like a question to be or not to be. Enable the hyper, the hyper trading or not enable it. And if you want to enable it, do you want to enable it on the controller or maybe on the compute or what's, what's, are the implication and, and, and so on, so on, okay? So what I'm trying to do, so I'll try to explain you why and, and, and well, I probably can prove you why you should enable the, the, the, the, the hyper trading on the controllers. I'm not speaking about the compute. Computes are slightly different, okay? Especially when it comes to telecomplication again. So how the hyper trading exactly works, okay? So this is a picture which I basically rip off from the Intel architectural software developer manual. And it's pretty much how the CPU core looks like. So you have this, this two logical processes. So there is a logical process zero and logical process one. And those are the, like the, the hyper-traded CPU. And they have a single executant, execution engine, right? So the, what's happening is that in every single point in time, there's only one of these logical processes which is actively working, right? And this exactly explained the graph with the active processes running. So when I saw the half, only the half of the processes, I mean the active process number was half of the number of CPU because with the hyper-trading, even if you have enabled the hyper-trading, you will have always, at any single point in time, just the half of the processes running. You cannot have all, all of them running in the same time because they share the execution engine. So just to summarize this, we have two logical processes, single execution engine. So each logical CPU has own set of registers, okay? So this helps a lot, all right? So this is, this is probably, this is, this is great. They share the same level one cache, okay? So if you think about the CPU and I was telling that you, we have initially, we have a 1,100 processes on the, on the, on the controller. I think, so if you start, do the simple math, you, you divide a thousand by 50, so we end up in around 20 processes per CPU, okay? Per hyper-traded CPU, all right? If you run this, this, this 20 processes per, this hyper-traded CPU, there is a lot of this L1 cache flashing, all right? So those, those, those processes actually are fighting about the space in L1 cache, okay? And then you have to, well at some point you will have to repopulate the cache from the old two cache or something like that, okay? And that takes a lot of time. I mean the CPU time. So, but anyway, going back for a second to the hyper-trading, so they, they have the common power-saving mechanism, okay? I will talk about the power-savings later on. So there is a, inter while ago I actually introduced this hardware support for context switching and that's great, that helps a lot. But still, because, because of the Intel architecture, the, the, the context switching, the process switching, like putting one active process into the memory and just putting another one, which was sleeping or waiting or something back to the, to the, to the CPU, make it running again, it takes quite a lot of time, okay? This is, this is very expensive operation. This is why we exactly want to have this, um, um, we want to minimize the number of context switches. And the hyper- trading helps a lot with this. So I did yet another test, okay? So this is, this is a graph, I collected this graph over 1000 seconds or so, and there is no workload. This is again the controller. And you can see that there are two, two colors. There's probably, well, it's supposed to be blue, but probably my wife said that it's not blue. It looks like a green to me, maybe. Um, so the green dots are the, um, number of context switches when you have the hyper-trading enabled, right? And you can see that it's about, I don't know, 25,000 context switches. But if you disable the context switches, all right, it's, well, less than 40,000, but it's, it's a gigantic difference in the number of the context switches. And you, you don't want really to just like your controller spending time on, on just shifting the processes between the CPU, right? You want to actually save the time, and you want the processes to actually doing some real work, not the, just housekeeping the processes, the, the processes between the CPU, right? So this is why exactly you want to have the hyper-trading enabled on the controllers, all right? Plus, the most of the OpenStack services is written, well, all of them probably, are written in Python, and, uh, uh, like all the web application, they have this interesting characteristic that, like, uh, when the API request is coming, uh, the worker take the request, get some data, do some work, then send it back to the, to the user, and then it's waiting, then it's waiting milliseconds, 10 milliseconds, 20 milliseconds, that's a lot of time, that's short time for us, but for the CPU, this is gigantic long window time when the, when the API worker gets something back from the user, okay? And for all this time, the process is slipping, he's doing nothing, okay? And this is, this is why, for example, like, we have, we have probably this small number of active processes running, but anyway, uh, for the controllers, again the hyper-trading should be on, the computes are different starting. Um, Keystone, so I said that the Keystone is exception, okay? So, uh, probably most of you is familiar with this kind of, like, uh, with the worker configuration, you have the Keystone file, uh, you have the default section, and in the default section you have the public workers, admin workers, uh, well Keystone is special because Keystone have this kind of distinction between the public and admin workers, usually you have just the workers, and that's all, uh, so again, what it does, it creates, it will create 48 public, uh, workers and 48 admin workers, and every single worker, okay, will be able to handle the same number of, in my case, 48 concurrent TCP connection or API sessions from the end user, okay? So multiple this, so we have 48, multiple boy, 48, it's about two and a half thousand, okay? Less, a little bit less than, but two and a half thousand. So unless you are not Google, Amazon, two and a half thousand concurrent connection to, uh, Nova, I don't think so we need it, maybe. So, uh, but there's a, there's a, there's a key, uh, so, uh, this is old architecture, old Keystone architecture, but Keystone is switching it, and Keystone is probably the first service which is fully switching from the old architecture to the new architecture. So why, why Keystone is actually doing this, okay? Uh, doing multi-process, multi-thread application is tough. It's difficult, uh, Python global interpreter log, you know, what's Python, Python Jill? Good. So Python Jill doesn't help, right? So the reason why we have all this, uh, Python, uh, concurrency framework like Python event led, Python, uh, uh, green led, and so on, so now you name it, is because basically Python has a Jill, this global interpreter log which basically lets you run only just one active process at the time. The real threading doesn't work on Python, okay, on Python application. You have to use something like this artificial way of, of running multiple or, or you fake basically that the Python can run multiple, multiple processes at the same time, okay? This is the problem, uh, the problem with this Python concurrency frameworks like, like I said, the green led, event led, and so on, is that they assume that they are designed to be like the web application, okay? They are designed, they are good for the web application, that you have a request from the end user, you handle the request quickly, you send something back and then you wait, you wait in a few milliseconds, 100 milliseconds, okay? And then the, the, the, the thread is doing nothing, okay? But this is not always the case, because like, except UID tokens, we have PKI tokens and Furnit tokens, and those tokens are, are expensive in terms of computation, okay? Those are the tokens which to be like a encrypted signnet, so even the verification of the token is quite expensive, the, the, the encryption stuff, well, you have to spend quite a lot of cycle to actually calculate all, all, all the stuff, all right? So the, so the tokens are not, the keystone workload is not really like the light workload, which is done by some web server somewhere on, on, on some portal, okay? So it's slightly different, and the, the truth is that way, way ago, like servers, web servers, like Apache or Nginx or whatever else, they already resolved the problem of this, with this multi, multi-processing and multi-threading and, and this, and this concurrency, okay? So why not use them? And right now the new architecture, which you probably, when you would take any like the installer, open stack installer and you install it by default, your keystone will use this new architecture right now, okay? So it's slightly different, you have the user, which is sending the API request to the, to the Apache server, okay? And then Apache server is managing all this dirty work with the, with the, with the pooling and so on, so on, we're handling the, the, the basic stuff, and then using something what is called WSGI, which stands for the web server gateway interface. So this is this standard Python interface when you have some kind of web application, and Apache is talking using this WSGI to the keystone backend servers, all right? And the backend server doing the work, sending back, and, and Apache is basically just like a proxy, something like that, okay? So we have this new great architecture, great. So what could possibly go wrong? So there's a few things, and actually I spent quite a lot of time discovering what's going wrong, because when, when you have the like big installation, like I was doing this installation with 68 servers, with triple all, so we have quite a lot of, like with triple all, you, you will end up with probably like a thousand of nested stacks, okay? And there's a lot of the services working in the background, Cinder, Nova, and so on, so on, so on. And all the services actually have a, they have to speak to each other, and all of them actually they're using keystone as a common service, common auto indicator, right? So all of them have to talk to the keystone. So the keystone actually is the most important, well, most important, is probably the most fragile service. You have to make sure that you have enough workers to handle this, and in the past it was pretty, pretty simple, okay, with this alt configuration, but like for example, when I start installing the setup with triple all, the default configuration of triple all, I don't know, I'm not sure if it's still there, was that it was probably like a four processes and four threads. So in total I had 60, 60 concurrent API calls which could be handled by keystone. It's not much if you have a over thousand, thousand of different nested stack in the triple all, okay? So I start seeing the problem that the keystone starts, the keystone backend server starts spiking the CPU to 100%. Some things were timing out in random places and well, it took me a lot of time to actually understand what's going on. So like I said on my first slide about keystone, that was the alt configuration and this is your new configuration, so get used to it, okay? This is, this is the new place where you're configuring the keystone, the number of processes and the number of threads. And like I recommend to use the, so you can see that there's a process line like here, I'm trying to highlight it, yeah, this is the process line and this is the threads. And like I said because token is not like, token, sorry, keystone is computation-expensive process. So this is why I would prefer to just put more processes here and just leave the threads with one because like I said, keystone is computational-expensive and you want to give a power, you want to actually get this token signed as soon as possible, okay? Right. And more OpenStack service is going into this direction, to this architecture. Cellometer is going into this, into this direction, probably more, most of them will be sooner or later running under Apache WSGI. Right. So this is never ending story, the C and P states of CPU. So what are the C and P states? The C states is basically, it allows you to save some power, okay? It allows you to put the CPU into some kind of low power mode or something like that. It's done in a few different places. So part of the CPU can be idle, some like, some of the part of the CPU, physical parts of the CPU can be basically just like the clock might be completely switched off. The power might be completely switched off. You can switch off, for example, the power of the level one cache or level two cache and so on, so on. And there's no instruction executed, okay? While the P states, it's completely different because the P states allows you to just like increase or decrease the power of CPU on demand. Using the, you can change the voltage and you can change the dynamically the frequency of the CPU. Even on your laptop, if you check, there's a command which is called CPU power. And this command lets you check the status and the number of the C states, okay? So in my case, on my survey, I had the five C states. You can see this here. This is the number of C states, okay? And this is what's going on. Every single C states except C0 is some kind of, so this is, this is, this is, so we have the C0, C1, C2, C4 and so on, so on, okay? So the C0 is the normal operational state of the CPU, okay? You want to have your CPU in the C state zero. And every single next CPU state is basically putting the CPU into some kind of like a low power mode or reduced power mode, okay? That's great, but the problem with this one is that you have some kind of latency. The latency, the latency come to the, to the game when you have to exit the state, all right? So like on my server, that was server which was running for the five or a few days. Like the deepest sleep state is the C6, okay? And you can see the latency here, which is quite long. This value is actually in microseconds, okay? So going back for a second. So the latency of the C0, which is the normal, there's no latency because this is full like a power mode of the CPU. There is no latency here, but the latency of the C1 is the two microseconds. Then you have the latency of the C1, whatever it is, it's the 10 microseconds, okay? So it's, it's growing. Then you have a next, next, next state, all right? And again, I start looking around and I found this, I don't know, sorry. One step back, sorry. So all the C-states, they have, you can find all these parameters in the sys file system, all right? If you, if you execute, for example, command like this, I don't see the command, okay, sorry. And oops, I broke something, all right. Okay, I'll probably make the demo when I finish, okay? I don't know how much time I left. Anyway, so all the, all the information about c-states you can find in the sys file system, okay? And the most interesting for us is this latency. We want, we want this latency to be as small as possible or there was no latency at all, okay? This is, this is probably the, the optimal scenario. So this is the table which I again copy and paste from the Intel documentation and they estimate that like exiting from the c-state c1 is about one microsecond, then we have a 156 microseconds and so on, so on. Those values are not exactly the same, but those values are probably for different type of CPU and maybe the document was older, but I was, I was seeing on my server slightly shorter values, but anyway, there's quite a lot of time when actually look, there is a usage line. So this is how many particular c-states was entered and this is how much time was spent in this c-state. So that's a lot of time and you don't want this time to be wasted, especially on the controller, okay? So what you can do about this? There's a few things. The first thing, the more intrusive is probably go to the BIOS and change few settings. There's usually some power management stuff, you want to just create some custom profile or something like that and set everything to the high-performance whatever, okay? You have, all right? Max speed. Then the next option is to just use those two kernel parameters, okay? Those things basically, those kernel parameters don't disable the c-states in the hardware, but they like hide it from the kernel, okay? And the next thing, the last thing, is probably the less intrusive, but if you start looking in this path, in the sys file system, there is a disable, disable file. If you write something to this file, like one, you will disable the particular c-states, but you have to do this for all CPUs, this is why you have a star here, for all CPUs, and you want to do this for all states. And in this way, you can put your CPU into the highest possible power mode, the fastest possible scenario. This is what we want. And believe me, you can spend a lot of time, waste a lot of time actually, the waste is better word to use it. Troubleshooting this kind of like a performance stuff, which is related to this, to this, because it's difficult to actually try to discover that this is the problem. Block trace, all right, this is the cool stuff. So, what is block trace? Has anyone heard about block trace? One, good. So, block trace is something like iostat on steroids, okay? It allows you to, it's kind of s-trace for the iosub system. It talks directly to the kernel and it takes, pretty much, it allows you to trace every single i-o request to every single block device you have, which is pretty cool. Block trace, how to use it? This is, you can copy and paste this and that will work for you. So, what's going on here? I just created some directory in the first line, in the second line. So, think about this. The block trace writes quite a lot of data, but you don't want to write those data to the drive which you're actually trying to trace, okay? It doesn't make sense, right? Because then you will have this obfuscated results, okay? And you don't want to do this. So, this is why you can use the tmpfs or, for example, some kind of ramfs to store the data, okay? So, in my case, I just create some kind of far gigs, four gigs, well, directory, okay? Then, I just basically the last command execute the block trace and it will run. It will collect the data, it will set to the file and you have to just press control C and it will stop, okay? And then you have to analyze it. There's a little bit documentation. There's a special tool which is called BTT, which basically the block parse and BTT. Those are the tools which allows you to just somehow analyze because the block trace writes data, the results into binary format, so you have to just somehow make it readable for you. So, this is how the output from the block trace looks like, okay? So, there's a lot of stuff. I don't remember exactly what exactly which command means, but like you have some minor and major number of the block device. The interesting parts are the columns with the capital letters like those two, okay? And you have, for example, the next number is the number of the block on the disk. Like, for example, that was, let's say, SDA drive. So, that was the block number of, well, this is the block number, which I, for example, the WS means that it's write and sync, okay? Like, you have the WSN, which means the write, sync and modify, okay? This is how you, I'll have more description about the acronyms on the next slide. And you have the block, you have the size of the request, okay? So, this is in blocks, if I'm not mistaken. And then you have a process. So, like, my SQL, some system, D, and so on, right? This is how it looks like, right? So, the two columns, the two important columns, the one column is the action, and the more interesting is the category, okay? So, in the category, you will find the read, which is, you have the capital R, you have the, you have the write, which is capital W, you have the flash, and so on, so on. Based on this kind of letters, you can, you can decode what's really, what's going on with the process, what kind of request your process was, was generating, okay? And again, few results. So, this graph shows, I did a little bit of analysis, some big data, four gig of data, still big data, for me at least. So, this is like the sum of the IOPS generated by process. So, you can see the, on the horizontal line, there is a, there are all bunch of different, different processes, and on the vertical, you have the numbers, you have a number of, the IOPS number generated by process, and this is logarithmic scale, okay? So, this is important. So, the thing is that, like, there is a, there's this one spike, which is probably not interesting for us, because this is K worker, which, K worker is the kernel process, which is really doing the rise to the disk. It's basically write, it's a kernel space process, which is writing the data, or reading the data from the drive, on behalf of the user space process, okay? So, this is, this is why actually, is the main process doing the right with the physical device, with your physical drive, okay? But then, next, you have the MySQL, you have the Mongo. So, the MySQL and Mongo are the winners, okay? There is also a swapper, which I don't know how to explain this, but because I haven't seen any swapping on my system, but the swapper is still here, so that's some kind of mystery to me. But then, you have, you have, you have a Neutron, Neutron server, you have a, you have a Cinder API, and so on, so on. But bear in mind that this is, this is logarithmic scale. So, the, the, the biggest IOPS consumers are MongoDB and MySQL. And they are really, like, the difference between the, the MariaDB and, and, and, and MongoDB. The difference between the next one, the index IOPS generators, it's really gigantic, because this is at least, like, a 10 times difference, okay? So, a little bit more breakdown, and this is the number of writes. And the number of writes actually, well, doesn't look interesting, but again, the scale is interesting, because the scale is 100,000, while on the previous slide we have 10 to the sixth, right? To the power of six, which is, again, 10 times. And this is the writes, the sum of the write request. And again, the winner is MySQL and MariaDB. So, and those stats were collected over six hours, okay? So, this is, this is again the, the stats I collected while I was doing my, my, my test with my workload, right? So, and, and, I'm not sure if that's actually visible for you, maybe I'll try to make it, make it bigger. Oops, no, I can't, okay, anyway. So, on this graph, the last one is the write. And if you look closer on this, on this graph, you can see that the, like a majority, like a 90% of the IO requests are the writes. There's very little actually reads generated by all the processes. So, this is, this is quite an interesting, this is quite an interesting workload, which is mostly the, the, the writes. I, I, I expected that, that they will have a lot of writes, but I didn't expect that it will be so much of them. And the winners are MariaDB and MongoDB. So, what we can do about this, right? So, about the MariaDB, there's probably quite a lot of different things you can do. What I did, and what actually helps me a lot, with the three controllers, we run, of course, Galera cluster, and you don't need binary locks. You can disable binary locks if you run Galera. Binary locks are basically useless. Maybe you are not completely useless, but you don't really need them, okay? And the, the next thing is this, this, this I know DB flash lock at TRX commit value, right? Which is probably a little bit controversial, but it's controversial because if you run the single, my like database instance with these settings, it will make your database faster, but you can lose data, okay? But if you're running Galera cluster, right? Even if one of the cluster members will go down and you will lose some kind of rights, okay? Then it will come back. It will talk to the rest of the control, the, the, the MariaDB cluster members, Galera cluster members, right? And you will realize, okay, I'm probably missing some, some commits, and it will just pull populated data from the, the, the, the remaining actively running cluster members, and there is no risk of losing data. So for the, for the, for the Galera cluster, actually it makes sense to use this setting. I'm not the MariaDB expert, so I'm not going to actually give you an advice about my MongoDB, and that's probably it. So key takeaways. You probably know how important is context switching, the, the load, how the hyper trading is working and what's, what's exactly the impact on the controllers. The API workers, you really don't need so many API workers, okay? The CPUs, CNP states, don't forget disable them on the controllers, the block trace, which is pretty cool stuff, but it takes quite a lot of time to actually get some useful conclusion out of the data. And the interesting, the controller worker characteristic, I didn't expect to see that many writes on the controller, okay? This is, this is, this is surprising for me. Questions? No questions. Right. Oh, there's one question. I guess my only question was the, was your only workload, the spinning up and stopping the machines, did you actually have the machines do anything? No, I didn't put any workload inside the VMs. No, but that wasn't my point, because like the, even if I put some workload to the VMs, it wouldn't affect the controllers in any way. I was running, like I was creating the VMs, I was deleting the VMs, I was firing the stack concurrently, I was deleting them in the same time, okay? I tried to just like really put the controllers down, but I was thinking about running the, some kind of workload inside the VMs, but for the controllers, it probably doesn't really matter. All right, so thank you everybody.