 So, today I'm going to present you some experiments I did with huge pages in MySQL and Postgres, mostly. And the presentation is going to be starting with, why took me to test this and try this? I'm going to just do a brief review of how memory works, a very simplified one just for the context, then how you would have an application work with huge pages, how we set up large pages in practice, and all of these in the context of Linux, okay, and x86 mostly. I haven't really looked into anything else but that. And then I did some testing. I'm going to discuss the results I got with you. So really my main motivation was seeing, well, seeing, working with talk with DB and MongoDB and having the transparent huge pages always on the way. Has anyone worked with talk with DB before? Yeah. Well, okay. Nice. So you might remember that if you start talk with DB, if you try to start MySQL with the talk with DB start the engine and you have transparent huge tables enabling the server, it won't start. It refuses to start. And it tells you to disable that. And then I started working a little bit with MongoDB and the manual says, well, you can actually start MongoDB with transparent huge pages but the manual tells you not to. And it says that database workloads often perform poorly with transparent huge pages. And they say that because they tend to have sparse rather than contribute memory access patterns. So let's think about those two databases mostly which I'm going to cover today. MySQL and Postgres. The main thing on a database, it would be the database cache, right? We aim to have all the data and index or at least most of that in memory. So we have a faster access. And with MySQL covering the main storage engine, which is the I know DB. We have the manual kind of suggesting us to use as much as 80% consecrated to this database cache, to the I know DB buffer pool, right? So we would have something like this. This is a rule of thumb, of course, but what we want to say is as much as the memory in the server we can consecrate to the database cache, the better it's going to be because it will need to access less the disk in search of pages, right? And if you happen to fit all your database in the buffer pool size, then you will be in practice working with any memory database. 80% but it really depends on how much memory you have in the server, right? If you have a lot of memory, you might be using more than that. With Postgres, it works a little bit different. What we can call the database cache is the shared buffers. And what they say is that we shouldn't use most of the memory for the shared buffers. And that's because Postgres is a little bit different. Every time you look for something in the database that is not in its cache, that is not in the shared buffers, it is going to look for that on disk, of course. But then it needs to load to the OS cache first and then get that to the shared buffers. So there is, at least for a while, a double cache taking place for the same page you are going to find it in the OS cache. You're also going to find it in the shared buffers. It's not better or worse than what MySQL does, it's just different, right? And for this reason, it says, well, you won't be really wanting to consecrate as much memory just to the shared buffers and just forget the OS cache. So they say the opposite. So you just dedicate a little bit of the memory, 20%, 30%, 40% to the shared buffers. In practice, it really depends in Postgres as well if you can fit your whole database in memory. Because if you do, then making this wouldn't make sense. In fact, you would do the opposite. You would do like you do with MySQL. Because if you can fit your whole database in memory, then you won't need to look for the data on disk, right? And traverse the OS cache to get into the shared buffers. Of course, it could be the opposite as well. If you find a balance between the shared buffers and the OS cache for Postgres. Just wanted to cover this for the database cache because it's part of what we are going to discuss. So how memory works, but have in mind this is a very simplified version of the process, right? So all the applications, they work with virtual memory. Nowadays, it's always like that, right? So the process that asks for memory, it has the impression it is working with a certain amount of memory, a certain amount of Contribus memory. Even if really we might have blocks of memory from different parts of the physical memory that are allocated to this application, right? So there must be some kind of mapping between the virtual memory, the application has access to, and the physical memory in the server. And this mapping requires some kind of translation, right? When you are looking for this block of memory here that my application got. So I need to find out where it is in the physical storage, in the physical memory. So there is that structure that is called the page table that is used for this mapping mechanism. And this is per application and this is stored in memory as well. So every time I'm looking for my page in the application, it translates to a page in the physical system. And this is done at the processor at the memory management unit level, right? And since this is kind of expensive, right? Taking mind the context of expensive here, every memory access needs to find out where is the virtual, where is the physical memory, right? So it is expensive, so there is a way to optimize this process, which is by employing a cache at the core level of the CPU, right? So again, a simplified version, there is several levels of cache in the CPU. We are just thinking globally here. So this cache, it is known as the translation look-aside buffer, the TLB, right? So what happens is, every time there is a memory access request, that cache is looked at first. If the entry we are looking for, the mapping of the page we are looking for, it's there, then it just returns the result and we call it a TLB hit. If it is not, then it needs to traverse the whole page table for that application to find the mapping it is looking for and we call it a TLB miss, right? The real difference between them both is that when we find the entry we are looking for in the cache, it is one memory access to do that. When it is not, we have two memory access. So the TLB is quite small. We can't have all the entries cached in there, like most caches, right? So how can we improve the efficiency of this process? How can we decrease the number of misses we get? Just increasing the TLB size at the hardware level is quite expensive and limited, right? You can just do it for all the memory we are working on nowadays. The alternative for that is just to increase the page size. Now, think about this. Page size, a regular page size in Linux is four kilobytes, right? But most of modern processors they work with alternative sizes, which are called the large pages. We have mostly in Linux we will find two megabytes, sometimes four megabytes and one gigabyte, one gigabyte of page, and we can even find some more modern ones, half a terabyte page in some cases. If you consider then a test several of 256 gigabytes of RAM, which is the one I have used for my tests, with four kilobyte pages we have 67 million pages to account for. So we will need to map 67 million pages between virtual memory and physical others. With two megabytes we get a little better stance, right? There are only 131,000 pages. And with one gigabyte pages you only have 256. Of course, it's not like, well, I just have 256 entries for one gigabyte pages, then I can fit all the TLB cache. This is really kind of dependent of the architecture and the processor, and even the version of the processor, it has a capacity for different amounts of pages in the TLB, depending on the size. But those two are what we could call the large page, huge pages. The name is used in both ways, large page, huge page. It started with some people calling large pages for a certain amount of memory than other people call huge pages. In MySQL you call it large pages. In Postgres you call it huge pages. Now, working with larger pages with MySQL and Postgres. Again, why we would do that? The thing is, if we have larger pages, we will, of course, improve performance because the number of TLB misses we will have will be small. There are less pages, we can fit more of those pages in cache, in the TLB cache. That is the one premise of using huge pages. With MySQL and Postgres, but in general as well. How we do that? There are two ways. One of them is that the application needs to have an active support for working with static huge pages. MySQL has, Postgres has, the Java machine has as well. With MySQL, this is mostly limited to the buffer pool. And remember, MySQL we use memory as well for temporary tables, for handling connections, and other things. But it's really the buffer pool that takes the most of the memory. So it has support for huge tables at the buffer pool lab. Now, you need to have MySQL compiled with huge pages support, which most of the new distributions nowadays they have. And what happens is, if it sees there is support for that and you have enabled that support in your configuration of MySQL, like we'll be looking later, then MySQL tries to allocate as much memory as we have asked during the initialization, during the buffer pool initialization. With Postgres, the use of huge pages is kind of done in the same way except that, well, the main beneficiary for that is also the shared buffers, the Postgres cache mechanism. But it could also use that for other things. But let's keep with shared buffers in mind. I will explain you why in a few moments. The alternative way to use huge pages with Linux is what I'm kindly calling it here blindly. So the application has no knowledge of what a huge page is, but the operating system it does. So it will transparently allocate huge pages for applications when it sees its feet. And that's what we call the transparent huge pages. Now, there is a process that works in the background, a kernel process that will try to find enough memory to create a huge page. And remember, it needs to be contiguous blocks of physical memory for a huge page to be used. Then it will try to convert this block of physical memory into a huge page. And then if it sees feet, it will allocate this one huge page to an application that is asking for money, for memory, almost the same thing. Of course, there must be some algorithm behind that because you will not want to allocate a 2 megabyte large page to an application that is requesting just a few kilobytes because you are going to be wasting the rest of the memory. Now, I just try to create a schema here to see if it helps understand the process. So all those red blocks are the regular 4K pages. And these green ones are what we call large pages. That doesn't matter if it's 2 megabytes, one huge page of 1 gigabyte. It's really not on scene scale. It's just for the context. So you see here, we don't have any contiguous blocks of free memory that we could use to create another huge page. So what the kernel does in the background is it will just try to see if it can move blocks around. And this moving of blocks around, it looks for slots where it could just move them and it will do that to free space for a new huge page when it sees feet. Well, there's an algorithm behind that. So it created a new huge page there. The problem with that is that this process is a little bit expensive when it comes to creating the huge page from a block of used regular pages because it is going to need to block the access to those 4K pages during the process. So we would see in production systems some stalls and sometimes, not always, but sometimes they could be related to this background process in the kernel that is just moving pages around. And that is one of the biggest motivation for people to say, well, just disable it with MySQL and MongoDB. Okay, so far? Yeah, good. So how do huge pages work in practice? Of course, the processor, the architecture needs to support it. And looking at an Xeon, for example, this one that I have worked with, it is only compatible with two megabytes large pages and one gigabyte large page. And the way to know that is through the flags implemented by the architecture. So the PSC is denoted for two megabytes and the PDP1GB is for one gigabyte page. So this is how you know if your processor has support for huge pages. One very interesting thing, and it took some time for me to realize, is that you cannot have those two enabled at the same time. So you either have one or you have the other. One way to find out is looking at the Procman info. It will tell you. Here it shows that the current huge page size used by the server is two megabytes. This direct map here, reference, it's a little bit confusing for me at least. I couldn't really figure out exactly where this metric comes from. It is related to the TLB use, right? But it's not like stating that I have direct... I could fit all this memory here into the TLB cache. So this is something for later. Now, how do I change the huge page size? If my computer, if my server, it says it is working with two megabyte large pages, how do I switch to one gigabyte if the server has support for that? You can only do that during boot time. So there is an option here to state the size of the huge page and you need to pass that to the Linux boot system and tell it to start with what different huge page size. So you really need to restart the server. Now, that is only to have... to switch between two megabytes and one gigabyte in this example here. Now that the server has support for huge pages and I have my huge page size, my default one set correctly, next step is to set up a pool of huge pages because that needs to be done beforehand. So there is an assist control command that you can use to do that and you denote here the number of huge pages you want to allocate for the huge pages pool. And when you do it, it pre-allocates that memory on the spot. So here you can see I have allocated like 10 pages of one gigabyte size. Where is my default? It should be here. Well, it's one gigabyte size. Oh, huge page size here. And on the spot, it took my 10 gigabytes and considered that memory, use it on a variable for anything else. This is a static huge page. You could also do, if you have a Numa system, you could also just create this pool of huge pages in one of the nodes. Because if you just do like in this example here, it will just balance the amount of huge pages you have in all the Numa nodes the system has, like it did right here. So the way to do that and just say, hey, I want my pool of huge pages allocated to node one or node zero, it would be a little bit different. But the same mechanism is done, right? You just tell, look, I'm assigning to node zero for the one gigabyte page size, this amount of pages here. And then it will do just that in the node zero here. Now, this online huge page allocation, the way we are doing here, it works, but it might not. And it won't tell you, you will need to check that out. So like for instance, here I have just tried to allocate 256 pages. This is again one gigabyte page size. And it just returned to me this comment here. But in practice, it in fact created a pool of only 246 pages because it couldn't just find out the rest of the blocks, align the contibul blocks to create the remaining pages. They were already being used and it won't just move them out, all right? So what is the safest method to do that, particularly with one gigabyte huge pages, which is more difficult to create, to find the contibul blocks, is at the boot time again. So you could just specify at the boot time, hey, I'm working with one gigabyte large pages and I want you to create a huge page pool with one gigabyte during boot time, right? Now, disabling the transparent huge pages, you have seen in the previous screenshot that there was like two megabytes being referenced by unknown huge pages. And these unknown huge pages here is referenced for THP. So we know that THP is working on this server. And you could also look and you will find the kernel process that is working in the background. Now, to disable it, you could do it at runtime, right? Like the talk would be warning in the error log, just show it how. You can just set the transparent huge page to never and it's going to disable that. But you could also do that during boot time. So you just add another option, it will say transparent huge page never and it won't boot with transparent huge pages. The difference between both, the two, is that once it boots with transparent huge pages enabled, it will take these two megabytes and it just won't free it for you. At least I wasn't able to. So if you are certain you don't want to work with transparent huge pages, you could just have that in the in-grub. Now, configuring the database. The huge pages, well, both MySQL and Postgres are mostly run by a regular user in Linux, right? And the access to the huge pages, it is kind of reserved to users that are part of this group here. Huge TLB SHM group. So the practical way to do that is just to have the MySQL user or the Postgres user as a member of a given group and just set this group here, right? Complimentary that user needs to have access to lock that amount of memory we are trying to create and to give to our buffer pool to our shared buffers. So one way of doing that is by stating the initialization of the service that the user has unlimited access to locking memory. In fact, with MySQL, at least if you do that, you don't actually need to add MySQL to that user group. But that is how it is supposed to be done. And then you just reload the service files. To actually enable that in MySQL and Postgres, you need to use the option large pages on MySQL and huge pages on Postgres. If you don't do that, it won't just go and look for memory in the huge pages pool to allocate to the buffer pool or shared buffers. This is the sign that it should start using memory from the huge pages pool. So I'm going to get back to that during my testing process. And this testing, I want to remind you, what I really had in mind when I set up in this quest was just to see how having huge pages enabled would affect the performance of MySQL and Postgres. And the way I was planning to do that was to just run a few benchmarks and see if I would realize any difference in performance. I wasn't really into looking at the TLB improvements per se. Not at first, at least. So my plan was I'm going to run MySQL and Postgres with some benchmarks, mostly sysbenchtpcc and sysbenchtpcc. All of them are artificial workloads, right? We are not looking to something that really goes into production. It's just a way of testing things. And the tpcc, which is not an official tpcc benchmark, it is a mix of selects, inserts, updates in the same transactions. And you will have a bunch of transactions like that. So it's really read-write. Then I have tried with sysbenchtpcc but just the read-only version. And I ended up testing pgbench with Postgres as well, read-only mode. And I was considering two situations. The first situation was when I could fit my entire dataset into the caching of the database. And the second situation was when I couldn't fit my whole dataset into the cache. So what would happen in this mode here is that there would be disk access to find out pages eventually. I would run each test three times with each page size. And I would also run each test with different number of concurrent clients now. Each test with a duration of one hour. So why this number of concurrent clients here? Because they are a multiple of the available threads on my testing server, which was like an Xeon with two sockets, each socket with 14 cores and each core 26 threads. So in total, I would have 56 threads available in the server. Again, 256 gigabytes of RAM, SSD disk. I tested with Ubuntu. For my SQL, I have used Percona server, which is a drop-in replacement for my SQL, and Postgres 10.6. I compiled the latest sysbench, and I have used the pgbench that comes with Postgres 10.6. The database configuration now. What I have done here was trying to optimize that database in both cases for the sysbench tpcc benchmark. Remember, it was a read-write benchmark. I haven't really tuned that for the read-only benchmarks. I run afterwards. What I would vary in each test would be those two things here. The size of the cache and whether I was using large pages or not in each test. So with Postgres, you set huge pages to on, but you could also set it to try. In fact, if you set it to on, and it just doesn't have access to all that memory you have configured Shared Buffers 4, it just can't allocate as much memory. It will fail to restart, and you would say, look, I wasn't able to allocate that much memory. What I was actually looking for was that much, and I just couldn't. So it just stops there. It could also use huge pages try, which would say, well, I have tried to allocate. I couldn't, so I resorted to the conventional memory pool of 4k pages, right? Which you either want to use huge pages or you don't. You don't want to, in most cases, just try, right? If you use try, then you make sure you are looking at the Postgres log afterwards to see if you are effectively using or not. With MySQL, it's a similar, except there is no try. However, it won't fail to start if you have set it to start with large pages, and it couldn't reach out that much memory. It will just continue with the conventional memory pool, which I don't think it's better, but you can't have it fail the starting if you can't have access to as much huge pages, right? So there comes a big barrier I crossed during my tests with MySQL, which was using 1 gigabyte huge pages, right? What happened was that, look at that. I had a pool of 101 gigabyte huge pages, so I had 100 gigabyte of memory allocated to the huge pages pool. Yet I was only able to start, the biggest buffer pool I was able to start with this setup was 12 gigabyte buffer pool. And it ate all my memory here, right? So it took 97 pages, so 97 gigabytes used to initialize a 12 gigabyte buffer pool. So this is my initialization, some of the variables, right? So 12 gigabyte buffer pool, 8 buffer pool instances, you can kind of segment the buffer pool in many instances, and the default chunk size of the buffer pool is 128 megabyte. So it took me some time, but I feel out that you are actually dividing the amount of 12 gigabytes, the buffer pool size by the chunk size, and then you get the 96 pages. So in fact, my SQL won't allocate more than a chunk of the buffer pool per page. So in practice it was just wasting the rest of the memory here, right? That's why it was allocating 96 plus 1. There is always an extra page being allocated with that. Well, okay, I thought, let me just test it and I would just increase the chunk size from 128 megabytes to 1 gigabyte, which makes sense, right? You are working with a larger page. Before with 4 kilobytes and 2 megabytes, you could fit many of these into a chunk size of 128 megabyte. So let me at least try with 1 gigabyte size with that same logic. And well, I just set chunk size to 1 gigabyte and started, so I should use like 96 gigabytes now. It's starting a buffer pool of 96 gigabytes of size. Yet it didn't start. I was able to start the same setup here, but with double the size of memory in the huge page pool. So, okay, I want to start my buffer pool with 96 gigabytes. I need to have a buffer pool that is allocated with twice the size plus 1 page. And then it will just reserve the second half of the huge pages pool, right? Well, that won't do, right? That just won't do. I tried with yet a larger chunk size and this time it worked. And look, I was only using 25 gigabytes of pages for a 96-gigabyte buffer pool. And I started making more math and thought, well, if I divide 96 by 24, I have the 4-gigabyte chunk size here. But then anything else didn't make any sense. So it's not clear to me what is the way that the buffer pool allocates memory when it comes to 1-gigabyte, a huge page, right? So I just got in touch with a friend at Oracle and tried to sort that out afterwards. But that is something to have in mind if you've ever tried now to work with MySQL and 1-gigabyte, a huge page. Now, back to the benchmarks. Starting with that sysbench, which is a mix of read and write, I have prepared the database with those two options here which will give me a dataset of 92 gigabytes, more or less, right? And I would run it with those conditions that I have explained before and with only varying the number of threads each time for each page size, each buffer pool size. And the result, after running this test, I would get a 99-gigabyte dataset. So it would grow from 91.0 to 99.0. That would be the biggest dataset I had. So I have tested sysbench TPCC with two buffer pool sizes. One was 96 gigabytes, and the other one was 24 gigabytes here. And look, I'm testing four kilobyte pages with each size of the buffer pool. And I'm getting these results here. So basically, we have two groups of lines here. One group which is for the 96-gigabyte and one group of lines that is for the 24 gigabytes of buffer pool. So since I couldn't fit all my dataset into the buffer pool in the second option here, the results would be not as good because it would need to load the data from this a lot of the time, right? But what really... Well, okay, and just to make it clear, after each iteration, I did recycle the data gene, which means I just copied back that initial preparation I had made at first and I dropped the OS cache every single time. For my SQL, that didn't make much difference, but for Postgres, it does. And looking at the same graph in a different way, we see only two lines because all the 96 gigabytes of buffer pool lines are stuck here and the 24 gigabytes here. So in practice, I saw no benefit, no gain from using... Well, no performance gains from using huge pages with my SQL for this workload, TPCC, with write, okay? Now, Postgres, same thing. Just presenting the second graph here. There is one set of lines for the bigger shared buffers and another set of line for the 24 shared buffers they actually converge at some point. And that is, again, not trying to compare my SQL with Postgres here, right? There is a lot of things to consider, but we do know that Postgres for this workload here, which is a bunch of small transactions that will create a connection each time. So it will create a connection, run a transaction, close the connection, open another connection, another transaction. So Postgres handles this opening and closing of connections in a different way because it is process-based, right? So it will actually fork the OS process each time a new connection is made. Whereas my SQL threads base it so it won't fork the whole process. It will just create another thread. Again, not trying to compare it. This is just kind of different ways of doing things. So with Postgres in this situation here, what we would actually use in practice is a connection pool, right? When it starts to have more than a certain number of connections depending on the workload and depending on how fast you open and close connections, a connection pool is kind of a must for Postgres. So now, starting with Postgres, with the OLTP point-select workload from Sysbench, which is another read-only, right? Just a read-only benchmark. Same thing. I have prepared the data set. I have vacuumed the data set so it gets the tables clean. And that resulted in a bigger data set. So that was on purpose. The table size here was on purpose to use as much memory as I could use in that server. Not as much as I could use, but way more than I had used it before. So running again each test for one hour. The results were with this data set, I wouldn't really see any difference even when working with a smaller shared buffer compared to a bigger shared buffer. Okay, again, this varies a lot depending on the workload we have, right? For this workload, these were the results. And again, no difference whatsoever in page size, which was what I was looking for when I was doing all these tests, right? I was looking for some performance difference I could actually see, and I didn't. My SQL, same thing, except that I couldn't run the test for 168 buffer pool because I just didn't have, like, 360-something memory on that server, right? Then, well, I said, well, let me just try a completely different benchmark which I hadn't before, which is PGBench. So I just took the exact same system, the same server with the same setup, same configuration, and run PGBench, which is a Postgres benchmark, so it won't work on my SQL. I have prepared with these settings resulting in a 187-gigabyte dataset, and I just run it with no vacuum because it is a read-only workload. And this time, I did get some differences. So the first thing that is interesting here is that Postgres does scales well for this workload. So we increased the number of concurrent clients here, and for all cases but one, it does scales well. And now, that case of one, it's for 4-kilobyte pages. So aha, okay. So first time, I use huge pages, and they make a very big difference when compared to huge pages. So with huge pages and the bigger shared buffers, so the shared buffers that use almost the same size of the dataset, I kind of get a completely different performance when compared to regular 4K pages when it goes above 112 concurrent clients. I did double check those numbers, right? But I decided, hey, let me just give it another try with transparent huge pages enabled this time just to see if it makes any difference. And I just tested with 188-gigabyte pages and it didn't make any difference. I, again, saw the 4K pages performing like that. So at that point, I kind of started looking, well, I didn't want to look at TLB, but maybe I should, so I started looking at other people's research, right? And then I stubborn into Mark Alegans, which is a reference, and he was just pointing out what I was doing and how wrong it could be, right? I was just focusing on the throughput itself. I wasn't looking at the efficiency at all. What if, in fact, I am getting the same throughput, so the same amount of transactions per second, but I'm using my processors way less than before? Then maybe I could scale if a workload that would be more processor demanding than the ones I was using, right? So I started looking at measuring the efficiency of the TLB buffer with larger pages. Again, trying to see it at the CPU lab, right? So if I use huge pages, and indeed that means I am exercising the TLB in a better way, I should have my CPU perform more optimising, right? And here again, I look at another researcher, which is Alexander Nitkin, and I inspired most of this part of the project on his blog post. And I look at the actual counters that you can get with Perf, looking at the processor level. And mostly what I'm looking for here is all those counters that are related to miss, causes, a walk, which actually means if the entry I'm trying to find out is not on my TLB cache, then I need to traverse, I need to walk across my page table, and that will mean a miss of the TLB, right? Another thing he looked for was the actual number of CPU cycles spent in this process of missing a page in the cache and walking through the page table. And so there are another counters that will count that. Some of them have aliases, like the first ones and this one, some don't, so you actually need to use the code to get that. And finally, the number of main memory reads, because that would be affected by more TLB misses, right? Other counters. So how I did that, I run my tests, a few of the tests again, using Perf, and I have binded Perf to the actual database process in the operating system. So I'm not looking at Sysbench here, I'm just looking on how my SQL behaves for the workload that is presented to me. And for this point here, which was the Sysbench HTTP read-only, the point selects, for one gigabyte, 48 gigabyte buffer pool compared to the 4K point, in fact, I would have quite similar statistics from that, right? Which, well, okay, let me try something else. So I'm looking at the PGBench 1 where we actually see the discrepancy when it comes to these points here, which is 224 concurrent clients. And comparing the 4K with the 1 gigabyte page, so we do see more of a difference, but still it's not that much, right? So I think, well, I must be looking at something, that is something missing here. And then I realize we are comparing statistics but for different throughputs, right? While here I'm having a much higher throughput than here. So if we look at that, the number of transactions that are being carried by each test, and if we normalize the results to the 4K ones, we will see that both the 1 gigabyte pages and the 2 megabyte pages, they perform better in terms of TLB utilization than the 4K pages does. Right, how much time do I have? Oops. So going back to that, looking at the 4K pages only, this time I run... So what happened, he was... I got a message from Vadim, which is the city of Percona. I had borrowed one of his test servers, right? And he said, look, are you playing any tests right now? And I said, yeah, I'm running some final PG bench stuff that I hadn't planned for before. And then he said, well, have you realized that the server is swapping? And I said, what? It's swapping. So, okay, it was just a little bit at first, but that makes a hell of a difference, right? What happened here was, my mistake, I have just taken the same settings I have used for the Sysbench TPCC, which is a mixed workload of reads and writes, and used that with PG bench. And the difference between those two is that, well, TPCC, it doesn't produce any temporary tables, and PG bench produces a large amount of temporary tables. So looking at the second line here, the swap is even more evident, right? So whenever you have a swap in a benchmark like that, the results are gone. And this is what actually happened for my test. So when I look at with four kilobyte pages, I would get as much as 23 gigabytes of swapping across the test. That's why we were seeing the results so different. And if we look at the page table, the amount of memory being dedicated to the page table with four kilobyte pages, we can actually see it requires much more memory just to maintain the page table during the tests when compared to using larger pages, right? And adds to the pressure, right? Bottom line, benchmark work great with huge pages and bad with 4K pages. We saw the memory pressure, but there was also some memory pressure we saw for the others as well. Why is this so much different? Any shots? Okay, that is my possibility. Okay. Well, mostly it means it depends on where the page you are looking for is actually located. But in fact, I was discussing these results with a friend who used it to work with Oracle, Oracle Database. And what he told me was, you know, Fernando, we never really saw performance gains with using huge tables. And at the time, they were using two megabyte huge tables. But we did use that for some setups. And do you know why? Well, static huge pages cannot be swapped to disk, right? If you have your entire shared buffer into huge pages, there is no way the server is going to swap that on disk when there is memory pressure. It's going to swap anything else but that. That could be good, but it could be bad as well. But there is the takeaway from this lesson here. So there was a lot of memory pressure here because of the page table size, but also because the huge pages didn't pass it to that. It would affect maybe some isolated connections, but not the shared buffers. So yeah, if you look at Oracle recommendation from various experts, you will see they actually use huge pages just for avoiding swapping. So sorry, I just passed my time. What I really wanted to say was that I wasn't actually prepared for that adventure. I was looking at something different, right? And I had this notion that the databases will always benefit from working with huge pages, which is not necessarily always the case. Of course, I haven't looked at all kinds of workloads, and I must at least look at more workloads to tell this with more certainty, right? And what is certain is that the MySQL support for one gigabyte huge pages right now, it is kind of broken. So thank you.