 Good morning. Good afternoon. Good evening. Wherever you're handling from, welcome to another edition of Red Hat Enterprise Linux Presents. I am here with the one and only Scott McIverian. We were just talking about what we're going to do on the show today, Scott. And it seems like a lot of fun to me, but I am an old school Linux admin. So. Well, and I mean, honestly we talk a lot about things like performance and tuning. And there's been a lot of technologies that have come into our lexicon for doing those things. Yeah. And like, what, three months ago, I think we had Carl Abaddon, who's the experienced product manager for, for operating system performance for RHEL. And we talked about performance co-pilot and Grafana and visualizations. And that's all great. And in fact, like the next version of RHEL will have even more cool shit like that. But nice. You know, when I start looking at what, what's out there, that's not what people, right? So like visualizations and those types of things are really good for stuff like troubleshooting. Right. But let's say that like, you are a database server. There's stuff that you should be doing to that, to adjust the system parameters. To optimize it for that database workload. And so like visualizations and data collection and those types of things don't help you with that. And so I thought we'd start today by just kind of doing a quick round of the contents of slash prop. Yeah. And then when we were talking before the show, I found a couple of tuning guides specifically for database things. One that I thought was really good for, for from Microsoft SQL server running on Linux. And another one that is like less good. And we could talk about like why I think it's less good for just kind of general database server. I think it may have been even on my SQL one, but it would, it would apply to Postgres or something. Right. Like, yeah, database tuning on Linux. Like you can, there are a lot of knobs that you can actually do to make your SQL or Postgres data, whatever database to be honest with you, like run better out of the box, right? Like, yeah. But like when you turn those knobs, and this is the thing that people don't understand. So like read the article and be like, okay, I'll put these in my 2G. There we go. Cool. Optimized. You're actually making choices that make this system less good for other things. That's right. Right. You always tell people to put your database on an independent class of servers, right? Well, and also things like a lot of the tunables will do things like shove as much as they can in memory. So like extending the file system synchronization values, right? So that your buffered rights stay buffered longer. So that if you need to refer to that data, it's already running in RAM. However, if you were running this database, I don't know, on a ship or... Some place disconnected from the world. Right. Or some place that we're like, there may be more likely to be power outages. If you lose power on that box or it's being run by less trained staff, right? So their solution is pull the plug, plug it back in. All of a sudden, all those disc rights that you needed to write out to disk? Didn't have. Yeah. And then what happens? Like was that data really critical? In which case it's gone and now you're losing critical data. So there are times where it's like, well, okay, so these are the database tuning things. Cool. But for my use case, I need to be aware of these other environmental factors, deployment factors, whatever. So maybe I use some of these things, but not all these things. And that's something that I think that there's still a lot of need for in our industry. Anyway, so let me just pull up my SSH session here. Dive right on in. Yeah. I like it. All right. So Chris, you assure me that you are a RHCSA. I mean, I'm not up to date, but yes, I was. But yeah, I'll let that expire. I'm currently ansible certified and I'll let that expire and then maybe I'll go get open shifts. So, I feel you at one point I was a. I had certified architect, but that has also kind of whittled down over the years. But you know, the knowledge doesn't really go away. Right. Right. All right, so. This is, I'm just SSH into a box. And we're currently in slash proc. what it looks like. And you'll recall that PROC is really a virtual file system. So it's not stored on disk. We're actually looking at space within the kernel's memory. And the first thing I noticed about this is all those numeric directories. What are those? Yeah, that was my question to you, Chris. Oh, you want me to answer that? Yes. Those are PIDs, or process IDs, yeah. Exactly. And so let's just look at one of them here. So this is all about process ID 610, which cannot read symbolic links 610 to EXE. Ooh, what's that? That is the, oh, this guy right here is broken. And that's what it's complaining about. That normally is a symbolic link to the executable that was used to create process ID 610. Right. But process ID 610 is apparently this kernel thread. And that's why it doesn't have an executable, because it's not executed from the file system that's part of the kernel. So maybe I, let's look at a different one. How about 61, 31? All right, that one's more better. We look at, this is a tool for the good home desktop right on this box, right? OK, cool. OK, so we've seen a lot of the information that's in this directory elsewhere, like the executable name, like the command. If we look at the command, I want to less that or cap that. So this is the actual thing. It was actually on the command line to make it the UXV. So that's actually some, maybe binary. Yeah, it's not really that much better, is it? And do a file. You can file that. Maybe it'll tell you a little bit on that. Oh, but it's stored in memory. And so, yeah, not as much. But some other interesting things, because like a lot of this data, we end up seeing other places like PS, right? So process status comes in here and collects stuff and arranges it in a more human friendly way because right now it's stored in a very kernel friendly way. But one of the ones I like a lot is that guy, whoops cat. So do you know what the oom score is used by? How to, that's the oom killer, I would assume, right? Out of management tool or system, whatever you want to call it. Yeah. So when your system is out of memory, the out of memory killer, oom killer, fires up and starts whacking things. And it used to be a. If you see oom killer running in your databases, you've done too much. You're having problems. You have you've gone too far. Yeah. Point of no return past. Well, like way back in the day, like, row three, row four, right? That two, four, two, six kernel. Oom killer would start. It would literally just pick a process a random killer. Oh, yeah, totally random. And just like just start whacking stuff. Right. And lo and behold, that was not the best way to make decisions. Shocking. Right. So they came up with the scoring method. And so now oom killer kills the thing with the lowest score. There you go. And we'll work its way up until it has enough memory to continue. But so on this box, on this process here, 61.31, oom score is zero, right? And if we looked across all the processes, there's probably a whole bunch that have oom score zero. They're all equally likely to be killed. But you can use the oom adjust to change the process's oom score. And thereby make it less likely to get killed by oom killer. And so in a production environment, we would do this on things like the SSHD processes. So you'd write a script that would run at boot that would check to see what the process for SSHD was. And then you just make that oom score really big. So that if the box had problems, it would kill your database or it would kill your web application server or whatever. But SSHD would still be running, which means that as a remote admin, you didn't have to drive to the data center to do the needful on this box. You'd still set it at home at four in the morning and connect to it and fix it. So. So I just want to ask, are we going to talk about FD and NS in this session? So FD is? File descriptor, name spaces, that kind of thing. Yes. And no, I wasn't planning on talking about them. But File descriptor is basically the files that this process has open. So you would get that data from things like LSOF to tell you what files are out there that you're using. And then NS I've never really delved into, so I will not make things up on what's in there. Fair enough. So. I will find an article for you, Farhan, if you missed oom score stuff. He's asked us to repeat that. But basically oom score is a way for oom killer to figure out what to do. Right, the lower the oom score, the more likely it will be killed in an out of memory situation on the box. The higher the oom score, the less likely it is to be killed. And I know that the inclination would be like, oh, this is a database box. I should make the database super resilient to oom killing. No, no, no, that's not what you want. You wanna make the things that are critical to your administration of this box, the things that are protected for oom killer so that you can connect to it and administer it and not have to reboot it or something else. Because you can always restart the database processes. Cool. All right, so that's a little bit on the process ID directories. But then there's all this other stuff too that's more system stuff. And so for example, modules. Right, so these are all of the kernel modules that are loaded into the running kernel. And we would normally access this data with something like awesome to show to you. And so you're looking at kind of the same data just formatted more nicely. So for example, this guy right here, the Intel GTT modules loaded. This is how much memory it's using. This is the dependent modules that require this one. And then if you really wanted to, here's the hex memory address of where it lives. We probably don't care as much about that. And you'll notice that like in the LS mod it actually doesn't show us that. I mean, this is Linux guts right here, right? Right, right. And honestly, like this is one of the cool things about Linux that other operating systems, yeah. And we can also look at stuff like, actually I use this one all the time. Partitions. So these are all of the file system, sorry, not file system, all of the device IDs of disk devices. So we can see that on this machine I've got an NVMe drive and that has two physical partitions on it. And then I've got three device mapper devices for my logical volume configuration. And so when I like plug a USB thumb drive into it, I wanna know what device that is. Well, I'll just take another look at partitions and it'll say like SDA or SDV or whatever it is. And then I know what the device is that I just plugged in. I can reformat it or whatever and not have to worry about destroying my entire box. And we would get this maybe from something like that. Maybe, right? And so here it's showing us, that's our NVMe drive. And then here are the two physical partitions and then these are the three logical volume managed file systems. And we get more data and it's organized a little bit more humanly, right? But you just want something quick, right? Less data, but it's live updated whenever there's something changes in the kernel, right? All right, so again, like we use a lot of this data through other command interfaces. But then there's like super deep stuff, for example. Yeah, so these are the interrupts that are on the machine. And so like back in the day, we would call these IRQs, right? The interrupt request numbers, that's the actual like signal number that is sent when something happens on this device. Over here, we're looking, oops, over here, we're looking at the actual thing that goes with this device. And you'll see that there's like three, four columns for CPU because this is a quad core box. And the numbers underneath each of those columns is how many times this CPU is handled and interrupt to this device. And a lot of times it will be either a CPU gets assigned management of this interrupt. So anything that gets sent to this device or interrupt that occurs on this device as specific CPU handles. And you can see that that's the case here with interrupt nine, right? The other CPUs aren't handling interrupt nine. But in other cases like this one, it gets kind of spread across the CPUs. So the handling for this interrupt is more equally handled. And this one happens to be for the ethernet card on this box. Which makes sense. And so again, this is like troubleshooting type of data where you're interested to see if something is going crazy or is there one of these that has like a really high number associated with it? So you can see like, my system performance is degraded but look, I'm getting like a lot of interrupts on my ethernet device or my graphics card or my Wi-Fi interface. And so you can kind of get a little bit of data on what's happening there. And then- So question and chat. Yes. Difference between PROC and SIS. So organizationally, like what they are is not different. They're both in-memory file systems presented to you by the kernel. The difference between them is what information goes there. And so if we look at SIS, SIS is more organized around the devices attached to the kernel or to the machine, rather. And so this is where you'd go to like set the scheduling algorithm on a specific block device. So I would say that SIS is more for interacting with system-connected devices, whereas PROC is for system information and then in just a bit, we'll start talking about the tunables but those tunables apply system-wide. They typically don't apply to a specific device whereas the stuff in slash SIS would apply to specific devices. All right. So I wanted to get like super deep for just a second. Super deep for just a second? Yeah. And let me like roll back. I'll show you one more like kind of useful one. Actually, let's do two swaps. So you've seen this before, but you probably did something like swap on slash S, right? So again, we're actually looking at the same data just to run different interface, but I also like amendment though. Oh yeah, amendment though is awesome. Right. Amendment though is like all this memory information and we usually interact with things like DM stat or free or maybe top and we'll actually show you some of this information, but there's a lot more in here than what those tools are showing you. So for example, near huge page allotments, these are typically not shown by a lot of the other memory reporting applications. If you're not familiar, a huge page is a hunk of memory that is larger in size than the normal page size on the system. The huge page size can be changed and on this system it's two megabytes. So each one page goes to a two megabyte hunk of memory as opposed to a 4K hunk of memory which is with the normal page sizes. Also in here you get things like this guy. Oh yeah, that's cool. Yeah, so... So what's commit lemmat just for everybody out there? Okay, so we often operate in memory over commitment mode on Linux. And the reason that we do that is because when we start up a process, it often will share a lot of its overhead with other processes. So for example, if you're running a web server and you've got 30 Apache threads or Nginx instances running, each one of those loads shared memory like shared libraries and that kind of thing. Well, how many of those do you really need to load it? Do you need one instance of that library for every single thread that you have going? No, you have one that everybody kind of refers to. But the process itself doesn't know that it's sharing that library with other processes. And so we, because we're sharing memory we don't report that at the process level. We have to have another way of tracking how much memory we've committed to delivering to processes and how much we're actually delivering to the processes. So the commit limit is how much I will allow processes to ask for across the entirety of the system. So this box I think has eight gig of RAM and you can see that we're committed or commit limit is like almost 12 gig of RAM. Or so normally you see us commit to about 50% more memory than we've got. Committed as is how much we're currently committed to delivering. So all the processes on the machine they are currently using up eight gig of memory. And that's the reported size, right? So that includes things like shared libraries that might be sharing with other people but they think that they've got eight gig of memory consumed. They think they do. Yes, so let's go back to one of our processes here. All right, so there's a thing in here about data and I believe it is, one of these. So status gives us like more specific memory utilization right down here in like VMData, VMStack, VMXE that's telling me about what this process is consuming in different types of memory. What I was looking for was maybe it's one of the map files. Oh, the map map, GID map? Oh, that's it. There you go. Okay, so this is actually telling me the hex memory ID and what is stored there. And every process it turns out is given the same memory table. So it thinks that its memory starts at like all zeros and goes until some other hex number. But if you looked at every single process every single process starts at that all zero memory address but they can't all start there is that would mean that they're all stored in the same actual physical RAM. And so what you may have noticed if you do S trace and look at the system calls that are being made by applications there's a lot of M map system calls. M map is memory mapping and it's to kind of be this bridge between, okay, the process said it wants to access this hex value of memory. So we're gonna go out to the memory mapping unit in the kernel and see what actual system memory address that maps to. And that's how we actually implement things like shared memory. And so we could store this lib UID SO file and have everybody refer to it and not know that they're all sharing it because behind the scenes the MMU is what's actually storing where this shared library lives and then making sure that in the individual processes it's got the right memory mapping. So when the process tries to access that memory it actually redirects it to the correct system memory. Cool. So there is a question from Mr. Rapscallion Reeves according to the Gintu handbook so take that with a grain of salt. Special care needs to be made when mounting or special care needs to be taken when mounting proc and sys directories obviously. They suggest using the dash dash R bind and dash dash make our slave options. Could you speak on that? Maybe why they're needed? I don't know, to be honest with you. Yeah, I have not seen those options. I know that sys and proc are automatically mounted into the rel file system. So I don't, let's see here. Oh yeah. So unless they are, unless they are implied options, meaning they're used all the time, I don't see them being called out in how we're performing them out here. So I don't know, without having the article I couldn't tell you more about why that is, that is suggested by Gintu. Yeah, I would ping the Gintu folks and say, hey, why is this that way? Well, we can maybe like do some Google searching and see what those options do. Yeah. But not right now. I will look it up after the show, how about that? Yeah, fair enough. So let me take a quick note to do that and maybe next show I will have a follow up to provide more detail. Sorry, that was for when mounting the base proc and sys directories into a true. Maybe it's just because those directories are needed for the host and true probably. I mean, this is all done automatically, right? Like this is the value add of rel, like you're not building it, you know, basically from scratch you're kind of just going with the flow here. And you're right. It may be a, because it's a true that it is needed by both the running operating system but also your true to the environment. I wouldn't be surprised if that was a containery thing as well but we'll see. I'll see if I can find some information on those and then have a more informed opinion on them. I will, we forgot the flags. So I wanted to go like super deep for just a second. I know that MMU was like also deep. All right, so there's so much data here and a lot of it like we get through other tools but then there's like a bunch of just like random how the operating system works stuff. And buddy info is an example one of those. So buddy info is broken up into the different zones of memory. It turns out the kernel when allocating memory uses an algorithm called the buddy algorithm. And the buddy algorithm means do you have friends nearby that are also available memory pages? So we can see that in the DMA zone there's very few contiguous individual or contiguous groups of pages together. So like, and I remember if I remember correctly over here, these are the pages that are individual where there's no adjacent free page of memory. I may have that backwards because there's no labeling on this table. And the next column over is like a group of two pages next to each other. And then the next column over would be a group of four pages next to each other. And so each one of these increases by a power of two and it shows you the contiguous blocks of free memory that are out there available. And I actually do think I am opposite. So no single individual pages but lots of like hunks of memory. Whereas down here in DMA 32 there's a whole bunch of individual pages but very few giant hunks of buddies together. And same thing in zone normal. And so like, do you need to know this information? Probably not. But if you're running on a system and it's been out for a really long time you could use something like this to ask yourself the question, am I dealing with fragmented memory, right? So if you have like groups of contiguous free buddies then you have groups of contiguous memory. You're not fragmented. But if you have like onesies in each one of these categories then that tells you that you have a very limited amount of memory in each of these groups of continuous blocks of memory. Or if you had like all single page reported buddies, right? Where they're just one page standing there by themselves and nobody else around them that's a more fragmented memory type situation. And the kernel does a whole bunch of stuff to try and keep that from happening where it'll actually even copy data behind the scenes and try and make more contiguous memory to avoid that situation. Nice. The kernel is a smart little thing. It is. And it's made smarter or dumber by developers. So depends on which way you wanna cut that one. Yeah. Well, you know, people have opinions. Yes. Especially in the kernel community. True enough. Sorry I had to say it. True enough. All right. So that's like a lot on PROC, probably more than we wanted to go into on PROC. But where I wanted to go with this is like, we're used to seeing this information but there's more stuff there. If you're really interested in down-to-dirty information you can get even more. Okay. But the other thing that we often do when interacting with PROC is make changes to it. And at this level, there are very few files that can be changed, right? You can't like go into amendment though and update your memory. Right. Like there's commands to do that, right? Like this is just exposing what's there, right? Right. But there are a couple of places where you can make changes. So we saw one of those earlier when you're messing around with the Lumajust and that was changing the LumaScore, right? So that was an example. Those changes are not persistent because everything in this directory is in memory. So if you read with the machine it goes back to whatever the value was originally. That's why you need to, if you want to keep that state you need to use the proper commands to make that state. Right. And so one of the places where we do often see changes is PROC SIS. Yes. So we have a couple of different ways of making changes to PROC SIS. One that I'm going to use is just like shoving data in there but that's not persistent. So we have syscontrol or sysctl.conf that is the older method of making these changes persistent. The newer method is tuneD. So tuneD profiles will actually adjust data here in PROC SIS. It can also adjust things in slash SIS. So Sony was asking earlier about the difference between the two file systems. TuneD can actually handle both. Sysctl only does PROC SIS. Right. All right. So at the onset of the show we're talking about databases being one of those things that we often see changed. And I wanted to take a look at the tuneD profile that Microsoft has written a really excellent article on for running SQL Server on Linux. I think you're going to paste it into the... I'm going to, but there's a question that I need to ask. Oh, yes. Drop caches. Can we talk about that for a second? Mainly the comms of drop caches? Sure. Why don't we make that one of our first things that we mess around with in PROC SIS? So let's hold this like database thing for a second and talk about the mechanics of making changes to PROC SIS. All right. So if I remember correctly, drop caches is in VM. Because it's a virtual memory thing. Right there. Yeah. Okay. That's why I couldn't put this bottom of the first room. Yeah. And if we just take a cat on it, it's currently set to zero. But how do you know what should go in here? Because like, for example, I know that swappiness here is a value between zero and 100. And it sets the kernel's affinity for using swap space. Whereas overcommit memory is zero, one or two. I will not overcommit memory. I will overcommit memory to a certain ratio where I will always overcommit memory no matter what. But how do you know that those are the correct values to go in here? Because they're not all the same. Drop caches is an excellent example of that. So the kernel documentation helps us a lot about these parameters. So this is all different types of kernel documentation but you'll notice that there is a SIS control subdirectory in the kernel documentation. And then it's broken down by that top level directory hierarchy underneath Proxess. So for looking at drop caches, which is in Proxess VM, I'm gonna take a look at em.txt. And I'm gonna look for a drop. So that's in the list of tunables covered by this document. And then when I look down into it, it says this is going to cause the kernel to drop caches. But depending on what number you shove into it, tells it what caches you're interested in dropping. So if I put a one into drop caches, it's going to free page cache. If I put a two into drop caches, it's going to free replayable slab objects, including directory entry cache and iNote cache. And if I put a three in there, it's gonna do both slab objects in page cache. So let me pop the back out to where I was. So if I do this, this is going to drop all of the cached memory that is currently in use by the kernel. Now, let me take a look here for free. So you see this buffer cache. So if I do record three into drop caches, it is way smaller now. All right, so the question was originally, what's the benefit? What's the detriment? Yeah, what's hurting us here? Right. So the reason that we cached data, in fact, you'll notice that when you look at free memory using the free command, that the Linux kernel almost always gobbles up whatever it can in cached memory. So like you see the used and you'll see a very small free and you'll see a whole bunch of stuff over in cache. And you saw that originally, right? I have about eight gigs of total memory. I was using about two. I only had one free and I was using four gigs for caching, right? But also available. So cache memory counts also as available memory. But if you're just looking at free memory, cache memory doesn't count as free memory. So if needed. Using, but yes. Right, so if you're using memory and you need to allocate more than you have in free, the kernel automatically de-allocate some of the cached memory and then allocate it back to the process that we're requesting it. So what we store in cache memory and why that kernel is cache hungry is we're storing lookup tables for things like directory entry cache or inodes. We're storing file writes that have been cached. Because if somebody has asked for a file to be opened in red, what's the likelihood that something else is going to ask me for the same file to be opened in red? Right, or just a second ago, I was in the kernel documentation looking around, right? And I went, was as I'm navigating this, the tab in bash was actually doing directory entry. Listing is to like figure it out. Find the stuff for you. And all that stuff was cached. So the next time I do it, I may not have noticed that it has sped up, but it actually was sped up because it was pulling that information out of cache instead of actually having to go and spin the disk and or in my case, look in the memory register on the NVMe to figure out what entries are in that directory. And then here we were overhearing this and I looked in the VM text and it's like, oh, wait, the tunable wasn't in there. Well, let me check here. No, that's not right. Let me check. Maybe I just missed it. So when I opened the VM text originally, it was cached. And then I moved away from it and now I came back to use it again. It is now hitting the cache instead of pulling that up from disk. And so that's the kind of stuff that we're storing in cache. And the same thing for things like application stuff. When applications open files or applications write files, what's the likelihood that it's gonna access that same file again? So we'll keep it in cache to try and speed those accesses up. So when you drop cache, you're ditching all of those saved bits of data. And then the next time we need to open that file, we actually have to spin the disk up, navigate the directory structure, find it, make sure permissions are correct, allocate it and show it to you. So that's what you're doing by dropping the cache. You're just making us do all of those activities on the native operating system instead of being able to review the data that we've already loaded once in memory for something that you used it. So here's a question. For large, and there's multiple ways to say this, in malloc or malloc, does the OS look into free or does it include cache too? Wouldn't that prevent apps from initiating itself because malloc doesn't have enough free memory? So the malloc request goes to the kernel and the kernel decides how to service it. So if there's enough memory and free, then it just services not have free. If there's not enough free memory to service it free, then it looks to see whether, if we combine some stuff from free and cache, if that would be enough memory. And if so, then the kernel embarks on the journey of like removing data from cache, flushing data out of cache, returning that memory to the free and then performing the malloc request that was made. Right. So also remember that we- And it does that until it swaps and we'll continue to do that until your swap is full. Right, and realize also that processes often over ask for memory. Oh, all the time, right? Like they're trying to be safe, right? Well, not just that, like they ask for things like shared memory or shared libraries, which we've already got loaded, but they account for that in their malloc requests. And so there's a whole bunch of things on like how we over commit on RAM all the time. In fact, that's one of the tunables where the virtual memory tunables over commit. So over commit memory is a like zero, one, two. Over commit ratio is if you're in over commit mode, how much will you over commit to? All right, so let's just jump down to those. So that's if you want to commit to a very specific part. All right, all right, so over commit memory. All right, so when it's zero, the kernel attempts to estimate the amount of free memory left on user space when it allocates on memory. When it's one, the kernel pretends there's always an S memory. It'll never run out. So whatever is requested, that's what it's gonna give as an address space. Right, realize the address space doesn't actually equate to real RAM used by the process. And then two is never over commit. So currently on this system and by default, we use setting zero. We guess and we'll over commit up to a certain amount. Right, when would you ever enable, like when would you set it to two, never over commit? So there are certain. That's gotta be a certain use case, right? There are certain very conservative entities that don't want to get in a situation where they need the memory and because they're out of memory, um, killer starts and starts killing off their processes. Cause that's what, if you're over committing memory, that's what could potentially be the case. And so there are certain uses where the architects of those applications, have said we're never going to over commit memory cause I don't want to risk having one killer start up to kill off processes on my box. Right. And so another question from Farhan isn't over commit somewhat contradicting swappiness and I don't know if it's contradicting, but it definitely affects how much swap would be used. If you even have swapped cause a lot of the times so not there. Swappiness cause I think a lot of people don't know what swappiness actually sets. It's totally possible. Okay. So swappiness. Swappiness is definitely something that I've tuned for DB servers. Yes. Yes. And in fact, it's, it's, I think one of the most commonly tuned things. All right. So it's a value between zero and 100 which sets the kernel's affinity for utilizing swap space. So it will always utilize swap space. Right. It's just how aggressive should it be at filling up that swap space? So at 100, it should try to swap whenever it can whatever data it can. Right. And at zero, it should try to never swap as much as possible, but it will still swap. And there's actually been, I think in row, is either row six or row seven, there was a change to swap where zero does not disable it. And that's, that's a misunderstanding a lot. People think, you said to zero disable it. That's not true. No, that's not true. It's still enabled. It's just trying not to use it as hard as it can. Yeah. Being the kernel. The kernel actually got to commit I can't remember whether it was row six or row seven time frame where specifically it addresses the value of zero for swapping this where it says, up until this amount of memory is left, it's actually like a number in bytes. The kernel won't swap. However, that number is very small. And so it was put in there to effectively disable swap space. But that's not what this parameter does. This parameter sets your affinity for using it. So really it's like setting a rule that says until you've gotten to this really tiny amount of free RAM that's left, don't swap. Don't. But once you hit this really tiny amount, then swap away. Right. So once upon a time, people are like, oh, set to zero, turn off swapping. No, also swap office for. The other thing is like, oh, set to zero, the kernel will almost never swap. Okay, that's pretty true. However, now because of that change, it doesn't swap up until this very tiny sliver of RAM is left. And there are now cases where if memory usage is going up a lot, then what'll happen is the kernel will realize that it needs to start swapping because it's crossed that threshold. However, all the memory is now used, which means there's no memory left to actually do swap in. Right. So it works great in places where memory usage is consistent or smaller allocations of memory are what's happening on the machine. And the places where it's like disaster world is Java applications or sometimes enterprise database applications because when they allocate stuff, they allocate huge amounts of memory at a time. And so all of a sudden you cross over that threshold where you need to be like, oh, I need to swap. But you've crossed over it because you've just allocated the last bit of RAM that you had on the system. So you're done. And the system will essentially like, hey, I think those use cases are extraordinarily rare. But what you'll see is when someone recommends that you tune swappiness and they want you to be really conservative with swappiness, they will now have you set it to one instead of having you set it to zero. Because at swapping this one, you're equally unlikely to utilize swap space. We're pretty darn close to equally unlike for the using swap space. But there's not this like artificial boundary at which you can cross to utilize swap space. So that boundary limitations are removed. So you're unlikely to use it unless you're very memory constrained. There's no actual listed amount of memory that you have to have in order to start using swap. All right, there's a ton of questions here. Okay. There's two. All right, from the same person. I shouldn't say it though, sorry. How does overcommit affect swappiness? If we set swappiness to a high value and have set the overcommit, would it fail to swap because theoretically all the memory could be committed? Okay, so overcommitment of memory means that processes ask for more than they're going to use. Right. So no, we're not actually using that memory. It's just we told the process that if it wanted it because it asked for it, it had it. And so it's like, I'll explain overcommitment like this. When you're dealing with a child, it's like, daddy, I want to go to Disney World. Okay, kiddo, good luck. One day, one day we'll go to Disney World, right? Did you actually commit to a date on which you were attending Disney World? No, but you've told them that one day that'll happen. And so at some point in the future, they're going to be like, are we going to Disney World now? We're doing this now. Right. And then you can decide whether you're going to do it or not going to do it. And overcommitting memory is the same, right? The process starts up and goes, I need 37 million gigs of RAM. Okay. And you're like, sure. That's a hell of an EC2 bill. But yeah. 37 million gigs of RAM. There you go, process. Have at it. Right. And really the process then uses like the first 400K. Right. And that's what you're actually committed as. And at some point in the future, when that process goes, ah, I need my other 400 gig of RAM. Blam, 400 million gig of RAM. And it actually tries to store stuff there. That's when you actually have to deal with that overcommitment. So. Okay. So I don't think there's a relationship between overcommitment and swappiness because swapping is actual memory that's been stored is now being paged out to the swap spaces. And there's actually a very specific list of things that will be eligible for swap. So things like anonymous process data is eligible. Wait. Yes, is it eligible for swap? But things like shared libraries are not eligible for swap because other things might be using it too. So if you don't have a lot of the eligible things for swap, there's nothing for us to do. Right. All right. Cool. Question answered. Good job. So let's take a look at this Microsoft guide here. Let me grab that. And, Short, do you want me to share it on my screen or do you want me to just talk to it? Your call, man. If you want to share it on screen, go ahead. If you don't feel like doing the screen, Hergie, Jorgie, you don't have to. Yeah, let's go ahead and do it. Okay. All right. One second. Let me... Screen, sharing, swapping, music. Yay. I do need a theme song for this. I can just start. Done, done, done. It's like half jeopardy, half circus, right? Like, that's what I'm thinking. What are you saying? I'm in the circus of swapping, screen swapping? Sometimes it can be. All right. So I'm just going to go ahead and share the whole screen now. No, boy. Oh, it happens. Okay. All right. Doesn't look terrifying. Cool. So I think that Microsoft did a pretty good job on this article about how to configure Linux and start selling them things like self-array and Gmail may or may not be using partitioning recommendations, battle system tuning, but the stuff that we're interested in talking about PROC today is down a little bit further where it starts talking about, okay, in your tune D profile, this is what you want to have in there. And specifically, all this stuff is in PROCS. Yeah, this is good stuff right here, right? Right. And so what's giving you is kind of the directory location and tunable file and then what the value should be in that file. And so you'll notice what we're just talking about, right, swappiness is set to one. Yep. Because they want to swap as little as possible because database is, yeah. I remember arguing the zero and one thing in the past at previous companies now. Yeah, granted. So what they want to do is they want to avoid swappiness as much as possible. So I was saying to have the affinity very low for utilizing it because a lot of the data that is stored by the SQL processes is cached data which is stored in anonymous process data which normally would be eligible for swapping. And so if you swapped that data out, now the database process is trying to hit its cache thinking it's gonna be fast and in reality you've now kicked off this like swap-in operation off of slow storage device which is not fast and that causes application problems. Right, like any disk is going to be slower than memory by an order of magnitude. So that's why swappiness is important for databases because you kind of want your whole database to fit in memory. Well, there are some databases like SAP HANA where that's absolutely true. All the database is in memory for SAP HANA. For others like SQL server, that's not always the case but they prefer to have that be the case. And then there's sometimes where that's just not possible, right? You have this enormous database on disk but you want the parts that you use a lot to be in memory. You want that to be snappy, yeah. Right, and so swapping is- Like if you're sharding for example, right? Like you want to make sure that, you know where you're sharding is appropriate performance-wise. I said shard, don't giggle. Sorry, it's just like Dewey. I know. All right, so the other parameters they're changing here. So here, the dirty VM.dirty whatever, they're making changes to how we flush disk writes and yeah, that's mostly how we flush disk writes because disk writes are stored as dirty pages in PageCache and they have a time on how long they can exist in PageCache before they have to be synced up to the disk. And so here they're saying the ratio of dirty pages needs to be 80%, which is like a lot of cache disk writes. And then we will expire those dirty pages after 500 centimeters, so they can actually persist in memory for five seconds before they're eligible to be written, which in computer time is like really long. Long time, yeah. Yeah, and then dirty write back, I can't remember what write back is, but if we went into the VM text file for the kernel doc, I bet it's in there and we can read about this. So again, they're like trying to keep data in memory longer. So yeah, we've got a minute and a half left and there's a link in here that I think you wanna share. Oh, well there's two. Well there's two, but. Yeah, so the other one that I'll go to real quick. So I think this is a good example of a tuning doc they actually go through and like tell you why they're asking you to make the changes. The not so good tuning doc, and that one's pretty good too. There we go. The not so good tuning doc is this one. I already put it in chat. Oh, thank you. Where they go, okay, make these changes. Well, if you actually look at what these changes are, they're talking about the shared memory maximums on the system, right? How much memory can be used for shared memory as well as how many semaphores and the size of semaphores and the number of semaphores you have for inter-process communication. But when you're dealing with memory and how much memory should be allocated to a certain thing, don't you need to know how much memory is on the system to make those changes? Yeah, you gotta make some math happen there. Right, and so there's a lot, a lot, a lot of tuning guides that'll do things like this and you wanna be really wary of those because clearly they're not saying, if you have this much memory, you need to set these value to this. If you have this much more memory, you need to be setting it to something different. So they're not accounting for the actual memory that you have on systems. So how do you know that these are even right for the box that you're deploying? All right, and then, so maybe we need to do a follow-up on this one in a month or two months or something. But you're right, Chris, there was one link that I wanted to talk about before we close out the show. I'll drop it now. So, do you want to do my job? Maybe? Maybe, can you do it better than me? That's the other important thing. So my team is hiring at Red Hat and we're hiring more technical marketing managers, which is what I do and what Chris does. I do, yeah. And so if you're interested in working at Red Hat and you can think, you can do our job, maybe even better than us. Please. Have at it. Yeah. So. There's a great group of TMMs here. We're all pretty tightly knit, I think, you know, I mean, we work cross bees pretty well, I feel like, you know, in most cases, it's just really fun time, right? Like you get to be part of, you know, the technical side and bringing, you know, solutions to people and then also part of, like, the larger marketing machine sometimes, which, you know, just things like this, OpenShift TV and OpenShift Commons and all those things. It's good stuff. And webinars and convergence events. Yeah. One day we'll do Summit again. Mm-hmm. Yeah. Oh, speaking of Summit, yeah, please. You know, Summit is going to be an awesome, like three-part event this year. And we're hoping that third part is somewhat physical. So check out the Summit page. I just dropped a link in chat. If you're interested, sign up to attend. You know, we greatly appreciate it if you did. But yeah, Summit is our big annual event and we're kind of splitting it up for 2021 to make it a little bit more consumable. We don't want to do the full, like, week on experience virtually because we know that people are fatigued from everything as virtual, right? Like, oh, I'm meeting with my family on Zoom later. Great, that kind of stuff. Yeah, I actually have a D&D group that I play with and I had to be like, guys, this is too much like work now. Like I just can't. Right, like. I'm in when we go back to in-person, but I just can't right now. Yeah, it's, I've gotten to the point where it's like, yeah, it's cool to do like a group call every once in a while, but I'm really not looking forward to sitting in this chair 24-7, right? That's not the goal of me working, right? Like, I want to use Zoom for work and very little else. You don't want to work to Zoom? Right, right, right, right, yeah. It's nothing against Zoom, right? Like, it's a great product, but it's just, I prefer to have time not on the computer. I hear you. It keeps me sane. All right, great show, man. Oh, thanks for. I think that we got a lot of great questions too. So yeah, thank you audience for tuning in and sharing the time with us. Indeed. Thanks everybody. And now, see you in a couple of weeks. Yeah, two weeks from now. Same bet time, same bet channel unless you go through DST and then things change. It has already happened here. So we're in that wonky, the rest of the world is turning over again. All right, so that's all for the channel today. Tune in tomorrow morning, first thing. We're going to be talking about storage, talking about live migrations from OpenShift Container Storage three to four. And do that on the fly. We'll see how well that goes in an hour long show. Hopefully it'll be okay. So you never know what storage, right? Never know. Never know. All right, so thank you everyone. You appreciate it. Stay safe out there. Thank you, Scott, as always.