 Hello, everyone. Yeah, it's already time. Oh, this is the one slide. Sorry about that. Yes, this is the right one. Yes. Yes. Hello, everyone. So I am Taejun and Johannes. We are from Facebook, and we want to talk about resource control at Facebook. So I want to start with this graph. So imagine a web server, a Facebook web server, which is running, like, filled up to the brim. It's saturated in terms of CPU, in terms of memory, in terms of disk. It has hard disk, too. So it's really filled up the brim. If you imagine a web server like that in a fleet like Facebook, there's a lot of management service going on. There's Chef running, which also runs YUM from time to time. And Chrome jobs, all the monitoring. So there's a lot of infrastructure pieces going on in the system. And imagine one of those support services slowly leaking memory, let's say 10 megabytes per second. You don't really notice that first, but it's going to add up pretty soon, and you're going to see problems. If you think about, let's say you run a Chef and somebody wrote a wrong script, and that's kind of leaking memory. And it's not really an essential part of the system. You want it running, but you don't want your website to go down because of that. So it'd be really nice if you can protect your main workload, the web server, from those kind of things. So if you look at the purple line, the y-axis is RPS, request per second, how many requests is processing. That's before we did what we did. So that red marker, 10 megabytes per second grows on both, means that we started a memory leak intentionally in one of those support services. And memory grows up, and memory runs out, and the whole machine crashes down. RPS drops, the machine is crashing now. It cannot really do anything else. And eventually, after like half an hour, the machine gets rebooted, and then it recovers. There's like a 40-minute recovery process. If this is happening at a wider scale, let's say you somebody deployed a Chef recipe, which he did that at the same time. That is taking down Facebook. That's the whole site going down. And the green line is what we did. Green line is the resource control. So we used that for resource protection, and resource control kicking in. And while the RPS drops a bit, it holds on. And we did that three times, and the system generally survives pretty well. There's minor annoyance. People experiencing some latencies in their use of Facebook, and some uncles getting paged, and that's about it. It's not a major event. So this presentation is about going from purple to green, how we did that. So we are a resource control group or team at Facebook, and this is our motto or our mission statement. World-conserving, full-OS, resource isolation. So it's a bit of a world salad, right? So let's unpack that. So the first thing, world-conserving, world-conserving means that we want to use the machine and the resource work to do, right? We don't want to put in resource restrictions in a way which limits the overall utilization of the machine, because that's wasteful. We want control, but we don't want to be expensive doing it. The second part, full-OS, is that if you imagine using a VM for resource isolation, it's a lot easier, right? I mean, you hard-assign memory, you hard-assign some disk capacity, you can pin CPUs. Resource control-wise, it's almost trivial. You can easily do that. But then you lose all the integration with the host system, right? You lose access to files on the host system. You can have other workloads, but that's really... Or you can restrict iOS to direct to iOS, right? That also makes the problem easier if you bypass memory management page cache. But these things all adds up to operational complexity. And so what we wanted to do was keeping... To make resource control transparent to the rest of the system, meaning that applications and users can keep doing whatever they have been doing and whatever they want to do, and we want to lay your resource control on top so that it just works transparently, right? You're at the same system, and you lay over... You put resource control on top of it, and it just works the same except for the added resource control. That's what we wanted to do. So if you think about doing that, imagine that we never had that. We never had... And we needed a project. We wanted a project which is simple and which we can demonstrate that this is useful and this is workable. So we chose a project called FB Text. FB Text means that inside Facebook, the text that every machine has to pay to be part of the fleet, or the management and the monitoring overhead that we have to pay, that part is called FB Text. And sometimes, as I said in the first slide, the text part misbehaves and brings down the system. So our initial project was, it'd be great if you can protect the main workload from the text part. So that was FB Text, the name of the project. It became the name of the project. And as you can see from the two parts, FB Text 2, the first attempt just failed miserably. We did something we thought would be effective and it just made the system more fragile, more brittle. Now nobody was happy. So we had to go back to a drawing board and rethink everything. And so what was the challenge about it? I mean, it doesn't sound that difficult, right? I mean, how hard can that be? And the challenges were in almost all areas. Yes, in almost all areas. And the first was that memory that high-end max is more like in C group one terms, right? That'd be memory limit in bytes. That would be the natural thing that people would think about when they think about controlling memory distribution. What they do is saying that you can only have this amount of money, right? Not money, memory. Yeah, but memory is money. So how we can do it for FB Text would be we can put a limit for support services, right? You guys can only use, say, 4 gigabytes or 6 gigabytes. You can use no more. That didn't really work that well. Because if you think about it, right, if you put in restrictions, that means that you are lowering overall utilization, right? When support services has to have more memory to run reliably, if you put in this artificial limitation, it can't. And the thing is that on systems which are running up to the brim, a lot of resources are nominally oversubscribed in temporary terms. So they have to be able to go over this artificial limit sometimes. So this didn't really work. So the moment we put in these artificial limits, the systems started failing more. And we had to inch up the limit to a point where it is not meaningful. So we couldn't lower utilization to protect the workload because the system is already too busy. The other part was that Colonel Oumkiller didn't really work well. The main reason for that is that Colonel Oumkiller kicks in only when the Colonel thinks that the Colonel is in trouble. If the Colonel cannot make forward progress, then Oumkiller kicks in and kills something to make forward progress. It doesn't really know or really care about whether your workload, your application is running fine or well. It doesn't know about that. So the system can easily enter a state where the Colonel is thinking, everything is fine while the application is rushing for 20 minutes. That's what you saw in the first slide, too. That's why the recovery takes 30 minutes because for 20 minutes of that time, the Colonel is not doing anything. He's thinking, oh, it's fine. So that was another challenge that we met in terms of memory control. And IO control, IO was hard, too. Maybe even more difficult. The first one is that we didn't have a good IO control to use. CFQ didn't really work well with SSDs, and even on hard drives, we had a lot of issues in production, so we couldn't really use that. And there's another controller called IO throttle, but that's the same thing. That uses memory IO.highendmax, so it limits the total amount of IO which can be done, and the same story. You cannot limit and survive. And another thing is that the BPS and IOPS is what IO.max uses, and they're just not a good measure of IO capacity. It's just really difficult to find a good configuration with those two parameters because they don't really reflect how expensive these IO streams are to the device. And another issue that we met was that we were not accounting. The IO controller part was not accounting for system metadata operations and swap operations. We get to swap. You kind of need to swap, especially with SSDs. It just makes more and more sense. And we were not accounting them properly, so we were charging them to the root C group, to the system, and that caused a lot of power convergence issues. And power convergence. So for the past couple years, we have been really working hard at this problem, and earlier this year, we thought that we now nailed the memory controller. We nailed the IO controller. It seems to be working, and then we tried to put them together and found out that nothing works. Nothing still works. And the main reason for that was that there are a lot of major priority convergence in the system. For example, also, the kernel basically assumes that everything in the system can make full progress. Otherwise, it locks up. I mean, so if you, like, one process runs out of memory, what the kernel's basic assumption is that I got to make that unstuck. Otherwise, the whole system is going to, you know, stay sealed. So it throws everything, you know, your priority configurations or whatever out of the window and tries to reclaim memory as hard as it can. It's just every other configuration becomes meaningless. But I mean, if you take that approach to, you know, research control inside the system, right, doesn't really work because we want to be able to slow down a part of system really badly while maintaining health of the, you know, a more important part of the system. So the kernel can no longer operate in a way which assumes that everything has to make progress, right? I mean, something is going to make a lot faster progress than some other things. And so this creeps up in a lot of areas, but usually around five system and IO operations. So, for example, EXT4, it can be fixed, but EXT4 in the data journal mode, the default operation mode, can create hard data dependency through its journal. So like a high-priority C group would have to wait for a lower priority IOs to finish before making progress. You know, you cannot use that and, you know, get anywhere. And the five system metadata, SOAP IO, those things are the same, right? I mean, low-high-priority, you need to make the distinctions otherwise low-priority can, you know, completely mess up high-priority performance and latency. There's MFM. MFM is another interesting one. So MFM is something which protects a process structure, like a memory layout over process. And one really interesting part is that when you do PS, right, the information, the command line comes from inside the memory of that process, right? So when you do PS, it has to go inside the memory space of the target process and read that, you know, command line argument from there. And so that requires grabbing something called MFM. It's a read-wise semaphore. And it is also a major source of priority inversion because sometimes you end up issuing IOs while holding it. So if a low-priority IO holds it, it's just an idea, waits for low-priority IO, and high-priority comes in and runs PS, then it gets stuck and the whole system gets stuck with it. So it doesn't work. So those are a lot of those challenges. And there are a lot of others too, but those are the big ones. It works now, right? Okay. So, yeah, what do we do about all these things? There's a whole laundry list of problems. So first thing we started out with was to tackle the memory controller because that was the first thing we configured in the first place. And like TJ mentioned, we had to completely move away from hard partitioning that you get this, you get that. It just doesn't work. And instead switched to prioritizing instead the thing that has higher importance. And the advantage of doing this when you prioritize instead of partition is that if the higher-priority thing is not running, the lower-priority thing can just take all the resources it needs. And by definition, that's just a lot more forgiving to configure because the worst that can happen is there's too much competition. And there are no artificial um-kills. There's no artificial crashes or anything like this. If the configuration doesn't fully work, we can just adjust it and detect it while it's running. So for memory, the way this looked like, we implemented something called memory-low and memory.min, which memory-low is the primary thing that we use. So instead of limiting something, you just say the main workload gets this much memory. And if it needs, if it wants more, it has to just compete with everything else that's running. But it's a competition between different jobs and the higher-priority one gets a leg up in that competition. So memory-low is a best effort thing. So what we say is, unless there's severe memory pressure, the main workload gets more memory than everything else. And then if we're about to run out of memory and are threatened to bring down the entire machine, that's when we violate that guarantee and just go, okay, let's just make it work and don't crash. And then we also implement a memory-min, which is for certain jobs where we would actually prefer an oom kill rather than letting an application not work. For example, SSH, because if the machine doesn't respond to SSH, it gets rebooted forcibly. So we'd rather kill something else than let SSH run out of memory. So that's what we did for memory. Then we also kind of had no insight into how well our configuration was working. If you have multiple jobs sharing a machine, we actually didn't really know or weren't able to tell how much is the sharing of a host impacting the throughput and the latency of individual jobs. The kernel has a bunch of statistics and counters like the page fault rate and reclaim activity from which you could tell, okay, there's maybe a little bit of struggle in that workload, but it doesn't tell you how much longer do your operation if you're running shared versus if you have the whole host to yourself. And literally the only thing you could do would be to have the application or the workload run on a machine on its own first to establish a baseline performance and go like, okay, this is what the workload could do if it had the entirety of the resource available. And then you put it into a shared environment and compare it to that baseline, right? Now, that's extremely cumbersome. It's also almost impossible for a lot of our workloads because they do change all the time. We cannot really say this workload is always needing this much resources because our user activity changes right now. The US is waking up, so the workload is going to increase on our web servers and stuff like that. So that doesn't really work. And even if it did, the only thing it tells you, this workload is now taking longer to run if it's in a shared environment, but you cannot really tell the bottleneck is there, right? You can tell, okay, there's some latency here. This took longer to complete, but you only know the time. You don't really know why. So this is where a thing called PSI comes which is a feature redeveloped for the Linux scheduler. And what it does is it annotates all the points in the kernel where we have events that are associated with a lack of resource. So if the kernel enters a page reclaim, for example, we know we ran out of memory and we have to do some work that is not enhancing the workload, but just kind of making up for the lack of memory. There are other events like you're waiting for a busy CPU to become available, you're trying to run, everything is busy, so you kind of have to wait. So what PSI does is it annotates all these events and then aggregates what it's measuring into a share of total wall time or real time. So it gives you a percentage, so between 0 and 100%, that tells you how much of your overall runtime is the workload not productive because of the lack of resources. So if it, for example, reports 20% memory pressure, that means during 20% of the time that's elapsing, you're not actually doing work, you're just waiting for page faults to come back of recently evicted pages, you're waiting for a page reclaim. And that gives you a measure of how much productivity you're actually losing and it tells you what resource is culprit for that. So the whole thing works against a life system, so we don't have to do control plans or anything. So if we have a setup with multiple shared workloads, we can tell for every single workload at any given time if it's losing time right now on contended resources. So that obviously makes it a lot easier to configure our resource control configuration because we can instantly tell this thing has too little memory. This is waiting on CPU more than it should. And so this helped a lot in getting the basic configuration right. And we can also use something that I would call being functionally out of a resource. So this is kind of tricky. As Tee-Tee mentioned before, when the kernel umkiller kicks in, that's a very specific event, because you're trying to allocate a page, a piece of memory, and it tries to reclaim and it fails and it cannot allocate the single piece of memory. That's when the kernel goes, you're out of memory. But the thing is, even if you're at lower levels of memory pressure where this is not happening yet, you might already be functionally out of memory because you're spending 60% of the time waiting for memory. So you're losing pretty much most of your production capacity, but the kernel umkiller is like, well, you're still making progress. So what this helps with is with detecting when you're functionally out of a resource. It goes for CPU, it goes for memory, it goes for IO. And we can use this for several things. So one thing we did was for load shedding where you have a service that would just go like... The latencies are now too high for every single request and so I'm going to stop accepting requests and let some other machine handle it in order to avoid shinging and completely rebooting or hanging for a while. The other thing we do is... We developed an enforcer of that. This is a project called UMD. It started kind of out as a small Python script that would just go, if you're waiting more than X% on memory, I'm going to kill something. Now in the meantime, it's kind of developed into a much bigger thing. Daniel's here too, he's going to have a talk about that later. And it really does out of resource management for everything. It can monitor IO, it can monitor memory health, it can do all these things. So the thing this does is, as Tita mentioned before, the kernel room killer doesn't kick in for a long time, so it's not really helpful in managing our workload health. And the other thing is if it does kick in, the only thing it tries to do is keep the kernel running so it just picks the biggest task in the system and just shoots it in the head. Now the thing is our jobs are complicated, right? They're multi-processed things and they also have different priorities, and the kernel room killer has no understanding, it just shoots something and then moves on. And when this happens, we actually have no idea what state we're in. So we had a couple of services that could just not continue if the kernel room killer kicks in, so they basically enabled panic on Oom and just had to reboot. It's like if something died, I don't know what died. I'm the entire machine. So this is where OomD kicks in. The two things it does. First, distinction between when the kernel thinks you're out of memory and when we think we're functionally out of memory or functionally out of a resource where our production capacity is no longer adequate. That's completely workload-dependent, right? Some workloads might be fine, they're waiting 20% on memory. Some workloads say, I have a little bit more latency. This is not acceptable anymore. My SLAs are not met. So this is what you can configure OomD to do. First, you configure it. What is my trigger point here? What is my tolerance level for health? And if that's not met, then go do the other thing, which is kill something important. And again, something important is completely dependent on the system. It's dependent on the workload. And what that something is, it's also dependent on is that a single process or is this an entire C group you want to have taken out? So people try to do policy like this inside the kernel over the years. Repeated just doesn't really work to convey all this knowledge, convey what quality of service is to the kernel to convey what workloads are. And that's why OomD sits in user space because you can just configure that much easier. Now for the IO controller, as H.J. mentioned, we cannot really know the cost of IO. So if you can have a couple of IOPS already completely filling up the device, or you could have a couple of megabytes if it's really seekie and it's a lot of IOPS also kind of fully utilized device. It's hard to say in advance how much is this going to cost. For SSDs you might have a simple write and all of a sudden the garbage collection run goes off, right? So it's hard to predict. So instead of trying to use these metrics like IOPS and bytes per second and all that stuff, what we do is we track completion latency. So every time you submit an IO request we just monitor how long it takes to complete. And then you can configure per segroup what your tolerance level is, right? So you can say if my IOPS take longer than 50 milliseconds, for example, then throttle everything that has a lower guarantee. So if there's somebody else that has a 70 milliseconds guarantee that thing's getting it throttled. Every time it tries to submit IO or memory that thing has got to wait. So again, this is work conserving. If the high priority thing is not running then the low priority thing can just use the device however it wants. And that works for both hard disk and SSDs. The other thing the IO controller does that we wrote is it supports things like, teacher mentioned, the metadata IO which is shared between all C groups and also something like swap. Because if you're doing memory reclaim and you want to allocate some memory and you have to reclaim something and you're deciding I need to swap out some memory, that memory might not be yours, right? So you cannot be throttled according to who owns that memory that you're swapping. Otherwise you might have the old priority inversions where you wait for a lower priority thing. So what this thing does is a concept called do first, pay later where if there is memory reclaim you're going full throttle using all the disk IO but then you charge whoever owned that memory that you swapped out. And when that guy tries to allocate later he gets throttled or when he submits more IO. So it's put on his budget but you're still moving at full speed to avoid that priority inversion. Which brings us to that. Yeah, that's a whole laundry list of things. There's really no magic bullet. We just kind of had to go through one by one and fix these things up. We switched to Butterfest to avoid the EXT4 journaling issue and then inside Butterfest there were a couple of things we had to untangle. For the MEPS semaphore first we did... The biggest offender was Readahead because every time you access a disk on file it doesn't just read one page the one thing you're looking at if you have an IOP you could do a lot more with a single IOP. So it just reads ahead a couple hundred K. And if you're really heavily under memory pressure it's not a good idea to try to allocate a couple hundred K and start IO when you're already maxed out on all the capacity. So we put a patch in to detect that situation and just abort Readahead and do page by page IO. That helped a lot but it doesn't fully cover all the situations in which we still kind of had these hangs. So Joseph who's not here is also at Facebook and I were working on completely avoiding any kind of IO under the MEPS semaphore which includes swap IO, page cache which includes things like writing to a file in the file system which has file system specific stuff. So it's kind of like slogging through all these things and patching it up. And then shared IOs as I mentioned do first pay later where if you're trying to reclaim it goes at full speed but if somebody else tries to allocate or try to create more potentially memory that needs to be swapped that guy gets throttled instead. Right so how does this look in practice for us? So the first thing we have to switch the EXT4 root file systems on our machines to Butterfest and that was actually kind of funny because we brought this up in the meeting that we have a priority inversion here and we should consider switching to Butterfest and the team that handles these things kind of just took it and run with it and after a couple weeks we had several hundred thousand machines converted to full on Butterfest, which made our Butterfest developers sweat a little bit but it's been pretty solid and we have a seven digit container image all on Butterfest and it's running pretty well. And yeah all the priority inversions are being addressed all the metadata is annotated and properly charged to handle these inversions. The other thing is we have swapped pretty much everywhere so some of our workloads they're like 80-90% anonymous memory and if you get into trouble with that there's not a lot of breathing room for the kernel if you don't have swap because all that stuff is basically M-lock that's everything you get from malloc that's tempFS if you can't manage these and compress them or get them out of memory you might be running okay in one second and you might completely hit the wall in the next second because the distance between being okay and being out of memory is very small if you can't manage memory. The other thing is there's always a competition between your file system caches and anonymous memory and if you can't swap then you might be thrashing your file system caches and constantly reload the same stuff while there might be anonymous memory that is completely unused and you could just swap it once and be fine and the system would run a lot better so having swapped enables us to do all this and makes this much better for memory utilization the other thing that's useful when you do this is as you get into pressure instead of hitting that wall it just kind of gracefully decreases the memory bandwidth and pressure builds up slowly until we hit what I mentioned earlier that our tolerance for production capacity so at some point umdi will notice oh you're spending a lot of time swapping that's something I recognize as a problem and just kill the unmanageable workload and this is really important because I think everybody remembers how horrible swap can be because if it's fine it's fine and then sometimes the machine just goes out for lunch for minutes or hours and umdi is really the key here because it can detect exactly when this happens and it gives us all the upsides of swap and kicks in when things go haywire so we have swapped enabled pretty much everywhere except for the main workload but that's only for right now because it really depends again what is the latency tolerance and right now the main workload's not really tolerating swap bio so the C-group setup is kind of like this we have three major hierarchies one is the workload itself obviously where the web server software sits then we have the system slice which holds all the package management and the remote control of the system the monitoring, the logging and then we have something that's kind of in between called the host critical slice which is neither the workload nor like that lower priority stuff because if those binaries die then the whole machine goes down something like SSHD we detect and reboot the machine if it goes out we also want reliable logging we want umdi to work at all times because that saves our ass constantly and so these are kind of setups the host critical is the only one that has a hard guarantee because we'd rather have umkiller go off on the host level than have any of these not be able to make forward progress then the workload gets the majority of the memory on these machines as a soft guarantee set by memory low and then you can see the IO latent guarantees workload and host critical have the highest priority and then system slice is below that because it's less important umdi we configure in a way that we protect primarily we protect the workload from what's going on in the lower priority system maintenance stuff and we also protect the system thing from itself so if there's any kind of if there's moderate like if there's mid-level pressure in the workload we kill something in the system if there's high pressure in the system we kill something in the system then we have triggers for IO as well just to make sure the workload gets the IO bandwidth needs in order to function and we monitor swap right now to make sure that if we run out of swap real quick then we also have to kill something yeah so these are the users that uh that uh we are seeing let me see oh yeah that's the first one so these are the hard drive machines web servers running you know fully saturated so this is 10 megabyte per second memory lake uh started in the you know in the management uh part of the system and um on the green line like the new one the fb text to protected one right it gets started three times you know ip drops a little bit but you know umdi kicks you know uh memory protection and all the patience is working so they are um protecting workload while punishing uh uh the the memory lake and then umdi eventually recognizes recognizes that uh you know the system is not in a healthy state and goes in and kills the memory uh memory bomb and um as a comparison right the base rps just drops the system just checks out right i mean that line is not you know that straight line is not number being numbers not being reported right the system completely checked out it stopped reporting anything so like the the graph thrower just you know through a straight straight line and the system got rebooted and you know recovered after like half an hour and this is a little bit faster memory lake this uh 50 megabyte per second about the same story but you know the tips are kind of deeper and the timeline more compressed but you know the same thing and this is 100 megabyte per second even deeper dips even faster timing but the same thing uh one thing really interesting in these graphs um i'm gonna go go to that uh later but is that uh the the top one right the the green and the purple ones are the rps the resulting performance of the system and the bottom graphs right uh these small things at the bottom um are memory pressure numbers um and it's a lot more clear in the ssd cases so i'm gonna get that uh get to that later um yes um and this is uh i o protection kicking in so i just uh we just started like um uh untarling a kernel package three times and touching them and removing them in the management portion so it's kind of simulating you know yum going haywire or something like that um and the interesting thing is that without the protection the dips are deeper i mean um but you know it also ends faster right uh the untarling the the bomb workload finishes faster because it consumes more uh resources that's uh uh that's expressed by those green bars green uh green and purple bars so green bar uh shows that on the protected host that bomb takes longer because you know its resources you know getting controlled but also you know they are the main workload suffers suffers less and uh one of the interesting thing here is that even you know even these are all protected they are still showing like these dips right especially on the um hundred megabyte one it's kind of clear i mean there's a fairly dip dip right and the reason for that is because this are uh nominally oversubscribed so we cannot protect it too hard because um system does like uh the system management actually needs to steal some higher capacity temporarily from workload for the system to stay healthy because you know it's just oversubscribed normally because that's more efficient so we cannot protect it kind of really strictly that's why you know you see those dips and going to the um SSD um it gets a lot better it's a kind of same same test but on SSDs and look if you look at the green lines you know they barely dip right I mean they still dip a little bit because the you know kind of needs to figure out what's working said and what's not but uh you know those are like small dips nobody cares nobody would even get paged over that right it's completely fine everybody can sleep peacefully um so you know basically same story right um unprotected the system just dips daily couple faster because SSDs are awesome but you know still we don't want that that kind of dip but uh look at you know this one is less clear look at this one so if you look at the bottom right um these are like really small peaks here and higher peaks there so one thing really interesting is that the blue line which has like this medium peak there's a memory pressure in system that slides so there's a the memory pressure that the memory bomb and the rest of management things are experiencing the the small like tiny bump here is the memory pressure that workload slice is experiencing so what what's happening is that uh uh memory controller kicked in and and made sure that you know um the workload is getting enough memory while punishing system that slice right so it's biasing the memory pressure that way to protect the workload if you look at this um the green and orange lines um if they are still you know a little bit different because they are you know their behavior is different but there's a system and workload for the unprotected system and um the pressure just kind of rises together and the system crashes right um now nothing is being protected there so yep that's the research um and um now that we have so this um it's this might not be you know as exciting as you know hey we succeeded at you know stacking multiple workloads and they will work perfectly we are not there yet but um this does demonstrate that we now can protect or isolate resources comprehensively while maintaining like full OS features um so um if you think about it right um if you can protect your main workload from system management stuff right nothing prevents you from like you know loading like a side workload like a side workload in the system that slides and you know the system would be just happy right because it's fully protected so that's um what we are going to try next um um pretty soon just keep sliding down um and so we're going to try to lose some batch workload uh uh you know next to our main workload and we'll try to make sure that you know workload is not disturbed at all so that would be our next step towards uh workload stacking uh with you know full dependency and eventually um we we still need some features especially on the iOS side but you know we will uh start experimenting with uh you know uh putting heterogeneous workloads on a single system while fully controlling how resources are used across them without using utilization so that's um what's our on our to-do list all right um and most of these features are already upstream some of them are not yet um but like in in in the next version next release or the one after that the kernel release um they should everything uh should have everything upstream so you guys can use it the same way we use it um and all of this uh is documented on this website you actually don't need the the the hashtag now we we drop that so you can go to i mean it still works with hashtag but it looks kinda worse um you can go to opensource.fb.com slash Linux and you know it lists like it has these icons um and you know there's a cgroup to psi bpf and all these things and then you know you can go to the website and there's a fairly good quality documentation so you can learn more details about them all right so that was it any questions we have a time okay