 Good afternoon. My name is Fenghua Yu. I'm going to talk about resource allocation through the Intel Resource Director Technology, RDT. I work for Intel OTC. I'm a kernel developer. So this is mainly focused on the kernel side. Recently Intel resource just introduced the Intel RDT. There are two parts. One is monitoring part. One is the allocation part. In the monitoring part, we can monitor cache and memory bandwidth. So CMT and NBM. In the allocation part, we can allocate cache, which is the CET technology. Later on, we may allocate more resources. So this is CET. And also, we have the CDP, code and data pre-all retention. So basically, the monitoring just passively monitors the resources usage to identify QoS and performance issues. And the allocation part is actively allocated resources to achieve better QoS and performance. In this presentation, we mainly talk about the allocation part. The monitoring part is another topic. So we talk about resource allocation. So what's the problem? Right now, we just focus on the resource we mentioned about it covers several resources, right? But the first implementation is only cache allocation. So we mainly talk about cache allocation. So what's the problem of the cache sharing? This is the long lasting issue. Since the first cache introduced, actually we have this kind of issue. We have some high priority process and low priority process. If they run together, they share the L3 as you know, the last level cache. So the low priority cache does not know there's a high priority process is running under user L3. It's a shared L3. It can kick out some patch line, L3 cache line. When the high priority process is running under access, this cache line, there's a cache miss. Then go to the memory. That takes like 200 or 300 cycles to access this cache line. So that causes the QoS issue. For example, in the real time, at least some cases in this screen, but there are more cases. For example, in the real time, this can increase the IRQ latency because in the IRQ handler, it may hit this cache miss and eventually go to the memory and go to like a long path to get this cache line from the memory. So eventually, it's a long path. And that causes very slow, very long IRQ latency in the real time. That's not what we want. And also in the recently on the containers, we may have lower container throughput. For example, some container has high priority. Some container has lower priority. It's the same thing. So the lower priority kick out the cache line and the higher priority fetch that cache line and eventually cause very long latency, eventually cause a throughput, very low throughput. So we list some problems of the current shared cache. So what's the solution? Solution is very simple, right? So don't share. But if you don't share, that's in the last, in the past at least 23 years, L3 has been shared. So if we don't want to share it, we need to introduce some new technology. That's what we are going to talk about the RET cache allocation CET, those things, right? So if we don't share cache, then low priority and the high priority, they don't access the same cache line. So low priority process access some allocated cache. And the high priority access another allocated portion of the cache. Then they're not sharing. That's isolation, right? So by doing this, then we can have like speed up the high priority process and the low RQ latency in real time. And we have the high throughput in containers. So all of these benefits we can get from this Intel RDT technology. This RDT technology, resource allocation technology, first implementation is the cache allocation. And the first cache allocation CET is the L3. So L3 first introduced in, on the hardware server, then on the broadcast server, and then later on the currently on Scalic server. And it's kind of going on, right, in the future servers. And L2 is going to be released in the future as 86 processors. Probably some people already have those kind of server already. I mean the L2 server. And L3, definitely you may have. We have the Broadwell and Haswell and the Scalic server support L3 already. But the L2, I think some platform, I think it's outside, you can access already. And for the CDP, it's the extension of CET. So this is just not just access the data cache and the cache together, right? It can separate the cache and the decache, the intrusion cache and the data cache. So by doing this, the same class ID, it can access the cache and the decache. And portion of L3 is used for decache. Part of the L3 is used as a cache. So that's the extension for CET. That's the CDP technology. This is the hardware, CET, L3 hardware architecture. On the left side, it's traditional, very traditional, normal cache hierarchy. The first level is L1 cache. The second level L2. Then last level is L3. It's very typical, right? Then L1 is usually it's not shared. It's only it's a physical processor. L2 is shared by two threads. I mean two threads on the same core. It could be shared by two half-threading, right? If they're not half-threading, usually L2 is only used by one core. One physical processor. But half-threading is enabled. Then two CPU, two threads can share the same L2. L3 is shared by all cores on the same socket. So that's a very typical cache hierarchy. On the top of the current hierarchy, the RDT adds some new hardware. So it's on the right side it's called the L3 QoS mask array. So this is some MSR registers. So it's an array, MSR array. So indexed by cross ID. So each item, each element in the array is the CBM. We call it CBM. Starting from 0 to 0 to 15, up to 16 cross IDs. So CBM is called cache bit mask. So this is specified which portion for L3 can be allocated for which CPU or which PID. So the CBM is in this array. But indexed by cross ID. So this cross ID is per CPU. So on the right side that QoS MSR, that's per socket. Each socket has one array. On the right side on the CPU, one is per CPU. There's PQR MSR. It's per CPU. It's not per core. It's per CPU. That's another MSR. It has two fields. One is RMID, which is the resource monitoring ID. That's for the monitoring part. Another field is cross ID, the class service ID, class of service ID. The RMID we don't cover, that's for monitoring. We focus on this cross ID. That's one field of this PQR MSR. So if a software can specify this cross ID, when the software is running on the CPU one, it always uses this cross ID unless you change it. So this cross ID pointer to the QoS MSR to let the hardware to know which cross ID used in this array, the index to that CBM. Get that CBM. Eventually hardware reads that CBM to know which portion of the L3 will be used. So that's the workflow of this hardware architecture. A little bit complex situation is we have both L2. Okay, yeah. To share which one? So basically the PQR is it? Perth, right. It's Perth, right. Yes. Yeah. Okay. So we continue this multi-resources allocation. Just now it's only one L3 resource. If in the future process, right now there are not such hardware, but we are going to have this kind of hardware, we can expect in the future there are more resources can be allocated, not just L3, right. So for the L2 and L3, both of them, then this is the CPU one. Okay. There's a new array, which is the L2 QoS mask array. So this is except on the L3, right, on the right bottom corner. It's L3. It's the same as L3, right. That's control. That's Perth core, Perth socket. Then this L2 QoS mask MSR array. It's Perth core. So each core is Perth L2 or Perth core. It's Perth core has this one array. That's control the L2. So each CBM control which part of L2 will be allocated for the current CPU, right. So the same thing for the class ID, still in the PQoR, it's Perth thread. It's hyper threading. It's Perth thread, right. So another CPU zero has another PQoR MSR. This is the class ID when the CPU is running, software is running class ID, the same class ID. So for example, in this case is class ID is one. So this one pointed to the L2 QoS mask and also it pointed to the L3 QoS mask. It's the shared same class ID. So when it read L2, then this L2 CBM1 is different. It could be different L2, right. Then with the same class ID 1 pointed to L3, it read the CBM1 in the L3 mask. It could be another CBM. That controls the L3. So eventually the reality is when we set up this hardware in software in kernel and in the user space, when we set up this, the CPU running context has this class ID, specific class ID, unless you change it. For example, in this case one, right, it pointed to the when CPU is running, whenever software accesses hardware, the cache, it always read from, it's not, I'm not, okay, it's not accurate. It's not always read the CBM. I believe the hardware will cache this CBM, right. It's not always read from this MSR. But basically the hardware gets this CBM and gets the CBM1 from L2 QoS MSR. And when we access L2, use this CBM, L2 CBM to know which portion of L2 is allocated for CPU 1. Same thing for L3. So the point is the same class ID, right. It's not a separate class ID. Same class ID used for L2 application and also used for L3 application. So just now we talked about the hardware architecture. Then we need to enable this feature in Linux kernel. On the right side, it's just an enumeration and enable from the GPID, right. It's a very normal case. And also on the right side, the yellow box is the user interface. We are going to cover that. It's kind of a lot of debate on this user interface, how to do, how to enable this. We're going to cover this later on in the next slide. This slide is, I mainly cover the blue boxes. So this is like a kernel workflow, how to use this class ID and how to use the allocated schema, the task structure, those kind of things, right. So this class ID and allocated cache, it happened during the contact switch. So each task won't do the contact switch. It has a chance to change this class ID. Of course, you can change it anytime, right. You can use the write MSR tool to change it anytime. But when you change it, then it's kind of, it's not well controlled. It's out of control, right. So what we do is in the Linux kernel, we just do this in the, during the contact switch. When we switch to one process, it has a chance to change the class ID and change the allocated L2 and L3. So that's the running context, right. When we switch to a process, firstly it wanted to get which class ID I'm going to use for this process. There are two resources. One is the task structure, it means, it itself may have the class ID. That's the schema. So user interface can specify this PID may use one allocated portion of cache. So that, that part that's allocated portion of cache eventually translated to class ID. That class ID is saved in the task struct. So if a task struct has this information, we read from the task. If it does not have this information, that means the CPU has its own default class ID. That's stored in the per CPU data structure. So we have two places to store this class ID. One is the task structure. If it has this class ID, then we read it from the per task structure, otherwise from the per CPU structure. So after we got this class ID, there's one thing is we need, we don't need to update this class ID every time. Because this upload, update this class ID is time consuming. It's usually to take like, some people marry to 1,000 cycles to just write this MSR. Because that's a PQR, MSR has this class ID, right. Just write this class ID. Some people say it's 1,000 cycles. What I marry is on, but otherwise it's like 300 cycles. So we don't want to, every time connect switch, write this class ID, right. So to optimize this, we just catch this in the kernel class ID, the current running class ID in the kernel. We compare, whenever we read from the per CPU or per task, we get the new class ID for this process. We compare with the stored current running class ID. If they're same, we don't need to write to MSR, right. We just use the current class ID. Otherwise, we need to take a long path to write MSR that take as true as 200 or 300 cycles. After we set up this class ID in PQR, and this class ID pointed to the CBM, either QS or M, mask, we started running this process. The process will use this class ID and the CBM. And eventually, this process know which portion of L3 will be used during the runtime. So this is a workflow in the kernel, in the Linux kernel. So now we talk about the user interface. In the past, if you remember or you take care of this part, last year Intel released some patches. Try to enable this feature in the Linux kernel, but it failed. Eventually, it's not upstream. The reason is we use the C-group interface, because it's obvious, right. This kind of control, we have a lot of like container use C-group, right. And also, we have a lot of other features we use C-group, right, to do these things. So we want to, naturally, we want to use C-group to do the same thing, to allocate the cache, to allocate the L3 cache. Hello? Okay. But eventually, maintainer does not like this, because they said there's some limitation. For example, we cannot allocate cache for kernel thread, right. If you want to do something in the C-group, you cannot control the kernel thread. In the C-group kernel code, you can see if your PID is equal to, a PID flag is kernel thread, then you have to return. You cannot move PID to any C-group, right. So that's one limitation. Another limitation is actually the C-group is, you cannot control the hierarchy. It always, you can create a recursive, you can create many levels of C-group. That's what, that's not what we want. We only want to create one level of control group. So C-group can have many, many sub-directory, another sub-directory, right, many levels. And also, we don't want to like, the last year, we use C-group and we, it's controlled all across ID, across the platform. It's not like a poor socket. So kind of these kind of issues, eventually we give up, we don't want to go through the C-group, we create another file system, which is called a resource control file system. So that's kind of parallel with the C-group. It's under C, under FS resource control, this file system. So in this file system, we have the directory. And under this directory, we have four files under directory. One is the info directory, which can turn all of the information, like a max class ID, max CBM length, and what's the domain, how to allocate, and what's the cache hierarchy, those kind of things we put in the read only info directory. Another file under this file system is tasks. This has a PID. The root has all the PIDs. So that means all the PID, we're going to use a default schema. So schema, the CPS is another file. The third file is the CPS. At the beginning, it has all one. It's a CPS mask. So basically, we want to let the user input this CPS mask to use a schema. So last one is a schema. So that's many schemas, right? So because it's not just one, it's not just one L3. It could have L3 and L2. And in the future, it may have multiple resources. And even one L3, it may have multiple circuits. So eventually, we have multiple schemata in one file. So users specify, okay, which portion of L3 is going to be allocated in this schema, right? So which CPUs use this schema? Which tasks use this schema? So when we, when the user mounts this resource control file system, they see four files, right? Info, that's a directory, actually. Then tasks, then CPS, then schema. Tasks has all PID. CPS at the beginning. CPS has all ones. Then schema has all ones. So basically, after a month, the meaning for this file system is all CPUs, all tasks use all resources, right? So that's the initial. Then later on, user can change this to all of the fields. For example, they can make a sub directory. If they want to create more partitions for one L3, they want to not just use all of the defaults, all of the L3, they want to create some portion, that's a partition, right? They create, make DRR, then create another directory. Then the sub directory, I have some example here, which is, example is partition one, part and bar one. So create this directory. Then immediately, in this directory, there are three files, tasks, CPS and schema. Initial status tasks is empty. CPS is all zero. It's all empty, right? Then the schema is all one. So that means this schema is not used by any CPUs or tasks. The user can control this partition. For example, they can move one PID to this part and bar one. For example, PID 1234, right? After move, this move means I write PID to this task, sub directory task. Then this task will use the new schema. And also, you need to change the schema. If you see the LKML, I send the patch and the document, how to write this schema. There's the format. It's very detailed format, how to write this. But from top level, it's very simple, right? So you want this PID use this schema. This schema just allows the user to tell the kernel which portion of L3 on which socket. So basically, it's which portion of L3 on which socket. That's the schema. So this is PID use this schema. So eventually, if a user tells this information, the kernel knows this, kernel will switch to this PID. It will use this schema. That means eventually, the kernel found the cross ID for this portion of L3 on which socket, right? So eventually, the implementation is the cross ID point to this schema. So that's the user interface. And also, this interface is, I sent out the version one last month. I'm going to send out version two this month. Actually, I tried to send out version two before this conference, but eventually, I still have something I haven't done yet. So I didn't send out yet. So I'm going to send out probably next week, so version two. So just some coding change. There are no user interface big changes except the Thomas mentioned that CPS should have the lower priorities and the tasks except that small changes. There are no other user interface changes. At least from the community, it's, I would think it's better than last year. Last year, we have C-group interface. Every people just eventually, the C-group maintainer does not like that at all, right? So we gave up. So right now, at least this user interface, so far, there are no big objections from the community. And I talked to other people like Google and actually OpenStack and also the Docker container. They don't object to this kind of user interface. So far, there are no big objections for this. So I have some like a usage case. For example, in the OpenStack, we have the user interface. We have the user space and the kernel space and the hardware. So from the high level, in the OpenStack, so we need to change the LibVert. LibVert is not part of the kernel, right? So just now what I talk is just the kernel part and kernel provide the user interface. But for the OpenStack, OpenStack guys should modify the LibVert to use this interface. And this LibVert just use that resource control file system to change the which PID, basically which guest, right? To use this, to use which kind of schema. That schema just say which portion for L3. So if they do like this, then OpenStack in the higher level, higher level OpenStack, user can use this another user interface, right? So that's basically OpenStack system admin to specify which portion for L3 used by which guest. So basically, that's higher level information. Eventually, that information translated to LibVert. LibVert pass that information to kernel resource control interface. And eventually, kernel set up this class ID, which the mapping between class ID and the CBM and which part of this L3 used by which guest. So eventually, the guest will start guest one, for example. It uses allocated L3, dedicated allocated L3 for guest one. And the same thing for guest two, right? So this workflow is like this for the OpenStack, how to use it. And for containers, it's similar things, right? The Docker and the Lib container. So Lib container just modify. So some people, I didn't modify this Lib container, right? So some Docker container people, they should change the Lib container. And the user same kernel resource control file system user interface, right? Just change the tasks, change the CPS and the scale matter and set different partitions, those kind of user interface. Eventually, the Docker user system admin can allocate some cache, L3 cache for some container. For example, in this case, container one uses portion for L3, container two uses another portion for L3, right? So, but this is, I talked to some Docker and the container people, probably they, because this is new, right? They're not working on this yet. But usually, they don't care about which L3 used by which container, right? This kind of thing. But in some scenario, actually, I would think they also agree. For example, if you have some high priority container one, another is low priority container two, right? You don't want to know the neighbor. Like just now we talked about the problems, right? So they, I don't want to like a container one is running and use some cache line in L3, right? The container two also shares the same cache line and kick out this cache line and the container one is suddenly has a cache miss, right? Eventually, it's a very bad QoS. For example, I can think about in the stock trader trading system, right? Some containers, it's doing very high priority trading things, right? Monitor the, for example, monitor current stock price change, then find some situation opportunity, then do the calculation. Immediately, it'll try to do some like trading, right? It's very high priority and the container two is low priority, right? It's just to do some like input or monitor, like just to do some like data update or maintain some just the routing maintenance, right? It's not a very high priority batch tasks. They're not needed to run share the same L3. So a container can be like a container two can have a very small cache, right? L3 cache, but the container one probably can have a very large portion of L3 and also it's isolated. It does not want to like a noise neighbor situation happen, right? In this case, right? High priority like directly use the larger portion of L3. Then it's just to have a better and low latency and the better performance. And the same thing for the real time, right? Real time can use the same kernel resource control user interface, right? The system can launch real time application P1 and also can launch P2. But before that, it needed to specify which portion of L3 used by P1, which portion of L3 used by P2, right? So by doing this, eventually the P2 and P1, they have different portion for L3. Then there's no noise neighbor and has better performance for the latency for the real time. Right now, we are developing some resource control tool, which is written in Python. And this has better kind of a good user interface, right? It's running in the user space. It has a good interface. And user can, user, I mean, it's root, usually it's root or see the mean. It can drag this, like for example, it's L3 partition. The two, okay, this screen is very small, right? So it can have the, on the side, this get the screen shot from the border well. There are two L3. So that's on the bottom L3, zero, L3, one. So the user, the set of mean can drag this, which portion of L3 used by which PID. So it's not like a good, from the user point of view, high level user point of view, there are no like class ID or CBM, right? So all user set of mean wanted to do is just some percentage of L3 used by which PID, right? So eventually it goes through the, that resource control user interface. So there's, this is a very high level information. There are no CBM, right? So because last time we talked about in the LinusCon Japan last month, some feedback from the communities. Actually, we want to like a very generic user interface in the kernel, like for example, CBM is probably x86 specific in the future arm may use different CBM. But what we think is we want to use that user interface very simple, just x86 specific. And for the higher level, if a user wanted to have, does not know the CBM, or want to use the same like a kernel interface on ARM or on AMD, ARM or AMD may have different CBM, or right now they don't have this kind of technology yet, but they may have different term or different implementation, right? So the kernel will hide that kernel interface will have that difference. But from the user, high level user, they don't know this difference. They all care about this, how many, how much L3, what's the percentage of L3 used by which PID? So this kind of information. So this resource allocation tool is kind of just a demonstration how to use this CET, right? CET technology from the user point of view. And also eventually it can be used by the CET admin in the real world. But right now it's just a demo. I just have this screenshot. Right now it's not open source yet, but we are going to open source this code. Okay. So, okay. Yes, yes. Yeah, L3 is a poor socket, right? So on Broadway, it's a poor socket. L3, zero is, usually it's, because that's a different thing. It's, in the past, L3 is always a poor socket. But architecturally speaking, it's probably it's not necessary to be poor socket. For example, we can think about one socket may have two L3, right? So one L3 shared by half of the cores on the same socket. Another L3 can shared by another half of the sockets on the same socket. That's doable. But in the past, there are not such implementation. But if you see the SDM, it's, this is doable. So that's why we created the cache ID. So this cache ID just say, okay, which, basically just identify which L3, which L2. So the level, the cache level plus cache ID uniquely identify which cache. So that's, we need this cache ID. So that's why I write some patches. And it's not up to me yet. But the internal technology does this patch already. And it's part of this cache allocation patch. So first of four patches actually is cache ID patch. So basically it's cache level plus cache ID, uniquely identify which cache. So we need this. Otherwise, I don't know how to, how do you allocate, how do you identify cache? So there's no, in the past, there's no such way to identify cache. Although people say, okay, L3 is a poor socket, actually, architecturally speaking, it's not. There are two, actually, there are two part of topology. Why is the processor topology? Why is the cache topology? In the past, the implementation, actually, L3 is a poor socket. It happened to be same. But architecturally speaking, they're not the same. It could be different, right? So one socket could have two L3 or three L3, right? It's doable. So that's why I created the cache ID, the cache level plus cache ID, uniquely identify one cache. So by doing that, actually, if you look into details about the user interface, we need to identify which cache you want to allocate, right? Which L3. So that's why this is cache 0, L3, 0, L3, 1. That's from the border. It happened to be 0 and 1, the cache ID. That's cache ID. Yeah, yeah. So that part is in the upstream already, a long time ago, right? It's in the slash devices, slash system, slash CPU, the CPU number, the index number, the cache, the index number. So over there, it has a topology already. But there is no cache ID. So how do you know which cache you are talking about? We create a cache ID for each cache leaf. So eventually, the level plus cache ID identify, uniquely identify one leaf. Okay, that's a good question. Actually, I asked another guy, Sai, because I designed this and Sai is doing the implementation. What I asked him is to, let's talk about that is the max CBM length, right? That's 20 bits on Broadway. So that means the granularity you can allocate the cache is one out of 20. So that's 5% of L3, right? So what I asked him is draw the 20 lines, actually 18 lines, right? Totally 20 portions in the vertical lines. So just tell the user, okay, all you can do is just granularity is 5%. Whenever you allocate a cache, it's 5% minimum, right? 5, 10, 15. But he didn't draw that. So yeah, so that's the granularity 5% on Broadway. It could be different on Skylake. Skylake is the same thing, but it could be different in other platforms. Okay, so we talked about the performance improvement. Actually, this CT, if you take care of this, actually you can see it's a lot of improvement, performance improvement, actually. So this is UC Berkeley. They did some experiment on Haswell. They run some network workloads. And each network workload is running in one guest virtual machine, one virtual machine. And this is the case, this screen shows the case, okay? When you run on the virtual machine, you run multiple many workloads on Haswell, right? So there's slowdown because that slowdown comes from many places, right? Because it's a virtual machine, so slowdown. And the shared cache, then slowdown. So there are many places called the performance slowdown compared to the just run native, just one workload, right? So but they use the CT. Okay, that slowdown is dramatically changed. It's dramatically improved. So now you can see, for example, these run the workload, for example, LPM, right? That slowdown is 37% compared to the native, just run LPM itself compared to run multiple workload. Each one shares the L3 and each one is a noisy neighbor with others, right? So they kind of a lot of conflict in L3. So eventually the LPM compared to the native running, just one LPM, it's 27, right? 37? Yeah. 37.6, yeah. But if they use the CT, each virtual machine running on dedicated L3, right? There are no noisy neighbors. There are no L3 conflict. Just one machine, just run on dedicated portion of L3. This is the slowdown dramatically changed. For LPM, it's slowdown only 1.9%. It's a dramatically changed, right? So that's like a very impressed performance improvement because there are no noisy neighbors, there are no conflict L3. But this 1.9 still comes from others like a virtual machine, those kind of thing, right? So yes. And the each, yeah. And also they have the network, they share the same network, right? So let's still have a little slowdown. But compare just L3 change, right? CT, just use CT itself. We slowdown comes from 37% to 1.9%. So this is another performance improvement done by Intel. Yeah, this is done by Intel. So run the CPU 2009 and the multiple, because CPU 2009 have multiple workload, right? Just run one workload. Compare, let's compare with run four copies of workload because we want the noisy neighbor, right? So one workload is running and another three are noisy neighbors. So eventually, so they have the L3 conflict. If we use the CT and allocate the dedicated L3 cache for each copy of CPU 2009 workload, so you can see that this Y is also slowdown, right? Slowdown number. You can see on the left side, it's slowdown is very high for each workload, right? On the right side, use the CT and the dedicated L3. Then you can see the slowdown is dramatically changed. It's smaller. So Intel did this kind of performance improvement. I have a lot of other performance improvement. Yesterday, the Google, they did the CT and use our patch. And also they show like a real time, like many times actually the latency, many times difference by just use the CT. But I asked them, I didn't copy that slide here. Actually, you can see the dramatic change the CT for the real time. Okay. The current status is we have the version one released last month and version two I'm going to release and fix some coding issue. The user interface is very small changes, almost no changes except task one. Task should have higher priority than CPS. That's it, right? Okay. And this is the references. Okay. Some people helped me to do Tony, Peter, Ravi, Vikas on the side. They helped me finish this presentation and also other user interface comments and also thanks for the community to comment on my first version of the patch. Okay. What's the Q&A? Okay. Yeah. Go ahead. Yes. That's a good question. Yeah. That's a good question. Actually, because a lot of implementation details and that's one question we have the overlap, right? It could be overlap. So you have one portion here, for example, do I have overlap? Yeah. Yeah. Actually, these two show the overlap, right? For example, on the L3 zero, we have the green, yellow and the blue, they all overlapped, right? They share the same. It's doable. Hardware does not prevent, does not block this. And the software, we can do it, right? But in reality, if you do this, you need some knowledge. Why do you do this kind of overlap? The system admin should know this overlap, how this overlap impacted the performance because after all, the three PID, they share the same portion of L3. Although other parts, they don't share, right? But they still share some portion of the L3. So it's still doable. If the system admin, for example, I can think about this, if they care some, like care some noisy neighbor, but not so much, they still can tolerate some conflict, L3 conflict. They can use this, right? Otherwise, if you don't use overlap, it's very, very, there's less chances you have no overlap because the granularity is 5%, right? So at most, you can, if they're not overlapped at all, you can up, you can allocate up to 20 different portions, portions of L3, right? If you allocate 21, definitely you are going to have overlap at least, right? So overlap is heard of the performance, but in some cases, probably you want that. Otherwise, it's not flexible. This allocation is not flexible. Yeah. Okay. Yeah, one more minute. Okay. So what's the last sentence? Okay. Yeah. So if you right now to do it, I have the patches in, okay, I mean oh, you can download. Otherwise, you can use my github. I can upload the slides. There's one link here. So I have the github that has the latest patch on the tree. You can clone it and build the kernel. It has a document in the kernel source tree, right? So you can see, okay, I have some examples. And also, these slides also have some examples. You can experiment on it. Actually, it's very good for more people to use it. Yeah. Okay. You talk about memory bandwidth, and you want allocation part, right? Yeah. So that's, I shouldn't say that, but it's not in SDM. But it's not my position to talk about that. Okay. Yeah. That's all I can say. I think it's done. If we have questions, we can ask after this. No problem. Yeah. Otherwise, we are done. Okay. Thank you very much. Okay.