 Okay, awesome. So it's me again, Hongjian Fan. And for this session, I'm going to talk about shared memory and pulled memory. Okay. So there are some really informal definitions about the two models. First thing I need to make clarification is this is talking about a scenario that we use memory appliances. Basically, several servers connect to the same memory appliances and each server can request or use the memory on the appliance. So basically the overall goal is to share the memory, improve the utilization and reduce the overall costs so that each server doesn't have to install huge amount of memory, what it can target like at its average use, utilization and in a short period of time when it needs amount of memory, you can ask that temporary memory from the memory appliance. So the major difference between a pulled memory model and a shared memory model is if that memory is exclusive, a pulled memory, it will have its own memory. It's not going to be shared with any other thing. So once it's claimed or requested that memory, it's all theirs. But for shared memory, you will have the same size of memory as other servers connected to the same appliance. When you use this to memory, you can allocate, but the memory could be used by some other servers. So when our application tries to allocate memory, you might not allocate all the memories from the memory space. So let me show some illustration. So on the left is the pulled memory. So basically I'm showing the DRAM as one human node or whatever, one space range and the CXL memory as another human node or space range. And it looks exactly like its own memory. But if it fails, it needs more memory at some time, it's then a request to the memory appliance. And the memory appliance will give it more memory. So it feels like a physically hot plug-in or extra piece of DRAM or whatever or a CXL memory. But it feels like a hot plug-in. And once it's finished with that piece of memory, okay, I don't need it, I can give it back. It's a hot unplug. But for the shared memory, you will be aware of it. At least what I understand, the memory management side will understand that's a huge piece of memory. I can use it and some other servers can use it as well. So I have a whole piece of memory, but some of it is unusable. So this is a kind of a basic concept of the code memory and the shared memory. It's still in the concept phase. And I hope to get some script and the model setup, maybe in the next couple of months, hopefully. So here are some sinkings from myself and I would like to get your opinions as well. So for the model, the key requirement or the feature that needed is the mechanism of a hot expansion shrinking. Like to get that memory to be added and remove the frilly or at least feasible and within acceptable time limit. And the other thing I think for this to be working, there needs to be a device driver or some kind of a host service that monitors the system. Saying, okay, I want to always keep like at least one gigabytes of free memory on my server. Once the applications use a lot, the free memory is less than a month, I will request and get more memory from the clients. And for the shared memory system, one thing I will think about is how the system will be able to know which memory is available, which is not and what kind of level of granularity. Do we need to make it to a single page level or we can just keep on the larger side? Say currently there is a structure under this route. I can online, offline and existing memory. Is this could be a good start point for this kind of a mechanism? So that's one thing. And the other challenge for this is several different servers will be able to access the same memory, but we need to let them know which piece of the memory is not usable. So it's hard to, at least there are some thinking about should we put a mechanism that each server will keep a copy of use the memory so that each of the server will say, okay, I know where it has been used and when I want to allocate, I know the rest are available. And the other thinking is let the memory appliance to keep the usage. Every time the host server wants to use a new memory, you will ask the memory appliance, show me some places it's usable. So the first way we'll have a lot of communications, data traffic between each servers. And the second one will generate a lot of traffic and maybe some delays. Can I ask a basic question about this for the shared memory systems? It sounds like you're saying you'd like the memory to be managed here just like normal RAM is today. Are you sure about that? I've always assumed that things where the memory can kind of be written by some other entity, we have to have that be really special because it's essentially, for instance, like not cash coherent, right? We have to make sure that, if somebody else is using that memory that we don't have any dirty cash lines to that memory. And it's of course possible to write software that has this coordination in place, but like nothing in the kernel understands that that's how this memory could work. So we couldn't, for instance, like ever have a kernel data structure and shared memory. So that to me says that it can't really be online as normal old memory, right? Were you thinking something different? The idea is that you will be pretty much similar to the system memory. So you will keep out of cash coherency and all of this is based on the CXL protocol. Well, but I'm saying it's fundamentally not cash coherent, right? Because if I've cashed some data in my CPU caches, there's nothing to say that that memory that I've cached, the backing memory won't change underneath what I have in my CPU caches. And that's fundamental to having shared memory, right? Because something else across the CXL bus, some other system that's sharing it can go right to that. And so all of a sudden the data I have in my cache doesn't match the data I have in memory. And again, you can write software that understands this and can figure this out. But I'm saying the kernel isn't that software and most things that call mallet aren't that software. So how do we reconcile that? Yes, oh, okay, so I might, okay. So let me make sure I understand correctly. So this shared is just because all the servers share the same memory space is not meaning they can access to the same piece of memory. If server A, allocate the memory from a shared memory pool, by the time server B can no longer access to that memory. Oh, so it's not truly shared between two systems. It really is private to the system, but it's like physically in the same place. It's the same physical memory space, same physical range. But once a piece of memory has been allocated, the other servers on the server level is not allowed to access to that. And we need to decide where to stop other servers to access to it if it's on the memory. I'm gonna jump in here because like what I've been hearing here is nothing new. In the virtualized world, we have both. Like the first thing is essentially just ballooning over to a mem which resizes memory. The second thing, interestingly, you'll find on L-Powers like on IBM C-System, standby memory where you can try on mining memory if it's available in the hypervisor. If it's currently used by somebody else, you cannot online it. So it's essentially the same thing just now in hardware which scares me a bit because like, I'm not sure if we should be doing that but you might have your reasons. We might find use cases. What I'm trying to say is that like we're fighting in the virtualized world with all of that. So you're gonna have to fight a lot in the physical world as well. And you can contact me like David Thurdenbrand and I can tell you all of the details about like what, for example, PowerDLPardas, what IBM C-System with standby memory does. Like how different ballooning devices work, stuff like that. But in summary, this is really not something new and some of the problems were already solved. Like for example, an IBM C-System with standby memory, they abuse, for example, LSM and change these commands to add or remove more memory. And like if the hypervisors allow it to give them more. So yeah, it might be good to explore what others have been doing already and like what are the limitations. Because like whenever we talk about removing memory, it's like it's mostly pure luck if it works or not. And that's not gonna change just because we call it CXL. Unfortunately, it imposes even more problems, I would say. Yeah, sounds like we are trying to bring a problem that between the hypervisor and the virtual machines into a server to appliances level. Yeah, so yeah, that's why I'm showing two models and I'm not saying one model is better or the other. And it's basically just some ideas and I would like to get some experiments with. But as you mentioned, it's an existing problem that will give me a lot of like learning. So this is really good. Any other questions or comments so far? So I'm curious as to what you think the benefit of, the shared memory system, you seem to be wanting to, you know, it's all physically visible from everywhere but you're still going to try to exclusively use it in one place or another. So versus the pool memory system, are you thinking that that is gonna be more dynamic with less work than sort of the pool view of things? Yes, it really depends. So the pool model, I think the key point is how do we fast and reliably doing hard plug-in and the removal. And for shared memory, we need to communicate where it's usable, where it's not. This is the key difference. Okay, and for the shared memory, you are not expecting hardware coherency between all those machines. If I stopped using it, my responsibility is to make sure that it was fully flushed and nothing exists in the hardware before someone else can start using it. Is that correct? I think that's the point. Okay, thanks. Where in system are you gonna provide those guarantees though? I'm curious in your model here, when you have that sharing, where is the access to that memory cut off after the system is done using it, for instance? Yeah, that's, I do not have an answer to that, sorry. I really need to think about more details, but yeah, I don't have that answer, guys. I could think about maybe some communications between the host and the appliance, like saying if a piece of memory is going to be removed, we need that host node to flash all the cash counting back to the memory. No, I mean, all the data on that piece of memory has to be located under. So basically it means there's no valuable information on it. Yeah, Dave, as far as I understand, that level of where the cut off is, where it's unmapped from the host is outside of the kernel's visibility. So the device knows what it's decoding, but if somebody behind the device is physically taking away extends from your mapping, there's the interface for the kernel to acknowledge that, but in terms of like, could the kernel control it or make sure that nobody else is accessing it? Yeah, it's kind of a trust the switch, the trust the thing behind the device kind of situation. But I was gonna say to Dave's point, yeah, I think people are gonna want, there's this conflict between people who want their normal memory allocation, new APIs, they want online memory because that's what their applications are written to. And then there's also people that they simultaneously want hard guarantees about removing memory, and those two things are in conflict. And so like you were saying yesterday, Dave, like I think we either need to reset people's expectations that no, you can't remove everything all the time, but you can probably move some of it most, like you can move for some of it most of the time, but not all of it all the time, either reset those expectations or teach people how to keep their, or change applications to make sure that they don't let memory get into places where they can't get it back out again. But this is gonna, I think it's gonna be a learning experience for the industry. Yeah, it will kind of expand the places where people get to see memory hot plug. And like David mentioned, there are places where it's used a little more widely today like in VMs, but this will widen the number of people that are exposed to it. So I think that's the other, the thing I really want people to look at here is, how is this different from memory hot plug? And I don't think that it is. And one of the limitations that we have in the kernel today on memory hot plug and there are quite a few. Yeah, it's adding a level behind memory hot plug that wasn't there before, which is, like memory hot plug is kind of like the last mile of instantiating new physical address space. But then there's a management of the physical address space behind that before it gets to the memory hot plug that Linux never had to do before. Like it's the thing that all our platform firmers did for us. Like they mapped DDR for us in this, told us in address range. But now Linux is involved in managing the address range before the fact. And then saying, okay, this address range actually has populated memory now. Now go do the memory hot plug thing. So it's kind of a, it's that new layer of physical address management. But yeah, this is an active area of the CXL specification working group to figure all these details. Yeah, thanks for that. Yes, so it sounds like CXL committee needs some further work on this topic. Okay, let me just to cover the rest of this page. So for shared memory system, I think it's probably also needs some kind of warning or mechanism to show the system how much real available memory is. But if the system is keeping its own kind of tracking of used memory might not be that critical for this. The other question, what the other thinking is within the memory being shared between different servers, is that possible or will make any practical use to add some kind of weight period if the memory is not enough just to wait for other servers or other thread to release some memory before like terminate the thread. And the last thing I was thinking about the same, the appliance has been run for long time enough and the fragmentations might cause some usage like waste of the memory. So is that another place to consider? It's two comments from me regarding out of memory handler whenever I read that I get scared because people think that they can intercept out of memory handler and just like get the system back alive by waiting long enough, but that's not true because you can have other allocations already failing and messing up your system. That's actually like one thing I've been discussing lately with some virtual balloon related work with again auto ballooning, which I detest where they like wanted to intercept and out of memory handler and then just tell the system, oh yeah, let's wait for like more memory to get back to us. But that doesn't work usually because you could have some other allocations already failing and messing up the system. So what I'm trying to say is that this is like a really important topic to work on. Like how could you actually tell the system that you're currently shorter memory and wait in a safe way? And out of memory killer, for example, intercepting and that is usually not what you wanna do. And the other point you raise with memory defragmentation, I mean like we have to do a lot better like work on that. I mean, we have all of these transparent huge page cases like we want to allocate like, like everybody nowadays wants to allocate like large chunks of memory and that doesn't really reflect the reality in the system. That's also relevant when you want to remove memory from the system and bigger granularity than to space pages. So like I think like the last two points are the most important work you have to look into to make all of the other above work somewhat reliably. Yeah, thank you. Any other suggestions? It just kind of pile on the issue. Yeah, don't forget that like if you were waiting for this appliance to give you some more memory, that involves memory hot ad, right? And memory hot ad requires allocating memory which is hard when you're already out. So again, there are pools and ways to do these up front but it is a very hard problem just to say, oh, we can wait for them to give us more physical address space. Because there's a lot of work and that's on a much different time scale than the new handler works. So yeah, I can't imagine this being applied to actually stop something that's doomed. You're gonna have to do it before you can handle it. Like I talked about the human handler as being something like an airbag in a car. Yeah, it's a safety mechanism but it's kind of a last resort. Like it is not something you ever want to do. Some of you never want to have to resort to it. Yeah, I mean, I never thought that this could be reactionary. I always thought it had to be like you're scheduling jobs and you know this job at this time of day needs a whole bunch of memory but every other time of day it doesn't. So you temporarily add proactively and then you try to get it back after the fact. But yeah, I don't think we can have reaction. We can't have like the kernel memory pressure talking to the data center controllers to add more memory. I think that's too late. It has to be proactive, at least in the beginning. Yeah, sounds like adding memory is always easier than removing it. Yeah. Yeah, this is really good. And I think, yeah, get a lot of information from you guys and this, oh, I want to cover. And thank you. I think we are one minute early to finish this section. Thank you.