 All right on my own, okay so because leases were hard and That project was stalling Intel decided I should go work on something else Actually other wasn't even Intel. It was Kristoff but So I worked on a little project called protection key supervisors right before the talk At breakfast Dan asked me so why isn't called supervisor protection keys? I have no idea I know that user space protection keys were PKU so So this little project I The overview is that version 10 is posted to LKM L right now and The the quick summary is that this overlays additional protections on top of the page table protections and This this feature already exists in power PC and x86 for user space pages what this feature does is add that same protection for supervisor pages and It doesn't it doesn't do execute permissions, but you can avoid right or Read basically any access Is the protections that you can overlay on top of the current kernel mappings? The protections can be disabled and enabled on a thread and CPU local Basis so when you actually change the permissions in in the hardware, it's actually happening on just that local CPU core so The nice thing about this is you can change protections quickly for a large range of pages with no tlb flushes in a very local and targeted manner so And this is why I got roped into this feature is because I work on persistent memory and persistent memory is this large memory space and We wanted to be able to prevent Random users from writing to this large memory space. So that's my first use case there and The Problem with this is that Originally we thought hey, this is you know We're kind of add these calls in and and when anybody needs to access a page You know we can flip these permissions really quick. We'll get access The rest all other threads all other CPUs won't have access to this this surface But the problem is that and we so we ended up abusing the k-map interface The problem is that k-map has a number of places where The mapping is not done as intended the mapping is done in a way that it basically does a long-term mapping So it basically k-map something takes that pointer stores it away Other threads come in and use that pointer to access the mapping and you know yada yada just For reference there's a couple other use cases page table protection and kernel keys protection We're a couple other use cases that were posted along the way as I got to v10 The page table protection some of you are familiar with it Rick Rick edge comb Basically did that patch set and the idea there was to protect the kernel page tables From using this mechanism so again because we can turn this on and off It's local so we can you know make sure that only the thread that's updating those page tables updates those page tables at the Right time so there's been a couple other use cases We thought of but these are the these are the three that we've actually written code for and kind of tested a little bit And the P maimed story right is the one that I'm really focused on So what is the issue like I'm in version 10. There's a lot of been lots of reviews There's been feedback people there's been a couple LWN articles on this So, you know, why is this why am I here because this conference is all about the future? Where are we going? Well, the real issue is that And the issue isn't that high mems going away because high mems should go away You know we were in a 64-bit world now the idea of needing to map high pages and 32-bit kernels with high with large amounts of memory is is somewhat antiquated And I say somewhat because I've read the articles And I know that there's arm CPUs out there that still require high mem and and so, you know Even Linus is not getting away, you know, he's not going to delete it anytime soon But we're moving in that direction, you know, effectively came up and came up atomic or deprecated Came up local page was created by Thomas And he did that basically to help support this feature so you know my original Way to fix this was to introduce the came out local or came out thread. I think I call it I forget in it And so that was a way to basically audit all the came up sites and say well These are all the sites that do the right thing. They came out the page They do their little Access and then they unmapped the page and they're all in this thread and they work great with pks And you know and most of the file systems that P mem cares about anyway are covered by that There's a couple other and I and I've actually updated butter FS as well, and there's a couple other places so and We actually have Some work going on in that area right now to convert all these came up sites But We're kind of growing this alternative access which is page address and obviously page actress isn't new everybody You know, it's there and it's been used But the problem is it can't work because there's no corresponding unmapped There's no corresponding unblock so to speak right and so really pks is is the first But probably not the first situation where we have extra protections on the direct map Where we're going to want to basically remove those protections temporarily and then restore them after so and we also have other places that we're looking to split the direct map and So the idea of just adding page address whenever somebody needs access when it when high mem goes away I I don't think is a good solution going forward. I think we need to make sure that we bake in the idea that Just accessing the direct map isn't Isn't going to be viable with all the extra protections and other things that we're doing with the direct map So, you know a couple of ideas that I've had is really easily just say we redefine what k-map means It's no longer a high memory Access thing. It's literally just give me the kernel mapping for this page Give me the kernel virtual address for this page. So, you know, obviously in 64 bit. It doesn't do anything There's no mapping. It just gives me the address and that's what it does now Another idea is maybe a lightweight V-map, you know, we're maybe a mapping actually is created I don't kind of like that idea, but Or we just you know make V-map better Possibly. Yeah, I know but I mean, there's there's there's a lot of caching of things that V Malak Like V Malak as a whole could be doing and doesn't because we've never really cared too much about its performance And I think if there were some targeted efforts to make V Malak and V-map more efficient Like it just could be better for us all around Okay, so so and that helps solve the problem of you know a mapping that you as long term So, you know because right now our kind of line in the sand is if you really need a long-term mapping Then you need to use V-map and in fact the patch set that I've Submitted V10 and actually it's been in there since V6 or 7 is the DM the DM cache basically if it sees These this protection enabled it it forces a V-map on some pages and I forget the details there But you know it was already doing a V-map in certain situations So I just said well you can't use K-map in this situation But we probably should just make it use a V-map all the time. I'd have to go back and revisit that DM code The device map work And and maybe the alternative is something to the direct map, right? You know, we don't actually have a direct map Which I think some people are trying to you know look at this stuff I even saw some people suggest that we just map memory on the fly. I don't think that's ever going to work I don't think it's gonna be performant. So, you know, that's that's kind of my ideas. So Thoughts, you know, can we can we redefine what K-map means? And certainly, you know something like K-map local page I think probably needs to leave live on even once hymem goes away So at least for some for at least one use case the nice thread local mapping Use case, I think we need to preserve that Interface and maybe we change the name to something else we could do a global rep replace, I guess But I'm okay just leaving the name Is anyone in this room thinking about cherry at all? I'm sorry, I didn't do cherry cherry. Okay, so all right So cherry is a research project at the University of Cambridge. It's being led by Robert Watson I know Jessica Clark who works on it, which is how I know so much about it Basically, it is a capability based system. I think there's been an article in LWN about this. Am I right there John? No, okay, sorry, it's been mentioned. Okay. That's probably what I'm thinking of Yes, it has like a 128 bit 129 bit address space something like that and No, no, it's not properly 128 bit like no object in it is really 128 bits in size no individual objects more than 64 bit But it's using the extra bits of address space as basically a tag as Protection key as basically you're making the address bit space so big that you can't search it and so They're basically they're based in free BSD and so free BSD is getting support for cherry and The the the the people involved with it are They come from the BSD side of the world And they took a look at supporting cherry with Linux and they ran away screaming because our type system is all wrong for them But I mean that that's kind of the opinion of outsiders on our pile of crap and You know, we we we swim and our crap every day and I should drop this metaphor You know, we know ourselves right we know how to work with the system we have and If if anyone were trying to think about it This would be a really great time to speak up but since this is clearly the first time many of you are hearing about cherry it spells C-H-E-R-I and Everything about the project is full of cherry cherry puns. So there's a board called morello because of morello cherries Anyway, it's it's it's kind of cool But it's it's a really intro and so they have FPGAs I think they may actually have taped out a CPU that is now in being manufactured I mean, you know, it's not like being manufactured on seven nanometer anything crazy. All right, these are Fundamentally university people, but they do have quite a lot of interest from limited Whatever the company is called these days So, yeah, I mean it's it's real it does exist the hardware exists you you you can get a board and and free VSD on it in this more secure mode and You know, this is not million miles away from being basically the same thing That that's how pks works it takes a few bits in in the virtual address and Yeah, but their virtual address is a 128 bit. So They have a lot more bits. Yeah. Yeah, exactly. Yeah, it's different But it's it's it's interesting and it's probably something we should keep an eye on if not be working on it properly So, you know, if anyone's inspired take a look at getting Linux working on a cherry CPU Mike because I can't even hear you Sorry The question was if arm does something like cherry and as far as I know It's a joint project between the arm and the Cambridge University. Sorry. So as a pretty heavy user of KMap, right like Personally, I don't care, right? Like if you want to change the interface and like change how we do mappings or whatever like go for it Man, like just tell me what new function. I'm supposed to use and I'll go and convert it because really the only time we care about it is It's right to our metadata pages, right? Like we have our helpers that came at the page and then do our rights and updates me on that Like if I gotta call something else or allocate differently or whatever like that is the least interesting part of butter FS I will just go change some functions and make it work. Okay Cool, I was kind of hoping Kristoff would be here, but Because I was expecting him to yell at me The the other comment I'll make from the butter a fast point of view and this will be important for the pagecast sharing stuff as well Every couple of years we end up writing and debugging patch where we intentionally set pages read only so we can figure out who is changing them on us without our knowledge And this would make that a lot easier It would be something we could actually leave in the kernel and just have it a mount option or a Config option or something so sweet. It would it would be better for us. Cool Yeah, so yeah, like I said that the support for the core idea seems to be pretty good. So yeah I'll ping you guys when So We'll find Kristoff, but I'll just state the state the direct Kristoff's observation was that Why can't I Kristoff speaking, why can I just use page as a paid address directly? Why do we have to go through came up with a page? And so I think if we're gonna if Linux Linux is gonna do cherry if Linux is gonna care about kind of local access permissions, I think I Think what you're arguing for is yeah, we need to keep came up with a page And if you have an inclination to use page address You're you're getting in the way of some of these fancier permission capabilities yeah, and It's just I mean page address is gonna stick around But it needs to be like localized and this gets back to one of the comments I made in the mm section yesterday was you know Understanding where those interfaces are like because you know, I can see a driver writer going oh page address gives me the virtual Oh, look my driver works, you know, so anyway, yeah, we'll get back to trying to document this really well, too and If we are thinking about making changes to this interface, I would like to beg for something along the lines of Something that supports multiple pages because one of the one of the things I had to do with the folio work is say Well, what do I do for stuff that calls came up local page? It's like well, I need to select one page out of folio. So we now have came up local folio which takes a folio and Bite offset within that folio as it's two arguments. Did you land that? Yeah, that's landed. I missed the pack Okay, sorry. No. Well, no my fault Really, it's your fault. You didn't find it in the like 800 folio patches. I've sent really well I recently unsubscribed from LKML. So I Couldn't I'm trying to lay thing, but we'll see so but so it's not it's not really a great interface because you You tell it I want this I want to access at this offset of that folio. Okay, that's great But you only get to access up to the end of the page that that Offset happens to land in and that seems like I've I've really given everyone a land mine But the whole K-map Local stuff can't local and came out of time for that matter only gave you a page anyway And it's it's kind of nasty. And so, you know, the alternative is B-map, but I Don't know I would really like it if we could have a K-map local range Wait, wait, wait, you gave us a folio a start address and a length Right, but again, this is a temp like temporary mapping. Yeah, absolutely. Okay. Yeah, all right Yeah, okay good. Oh and there's a caveat on that which is of course if you want 64 But you do get to access as much as you want past the end of that right in this unless you turned on the debugging Yeah, okay. Okay, sure. Okay. Yeah, I like so we have a similar thing inside but our fast because we have we support multi-page metadata blocks like by default we have 16k metadata blocks and like all of our helpers just if you give it a bite offsets and a length or whatever And it just loops through and goes, okay This is the next page K on map the previous page came up the next page do the thing and so like this like That's literally where we came up and came up So like you tell me to use the new shit and I just go change these helpers and then we're using the new stuff Okay, and I'll see what I can do because It could be it could be nice to you know optimize a 64-bit path and then in the 32-bit path just fall back to some loop and you know because Not who cares because I don't want to you know, I know there's people out there But you know, it's it's it's not the optimal Yeah, I mean if I can if I can came out for all 16 You know, let all four pages or whatever and get the whole thing and I can just do my thing then Hooray, but I already have a loop. So I'm good either way. So awesome I was gonna ask a question about the debug facility. Do you care about the you care about the? TLB shoot down overhead like I mean like you're paying that today, right? So it's kind of just a debug facility. Yeah, so like we straight up just We do this fun thing and actually yon gave me this idea like 10 years ago We're like I will set the slab pages like slab sizes to page size and then Whenever you access it to write it like I put a help I forget exactly how we end up doing it but basically just do Right protect so if we're done using it like we unright protect it to update it and then we write protect it otherwise and then we just wait for the system to crash and At this point like this is usually our like we have no idea Something's wrong. And so we do this and we don't care about performance at that point. It's just a Figure out who's corrupting our pages. Okay, but it's like totally space inefficient and oh, yeah You wouldn't turn it on during no job. No, I'd I'd install it on a thousand machines and wait for something to blow up Like Did you read that right right? Yeah, this feature is like if you wanted to leave it on all the time and you want to make it fast Okay More time for is it break? I'm sorry Okay, yeah, we got one more session. I got build one up. So we got more time to talk about this. Thank you