 In the verifier we need to make a lot of guarantees to ensure that memory is accessed very safely. And so that's kind of the idea behind dynamic pointers in BPF. And so this is kind of similar to fat pointers or smart pointers and other languages where essentially you have your data that it's pointing to, but alongside that data you have some other metadata that you can use to, in this case in BPF, enforce certain things that you're not accessing beyond your memory range or that you're not trying to write into something that is read-only. And so underneath this is kind of the structure of how it's defined at the pointer in BPF is you have some data that you're pointing to, some size that tracks the size of the data you want to interact with, you're offset into the data. And we currently use the first I believe eight upper bits of size to keep track of some other things like if it's read-only, what kind of dim pointer it is. And so yeah, so kind of some use case applications of this is, I think the first thing that will probably be used most mostly for is like we're able to now do dynamic memory allocations. So we can do I guess the equivalent of like malloc within BPF and we're able to kind of keep track in the verifier that it will always be freed by the end of the program so that no memory gets leaked. And alongside that you're able to persist these dynamic memory allocations and BPF maps so that you can use that across programs and be able to use it that way. Kind of another use case is which I think some people asked about I think last year for ring buffers is a way to have dynamically sized ring buffer reservations. And so we're currently able to do that in BPF right now but that requires an extra memory copy. And so kind of through this interface you're able to do that without incurring an extra cost. And so another application is also parsing like SK buff or XDP data more dynamically and ergonomically. Like I think right now you have to add a bunch of these if tracks to make sure you're not writing past the data end. And through dynamic pointers you can kind of use like an iterator like interface where you just iterate through your data and access it through that. And then also like dynamically sized strings without instead of having it being statically known at compile time you're able to just specify it whether it's like through the user space application dictating like sizes. So some example use case APIs right now the first kind of batch of patches is currently upstream and that kind of adds the most basic kind of dim pointer which is like a malloc dim pointer. And as well as adds like the things you're kind of need to do in the verifier to keep track of all the state and ensure that nothing really goes wrong. And so in the next series of patch sets we'll be kind of be adding onto that idea and kind of adding more of the functionality and like helpers to work with this. But yeah so like for mallocs you're able to use like BPF dim pointer alloc to kind of dynamically allocate a whichever size memory you wish to. And the verifier kind of enforces that before your program ends there always needs to be a dim pointer put which is essentially the equivalent of like a free. Can I ask an API question? Yeah okay cool. Could we you know maybe not in your first version or something but could we put the memory in a map and then skip the put? So if you put it in a map so yes you're able to put it in a map and then so when you put it in a map that acquires another like I guess reference count so in your BPF program you would still need to call a put. Yeah essentially the way it works is that with a malloc we'll reserve a few extra bytes in the beginning of the allocation to kind of keep track of like the ref counting. Awesome I think I remember talking about malloc years ago so. Yeah and then for like ring buffer it's kind of like very similar to the current ring buffer APIs we have just like ring buff reserve dim pointer ring buff smith dim pointer and kind of the same underlying idea behind ring buffer follows as well as like to whenever you reserve a certain record you always need to call smith or discard on it kind of like what you currently do with ring buffers like with packet parsing you're able to get a dim pointer into that specific XDP or SK buff data and then kind of iterate it through like an iterator. Yeah it's kind of what I wanted to bring up for discussion is there talk about. One very important like part of this dim pointer API that's not on the slide is how do you actually convert dim pointer to a direct memory access can you touch on that? Yeah so are you talking about the case where like your like a user space wants to read into some specific. So like was there is that right like what was normal reserve ppf ring buffer reserve you're saying like I want 100 bytes. Oh yeah yeah okay. Yeah so we're also able to get a direct data slice into that and essentially in order to do that you kind of have to know what size data that you want to access so if it's like oh I want to access like 50 bytes of this you can call something like BPF I think it's called BPF data from men where you pass it in your dim pointer and it'll give you a direct access to that specific part of memory that you want so you're able to just directly access that directly read write mem copy whatever you want to do into it just like a normal buffer and the reason it has to be statically known is so that in the verifier you can kind of enforce that you're not trying to access something that's like outside of the memory range that you wanted so so with the map the map use case right you'll you'll allocate and you'll push it into a map but then how do you know without a runtime check how do you when you read it back out how do you check the size so like yeah require the size to be part of the map specification or if it's just sort of a generic map of pointers dynamic pointers so then you load them in then you'll read them in read them out and then you'll try to do is like read some offset into that right so how do you do that without a runtime check do you mean there is a runtime check okay that makes sense I also have a question regarding the packet parsing so do we have a helper yet where we can also say I want to have a din pointer for given offset for given size or so when you have a dynamic offset which is which a verifier might not be understand that verification time sorry for example like the I for like for for example IPv6 extension header parsing right so I think the what was nice and we discuss this on the list would be the user like we could have a PPF 10 pointer right users help her as well that would avoid the extra copy sometime with the PPF probe read user into that yeah but this is super cool thank you you mentioned that if we put a din pointer into a map the wrath count will be incremented and decremented automatically things will just work yeah what happens if I put a struct that contains a din pointer into map yeah that's a good question so whether you currently do that for spin locks and timers so you're able to put a din pointer inside a struct and access it within the struct that's a map value is this in last slide do you have an example how it would look API will look when din pointer is part of the map no yeah this is the last line for a packet parsing can you use that I would assume you can use it interchangeably right where you later on used also like the old way of getting a range for parsing packet right okay and as far as like escapee from escapee and xdb see we need the context there can be like one helper that gets like data data in is a difference for us so it's because of the multi multi buffer stuff but multi buffer we cannot do anyway both the data and data and like two pointers I guess a related question on the SKB stuff how do you handle frags like yeah do you have to walk the frags somehow or or yeah so I was kind of thinking of so inside you're able to track like which type of din pointer it is so if it's like a one that needs to be like paged in I'll do that for you I guess automatically where it sees all you're trying to access but yeah yeah so like an SKB could have chained SKB essentially right and and like it's all one I'm what I'm trying to understand is I guess the the program would say here's an SKB here's my pointer and then if it wanted to get the next SKB you would have to walk that chain somehow I'm not sure how this would work for me like TC and stuff like where I think you probably have to linearize it because it can be shared yeah but that's not good right like that's not might as well not even we don't want to that's going to be a huge performance that's so but then every load would have to be checked right because you wouldn't know what data was actually pointing at my making sense like if it's like it's basically a scatter gather list right like simplify it to that you have a scatter gather list how do you do a dynamic pointer over a scatter gather list and either either this API only works with one entry in the scatter gather list and it's up the BPF program to sort of walk the scatter gather list and do a dynamic which to me makes the most sense I think I guess the other way that we tried to do with the which we do with like sock map scatter gather lists from user space the SK message stuff is the user can read anywhere and then kind of under the under the covers of the API it figures out it does that walk for you and finds it which I think is actually probably not the best way but that's your copy right then we copy it and return it back so like it optima it says okay we'll do the optimal case unless we can't because it crosses yeah and then it's it then it shoves it in the scatter gather list as an entry in the list so it basically reallocates a chunk of the scatter gather listen puts it in but I think so basically like internally we actually know what kind of din pointer it is right so we can have special operations for special kinds of din pointer so like SKB can be a special kind of din pointer then we can add like another API that like linearizes part of the packet like or like entire packet like you can specify like from offset hundred like next 200 bytes right and you just copy it into another din pointer or into the buffer whatever that's the exact API that is used for sock map right you say like here's my start pointer here's my end pointer and if you're if you're smart about your application if you if you really want to optimize things you always make sure that that in pointer never goes over the end of the scatter gather list because you know the layout right you can from the BPR program you go what's my layout I don't want to incur linearization cost so I'm always gonna chop it off at the end of that list but if you're like don't care or like for some reason you need to get data across the boundary the API is perfectly fine for you to do that and then it will basically fix up this scatter gather list for you. Yeah and so I mentioned like one of the important APIs that's missing on this slide is like when you have din pointer right like you technically ring verifier doesn't know like how many bytes but you as an application you might know that like there is definitely like first hundred bytes that is there like that's for like ring buffers are like for ring buffers are like the typical use case like for dynamic use case would be you would have some prefix right which is fixed it's like some struct that you maintain right and the rest will be pre-allocated for some string data that was like actually the case of from you guys from from Google right like reading like the environmental stuff so what you would do you would like pre-calculate how many bytes in total you want but you only care to directly write into like first hundred bytes so there will be API that says like give me a direct pointer like for first hundred bytes or now if like there is not enough bytes so similar approach can be done here like if you know the layout of each buffer you can like advance it by some amount of bytes and they say like assume that there are at least hundred linear bytes and then you just work with them right and you can iterate like that it's kind of complicated still but that's how the pack of stuff works to you right because like you if your application is dumb and writing like a character at a time I don't care if I am not performing like your application is broken but if so don't you need an end pointer here though too then to say like where to stop or like a number of bytes is it you mean for from SKB yeah you pass entire context SKB right now so like we figure out like everything from that but that's what that's what I'd say much like we can pass data and data add but then it will not work for multi-buff while if you pass the SKB for like XDP MD you actually know about multi-buff if you want right right so wouldn't that be an API change here to add the XDP pointer plus either length or end of the data am I missing something because okay so what would from SKB do because there's a lot of frags there it's going to give you a pointer to so like in that like if you can go back to this slide oh one more so you see this data right what it points to depends on like what kind of the point yeah it is right so for SKB we can just point to SKB or like XDP MD or whatever right like size and offset is kind of like local view so this is kind of inspired by ghost license right like where you have like original like memory and then you actually have like limited view of it like if you want to right so for this multi-buff use case you would still point to like XDP context basically and then based on size and offset you can like extract like in the data that you need right work with it if that makes but then if you walk past the first buffer if you say like I want to go whatever 10 something right and that's past the first buffer how do we know that at runtime there must be a runtime because the data end could be different right yeah that's what I'm saying yeah I'm not sure if I'm explaining it well enough safety and all that stuff and like very fire basically trust those decisions so like if you get direct access to data then there must be some helper that will do all the run checks and returning you now or not now and then in the BPF code you will need to do now check after that and if it's not now then you know like very fire trust and like it knows that this is valid direct data pointer with some amount of bytes right so if you if you had an end pointer as your API then you wouldn't need runtime checks right because the the the verifier could statically verify that your offsets are within that range right but in practice data and data Anna so hard to work with that like this is sort of like alternative to that maybe it was like extra runtime check but might be worse in a lot of cases anyway okay but I mean like multi-buff is like extension of all this this is like even without multi-buff this is like yeah yeah we're useful interface to something that we don't know that the size of up front right and like was all the safety all the stuff just like put into the runtime yeah I agree I think we're talking about the I also think like for the for the SKB case what would probably useful when you don't need to go in linear ICS KB if you don't have enough data and the linear thing right we probably would need some read-only pointer where you can then still parse it if you don't plan or intend to modify the packet but then you can use that as an optimization as well yeah so like the size right yeah yeah so in the size we also keep track of like whether it's read only or not so yeah I think we'll be able to support that as well like very fire once through this refactoring where we have like a sort of generalized specifiers of like what pointers are like what right arguments are and then like we have this read only not read only like extra modifier so like it's possible to use it here and like have some helpers that would only accept in pointers that I read only like you have guaranteed that you're not going to modify whatever you are not supposed to and very fire will enforce all that so yeah well for helpers it doesn't is yes but like well it's not an annotation but like we do keep track of like where the din pointer is in the map right so we don't need tags yet for that but we do by name right awesome stuff super excited I think the the one tricky bit but that's not a problem of upstream is how to use it for old kernels versus new kernels without having to rewrite too much but core have few implementations between a program pick one dynamically I don't I don't have any solution for you here yeah it's fine all good all right any more comments questions feedback thanks thank you