 Hi, I'm Irka, I work for Isovalent, this presentation, or hopefully later discussion is about how we currently resolve functions in the in the BTF data. Timo Baker reported a problem that if you actually want to attach a program to the function through the trumpline, and you have multiple functions with the same name, which will happen if you have static functions with the same name defined all across some objects in the kernel, they will at the end will be linked together in the VM Linux object. So if you have a program that you want to attach to one of these specific static functions, you can actually never be sure that there's the function that you are attaching to because we are actually looking for the functions in the BTF just by the name. So if you have multiple static functions with the same name, you basically have no idea where your program will be attached to. The problem is that we are actually, we have normally like kernels full of global functions which name is enough. When it comes to the static functions with the same name, we have this problem. So LibBPF has used this function, BTF find by name kind of where we basically say that we want to look up the function by the name and the first match on the name will give us the BTF ID. Then for the trumpline, we basically need to load the program with the BTF ID and the verifier has this sort of logic inside. It takes the BTF ID of the function, it gets the name of the function and then using KL sim lookup name to actually get the address of the function. And for static function, that can be multiple addresses of course for the name if there are like multiple instances of that function. And KL sim does the same things as we do in the BTF data, it takes the first match. So these are like two levels that we can get it wrong and we cannot be sure where the probe will actually end up. This is actually not the worst problem. There is also regarding to this, there can happen that you verify program for one function, one set of arguments in the prototype and the program is being attached to another function with different arguments with the different prototype and suddenly the program gets verified, loaded, it can be executed under the different function and that can lead also to crashes. Fortunately this was already fixed. We have a power fix made by Ellen and the fix basically ensures that whenever we have like multiple functions with the same name, they need to share the same prototype. So we won't have this problem that I just described. The problem with actually not being sure where your probe will end up is still there and that's basically what I was hoping to discuss. So when we were discussing this several weeks ago, there were several suggestions on the mailing list so I put them to the proposal like to be some sort of starting point. Maybe we can get from them somewhere. So the basic problem is that we use just the function name. So to fix it and being able to identify the function, we should use the path together with the function name. The suggestion was also to use declaration tag to store that information for the function. Then we need to add some LibBPF support for that so basically being able to find based on the path and the function name, the function that's part of the problem and another is actually being able later in the kernel from BTF ID to get the address that would be specified by path and the function. So I will go like step by step what I was able so far to find out on that. So actually adding the declaration tag to the BTF for the function is a really easy power change. I didn't do any optimization. I just added declaration tag for every function and it was like one megabyte increase in the BTF data from six to seven megabytes in my case. Then we need the LibBPF function that would actually return the BTF ID for the path and function which I guess could be easily done. And the last part of the puzzle is actually when we are in the verifier and we have the BTF ID. We need to get the function and path from the BTF ID which the data is in the BTF data so we should be able to do that. But now we need to get the address and the address at the moment the KL sim allows you to search only by the function name. So we would need to add the functionality, we would need to be able somehow to store the path together with the function name in the KL sims and add interface function that will actually give us the address together with the path. Yeah, I also was exploring that so KL sim at the end of the build it just generates the assembly file with all the symbols and we can actually add like indexes to the paths. It was just like proof of concept that if something like this would be even doable and that might be actually in the case. So that's like one proposal to store the path with the function name. Another might be store directly address of the functions directly in the BTF which brings probably other questions. But yeah, so that's my starting point if you have, no this is just the slide. So for the second proposal I wasn't sure so I was using the declaration tags for the paths and that's basically a string. So I wasn't sure how to store the address in there for the function. So I guess maybe a third option is if you go back to one slide where you say you store the paths here, you could just add a thing where you go from BTF ID directly to address. Like you just order it differently. Like BTF IDs are sequential, you would have to have it for every BTF ID in the kernel so it's not a sparse thing, but it might be less than that. So you mean to store BTF IDs even here? No, so the BTF ID is the index to the array that you build here and then the array contains the addresses for each ID and then of course there's going to be some entries that are going to be zero because they're not actually a function, they don't have an address but that way you could go from BTF ID to address at least for the specific case. So just quick comment about this before I forget. So it was this approach, the obvious downside is then you're storing the paths twice once in BTF and second time in KLC. So it's double. Well, it's like two separated subsystem, right? Like you said, then the same strings twice in the index so that was still bloated. Another thing I wanted to kind of say about the, like putting the path into the decal tag. I think that's a creative or good use. A little bit of a comment is like essentially BTF started out by stripping out the compilation units and now we're adding back in the compilation units so maybe, I think that's fine. I think that's going to be necessary but ultimately we should ask ourselves what if we need these compilation units, not just in functions but on other things like constructs for example or, you know, there's like the ring buffer example. How many ring buffers do we have in the kernel? Like if we go for the tag route on the other things on structs for example, like does it still scale or do we have to come up with a better idea? That's something I would also be curious about. My question is about the path because do we show the path always start from the same source route or because a relative path sometimes will change by different compile units so... I'm not sure. I understand. I mean the path maybe, the same file maybe have a different path, a relative path after compile in the object file. The path. Sorry, I don't get it. Because at the same file for example, if you define a static function in a header file and they include by different companion unit or different source file, then that's where maybe when we compile and generate an object file maybe have a different relative path we call in the object file. So we probably need some way to formalize this to relative to the route of the tree. I did not see that actually happening but yeah, I mean if we can end up with some relative path there should be absolute path. So yeah, I didn't. I was just, this was just like proof of concept that we can do it in the KL sims. I didn't think about any optimization there. Basically my question is if this is the way we want to do it. I guess we need some other way for static functions than looking for them just with the name. We need the path as well. Good question. It feels like we're kind of proposing a way to do namespacing generally in the kernel for symbols, right? If we have to go into KL sims and provide paths, which makes sense. Has this been an issue for like F trace or anything else in the past, you know? I mean obviously it has been but does anybody, do you know if like any solutions have been proposed? Not that I know of. I remember discussing this address in the BTF like a really long time ago. And just recently we had this issue that we need to fix the PAHL form for. And yeah, basically the thing that if you want to actually attach program to static function, which shares the name with all the others, you have no idea where you will attach. Yeah, for F trace, when you do like a function, well if you say enable by function name, you'll actually connect to every single function with that name. So if there's three of them there, it will actually connect to three of them. I have not yet to hear anyone complain about this. Most of the times I guess people don't connect to things that have multiple names, but I have noticed this. One way around it at least is a kind of a trick I have is in the set available filter functions file. You can actually enable by index. So if you actually know which file is or which function you want to attach to, so if you only want to attach to one, you actually can pass the index into it. And that would also, well I mean like what I meant by the index was the fifth function down. So if you know, because it's basically equal to KL sims is there. So it's in the same order of KL sims. So actually it's not. Probably you actually have to find a way to get the address of or trial and error. I don't know. But yeah, that's something that I've thought about before but haven't yet implemented. I think of the address style probably the best because typically if you have a sim function name and appear in many places, people will check the door and try to find the door for information which contain a pass and also the IP, the PC address, Laura and Harper. And in that particular case, declaration tag with the address will be the best match. I would say because in current way even the BTF will encode the pass and the user are not sure how to encode this pass and in their way the relative absolute, I don't know, 100% sure. So it would suggest to store directly the address? Address, yeah. And using the declaration tags? Yes. But it stores the string, right? Declaration, yes, it is a string but that's okay, you can convert. Just copy paste the address. So to elaborate on young Hong's point, KL sims is one way. You have name address, maybe the same name, different address, but you can have name in multiple addresses. And with the static function in lines in multiple places, like Dwarf has all of this information, we can do the same with BTF. If we have here's BTF ID, here's the function like prototypes, here's all the arguments and here's the addresses. And how you encode it with a decal tag or whatever else trick we use, it's implementation detail. Actually, one use case I think, well actually kind of a side use case would be nice to be able to like connect or at least for tracing wise to pick a file and say trace all these functions within this file. So I don't know if there's a way to attach, if there was the easy way to match functions to like where they exist in a file, it would be really nice to say I want to trace everything in this file filter on the file, so that's one thing. Someone's calling me, so. Yeah, I mean that would be sufficient for static functions obviously, right? Yeah, filtered by a compilation unit. So are the addresses available at compile time or can they change later of the functions, I mean? So yeah, they are in Dwarf. There might be, there might be maybe a problem when the kernel can be actually like relocated to different address, but. What was your question? Wait, so are the addresses the same, you know, at compile time? No, it's relocated. Most functions are kernels relocatable, so the address could be any, like when you, each boot could be a different address. The function could be at a different IP address at every single boot. So then how generating it at compile time would work? Yeah, we would need to. So I think the proposal is to like to record address, like relative file of sets from linking time, but like I think there are multiple stages, right? Like once we compile more or less finalized vmlinux.o, then like we go through like a few steps where like we re-link it, like we add like KOL sims, then like we re-link again, like would that potentially change some of the addresses? Or no? The address will be correct in final binary. Because the relocation is not like, whatever you place, change the, oh yeah. Oh, okay. I need to double check that part. And whether the final or not. It seems like Yonkong's proposal is actually to emit relocation into btf data so that linker actually, if it adjusts the offset, will update the btf data. Except if you do it through decal tag, then it's a string, and then linker probably cannot update just a string with like kax representation, yeah. Yeah, I mean this isn't really like a scientific argument, but it feels like this is a namespacing problem, right? Like it feels like it's something that should be solved with strings and with paths to me. Like I mean, is there some way that you could just, yeah, I mean like have a, figure out an offset of file name to function name and just call Kail. Like structure the K probes so that you know you include the file name somehow and call into Kail sims and add the path lookup function that you mentioned. I mean it feels like, yeah. It just feels like a namespace problem to me, but yeah. There are two probes, right, that you described. One is in user space be able to say this is kind of the function that I'm talking about and get a btf id for it and then in kernel space, take the btf id and get metadata for it. And I think it wasn't clear to me which side of, which problem you're specifically addressing. Like the first one, which is like user space, I can figure out which, you know, function is this or in the ground space. I'll have to think about it. So I think if you can use the path name everywhere, you don't need the id, right? Because user space can pass the path name and then kernel can index via the path name, so you don't need the btf id. But there are many APIs that already take btf id, right? So it would be nice if we could just keep using those for, you know, specifying every place, every touch, et cetera. All of these go through btf id already. They do need to change the interface of the btf, btf programs. I think in terms of like making this easy to upgrade to is like better to say in the kernel you can go from btf id to additional metadata and then in user space you get additional stuff to figure out what the id is. I would online has a comment about module. I would, do you want to speed it up? So there's an online comment about the module, low and unknown. I think there's a comment about what that may happen. And there's a patch set up there. Do you aware of the patch set about module, low and unknown thing? No. Are you talking about the patch? There's a patch set to add like a path where unique KL sims are already out there. Have you seen that? I didn't say anything. There's been talk about a way of making KL sims different for functions and adding paths and such like that. There is a patch set out there. Was it a build? I'll do a build id. It was something different. It was actually a few months ago. I think it ended up as a KL sim iterator. I think what you're referring to. Let me go check my folder. I think I just want to make sure I understand the thing. One of the problems is we want the kernel to have this information so that it could differentiate the user space in the kernel to have a synchronization to know which function it's talking about. So some sort of tagging or something. And the problem is if we put like a path name that also could bloat the kernel because all that's going to be in the memory. I think that's the fear about putting this function belongs to these files. Unless we could compress it somehow. Yeah, for me, I mean, storing the path yet it has the potential of bloating the kernel size. But another thing is, so if you store the address, actually for the function, we will have it also in the kernel BTF. So the verifier could take it from there. That would solve the thing. But the rel equation might be the problem. So if we include out of tree module in the discussion, like even the path plus function name is not enough to avoid all the collision, right? You can have two modules have like core.z type show. That might be right. So we need some other sort of ID. Yeah, I mean, this really feels like a general problem for the kernel, right? Like wouldn't livepatch also potentially run into this? Like you can have within a module even multiple instances of a function that you want to livepatch. Yeah, I think there's two ways to connect to a function. There's one through IP address and one through function name. I think there's something that probably verifies that this is I probably convert it to an IP address somehow, I'm assuming that I might do that. Because if it looks at KLSIMS, finds the address in KLSIMS I could then map to... Yeah, sorry if I'm misunderstanding you, but for livepatch like there's a field which when you insert the livepatch module is just an old name and it's just a string of the function. Yeah, so it's I think it's like the kernel has just always operated under a global namespace assumption. So and it's I think people just plug their ears and pretend it wasn't a problem. I wanted to come back to a song's point about kernel modules. The way that a lot of the APIs now work is that you actually like giving the BTF type ID is not enough. You give the kind of the a handle to a BTF object that's kind of the blob. So if you want to attach to a a function that lives in a kernel module the way you do it is you find the kernel module you get a handle for that kernel module basically through the BTF syscall and then in that module you find the function or the type that you're interested in and then you pass both to the syscall. So you have like two 32-bit IDs and then the first identifies the module and like the convention set zero is basically the kernel and then there's a type ID that says in this specific thing like it's either the kernel or the module this is the type that I'm interested in. If we get all of the APIs onto that and we also had a way of saying like oh this is actually in the VM Linux this is the path that the function was then we could make it unambiguous about which like is this in the kernel is this in the module is this the function I'm interested in etc. Yeah the patch set is k k all mod sims right. Have you remember that one? I think that it eventually end up as k all sim BTF iterator I think they wanted to add like k mod all sim file to the proc because they wanted all the other information but they can do it with the BTF program iterator. Did that get in at all? That's merged. That's merged? k all sim iterator we have but that was actually the answer that sounds like it was to solve this problem you know reliable symbol address lookup that's actually the name of this patch set k all sims reliable symbol address lookup with proc k all mod sims Yeah No it doesn't look like it did no it didn't go to the BTF mailing list Yeah I'll just forward I'll send the link which won't let me just reply to this it was originally set December 5th of last year I'll send that to the list right now Thanks All right Did you have a last comment or do you still have slides? No Thank you very much Now it's a coffee break until half past