 this works? Yeah. Is it too small? Oh, well, yeah, but that's just the heathers. I mean it's, okay, so we wanted to introduce this topic about compiled BPF, which is what, just by chance, a big part of Alexei intervention was actually about that too, right? All those problems that they are having, you know, with Clang, in this case, you know, compiling all that stuff. And, well, the first thing I wanted to say is that compiling BPF to BPF from high level languages from C in this case is difficult. It's not easy. It's not easy. And part of the reasons, well, I hope that the discussion points that we have for today are going to exemplify that. It's not easy. It's not easy. It's difficult. It's a difficult thing to do. But I think that innovation, it requires strong foundations, because you may be having all fun, you know, like innovating all around. But if you don't have strong foundations, everything is going to collapse sooner or later. And when it comes to not just the instruction set, but all those things, like you mentioned it, BPFC, right? I think it's the first time I see that written down somewhere. But probably it will not be the last one. Anyway, so we wanted to go through some very particular topics. I'm sorry, it's going to be a little bit boring, but those are things that we will really need to clarify, because we are sort of stuck, you know, in the work in the neutral chain support in GCCMB noodles in those different topics. So we need to discuss about them. So let's start. So the first one is one problem, one situation that we have with the core buildings, with a lot of them, not all of them, but many of them. Well, it happens that all the core buildings, but one, they take arbitrary expressions and the first argument, like this is, you know, like preserving and value, preserve type info, field info and so on. And this works, in particular with Clang, because Clang, when it handles those buildings, and this is internal, internally in the compiler, it needs to be enabled to extract the type of the expression that you are passing from the IR that it's getting, you know, internally it's handling the compiler. In the case of GCC, this will be the tree representation in LLVM, I guess, is some sort of IR, which is tree like, I guess, too. In any case, in any case, you cannot use those buildings directly, because if you use the building directly, and for example, you say, building preserving and value, and then you pass, you know, some minimum value, it's not going to work. You have to be sure to pass a very particular C expression that results in a tree, internal representation that the Clang implementation happens to know how to deal with. And that's why, in BPF core redo dates, you have macros like this one, that they make sure to pass expressions which are carefully crafted, so they work with this particular version of Clang, and it's not warranted, that it's going to work forever, as Alex said, mentioned, for example, new optimization in the parser. Clang is going to do constant folding in the parser, like GCC does, that's one of the problems we have. This stops working. So, we identified three different magic expressions that are used by this kernel header to make sure that that building is invoked with something that the compiler can actually understand, which are those three. So, we have two problems in GCC. First, is that indeed GCC does constant folding at parsing time, and it happens that none of those magic expressions work for us, because at the time that the backend is able to handle the built-in, the type information associated with the enumerated value is lost. That's one of the problems, which is particular to us. And the second is that this is fragile, this can break any time. This can break any time. Clang, oh, we are going to do it like GCC does. Everything stops working, which is not very good for using friendliness and so on, right? So, those are the two different problems we wanted to discuss a little bit with you. The first one is, well, we are actually working now in GCC, so we can inhibit the constant folding in the parser. And that's another interesting thing that links with what you were saying before. I think we are the only backend in GCC that now we are going to have to reach up to the parser, right? Which is very uncommon, right? But we will have to, because again, yeah, BPFC, right? It will be in this case, maybe it requires to not do some optimizations, even at the parse level, which is the top level one, right? Anyway, so the first question I have for you. We are now trying to inhibit this in GCC, but in any case, we are going to have to come with another three or maybe four magical expressions that we will have to, we will need to use, and they may be different to the Clang ones. So my first question for you is, will you be okay to have conditional if-depths in BPFC core read for GCC? That's probably a question to Jon Hong, but we can change Clang, right? So we can change this type. If you think the passing type into this preserve will work, then why not? Right. That would be the proper way of fixing this. I mean, this is in the meanwhile, because we are, we really want GCC to run the BPFC self-test. It's about time, you know, and it's very frustrating for us because we've been working like for years on this, but okay, yeah, what would be the proper way? It seems to us, we may be wrong, but the whole difficulty here is that what you want to pass to the built-in is the type, well, also the expression, but if you could pass the type as an extra argument, but you cannot do that in C, right? Because in C, you cannot pass types as arguments, you cannot have types. So what to do? Cupertino, who works for us with us in this in Oracle as well, he was thinking about maybe we could use BTF type identifiers to identify the type, because you can indeed pass a number right to the built-in. What do you think about that? It is a half-cooked idea. I think the old VM, again, please correct me, the ID is being set only at the very last part, so... We have the same problem. In the front-end, the ID is unknown, right? Yeah. Anyone have any idea of how could we pass somehow a type to a built-in? Doesn't C have like this generic macro where you can actually pass the type, so how does that work? So you have like generic and then you match on type, and then you like pass different, use different expressions, different values depending on what type of the variable is and stuff like this. I think it's even used in kernel, underscore generic with capital G. I don't know about that, but those macros are polymorphic anyway already. As the first argument, you don't pass... It doesn't expect any particular type, yeah. But okay, what is this? My kind of point slash question is like, it seems like C has some way to pass type, so we should look into like how is this called? And underscore generic was capital G. I think that's how it's called, and it's used in some of the Linux headers. I'm just sort of a BPF user, not an internal designer, but listening to this, it occurs to me that sounds like the front end parser is okay. You're having major troubles with the IR and the optimization there, and then you're having troubles with the back end because it's unusual, and then you're having troubles with the overall design because you want to reach back from the back end all the way up to the parser. So tell me again why you want to use LLVM and not just do your own thing instead? It sounds like it's a bit of a force fit. That's an outsider point of view. You mean that why compiling C to BPF instead of writing your own BPF? I'm not questioning what BPF needs. I'm questioning how you're doing it. You're using LLVM to get there, right? Steal the parts you need from LLVM and move on, fork it, whatever. It doesn't sound like it fits at all, and it doesn't sound like the community wants you to make the changes that you need. That's, again, that's my first snapshot. I cannot talk for LLVM. I don't work with LLVM nor Clang. Actually avoid it as much as I can. So I am, you know, on the GCC side, but I would not like to rewrite a C compiler for this. I hope it doesn't have to go that way. Well, I guess when I say right, I'm assuming you'd stand on top of prior art, so, you know, fork. Yeah, but also don't forget that now it's not just Clang, it's also Rust C, and also in GCC we have a Rust front-end, and that's one of the topics we have for today. Okay, well, thanks for this generic thing, because if we can dispatch on types, maybe we can fix this. Yeah, but just in general, like whatever the underlying implementation of those macros that are exposed from BPF Coreread header, like we should preserve the behavior, right? So like right now macros is like allowed to accept either like the type, so you do like struct blah, blah, blah, or like you pass the field and then we extract type from that. It is fragile, but, you know, I understand the struggle because we have the same problem, right? I think that now in Clang you recognize some shapes of trees, right? Some shapes of expressions, if it's something that it doesn't know how to handle it, that works out, which is the user frustration, right? Like if I write this expression this way it works, if I don't, it doesn't work, right? Yeah, so if we can't pass a type, because there's no construct, I can see why don't we pass in a string and come up with our own little format that both compilers would have to understand that says this is the type we want to look at. If you could somehow come with a mapping from strings to particular C types that are being defined in the compilation unit, I guess, then yes, yeah, that would be another way of doing it. C++ has this type ID, would that help? C++, yeah, yeah, yeah, but we are now trying to get CBPF. C++ BPF is something that is going to blow, you know, this whole place here. Maybe some similar mechanism. Poof, yeah, well, yeah, that would be, you know, like cherry picking C++ features. I don't know, would you like that? Okay, well, I hope we will have time like those days to discuss about all of this, right? I don't know, it's my first time in this conference, but I hope so, because, okay, so let's move forward. I'm glad that we sort of agree. So can we have those conditional compilation things in the header for the time being? Okay, thanks. Good, so, yeah, yeah, that we will look at. Yeah, thank you. We can pass the sort of passing types that will be great. So let's go to the to the quarrel in parts. So, yeah, so I don't know if you have noticed, but, you know, I mean, the assembler language that you people are using is very weird, right? And that, what they wanted to do here today is that, I mean, I don't want trouble. If I find trouble, fine, we go to trouble, but I don't want trouble, right? But these sort of things, they have consequences. And in this case, it's pain, right? A lot of pain and money, right? But okay, let's see. So right now there are two sort of main BPF assembler dialects, so to say, right? I think there is more, but the ones that actually are relevant somehow are those ones. There is the main one, which is, we call it so to see, because it sort of looks like C statements, right? Originally BPF verifier format, I found that sentence in your first patch to Clang, because this comes, I think, from the format that the verifier used to to damp, right, to print the bytecode. And then we have like a normal boring, yeah, it's boring, totally lacking all sort of innovations, you know, assembler-like dialect, which is the one that we support in GCC, which it was not gratuitous, I mean, because we got it, most of it from the UBPF implementation, which was at the time we started the port to BPF in GCC, UBPF was already around, and we use more or less the same, is the same syntax, right? So in that table you see the Clang, Rust-C, LLVM for both assembler and disassembler, it uses the pseudo-C dialect, so it can parse it, and in the disassembler you get the dumps like that. The compiler column here is what the compiler generates, but in case of Clang and LLVM, this would be IR, not a textual representation of the assembly language or the assembler program, because as you know, LLVM, they got all the abstractions wrong, right? So this is IR. Now, in GCC Binootiles, we have support in the assembler for both syntaxes, and actually you can mix them in the same source file, so you can write one line, one instruction in one format, the next one in the other one. In the disassembler, we at the moment we only support the normal assembler-like dialect, and then in the compiler, I'm working on that right now, we are working to make a GCC, if you pass an option to generate the pseudo-C syntax as well. This is necessary for inline assembly, which I did not realize until very recently, because I'm not very smart. But then, yeah, I realized that the encoding of the registers in both dialysis different, so if you're using line assembly and then you have an input argument and it's a register, then okay, so and here it is, here is we in our best faith, you know, and in the best common interest, in our opinion, humble perspective, we recommend you to ditch the pseudo-C syntax. I know you like it, right? But this is not about personal likings, trust me. So we recommend to progressively, to progressively, you know, go from one to the other. Why? First, this is expensive. We literally, in my company, my team, we had, you know, someone, you know, implementing, you know, the pseudo-C syntax. When we did the binutils port to BPF, and you know, this is important because you have to convene your management, right, to do this kind of projects. And they look at the money. What can I say? So, and the time spent on things. So one of the things we did was to use Sijian, which is something that some other ports in binutils do, to generate our obcodes and everything. Sijian also provides a parser for your assembly language, surprise. It does not work with this sort of syntax, right? So we had to write a parser on the side. That was expensive. We managed to convince our management to actually pay for this, but maybe next time we won't be able to do so, right? The second point, having a syntax like this for assembly languages is problematic. It's problematic because there is a lot of existing infrastructure, like assemblers, supporting several targets, disassemblers, supporting several targets, things like Sijian, which is the CPU, you provide the CPU description, it generates a parser for you. Latech, right? In latech you have templates when you write your papers or your presentations for assembly language. They don't work with this neither, right? Editors, and I don't, this is very important. You know, I hate it when people diminish the importance of this. It is very important if someone is going to write a BPF assembler for that person to be able to use whatever mode her editor has for editing assembler programs, right? They don't work with this syntax neither. IDEs, same thing, like Eclipse, you know, all that stuff, and so on. So it's problematic. It may be nice, cool, very innovative, may have some advantages, but it is, it is problematic objectively. It's problematic. It has a cost. It comes at the cost. It's also ambiguous, but I have to confess something. I was like, okay, I'm going to look at the syntax of this when we implemented it, and I'm sure I'm going to find a lot of ambiguities. No, it's actually not that, there is only one syntactic ambiguity that actually in Clang you actually handle, which is that since the registers they don't use a prefix and you have this equal signed statement, there is an ambiguity in the syntax with the symbol assignments which are supported by both GAS and the LLVM assembler. So you cannot actually use full equals 10 as a symbol if you are compiling a BPF assembly program. And then at the last point is that it is pervasive, which is that because of the inline assembly, whatever syntax you use, everyone has to support it, right? So it's not something that, oh, you know, in your toolchain you use that one, in my toolchain I use that other one. No. So that's the question that we wanted to bring here. Is it really worth it? That's it. I mean, again, this is not about personal liking of the whatever of the dialect. Actually, I'm not going to tell you if I like it or not because it's irrelevant, right? We really, really recommend you to go and use a normal, conventional, boring, garden, variety assembler language syntax because your future will be way more, well, less, it will be less problematic and mine especially, right? So this is the message we wanted to bring here. Now, I know we will have time to discuss, I guess, right, during those days. But then in the meanwhile, there are a couple of little things which is, for some reason, I don't know why in the BPF assembly parser that you have in LVM, the prefix bitwise not, it's not supported because you have a switch statement there that actually filters before calling to the parse expression from there. I don't know. Well, little thing. And the second is that for some reason, even though in the code I see that you allow open parentheses in the expressions, for some reason the sub expressions are not working. But those are the only two divergences that I have found between our parser and your parser. So if we could get this cleared out, then we are good. Okay. So as a user of the, I guess, maybe I'm back interpreting this a little bit, but the original reason for the Zulu assembly was to make the verify more legible, I guess. And this was at a time when there wasn't a way to add debug info to BPF, I imagine. Like, right now you can add, like, this is the line of C that this originated from, and then now you can kind of, it's easier to understand what the output is. So maybe nowadays we actually need the Zulu C dialect less because we have better debug information that gives us line info. That's number one. Number two is, I've been bitten by this by, there's a website called Godbolt.org where you can kind of, yeah, the compiler explorer, you can punch in your code and composite BPF. And it doesn't work properly for, like, it doesn't get the coloring, et cetera, right? Because, like, BPF has weird syntax basically. And I open a ticket, it's like, oh, could you fix the coloring for this? And he's like, oh, yeah, that's kind of unfortunate. It seems difficult. I don't know if that's going to happen. So kind of experience report that, yeah, you're right. I think this actually does really cause pain. My question is, if this, if the assembly format became the, like, what you, something like you described, would the output of the verifier and the kernel have to change, will we keep it as it is? Or, like, what would be the way forward? Is it a requirement of you to be able to feed, to feed the output of the verifier to an assembler? Then you could get it in, like, in aquarellas, right, with colors or whatever, right, for what you care. Like, is there a reason, is there some reason why those, the format should be the same? Probably ease of use, I would say, that people are used to that. Here, yeah, yeah, because it's been out in the world for so long. I think consistency matters here, right? So, like, I used to be in UB as well at some point, right? And, like, I learned the BPF assembly just by reading BPF output, BPF verifier output, right? And then I started, like, translated that to, like, writing my own inline assembly from time to time. So I think that's like a super useful property. And I don't know, like, if all of those downsides actually beats a little bit more understandable assembly. Because, like, I work with customers, right? And, like, even with this simpler syntax, they're already afraid of, like, reading BPF assembly. If you go to JGT and, like, then, like, three parameters and, like, you have to remember which one is which, I think it will be, like, a huge, like, step back in terms of ease of use. So, like, this actually goes straight against the, like, goal of, like, simplifying the user experience, especially for new people. Okay, well, I don't want to be nasty. But I don't think your customer should be writing any kind of assembler, honestly. Sure, that would be ideal. I mean, if, if, if, if reading something like jump greater than register blah, it's going to be a problem for them. I don't think they should be writing BPF assembly at all. I mean, yeah, I understand. But what can I say? Yeah, what is easier to read? The one at the top or the one at the bottom? Maybe it's the one at the top. All I'm saying here is that this has consequences. And I will hate for you to not realize the consequences before it's too late. That's, that's all. I mean, before I hand off. Like, what's too late, I guess, here? Well, actually, yeah, that's a good question. Maybe it's not, it's already too late in this sense. Yeah, I mean, I, like, if you're writing a BPF program, you've probably read X86 assembly, like, you can read, you should be able to read jump greater. Like, I hear your point, Andre, but I mean, I don't know. It's, if it like requires you having double the, the amount of support across every tool chain, I don't think it, I disagree that it outweighs it in terms of the readability personally. But it's, I also think it's subjective. So I just looked at what UBPF does right now, and it uses the assembler like dialect almost is a different dialect, which doesn't use the parentheses there. So UBPF isn't using the parentheses in the bottom one, right? So if GCC and UBPF actually slightly diverge, there's, you know, a and a prime or something like that, that gets to my main point. And I'm looking at you, David over here, that compiler expectations topic that we said should be like in the BPF standardization stuff. This whole discussion here probably goes into that work item, whatever to say, what is actually the, this and whether there's one variation or two variation, one is better, right? But the point is, whatever it is, we should agree on it because we see at least three different dialects is my point, right? And so that's a case for standardization to help the ecosystem converge rather than keep diverging. We're flooding the assembler like to sort of see like translation to a separate program help. So the back end don't have to do it, but you can have a separate program that can translate for humans the SME like. And then you strike kind of a, you know, a middle ground. But let me, let me, for example, give you a real example, a particular example of the kind of problems this sort of syntax has. Right now I am modifying GCC to generate this instead of the normal assembler. We are basically, that implies to obfuscate in a very awful way the machine descriptor description that we have. And also GCC, for example, assumes that registered names have prefixes, which by the way is something that assembly languages do is not a capricious. You know, there are reasons for that, for example, so you don't have ambiguity of symbol names with registered names, you know, stuff like that. And the thing is that this makes it very difficult to support you. That's the thing, what I'm saying. So something simple becomes a problem, something that could be done in a couple of days using, you know, like a normal syntax, it translates into into projects of weeks or if not months. And that's, but the question is, is it worth it? That's the thing. That's all the question. I mean, is it really worth it? That's it. So that is why we're suggesting to progressively, because obviously this has to be supported. And actually we are working hard of supporting this because this exists in the recent line assembly. And even if it is changed, even if it was changed, it will be, you know, we need to support this basically forever. But we recommend to to transition to a more conventional one. Yeah. We think that we may be wrong. I mean, maybe this is the best thing, but we don't think so. Okay. I think we better move forward. We can go. Yeah, like one short question with the inline assembly. Is it possible? I mean, that all the remaining tooling, you would only support a sample-like dialect, but for the inline assembly, you would support both? Or does it require that all the other tooling also needs us? Well, it would be. It could be. And actually one of the we will need to make the compiler aware of the syntax that is written there when you use inline assembly, right? Which maybe in LLVM is easier because you have the integrated assembler. But in GCC, GCC takes those templates as opaque things. But maybe, maybe, yeah. I mean, maybe that's kind of the middle ground or I don't know. We could go maybe a step by step. For example, what if you would support using prefixes in the register names optionally for now, you know, only that, for example, will make a word life much easier and other people's life easier. But yeah, as far as, so like I'm a bit puzzled, you already did all of the work, right? Supporting it in assembly and in compiler and you're saying, well, I did all of the work and now you don't want to maintain it or like why? Like I just don't see, since you already pretty much completed the work of supporting this, see the seed that you don't like, what are you proposing? There is only like GCC and LLVM compilers realistically, right? Do you care about whatever, Intel compiler? Like who will do another BPF backend? Like I'm just not seeing like what this other stuff, like editors, yeah, like just do the, like say it's not assembler the way I like write assembler and just say this is C and it highlights everything nicely. You just, you don't have to say it's assembler. You just say it's C and syntax halogen works magically by default. CGN, yeah, I totally understand it's a pain because of the way GCC is written, but it's zero pain for LLVM because the LLVM is written differently. It can be a pain for the third compiler, but I don't see it third compiler. Right. So you don't think that it is things like this that is making your interaction with the client people so difficult at the moment? Definitely not the assembly. It's not, not, not the syntax. I'm not, I'm not mean this particular point, but I mean generally speaking. I mean, you say you have done the work already, right? Well, it's not been easy. So why not making it easy? What do you gain by being difficult for no reason? What? That's what I don't get, honestly. I mean, this is just one aspect of the whole thing. But why? I mean, why? For the assembly languages, they have a structure. I have, I have another question for you. Why GCC still doesn't have an integrated assembler, right? Why GCC is making it difficult? Well, because actually the aspect of the integrated assembler as years pass after LLVM, more and more and more people, including a lot of clang and LLVM hackers, they are coming to realize that an integrated assembler was not such a good idea after all. I mean, there is a discussion which is where to put the abstractions here. Just, just an example. When I was like hacking on BPF early on, the first BPF backend I wrote was for GCC. LLVM was second. So when I wrote for GCC, I didn't use, well, I didn't invent an assembler. I just made GCC to emit binary because, well, not having an integrated assembler in GCC, in my opinion, is kind of silly. Like it makes life difficult. So I just made GCC with hand to emit binary. It was easy and simple. Okay, sure. I mean, I can start using exo kernels too. They are nice. And I mean, why, I mean, right. Yeah. But how is that related to this? Think is, you need, you, you have an assembly language, assembly syntax. It is unconventional. It is problematic, objectively problematic. It is ambiguous. You have syntactic ambiguities. There are things that you cannot do because of this. And now my question is, is it really worth it? Ambiguity we have to fix. But if you point that R1, you cannot distinguish between R1 and foo, well, yeah, just like in C, you cannot distinguish between int and foo. And well, that's why there are keywords. Look, why is this so precious to you? Why, why, I mean, why do you have to have a totally completely different syntax for your assembly language than the rest of the world? Why? Because users matter more than the developers. Okay, fine. Cool. Okay. So I guess that the, the users matter most. Okay. Well, is the use matters? Fine. The users could be using, you know, GCC for months now if it was not because of this. So those users don't matter? I mean, really? Anyway, it's just, we can discuss, but there are other points. So I think given now there's a coffee break until 11, that we probably need to take this to the hallway, but then you also have other topics, right? So I think when, once we come back after the break, we can continue with your other topics. Okay, sure. All right. Thank you.