 Good morning everybody. I'm going to start right away to leave a room for questions. So this talk is about how to make DSP programming a little bit more open source friendly than it used to be. So the plan is that I'm going to self-introduce myself and my company just briefly. Then I'm going to overview the Linux Audio Stack for those who don't know it. Then I'm going to talk about extensor hi-fi extensions, which are DSP extensions, DSP extensions used by the firmware on your laptop, most probably. And then I'm going to talk about how to fit it into Clang infrastructure and LLVM. And specifically, I'm going to talk about vector data types and some implementation challenges, and then a Q&A session. So short introduction. My company is based in Cracopol, and we do low-level stuff, FPGA drivers, network OS porting, and networking. So myself, I'm mostly a former network expert. I say former because right now I'm doing the compilers, but I used to do quite a lot of network programming and network interface drivers. We usually cooperate with big tech companies when they need some support in bringing up their SOCs or other hardware. And this work is actually sponsored by one of them, the guy, the gentleman on the lower left here. So Google is sponsoring this effort, although I don't represent Google here. I represent my own company. This is a little bit of disclaimer. So the Linux now, we let's come to the details. The Linux Audio Stack, this is a simplified picture, so it's got not as much details. I would like to, but it at least fits on one slide. So the Linux Audio Stack is present on most mobile computing platforms like PCs, laptops, and Chromebooks. It's present on both on x86 and RVM platforms. So if somebody has an ARM V8 Chromebook, it's got the same infrastructure inside. So basically, if you want to play a sound, you've got the user space demon that is responsible for it. It's either post audio, if this is a regular, let's say a movie or a song, or if there is something specialized, you've got Jack Audio for low latency processing. And then there's an ISA, ASA library in the user space, which is an interface for ASA's calls. And then in the kernel, there is ASA subsystem and the other subsystems are responsible for driving the whole thing, like DSP platform driver or codec driver. Now, on the SOC or on the PCB board, there is another SOC. It could be a part of the big SOC or another chip, which has a DSP and the audio codec, which transforms digital audio to analog or the other way around and analog microphone to sound to digital, right? And the DSP here is usually extensa architecture. It's responsible mostly for equalization, echo cancellation, microphone boosting, that kind of stuff. So any PC has a specific frequency response, for instance. So your speakers have specific frequency response, which is strictly tied to your platform. And the producer, the vendor needs to know that frequency response. And then the DSP needs to adapt the sound so that it sounds nicely. Otherwise it would be kind of dull or, I don't know, or would lack a base or whatever, right? So that's the responsibility of the DSP. Usually those DSPs are around, have a couple of hundred of megahertz frequency and few megabytes of RAM. So this is that kind of, and a couple of cores. So these are usually multicore. It could have two, three, or four. And on this SOC there is a, usually today you can find a sound open firmer, which is an open source project and you can find it on GitHub. So it used to be custom proprietary firmer, but right now vendors try to converge to sound open firmer because it's easier for them. So this is a fully open source project, but the compiler that compiles this thing is not open source yet. The extensor architecture is kind of unique. So you most probably, if you haven't programmed any DSP you've never met such a thing before. So it's a multicore 32-bit CPU that has a separate data path for scalar instructions and a separate data path for a vector or a floating point or any other extension instructions. The core ISA, so-called core ISA is a scalar instructions with a scalar 32-bit registers and it's got basic arithmetic, jumps, function calls, what have you. Now there are ISA extensions to it and these are mostly Boolean registers, floating points, specialized registers for 32-bit floating points, Hi-Fi, so this is the DSP extension that I'm going to talk about and there are also baseband extensions in some chips. And usually the producer who, the vendor who wants to deploy this DSP tech, they buy a license from Cadence because right now it used to be Tensitica, right now it's Cadence. And the license allows them to produce their own version of a chip and they can kind of point and click or configure the chip and add some features or remove some features if they like. And they can even create their custom instructions that are not present in other architectures. So they've got quite a lot of, so they can tweak it a lot to their liking and that also presents a unique challenge for the implementer later on, for compiler people for instance. So each of the chip has a unique configuration but they are similar to each other. So the Hi-Fi extension is a VLIV extension. So basically the core ISA is a scalar ISA which has either 16-bit or 24-bit instructions but you can have an extension which has 64 bits for an instruction and those big instructions are divided into so-called slots. A slot is a single operation. So you can have three or from one up to five slots in a encoding. There are actually up to 10 encoding formats for the slots I displayed only two for reference. So this is variable encoding. You've got variable instruction sizes and you've got variable number of operation inside an instructions. Most of these extensions are SIMD-oriented. So they use SIMD registers. They use saturation arithmetics which is popular in DSP world. Multiply accumulates. There are fixed point operations. So fixed point numbers. I'm going to talk a little bit about them. Circular buffer support, efficient looping. So extra instructions that make it easier for programmer to make efficient loops. So, you know, those processors, if you go back one slide, they're usually, they don't have any out of order execution engine or any kind of instruction scheduler inside. So your big application processor usually has it. So the programmer that programs an internal ARM CPU that's running on your laptop or in your phone doesn't have to worry that much about instructions scheduling. So which instruction goes first, which second and so forth. Because this is automatically scheduled by the CPU. Those smaller CPUs are power efficient and that means that all of the burden is shifted to programmer actually and compiler. So you can, the closest to this thing would be for a, it would be itanium for instance. Although itanium is a high end, used to be high end server architecture but that's the similar paradigm to itanium. So the registers, the SIMD registers are 64 bits and they can be, they have several data types inside, several flavors. So it could be one 64 bit integer, could be two 32 bits integers, could be floating points like four 16 bit floating points or two 32 bit floating points. There are legacy data types, for instance, 24 bit data types that are zero extended to 32 bits when loaded into the register and fixed points as well. The fixed point number is a number that is divided into a fraction and a integer part similarly to floating point but the floating point contains only a fraction and the exponent. And here you've got the fraction and the integer part. And if you multiply them, the multiplication must cut the lowest bit which is as opposed to regular multiplication which usually cuts the highest bits after the multiplication. So that's the difference between multiplying fractional numbers and integers. So that's why fractional numbers needs a special instruction and a special data type. There are also Boolean registers. These are working like predicates or selectors. I'm going to show you a code snippet with that. There is a floating point register phi although it's optional because some configurations use the SIMD register as a floating point carrier and some of them use a regular floating point register file. There are also kind of another additional weird registers like the register that would help with non-aligned load stores because those DSP cannot do that automatically like your application processors does. And there are let's say also kind of weird thing which is extended precision register for when your arithmetic doesn't fit into 64 bits. I'm not going to talk about that one but just for reference, there are many, many weird things. I mean, unusual to a regular programmer, I would say. Now, how to program that bit? So usually programmer like general purpose programming means that you rely on the compiler to do the job for you, right? So you write your high level so-called C code or Rust or whatever Java and then the compiler or JIT is supposed to select the optimal instructions for you. Now, this is a different world. So it's a programmer's responsibility to choose an instruction and to make it efficient. Now, the compiler's work is to make it optimally encoded. So select the optimal encoding and by optimal, I mean the smallest possible. For instance, if the instruction could be fit into scalar and it makes sense then fit into scalar because it's smaller. If it cannot be fit into scalar, it goes to VLIV and then the compiler needs to select on which slot it goes. Tries to optimize that much. One of the issue of VLIV architectures is that if you have not enough work to do some of those slots are empty are becoming non-operations, right? So you have to stuff as much as possible to those slots but it depends on the algorithm. If it's numerical algorithm, that's usually easy. If it's like sequential algorithm with a lot of branches that usually means that all those slots are empty. So the way you program it is each instruction has a macro in the header, in the C header. There's a macro which represents the instruction. The macro is translated to the built-in function. So each C compiler has a lot of built-in functions. Usually you do them if you do some special stuff like for instance, atomic operations in C compiler are represented as a built-in functions or some kind of weird bits manipulations. And there are quite a lot of built-in functions which are architecture specific. So the built-in function and then gets translated into an instruction in the backend. Now the power of the built-in function is that it works like a poor man's template. So basically it's a template implemented inside the compiler front end. It means that the implementer can do let's say a constant check. You can expect from the built-in function to accept let's say a constant as a parameter and only constant and you can check it in your compiler which is not possible in regular C code or with the regular C function. So that's the kind of a magic that is allowed for a compiler implementer but it's not allowed for a C programmer as opposed to C++, right? And there are around 800 macros for high-five free standard for instance. So there's quite a lot of those instructions. So this is a snippet of the code. This code converts from sign 16 to floating point, converts an array of shorts basically to an array of loads. And as you can see, it's just a regular C code with some macros like this one, which is just loading the 60, loading a SIMD vector of four 16-bit integers into a sample variable. This is the address, the in variable is an address. This is a constant. This is the size of is always a compile time constant. So not a problem. Then there is a float conversion instruction. So it converts from the integer to the float. And then there is a store operation that stores this float value to this address. And again, this is a constant offset. So the problem with that compiler ecosystem is that today the only option for DSP high-flying section is a proprietary compiler supplied by cadence. And it works very well, but it's proprietary. It means that it's got quite restrictive licensing. So basically only the vendors are allowed to compile it this day. And it cannot really keep up with the open source tools. So for instance, I think the most recent that I've seen was based on Clang at 3.8. So there's a older compiler that is not compliant with Clang and the new one is based on Clang, but it's based on 3.8 and they're not willing to open source it right now. There is also GCC and Binyutils, but it supports Core ISA only, so no DSP extensions. The good thing about them is that very good instruction scheduler built into GNU Assembler. And there's Clang, that the Clang status is that the patches of Core ISA are done by Espressive. This is company that producing ESP32 chips, mostly for Bluetooth and Wi-Fi. And it depends on GNU Binyutils to do linking, but anyway, it works on ESP32, but this is kind of like a lower end processor that is not present in sound cards. So my goal was to sponsored by Google Chrome team was to take this scalar instruction patches and then extend them with the HiFi extensions. In order to make it possible for everybody to compile sound open firmware into the binary four and then load it to the sound card. So one of the use cases would be to have a massive continuous integration infrastructure that could test those builds either on hardware or on emulator or whatever, but anyway, in order to test it, you need to compile it, which is not possible about a license and license server, for instance. Usually license even locked to a note or it needs a license server, right? So that's one of the use cases. And anyway, it's an open source project, the sound of the firmware, so it's kind of weird that it cannot be compiled, right? So the technical goals are to reuse the core ISA support and then extend it and rely on GNU and BNU T's for the moment because they do a pretty good job. So the minimal viable product would just reuse them. So the tool chain architecture, if we zoom in usually that each compiler that you can see in LLVM world even in GCC usually split into backend and frontend. So the frontend is supposed to be architecture agnostic, at least in theory. So this is what is written in wise books, you know? And the problem is it is not. And I'll explain it in a moment. So the supposedly architecture agnostic frontend is has a parser, then a semantic checker, then several optimizers and then intermediate representation generator. And in case of LLVM, this is LLVM IR, this is just their own language that specifies their data types and operations. And this is abstract language, so it's architecture agnostic. But it's got its own type systems, which is by surprise, a price similar to the C type system. And then the backend, when it comes to LLVM, it parses this LLVM representation into a directed IC click graph. And then it goes with optimization, legalizing instructions. So it means selecting those instructions that can be implemented in the target machine or converting to other ones if they cannot be implemented in the target machine, selecting the right instructions, target specific optimizations like jumps or something, or call trampolines or whatever. And then machine code generation, right? And after the machine code generation usually goes the linker, assembler and linker, or directly or it could generate directly the machine code in binary form, so we don't need an assembler. In our case, in case of Clang support for HiFi, we're going to need a GNU assembler. So the LLVM bucket would produce a assembly source file here, and then assembly source file would be fed to GNU assembler. It would done encoding and scheduling to VLIV, and then it would go to GNU linker. And as you can see, there are two pieces of Clang frontend, which happen to be target specific, which is vector semantics and intrinsics, intrinsics meaning built-in functions. So the vector data types in C compilers are not part of the ISO standard. So basically if you're writing a vector variable, it's not a C language by the strictest definition. So what the compiler implementers do, but people want to write vectors, right? They want to write SIMD code, not only in assembly, but also in C. So what the compiler creators came out with is the vector extension and the first one GCC. So the GCC has an attribute vector size expression, which works as a type attribute and it extends a type to become a vector. So for instance, this definition makes that you've got a short variable, but this vector size is eight bytes. So it means that there are four shorts. So this is four by, so this is a vector consisting of four items. Each of them is a 16 bit. So this one, for instance, vector size eight, but it's integer. So it means that there are two integers, each one is four bytes in this particular architecture and so on. So you create such, you create type depths like this and then you've got your vectors. Now, so this AA underscore is the naming convention used by extensa. So you've got a lot of those. So AA underscore in 16 by X4 means that this is a 16 bit integer replicated four times in a SIMD vector and this most likely backed by a SIMD register. Now there are a couple of, there are different vector types, but it could be because you could have like F34, F32 vector, which is not a floating point, but a fixed point vector. So we can have two kinds of integer, either regular or fractional. And it cannot be expressed in any LLVM or GNU or whatever or C type system. So basically C programmers and LLVM designer didn't envision fractional numbers as a part of the tool chain. So these are completely opaque to compiler. The compiler doesn't have slightest idea about fractional numbers. So there are some non-standard operations that extensa compiler, the proprietary extensa compiler can do which the standard compliant compiler cannot do. For instance, casting a vector to an integer. It's not allowed in GCC, not allowed in Clang today. At least hasn't been until I got into it. So for instance, on the upper right hand, you've got a casting operation. So you've got 16 by four vector and you want to cast it to an integer. The way it's done is you take just the lowest element of the first element of the vector or whatever it is and just copy it to the regular integer and then you discard the rest. Splicing is the opposite operation when you have one integer and you want to replicate it to the whole vector. Again, not supported by the standard C vector extension. And what it does is it produces, this is the first introduction on LLVM, LLVM intermediate representation. It produces actually four LLVM instruction. So first of them is extract element. So basically it takes, accepts the vector and then it says, okay, take me the vector element number zero and just copy it to the destination. Now there's a regular addition. So I plus plus compiles to this one. And now there are two instructions that implement splicing. This is because LLVM instruction set doesn't envision such a thing as splicing. They've got insert element which can build a vector or insert an element into a vector or they can, and they have shuffle vector instruction which can shuffle, take two vectors and just exchange the items between one of those two. And so without going into too much details those two instructions basically implement splice. And you end up, if you compile it further you end up with again three machine code instructions. So as you can imagine the backend optimizer is responsible for recombining those two instructions into single machine instruction. So the one, the first instruction is takes, this is a source register. So this is a SIMD register. The SIMD registers are called AAD and the number. Usually there are 16 of them. And there are 16 regular registers. So this is copying one of the items from the item number zero from the vector register to a scalar register. Scalar registers are A1, A2 and so on, H0 and so on. Then there is an integer addition and then there is a opposite operation, a splicing operation which takes the scalar registers and replicates its contents into a vector register this in this case four times because it's 16 bits. And as I mentioned before, unfortunately, all this magic must be done in the front end because this is an intermediate representation. So it means that it's already done, right? So whoever is doing this, it must be in the front end. And by the way, in LLVM there are like four or five types of vector extensions. So there is a regular GNU vector extension. There's a neon vector extension which is arm specific. There's an altivec vector extension which is power specific and there are others. So it means, so I took the regular as a basis for the implementation, but there are others. And so it's not like, it's not a very generic code. It's not a very generic way because it's not a standard. So everybody can devise its own vector extension if they like. And now Boolean data types is even weirder than the one before. So there are two Booleans. One is Boole and this is just a regular Boolean which is usually implemented as 8-bit integer and it sits into a regular register file. It means it signs extended to 32 bits. So as opposed to Intel, most other architectures don't have 32-bit architectures don't have 8-bit registers. So each time you use a variable which is smaller than 32 bits, it's sign extended to 32 bits. This is what arm is doing and this is what power is doing and risk five. The only exception is Intel which has these 8-bit smaller registers. With it comes to application process obviously. And it's no different in DSP world so they have 32-bit registers and they don't bother with implementing smaller ones. But there's another Boolean which is called XTBOOL and it is supposed to be backed by a Boolean register file which is one bit or it could be two-bit vector or four-bit vector. I mean vector consisting of four bits. So LLVM can create four-bit vector with consisting of one bit elements or two-bit vector or I1 which is Boolean in LLVM language. But the C language doesn't know anything about it. So there you cannot have a one-bit variable in C language. There is an extension proposal in Clang. But it's gonna be just Clang-specific extension. Now the Boolean call ABI says that you have to pass it and return it if you call a function or you turn a value from a function which is Boolean, you must keep it into general purpose register. So you cannot use Boolean register to pass the argument. And it means that each time you end the function and you begin the function, you need to begin with zero extending this to 32 bits and then truncating it back to, oh yeah. So zero, at the entry, you need to truncate from 32 bits to one bit and at the exit, if you want to return it, you have to again convert it from one bit to 32 bits. And guess what? There is no single instruction in ISA that can do one of those operations. So actually truncating can be done by one instruction, but zero extended cannot be done. There is no instruction like that. It takes like three to four instructions to zero extended. So it's highly inefficient. And so the way the Boolean instructions are done are used is like this. For instance, this is an operation that compares a vector of two integers. Each of them is 32 bits. You compare two integers with two integers, right? So A variable is one vector, B variable second vector. And you end up with two bits of result. Each bit says if the bit is one, it means the lower, let's say vector indices number zero are equal and the bit number two is accordingly for the vector indices number one. So you've got two bit vector. Now you want to have a conditional move. So basically it's like a select operation, like a conditional operator in C. So it says take it as a predicate and if it's one, then move from B to A. If it's zero, don't move it. So leave the A in act. And it's parallel for each of the vector element. And it is again, now it is translated to magic. So this is LFVM magic. So each time you've got a built-in function, well almost each time you've got a machine specific built-in function. It is translated to LLVM built-in function and then later, later in the instruction selection phases, you then change it into your machine instruction. So this way you can create custom instructions in C compiler that are kind of opaque for the compiler for some parts, but end up being translated into real machine instructions. And this is the case. So this function call is translated into single instruction and this function call is translated also into single instruction, which are looking similarly to those macros actually. So what's more about it? Well, so anyway, they must use Boolean register. So this instruction does not accept any other opcode. It must be Boolean register here and Boolean register here in the second one as a source. So it means that you must force your compiler to use Boolean registers. So you end up having two Boolean data types, one regular and the second one a special one. And again, LLVM cannot distinguish two Boolean I1, I1 bit types. There are no type attributes that will say, okay, it sounds like a real Boolean, but it's not. So we have to select pick one or either you know, completely change the LLVM so it fits to extensa or just pick one which is kind of more common or more usable. And it ends up being the real Boolean register. So in my implementation, the Boolean is backed by Boolean register because it's more generic because you can use those special instructions. Now there are also other, there's other weird stuff here. So there are hidden side effects. So you've got the operation like this and it's in the manual to the extensa and you've got output variable here passed by pointer. But we also have an in-out variable here passed by double pointer here. I mean, that does the explanation which means that you send the input variable and then it's modified inside, right? And then there's an immediate argument. So it means it's constant. But the macro, macro is not using any pointer casting here. So this is not a pointer to integer. This is just an integer. This is also a pointer, not a double pointer, not a pointer to a pointer, but just a regular pointer. And it has to be magically translated into a built-in function which takes those pointers. And then it's again translated into an LLVM built-in function. But the LLVM built-in function doesn't use pointers here but it's just producing the new result and cast and writes the new result to the pointer variables. So this is because the real instruction cannot be represented in LLVM as just a, with a side effect. So basically instructions, instruction definitions in LLVM cannot have side effects. It means that it cannot have, if you have, it cannot have a register which is an input and an output at the same time. If you have such a situation, you have to translate it into a dummy, a kind of virtual operand, one is for input, one is for output. And you have to say to LLVM that these two operands are in fact, one operand in machine code but they are used in a different manner. So that's why we have to do this weird kind of translation. And so this is similar to a C++ reference type or a Pascal-Var argument, but in C language there's no equivalent. And in LLVM also. So it breaks the standard. That's, so using without, if I didn't use this pointer, I would break the C standard. And this is what the extensor compiler is doing. If we go back to this place, so the extensor compiler doesn't modify, it doesn't use the pointer producing operator here or here, it would just assume to be done inside. But this is kind of cheating because it breaks the standard. So the way to implement it and it's got consequences, because let's say somebody is a C programmer that does a casting operator on your address, which is allowed in C. If you have a casting operator, you've got an R-value. R-value by definition is a kind of temporary value that cannot have a memory address. So basically, if you have something like this, you cannot do a pointer casting on it. So then it gets translated by the macro preprocessor to this one and this one produces in the compiler error. So that's why having a hidden side effect is a bad thing. And there's no escape route. So the only thing to do for the open source implementation is to just not allow it and ask sound operator engineers to just not use such constructs. That's it. And the way to overcome this is of course to produce a new variable after the casting and because you've got the new variable, you can take the address of the new variable. So in order to summarize it, there are a couple of challenges for such unusual architecture. So we have to upstream a new vector type at some point because the high-five vector is not the same as GCC or we have to just abandon all the extensions that are not standard like casting and splicing and just ask the developers to stop using it. So either this or that. Currently, it's a hack. So basically, if I see the architecture is extensa, I just use a different semantics. But when I'm going to upstream it, I cannot use that hack. So the detailed explanation of the vector extension is here on this address. I highly recommend it to everybody who wants to use vectors in C. Fractional types, another challenge because they cannot be represented in the compiler infrastructure. So they have to be completely opaque types and then no implementer can rely on the fact that if you add two fractional types, you would have a fractional result, you want. Brilliant types, again, weird beast. And there are departures from C standard like those side effects. So the general solution would be that, so by the way, this project is kind of very close to upstreaming, so it's not upstreamed yet because of those dilemmas among other things. But the general solution would be to just divide a new open source extental coding standard that is compliant with C without this most extraordinary extensions that cause problems for the regular compiler. And apply the new standard for the code, hopefully so that it's compatible with the extensive proprietary compiler as well. And perhaps in the future, those two compilers would kind of converge. And the key takeaway from this presentation is that standards do make our life really easier. So you heard it yesterday on a keynote and that's really true even for such a specific task like this one. So all of the pain that I had was just because the vectors was not part of the standard. And people, because it was not part of the standard, everybody did the vectors their own way, right? ARM people did their own way, extensor people did their own way and the GCC did their own way. And side effects are terrible. And that's the second thing. And especially, like surprising side effects that were not supposed to be there when the language says there are no side effects, you should not introduce any side effects, right? No, don't break the standard. Now, another takeaway is that it's not the ideal word. So Cfronter has plenty of architecture specific spots which has to be addressed anyway. Again, because it's not part of the standard, if it was part of the standard, it would be architecture agnostic. Except for built-in functions, it would still be architecture specific because there's no way around it. If you want to use a specific instruction that is just on your architecture, you have to use a built-in function in C or in line assembly, but both ways are not portable, right? So either this or that. And actually built-in function is better than in line assembly because the compiler implementer can reason about it, knows what's the operation and it can be better optimized. Now in line assembly it's completely opaque for the optimizer, so it cannot do anything with it. Almost. Like anyway, LLVM type system is flexible but not flexible enough for an extensor, not good enough. So I'm not proposing to extend it because it doesn't make sense for a single architecture. I'm just saying that it's either, even though it's kind of generic, in fact it's really tied to the C type system, it blends very well with C, with C++, with Rust, but if you have something unusual like fractional number, it doesn't fit, right? So and the bad thing about it is that opaque values that are not, cannot be reasoned about inside the compiler cannot be optimized by the compiler. So you could leave the opaque type like a type that has different bits of presentation, but it can be optimized. So the last takeaway is that ISO design should in fact take the target language into consideration and this is, if you think it is something controversial, think about what health risk done in the 80s. So basically the risk architecture, one of the key design goals, if you see in the documents and in the memoirs of those people, the key design goal was to make it easy for the C programmer to generate, C compiler to generate code for it. So it actually makes a lot of sense to think about C programming as a kind of common language for system programming and create an architecture which is C language friendly. And ABI as well. So if you design your own ISO or if somebody else in your company is designing your own ISO, please think about those things. Like for instance, zero extending, which has no instruction in extensa, right? And it is obvious thing for any C compiler implementer that you need to zero extend your variables, sometimes to 32 bits. All right, so that would be the end of my part and now I can accept, I guess few questions because we moved to the other presentation. Yes. Okay, so that part wasn't explained in details but each extensa, but usually they are chip specific, but most of the real world chips that you find in your laptops or your laptops and Chromebooks and so et cetera are pretty similar. So basically there can be anything but in fact, they're usually quite common and if they use the extension, they use like the full part of the extension. So if they have those vectors registers, they have all those vectors instructions, the same ones and they have the same encodings. Now, if somebody is newly coming to the market and producing new version of the next of extensa DSP and does some weird thing, then it's gonna be incompatible. That's the problem. So we cannot like, so right now it's done this way. There's an ISA definition which is produced in a C file and the ISA definition is consumed by binutils and the binutils can reason about the ISA and there are all instruction encodings on all operative definitions. It's called overlay. If you Google extensa overlay, it will show it to you. So this way, each variant of the GCC compiler right now can or actually binutils maybe compiled with a specific variant of the extensa overlay which is a strictly a processor definition and ISA definition actually and then it would compile for this specific processor. But Clang has a different assumption that you have multi-architecture front-end which is supposed to produce code for any processor whatsoever. So what I'm going to do is I'm going to list a specific CPU variant in Clang and which is tied basically to a specific DSP variant which is on your laptop. So for instance, you've got a Canon Lake CPU. So you're going to have a Canon Lake DSP. If you have AMD Renoir or Mediatek CPU RV8, you're going to have Mediatek DSP. So this way I can escape this problem but it means that any new CPU must be added explicitly to Clang. No, it wasn't. So the GCC had a, in GCC it's upstreamed. So it's like higher quality, I suppose because it's already in the mainline and maintained. And in extensa, it was not yet accepted to the mainline. So that was patches. So I suppose there might be some bugs in there, right? So the code quality might be a little bit lower because it's not yet accepted. So people were not actively testing it, I suppose. At least not as many as when it comes to GCC. Oh yeah, I was expecting that question. That's a good one. So that was a choice of my sponsor basically. So they prefer Clang because they are heavily relying on Clang infrastructure for other projects. And if you happen to be on a C++ conference, there's quite a lot of talks about C++ and Clang done, let's say, by a guy named Chandler Karuth. And this guy is a part of ISOC board and he's also a Google employer. So they invest heavily in Clang infrastructure and that's why they chose this one. Okay, so I've got, yeah, so let me, can I answer this question offline because I have got a signal from the staff that I should stop right now. So we could talk about it in a moment, all right? Thank you guys.