 Felly, hi allwn. Felly, rydyn ni'n dweud i'r Bwbwr. Felly, rydyn ni'n dweud i'r Llorys CIC. Felly, rydyn ni'n dweud i'r open source hardware, rysfi dylai'r ddweud, ac rydyn ni'n dweud i'r effort a dweud i'r rysfi yn ystrym LVM. Felly, byddwyd yn y torw'r gweithiau, rydyn ni'n dweud i'r... Dwi'n dweud yn ymwyng y rei iawn, felly rydyn ni'n dweud i'r lle'r cyfrifiadau some preliminaries, a little bit of insight into what it actually means to pour to compiler like LLVM to RIS5. Then I'll talk a bit about how I've gone about the task, some design decisions, and then pick off a few topics which I felt were maybe representative of the sort of design decisions and tasks that we've been taking on in this effort. I'll of course give a bit of a summary of current status and then an idea of where we're heading in the future. Yeah, I can drive. So first of all, what is LLVM? LLVM is a popular compiler infrastructure. It has a permissive licence just in the last couple of weeks changed to Apache from a BSD style licence. It has a library based design, which many people find useful with my integrated into our projects. And it's often used and paired with the clang language front end, which is the equivalent of GCC in terms of giving C and C++ support. And as we'll touch on later, there's a number of other languages, languages which have used it as their back end, most notably Rust, which I know is one reason why a lot of the distro people are particularly interested in the status of LLVM RIS5 cogeneration and Swift, Julia and other such languages. And so LLVM provides a sort of common infrastructure, a common set of passes, analyses, optimisations, transformations in order to provide to implement basically a whole suite of compilation related tasks. So there's the LLVM and its co-generator, but it's also a family of projects, including Linker, Debugger, C++ runtime. So compiler RT, the equivalent of the low level of LibGCC library, and a growing suite of projects as people upstream contribute new things. OpenMP implementations, sickle coming from Intel soon. And so why are people interested in LLVM? Why am I interested in LLVM for RIS5? I think there are a number of reasons. I think first of all, we should make completely clear that GCC for RIS5 is in fantastic shape. People have been using it to compile a whole bunch of things. So if you just want a working compiler, GCC is there, it's ready to go right now, well supported. There are a range of pre-compiled versions available, people upstreaming and increasingly cross-compiled tool chains upstreamed into distro repositories. But there are a number of people who prefer LLVM either for licensing reasons or because they rely on it as the back end for programme language. So Rust is perhaps the most notable example and I'll be hearing later it's becoming a dependency for any modern distro right now with Lib RSVG and other libraries being ported. I think another LLVM's also had very good uptake in academia research and R&D. One of the reasons is that it's actually very easy to modify and add support for custom extensions. There's a very active developer community and we could argue back and forth about GCC of this LLVM but certainly there's a locus of interest around it for new and interesting compiler analyses and transformations. So I'm just going to give a little bit of a brief overview of what a clang LLVM compilation flow looks like, which we'll use to talk about the tasks that you take on when porting something like LLVM to support new architecture. So it of course has your C input, so in the case of LLVM clang you put this through your C front end and when porting it to an architecture to be honest there's very little you need to do to the clang front end. You need to add support for the set of command line options which get passed through to the back end. Teach it a little bit about the target architecture so it can provide appropriate diagnostics but beyond that it's an order of a few hundred lines. So we have our C input, this gets, eventually we get through to miss gets passed, we get the clang AST and that will eventually spit out LLVM IR which probably most of you have seen in some form at least once. So unlike GCC this is exposed to the end user or people who want to feed it in as inputs to LLVM so people can actually pass that using other tools if you wanted to do your own stuff in practice most people implement their own extensions to LLVM using the library based extension mechanism. So as you see we've lowered the C types in a fairly trivial way, into 32t becomes LLVM's IR type I32 and LLVM IR has its own set of instructions and semantics and one sort of con misconception is that it's some sort of universal IR in the sense that well assembly or other projects are trying to be. So once you've got to LLVM IR you've actually already made a number of target specific assumptions which you need to do in order to meet the CABI and maintain the semantics of your input. So once you have the LLVM IR the next step from that is then putting that through the compiler middle end and eventually this goes through a process which spits out what's called a selection dag and we're not going to go into details of that because it's not particularly relevant to what I'm talking about today but the point is everything that's going on on the LLVM IR level that's mostly target independent and shared so it's all the same code for ARH64, x86, power and so on whereas once you get to this step you've done all of your target independent transformations and analyses this is where the porting effort comes in. So you're able to make use of a whole set of libraries and helper functions but the task is essentially to take this selection dag and to lower it to our final form which is the outputted assembly or ELF object. So in this case it's all very trivial because ultimately we have a single IR instruction, the add which gets lowered to a little bit of extra junk in order to handle the function arguments from return as well as the main meat of it, this selection dag add node which is triggerly then lowered to the add instruction. So that's given a bit of a sort of high level overview of the sort of things that you take on while porting something like LLVM. So this includes, so LLVM CR4 a number of years now includes an integrated assembler so that's a fairly relatively straightforward task of just supporting the RIS5 assembler for syntax. We start looking at defining, so you've implied your assembler and that's actually where I started with the RIS5 LLVM port as to provide a solid base on which to build everything on top of. The next task you want to take on is doing something like defining the instructions so which instructions does the architecture support and how they encoded. So I'll just show a slightly verbose example over the next few slides but you don't need to study the whole thing and also step through the rest of these examples actually. So this is the exploded definition of a single instruction in LLVM and of course you don't need to write this when defining your new target. This is using its domain specific language called tablegen so this is specifying that we have a RIS5 instruction and add the assembler syntax looks like this at the bottom and we have a whole bunch of encoding so actually I've highlighted on the next slide these are the lines that are relevant to encoding the instruction for the target architecture and these are the lines which are relevant for the assembly parser which is actually mostly generated for you and for specifying the inputs and outputs of the instruction. But of course this is verbose painful if you had to write that for every instruction it would become rather tiresome and not particularly easy to maintain. So you start adding in sort of classes which are a feature of the tablegen DSL which means that ultimately you can specify instructions using a fairly straightforward form such as this which defines all the register register instructions in the base RIS5 ISA. So that's enough in order to provide if you have enough support code around it that's enough to generate the assembly parser and to actually produce object code from that but we also want to handle the more common case which is when you have LVMIR input from Clang, Rust, Julia or whatever else and you want to work out how to convert that to RIS5 instructions. And so that involves both writing C++ logic for more complicated cases but in a simple case such as an ad you just write these patterns. So these are just sort of, so these are patterns on nodes of a selection dag so it's just a directly cyclic graph specifying that if you have an ad node which looks like this convert it to a RIS5 instruction which looks like this. And I mean from a high level that's basically what it involves going through each of the instructions supporting that, implementing patterns in this way and then most of the fiddly work is kind of around the edges it's ensuring that the stack frame to set up properly that the ABI's maintained that various corner cases come up which are handled properly. Of course with RIS5 we have, it isn't just a single target and although over time other architectures have also gained a whole range of different features and different options, I think RIS5 is, they don't tend to be exercised simultaneously so essentially we have free base ISA definitions so rv32i, rv32e which is when you have only 16 registers or rv64i but then you have the question of do you support the multiply, atomics, single precision floating point, double precision floating point, the compressed extension and maths will find that's fairly straightforward it's quite every architecture x86 adds new extensions with every Intel chip released but there's also a whole range of ABI's attached to that so whether it's the, which is specified in terms of xlen so whether it's rv32 or rv64 and then indicating whether so just ILP32 would mean that is essentially the soft flow ABI ILP32f is single precision, hard float for single precision which means that your floating point arguments are passed by registers and similarly for ILP32d and LP64d and so on and the way that the RIS5 ABI is defined it's actually pretty easy to handle these with a single function which is parameterised by the supported floating point length and the supported and the general purpose register length but in terms of testing you can see that once you start multiplying all these options there's quite a lot to cover there and there's more to come as people are introducing new custom and standard extensions and introducing new custom or standard ABI's so the report of RIS5LVM back in the early days of RIS5 just when it was making its way out of Berkeley there was a GCC port which some of Berkeley students had developed and there was an initial LVM port and ultimately I actually went and waited around a bit to see what was happening I'd previously been working on a LVM port for a research architecture at Cambridge a large number of very simple in-order cores but with its own ISA so I'd kind of got quite up to speed on LVM back end development through that and I think we'd found that the initial RIS5LVM port although people had managed to do some useful things with it it was quite some way away from being ready for being committed upstream so I sort of started a fresh effort trying to work in a very step by step taking a very step by step methodical approach ensuring that everything was well tested, unit tested and to the extent that I was able to document it and I guess over the past 18 months or so I've been able to get a bit of funding which has helped to increase the time I can spend on this I'll say more about status a little bit later but there's now this started as a downstream project but with the intent of getting it upstream very quickly LVM has a pre-commit review policy which in the early days it took quite a while to actually get things up there but that's sort of sped up substantially now that there are more people interested I'm the upstream code owner so that things got much easier in that respect so at this point there are I guess the sort of high level view of the current status is that it's not there yet for hard float on Linux targets while I'm targeting the next LVM release for that but there are I know multiple companies who are using it internally for their firmware builds on their embedded 32-bit targets quite successfully occasionally working around various limitations but finding it's doing what they need so I think I've covered that so in RISV as you might know there's a compressed extension and how that works is so in the standard RISV instructions the RV32i, RV64i every instruction is 32 bits if you support the compressed extension then for a subset of those instructions you're actually able to encode them in just 16 bits so this is not dissimilar to thumb 2 or various other compressed instruction set designs except that you're able to freely intermix them and we actually had a fair bit of discussion about how to handle this in the compiler and there are a number of decision to design choices that were considered and one of the questions we had to answer quite early on was whether the instruction selector would be aware of the compressed instruction set because of course if you have a 16-bit instruction for an ad it doesn't have the same it has various restrictions on it such as you have to have the whether restricted subset of registers that you're able to access whether restricted immediate field and so there's one approach would be to try and teach the instruction selector about all of these instructions and have that produce it and we actually ended up doing something very similar to what GCC in the GNU assembler does which is to actually handle it basically exclusively as a very very late stage conversion so you go all the way through register allocation through the instruction selector, through register allocation actually generate the in-memory representation of the machine instructions and then we put it through a converter which just is basically a table based system that looks and sees and says for this ad instruction can I convert it to a compressed ad instruction for this load instruction can I do the same and the advantage of this is that it quite substantially simplifies the logic throughout the rest of the back end because I mentioned that we have those long list of instruction patterns and that's pretty straightforward but there's a whole bunch of other target specific analyses such as looking at analyzing control flow and branching and it becomes somewhat tedious if you have to ensure that all of that recognises both the uncompressed form and the compressed form and of course it's an area where you can start to introduce errors but I think that's the starting point rather than the end point so we have it so that and it's not completely naive in that the retro allocator is aware of the fact that some registers are able to be accessed from compressed instructions so they would be chosen in preference to registers which can never be accessed from the compressed instructions and I've gone a bit around in terms of how much to go into detail on the RIS5 merry model support so I'll try and keep it somewhat accessible it turned out to actually be a larger body of work than I was anticipating so as you may know if you've been following RIS5 over the past few years one of the early and probably most successful standardization efforts in the RIS5 foundation was coming to conclusion on what the RIS5 merry model should be so the initial specification it had various things to say about the merry model some researchers at Princeton and others pointed out that actually like most merry models for most architectures there were various things which were somewhat questionable and they quite successfully managed to get a whole group of academics practitioners and industry together and come to an agreement on the merry model which ended up being a sort of very relaxed merry model not dissimilar to ARCH64 and so what do you need to do to support that in the compiler and it's actually the basics of it is relatively straightforward because I have that with the C11 and C++11 merry model there are a number of primitives such as atomic add, atomic subtract, atomic load each with different memory ordering specified by the programmer and all I really need to do is understand each of those so this is a compare and exchange that would have been lowered from the C++11 atomic compare and exchange and then I simply have to convert that to whatever the merry model people tell me is correct and then in theory you're done but it's actually a little bit more fiddly than that so in RIS5 supports it has two sets of atomic instructions so you have the AMOs for the atomic memory operations so atomic add, atomic subtract, atomic all atomic XOR and that's relatively straightforward to support but then there's more general instructions so load reserved and store conditional so if you want to do something a little bit more complicated than just load a value, add it and atomically write that back again then you rely on these primitives so load reserved reads a value and it sets a reservation and the granularity of that reservation is dependent on a micro architecture but let's say it reserves that cache line and then you do some computation based on that and then later you do a store conditional and if somebody else wrote that cache line in the meantime then your store conditional will fail and then your program code if it wants to be correct should then loop back and try again and keep trying and it's this looping again and keeping trying which can be problematic because you want to ensure that this loop will actually terminate at some point and so the RIS5 the RIS5 ISA manual specifies various restrictions on what code can be placed between the load reserved and the store conditional in order to ensure what it calls the architectural forward progress guarantee and the restrictions are pretty simple it just means that you can't have the main thing you can't have are spills or loads of stores from main memory or jumps or floating point instructions and that sort of thing but the difficulty is that the compiler doesn't actually have an easy way of reasoning about that a reasoning about the fact that say for this IR construct once I start to decompose it to RIS5 instructions but before it's done various other analyses of transformations it's not actually possible to specify that this region should be left alone and there shouldn't be any spills and that sort of thing so it's actually a problem which other architectures face and I ended up sort of implementing a new whole set of different hooks for atomic lowering which currently RIS5 is the only user of but I'm hoping that AR64 an arm will move over and essentially the approach is to lower to a to basically treat something like compare exchange as a black box when you lower it so we see this compare exchange we lower to a we load something which is a little bit still very abstracted so it's basically a we know it's a RIS5 compare exchange but it's not converted to RIS5 instructions until a very very late stage after all the other register allocations have been taken place and things like that which ensures that there's absolutely no way that unexpected spills are stacked and that sort of thing are introduced and although in practice say the other backends have managed to avoid this by having a slightly different code path or when you're targeting O0 there's it's still a sort of theoretical concern but I think it's worth addressing so in terms of testing and fuzzing there's been I guess there's a few approaches we take so most of it has been done through unit testing so targeted tests written against the LVM has a whole infrastructure for this based on giving input IR and then writing effectively regular expressions on the output to ensure that the assembly that you expected came out the other end there's been an increasing effort to start to test with of course real world programmes multiple people are compiling spec with Clang and RIS5 and there's some great work last year from some interns at Qualcomm on ffuzzing and this was targeted on the assembly parser so specifying a grammar for RIS5 instructions and then trying to generate a whole bunch of different inputs and seeing how it differs between GCC and the LVM assembler so that was using there's some related work for doing something similar with C so that was inspired by I think there's work some Google's done on A so you specify to specify to specify your grammar and then using LVM's ffuzzing library you're able to instantiate that with with legal and illegal values so one of the another challenge on the compiler side or I guess another task is supporting linker relaxation so if you have so LVM has fairly small immediate sizes so if you have access to a and of course it has fixed instruction length on like x86 so with x86 if you if you want to access global you can stick the whole address there if you wanted to whereas with RIS5 you'd lower it to say a couple of instructions so you're to say lowering the other loading the other 20 bits and then say loading with a 12 bit offset but linker relaxation is when you sort of you allow the linker to look at all of these instruction pairs and then decide and find those cases where actually you didn't need the other 20 bits because it is within range of the say the 12 bits offset and then delete the original the first instruction and this is and if you'd use that approach you have to ensure that the linker knows that the linker has access to to every to every fix-up or relocation in the program which basically means every access to a symbol because all of the instructions could be moved or changed I think so supporting RIS5's ABI as I mentioned one of the challenges is that there are a lot of them the helpful thing is that they're actually basically all very similar and that they are the contractualisation of the supported floating point register length and the general purpose register length so LVM is actually structured so that a lot of the a lot of responsibility for supporting the ABI is done in the front end so this is something I've been talking to some of the people who have been porting Rust about they also have the responsibility of capturing cases where you need to effectively is a problem of say I have a function which takes a unsigned integer and a signed integer how do I convert that to LVMIR which maintains those semantics and that's mostly done for us in Clang but this is work which also has to be repeated for Julia, for Swift and Rust and other languages so one thing I did with this was I've implanted a Python library which aims to be a RIS5 calling convention initially I called it the calling convention goal a model it's now called ABI cop so this is a sort of tool which helps to test that the ABI is conformant to at least my interpretation of the RIS5 ABI specifications there's more work to be done there in terms of randomised testing so it does a fairly naive it has a fairly naive approach of doing randomised instantiations of function pairs and then you can take take your caller compile that with GCC, take your callee compile that with Clang and hopefully you get the result you expect but there's definitely more that can be done there so moving on to status and where we're at I think the first thing I'd like to highlight is that this started out as a project where I was the primary author and it's expanded over time now so we're at the point where we have full external contributors, I'm very grateful to contributions from companies like Qualcomm, Embercosm Andy's Tech and others who've added support for features like working on the compressed instruction sets more recently TLS from Embercosm and a whole series of assemblifix and other things from Andy's Tech and I think we're starting to see as the L of the M backend nears the point where you're able to do things like compile Linux kernel and Linux user space there's a growing number of people who are interested in starting to attribute upstream it's also been something I've tried to encourage through giving tutorials and that sort of things I'll mention, I'll give a link at the end of the talk, I gave a tutorial at the L of the M dev meeting at the end of last year which gives a sort of introduction to hacking on L of the M and hacking on the RIS5 backend in particular so where we're at with L of the M as I mentioned there are multiple companies who are using it internally for their first two bit firmware builds but there are still a few limitations so the TLS support is in review queue and will hopefully soon be upstreamed so that's a thread local storage things like position independent code remains to be implemented and upstreamed and the so the hard float ABIs, which are the ABIs that you want to use on Linux aren't implemented yet so the main block of that had been the 64 bit floating point cogen that has actually merged in the last few days so imminently now will be pushing forward on the hard float ABI support which people want to compile modern Linux applications there is a Rust port and an active community around that targeting the baseline RB32i so the RB32iM whatever extensions you want to add to it so that's now that all the 64 bit stuff they need is upstream they're now looking at porting that 64 bit as well LLD I should have mentioned also that's another thing which I have other contributors to thank for so Andy's Tech did some work on adding support to S5 to the LVM linker and it's so it's there, it works, it's somewhat early days so it doesn't support some of the features that I mentioned like linker relaxation yet that requires a bit more design work but all the basic relocation to support it and you can use it if you want to but you can just use the canoe linker so it will be moving forwards with work on the hard float ABIs that's probably priority number one right now other work which has been going on is on the RIS5 Vector Extension so I won't go into much detail about this this has basically been led by Robin Croop who is a student at TU Damstatt if you've been following the RIS5 Vector Extension discussions it's it's an interesting compiler target like arms like arms SVE Vector Instruction Set one of the basic design decisions is that your vectors are a are a compile time unknown length which is dependent on the micro architecture and so there's quite a few interesting challenges about how to support that and there's been actually work from Robin and some of the other people working on vectorisation LVM working on how to support that for RIS5, SVE and other vector architectures which take a similar approach now there's I think there's a talk later which will be looking at comparing code size on RIS5 vs other architectures just comparing RIS5, LVM and Clang vs GCC there's more that can be done to improve code compression first to meet GCC and then hopefully to move beyond that so I mentioned that the select that targeting the compressed instruction set is there's very little interaction with the instruction selection stage so it's not quite unaware of the fact that you're generating compressed instructions but there's still not much of feedback loop there and there's definitely more you can do so specifically teaching the register allocator about the compressibility of instructions so let's say that I have a series of instructions all of which use the same virtual register so I know that it has a that they want to use the same value but it hasn't yet gone through the register allocator if the register allocator is able to dynamically wait the choice of the registers it can make a more sensible choice because if you know that one of the instructions could never be compressed anyway then there's not much point in using one of the registers which is compressible you'd be better off saving that for a case where you can actually improve code size there's also I mean one thing we've noticed comparing GCC and Clang is that GCC's Clang's ordering a basic block reordering a basic block can be quite aggressive so what it's trying to do is to ensure that based on either profile data or static estimations ensure that the common case or the hot path is sort of linear in memory which is great for for cache effects and performance and so on but if you're trying to minimise code size that's not always what you want because it means that your branch offsets tend to be larger and so you tend to end up not being able to compress all your branches and so tweaking this or adding ways of opting out that behaviour is starting to be further explored so I'm particularly interested in talking to anybody who's been working on different language front-ends for LVM so I've had there is a Haskell LVM backend as well so I know they did a little bit of initial work but I don't think there's been much activity there so I think Rust has been most active I haven't actually heard anything from a Julia or Swift community but if you know people in that community who are interested then please put them in touch it would also be good actually to work together more on the ABI lowering and validation around that so as you know RIS5 has a sort of set of standard instruction set extensions which are defined right now but there's also an ambition to extend that set the RIS5 vector extension was one example of this there's also active work on defining new instructions bit manipulation and so I'm quite keen to work with that working group to provide support for the compiler before it's standardised in order to provide feedback in terms of how difficult it is to support compilation to that extension and getting statistics on how often you can actually select these instructions so I'll hopefully be something we move forward on in the next couple of months I've suggested that it would be worth the proposed extension if they could just pick and encode if it's not final so at least those of us who are doing those sorts of experiments can cross check our tooling to ensure that the assembler is working in the same way and that sort of thing and there's more stuff coming in terms of RIS5 standards there's been talk about supporting floating point registers supporting floating point instructions on values held in general purpose registers of new APIs for embedded targets so APIs where long double isn't 128 bit float and other changes and so this will of course lead to continuing work in the future and there's more to do in terms of support for target independent LVM features that we'd benefit from so I mentioned code size and the machine there is a very promising feature that's been pursued more recently with ARC64 so effectively this is the opposite to inlining so with inlining you see that a function is used and that you don't actually need to have the external function call you can copy it into your function without lining you look and you find common code sequences and because you're trying to optimise code size you decide well actually I could pull this code sequence out to a separate function that's less expensive in code size just have a call to it and so it's obviously not positive for performance but useful for people who are really trying to cram their code into a small matter space and we're actually working on further testing on more real world applications I guess in the last couple of months it's mostly been people who are using it on their own code bases and they've been feeding back upstream when things work and we're moving towards being taking a more structured approach to I guess proactively testing on real world programmes and benchmarks and on the 32 bit side Zephyr, Friartos and moving towards Linux and FreeBSD on the 64 bits side and I guess a more automated comparison to other tool chains and I guess other targets as well in terms of code size like comparing RIS5 versus ARM so I guess I'm getting towards the end of the talk that is roughly the end of it I should say that low risk is hiring we have about five positions open we're ramping up a lot more work in order to deliver free and open source hardware and software so if you're interested have a look and talk to me or Luis or Philip afterwards if you want to get started hacking on LLVM I put together this tutorial at the LLVM dev meeting last year which a lot of people found helpful if you have problems with it let me know and at that point I'll see if there are any questions thank you If I remember correctly well from a couple of years ago when it was hacking LLVM the IR is not completely changed Correct, yeah So if I want to repeat the question Yes, thank you for that So the question was that actually move on to the question and I'll repeat the whole thing So the LLVM IR is not completely machine independent and so if I'm going to write a frontend for LLVM for a different language how much work do I have to put into supporting all the different variants of risk 5 especially 32 and 64 bit variants So the question was given that LLVM IR isn't actually fully target independent how much work is there in order to support a new frontend in order to support the ABI So it's something that we talked about in the LLVM community back and forth for a while because it seems unpleasant that people need to do this for each new frontend language and potentially get it wrong So it depends on the ABI that you're targeting So in the case of risk 5 you actually do need to understand that given your function signature you actually do need to go through and count the registers and determine whether this value is going to be passed on the stack or not because it affects whether you need to add the sign extension or zero extension annotations and things like that So it's hard to quantify how much work there is in that I guess writing something not that hard testing it and ensuring that it works more work and I hope that's an area where more of us who are working on that can collaborate because the testing problem is basically the same one approach is the approach I mentioned of doing separate compilation and compiling the caller and callee under different compilers or perhaps extending the ABI cop tool so that it does a very simple form of code generation in order to provide I suppose more of a even more of a golden model right now it just spits out a text description of what you're expecting but I'd be interested in talking more for anybody looking at doing that Anyone else? So typically they they've standardised on LVM and Clang for whatever reason throughout the company they have toolchain teams who'll support that and so it mostly stems from that Thank you very much