 Hello and welcome, ladies and gentlemen to this year's ELC 2021 talk initializing risk five, a guided tour for ARM developers held by Ahmad Fatoum, a colleague of mine and me, Rufen Selvinsky. I'll start by introducing the both of us and then we'll start right into the specifics of the risk five architecture and what's helpful to know for ARM developers. Let's start with a short take about me. I'm Rufen Selvinsky. I work for Pangrotronics as does Ahmad. You can find me under the nickname Eman Tor on GitHub and you can reach me under the email address on the slide. I do bootloader, porting security, consulting and integration. For example, verified boot for IMX platform system integration integrating libraries required for your Linux system and consulting all around Linux on embedded systems. And so does my colleague Ahmad. He is also doing a lot of kernel porting and kernel drivers providing driver development for new devices which need integration into the kernel and pretty much does the same system integration and also consulting all around the Linux embedded stack. Why do we develop for risk five? Why should we start investing time into an architecture where we can't even buy Raspberry Pi-like devices yet? Risk five could very well be the future of embedded devices. We just don't know it yet because we haven't seen devices out in the field which fit our use cases with larger Linux systems and integrations. We also wanted to spare some time for our barebox bootloader development. So we took the time to enable available risk five patches from the community to work on the QEMO virtual platforms. This was the initial step and the next step was using barebox for Beagle 5 devices for the Beagle 5 beta boards. We also want to be ready for the first real embedded devices so we want to have barebox running on the first commercially available risk five devices. And it's also very attractive to have risk five as a tiny Emo target because you can do fun in-browser demos with tiny Emo running as a web application within your browser. You can check that out in the link on the slides. Next up is a short risk life over you. The base instruction set for risk five is very, very simple and consists of mostly integer instructions and system instructions. So if you know ARM assembly, nothing is very surprising to you. You will be right at home. It's also helpful to know that there are 31 general purpose registers. There's also the zero register which is constantly zero to provide a zero value for your architecture and then there's the process counter register which is also special and always points to the currently running instruction. There are also two, so to say, instruction sets for risk five. One is 64 bit. That's RIV64i. And then there's RIV32i which is 32 bits. And the only difference between the two is that RIV64i, the 64 bit instruction set, extends over the 32 bit instruction sets by providing 32 bit instructions. So if you want to run 32 bit and you want to target 32 bit register sizes or a 32 bit integer within your 64 bit register, there are directly instructions available for that. There are also different extensions. For example, there's an extension for atomic instructions to have one instruction with atomically exchanges of value. There are extensions for integer multiplication and division. There's an extension for floating point or for double floating point and for the control and status register. What's important to know is that for our Linux targeted use case we want to have RIV64iG where the G general expands to integer, multiplication, atomics, floating point, double floating point, the control and status register and the instruction fence instructions. So on most systems which are capable of running Linux either on 32 bit or 64 bit, we will have all these extensions available and the kernel does use them. Next up are timers. So the privileged architecture provides a timer register which is the M cycle if you are in machine mode or simply a shadow register in the supervisor mode to access the current time. And this access also depends on your mode. So if you have an S mode where Linux is running in supervisor mode and then you also have a user mode, you will need to call into machine mode where usually you will find a supervisor binary interface running called SBI and you use this supervisor binary interface to get and set the timer from the machine instruction or by directly accessing the shadow register. If you are running in M mode only you have direct access to the registers and there is nothing bearing you from directly getting your timer value or setting your timer values. This is in contrast to ARM32 where before Cortex A7 and UR which have an architect timer, the timer instructions were specific to your current Cortex processor and needed to be set up separately. On ARM64 bit, there's also an architect timer which is always present so there are two instructions which are used to get the current timer value and ARM64 also supports virtualized timers naturally. Next up is the ABIs. So the risk five calling convention or ABI specification defines multiple ABIs. However, only the two defaults are really recommended for you and these are for the 32 bit instruction set ILPD32D which means integer, long pointer are 32 bits and float is by default always double precision. And for the 64 bit instruction set it's LP64D which means that long and pointer are 64 bit float is also a double but the integer stays at 32 bit. This is pretty much the same for ARM32 bit which doesn't have an ABI and ARM64 bit which also only supports LP64D under Linux. There were some ILPD32 bit extensions or there was ILPD32 bit support available on the Linux kernel mailing list. However, as far as I found out it hasn't been merged into the main line yet and I don't know if effort from ARM is still ongoing at this point. Next up are the code models. These two code models available for risk five change how global memory is referenced within the code. So there's the met low code model which expects that global symbols are within two gigabytes of the linker address which is mostly zero and this is implemented by using by loading from the global memory reference and then just or loading the global address into register and then directly loading from the register and the second code model available for risk five is met any which expects that your global symbol is within two gigabytes of your instruction. So this is a much loser requirement then for the met low model and can be used to access more memory and makes it far easier to make your binaries run positioned independently because your entry address is not no longer fixed to a specific memory address and this is implemented by using an AU IPC instruction which just loads an offset from the current program register and then just loading the address from the register you just programmed for arm 64 there is a tiny code model a small code model and a large code model however according to documentation the tiny code model is not really fully implemented yet and mostly uses the small code model as well and the small code code model carries the same or carries the restriction that global symbols must be within four gigabytes of your of your binary and the large code model does not impose any does not impose any restrictions on your code model or how your binary has to look like and for arm 32 there's just no selection available anywhere there let's take a short look at kernel assembly since RV 64 or the risk 564 bit instruction set and the 32 bit instruction set carry the same instructions we can use macros to write assemblies or assembly for both for both at once so we can write assembly macros which will just expand to the correct instruction according to the size which has to be loaded into a register and this is done by using the rec s and rec load macros within the Linux code the rec s macro is used to store a value at a register offset and the rec l macro is used to load something from a register and from a register content and the size rec macro expands to the register size for your architecture or in the case of a 32 bit call within a 64 bit instruction set you can also set it to four to force a 32 bit load into a 64 bit register and this is plainly not possible within arm 32 or RRG64 the arm 64 instruction set because of differing instruction sets and registers since arm 32 bit has only 16 general purpose registers available and arm 64 has 32 general purpose registers available so you can't simply share the kernel assembly code there unless you impose very very tight restrictions on the arm 64 instruction set next up I will talk shortly about our development platform so we have done a development on QEMU or tiny EMU based virtual machines for IEO to the host system those are using VIT IEO to communicate with the host so consoles or memory images or HDD images are provided by VIT IEO then we are currently experimenting with a LightX VescoRisk FPGA platform on an ACPI X5 which uses a lettuce FPGA where open source tool chains are available which is really nice and this is a LightX SOC with a lot of components provided by the LightX SOC infrastructure so LightETH or LightURD are used within that and then we are using VexRisk cores which is an open source processor you can download and assemble into your project and then for some real real world available hardware platforms we are using a StarFive Beagle5 or we are using still are using but unfortunately the Beagle5 has been discontinued a short look into how BearBox on RISC 5 looks like for the QEMU platform we can see on the slides that BearBox provides a so-called BearBox Generic second stage which is really really useful in this case BearBox pretends to be a Linux kernel so it can be started from any other bootloader which implements parsing of the Linux RISC 5 kernel header and then we are using VIRTIO4DISCIO so you can see on the right side we have a current BearBox from the master branch the board is a RISC 5 VIRTIO4QEMU platform it has some flash which is always present in the VIRT platform of QEMU and then we can see that we are running in S mode because QEMU also provides a supervised binary interface implementation and yes it pretty much works out of the box and can be used to boot a Linux kernel I was talking about the RISC 5 Linux kernel header we can take a short look into the header we are using for the BearBox Generic second image so on the first line you are seeing that we are setting up the stack pointer at the current location where our program or our program counter is currently pointing at because the Linux RISC 5 header specifies that there has to be a load offset so the text area will be loaded at a different place and is not loaded directly after this very initial header then we have a jump instruction which jumps to the one label which is defined further down in the file so this is used to jump over all of the header and then directly run the BearBox preboot loader code which sets up everything else to run BearBox then we have a B-aligned 8-line because the next instructions need to be aligned to 8 bytes because the first two instructions always need to be two 32-bit instructions then we have the image load offset so it's talking about that the header specifies that the text area of the binary has to be loaded at an offset from where it's jumped into and in this case this is about 4 megabytes then we have the effective image size which in this case is 2 megabytes we have a kernel flex value which is just empty at the moment and I think it's also empty for the Linux kernel and we have a version of this RISC-5 Linux header which is defined to be major 0 and minor 2 so 0x2 at the moment then we have two reserved fields which are not used at all at the moment and then there are two magic fields which have to be present within the RISC-5 Linux header to identify it as a RISC-5 Linux header and in this case the first magic has to be RISC-5 and the second has to be RISC and then the escaped value 0x5 and at the very end we have a reserved PE coefficient offset which is used for ETHI if you want to run ETHI payloads or if you want to have a bootloader which implements the ETHI specification and you want your Linux kernel to be an ETHI binary you have to use that field next up is Bearbox on Beagle 5 we have the support for that upstream and it can start Linux just fine so if you have access to a Beagle 5 beta board please don't hesitate to test out Bearbox on the platform and you can see on the boot app that or in this case it's not shown but first up the Beagle 5 also starts an open SBI payload as the supervisor binary interface and then next up jumps into Bearbox so in this case we can also see that the Beagle 5 implements S-mode and Bearbox is running within S-mode but the specification in this case is much higher than it is in the QMU case because QMU only implements an older specification of the SBA binary interface a short word on RISC-5 hard CPU boot up so if you have a simultaneous multi-processor system you will not only have one CPU but multiple CPUs in RISC-5 those are called hardware threads this is so that if you for example take a look at the X8664 processors where you have multi-threading RISC-5 in the RISC-5 architecture we just say that a normal core and a multi-threading core are the same and are just two different hardware threads and we don't care about whom is able to access resources of another thread or share caches we just say those are two hardware threads and are done with it so we don't have to differentiate between different CPUs under ARM64 usually boot up of secondary CPU cores is done by calling into the PSCI interface which is the power state coordination interface and for most architectures I've worked with this is implemented using the ARM trusted firmware on ARM32 bit there's a lot of varying here so if you have a platform which is very very new like STM32 MP1 you have an ARM trusted firmware available as well or if you are running Opti then Opti has to be the one starting course and starting course is only allowed from the secure world so those also implement the PSCI interface or you have direct register access where you can start a core or stop a core on RISC-5 there are two options so the very first option as per specification was that all hardware threads enter a Linux and each heart has its heart ID in the A0 register and then Linux can identify okay I want the core with A0 to be the main core which does all the boot up until I can start the secondary cores and the secondary cores will spin until Linux is done with that but for the newer case where we don't want all the boot loaders to implement the startup of hearts because that's very tedious and needs to be done for every boot loader because they all have to start the hearts before jumping into Linux there's the supervisor binary interface heart state management extension and this is somewhat similar to the power state coordination interface where you get function calls into the supervisor binary interface to start a hardware thread or to stop a hardware thread to get the current status of a hardware thread or to send it into a low power suspense date where it will wake for an interrupt to wake up again after this look into the CPU side of things Ahmed will now start with the peripheral side of things so have a lot of fun so now we got our RISC-5 CPU executing our own code and we would like to put it to good use this means we want to interface with the physical world via talking to the peripherals the way this works on RISC-5 systems is by using memory map.io so just like we access normal system memory we can issue memory transactions to read and write from peripherals this works but having caches in between can complicates this so we are all on the same page let's have a quick primer on cache coherency so we have two CPUs both CPUs have read some region of RAM and have it in the L1 cache so CPU-1 has modified that region and then flushed it out to RAM and then CPU-2 wants to access the same region but it has stale data from before CPU-1 has written its data in its own L1 cache and then it will access this stale data once you have CPU-1 and CPU-2 coordinating with each other you will have a synchronization nightmare at hand the way this is usually fixed is by having hardware coherence protocols which keeps these caches in sync this is commonplace for Dcaches on SMP systems but for iCache it's usually not because you write it much less often than Dcache but in the context of Bearbox, a bootloader it's normal that you load code and this code loading happens through the Dcache and the execution happens through the iCache so you will need to do manual cache maintenance how that looks like is that you normally load your code for example by copying it off a network packet when you are doing network boot then you will see this copied data you will flush it out so it reaches a unified cache and then you will invalidate your own iCache that means the next time there is some prefetch going on or some instruction execution going on you will not find these data in the cache you will have a cache miss and you will get it from the unified cache where you had written your data before how does that look like for devices? it looks slightly different because you will not want caches between you and between a device that you are directly accessing for example a timer you will have a bit to configure that the timer should start running and you will have a memory mapped register where you can read out how many ticks have gone by and you don't want to read these ticks and always get the same value because it has been cached so you will want to disable caching for the iO memory region that is used by the timer and other peripherals this is done by the hardware implementer they know okay I will map the system memory at this address and I have this region for iO devices and they would design their physical memory layout so that part is cached and part is uncached so no problem there the problem starts once you have devices that themselves want to access system memory so the CPU access system memory and it goes through the caches but now you have devices that don't go through the caches so what are these devices supposed to do so they are aware of modifications done in the CPU local data caches on x86 what happens is that the devices are aware of the CPU caches so you have a cache coherent interconnect a PCI network controller can access memory normally and it will be aware of changes done to the CPU data caches but on ARM this depends on the SOC interconnect so some server grades have cache coherent interconnects which means you don't need to do any manual cache maintenance but most embedded SOCs don't and on these embedded SOCs you will do these cache maintenance you must do this cache maintenance by hand if you want to have devices directly accessing memory how that looks like we can see here with this example we have a processor and we have a network interface that receives packets that we want to process on the processor and we have system memory which is shared between the processor and the network controller so the processor will allocate the buffer and this buffer it will create a descriptor that describes this buffer it has its base address it has its lens it has some bits, control bits that describe to the hardware what to do with this buffer and then it will take this descriptor place it in a descriptory which contains many other such descriptors and the base address of the descriptory is written to the network controller writing to the network controller happens like we do with a timer we do a memory mapped IO access and write that address now the network controller is informed about the existence of the descriptory it will take it it will go through it and then it will say oh, there is a descriptor that says I can use it and there is a buffer so when it receives a packet it will directly access system memory and copy its received packet into that buffer and then the processor will get for example an interrupt it will look through the descriptoring it will see okay this descriptor is nearly arrived I will take the buffer and that buffer will propagate through the network stack where it will read different parts of it and process it appropriately for the protocol what we see here are three different types of accesses from the CPU we have the access to the memory mapped IO that's the PIO vertical line here and set one we will need to be we will want to be uncached then we have the access to the buffer the buffer after we have received it many different parts of the network stack will work with this buffer so we will so we appreciate it being cached and then we have the descriptor in which we use for synchronization with the network controller and when we set a bit that says okay network controller you can now use this descriptor we want these changes to be directly propagated to the network controller and the inverse of that so we want that ring to be coherent how does Linux deal with it? you have two different types of DMA mappings we have coherent DMA mappings which are accessed by the CPU and device in parallel and we don't need any explicit synchronization which makes it very good to implement these descriptorings then we have streaming DMA mappings these are not accessed in parallel but you will want them to be cached because once they change ownership the network controller will use them for a span of time and then the CPU will use them for a span of time and just at these synchronization points the ownership changes so you will need to account for caches how that's done is that you have functions for example DMA sync for device that will say okay I am the CPU I have put some stuff into my buffer now take this buffer and sync it for a device so after that function returns the address of this buffer can pass to the hardware and it will access the data I just have written keep in mind that this is different from memory barriers so this ensures visibility not order which is what you need memory barriers for how does that look like on ARM so you have coherent DMA mappings implemented via the MMU so if you don't have a cache-coherent interconnect you will take the page tables and you have bits for the caching attributes this is cacheable, can write speed buffered and then you can un-set these bits to make pages un-cached and if you have direct un-cached access to a region of memory on the CPU and you have the same access un-cached but at the device side you are basically coherent because there are no caches to keep coherent between them and so that's the way how you would implement coherent DMA mappings then you have streaming DMA mappings and these you can implement using cache maintenance operations so once the CPU is done with a memory buffer and it wants to pass it to the hardware pass hand off the ownership what does it do? It cleans it caches so all data is flushed out to main memory so when the device accesses main memory it will find the data the CPU meant to send to it and then when it's done with the buffer the CPU will need to invalidate its own caches before reclaiming ownership because in the meantime there might have been some speculation going on so you have stale data in the caches and then once the device has said okay I have written to main memory you can just invalidate, drop your caches and then read the new data and there is a whole zoo of barriers on arm so you can write optimized code depending on what sort of on how strong your ordering should be how does that look like on risk 5 so there are specifications you can search for coherent coherency in them and you won't find much but you will find what I have quoted here in risk 5 platforms the use of hardware in coherent regions is the scourge due to your software complexity, performance and energy impacts how does that translate to practice here are five different single board computers they range over an order of magnitude from a hundred dollars to a thousand dollars and if you were to venture a guess which ones of those decided in favor of hardware complexity to save on software complexity you will probably guess right everything was also five hundred dollars decided for cash coherency and the one hundred dollars and one hundred fifty dollar boards decided on cash incoherency so you have actual hardware there that didn't quite follow this discouragements in the specification and if you want to write software supporting it you will need to take manual cash maintenance in your hands because the hardware doesn't do it for you let's look at these in series so we have the all win at U1 it's based around an SLN CPU from Alibaba and it's designed with cash incoherency in mind that means the DMR masters are not cash coherent and no DMR masters at all are cash coherent but the page tables have cash attribute bits and you have instructions for cash maintenance so it looks like you would have an arm but it serves a very big difference in that these instructions and these bits are not reflected in any specification actually these bits are even reserved for official use but not for vendor use a vendor adhering to the specification should trap if you set these bits to anything else and zero but they are repurposed for cash attribute bits same goes for the vendor specific instructions they are vendor specific while on ARM if you have an ARM version 8a you are guaranteed to have these instructions available and these bits in your page tables so you can write common codes like you do with a multi-platform Linux kernel that uses the same page table structure for many different resources unfortunately you can't do that the Linux support for RISC-5 supports multiple boards also with device trees like you would expect on ARM but it can't yet support something like the all-in-a-d1 or the GH7100 in the next slide because these are not because these do not adhere to some common scheme but if you are willing to add an abstraction layer in the midst you can have your coherent DMA by using the MMU to map pages on cache and you are streaming DMA by using these dedicated vendor specific cache maintenance instructions for the Star-5 GH700 which was in the Beagle 5 beta board for which I ported Bearbox 2 this one has a C5U74 CPU multiple of them and these CPUs are designed for cache coherent systems but the MMC and Ethernet controller on this SOC were not cache coherent but how they approached this was different from the all-in-a-d1 they followed a MIPS-like scheme in that they have two different SDRAM aliases they had one alias for cached SDRAM access and one for uncached SDRAM access so the CPU depending on whether it needs cache or uncached access it would either add or subtract given the offset between these physical addresses what complicates this a bit was that all cache-incoherent DMA masters are 32-bit which is a bit unfortunate because the uncached mapping is above 32-bit and you have no ILO MMU to map this around so what can you do here? what I decided on was because I just wanted my network controller to work and there was an SDRAM that was quite small a few dozens of kilobytes but yeah that was enough for me to do network communication so I could just network boot another Bearbox and so have faster cycles of development so I decided to have a coherent DMA pool in that small memory and for later I intended on reading up on page tables and then a new support on RISC 5 so I can teach it proper coherent allocation support well, I made a version 1 I sent it out I wrote a bit about it on forum and the Beagle 5 forum and I was promptly corrected there is no support for this in the page tables there is supporting the MMU 1 change set but there is a way around that in that you just accept that the view of the hardware is fundamentally different than the view of the CPU so the hardware can only access 32-bits so it can access that cached alias but because these caches are only between the CPU and the main memory even if you access the cached alias from the device side you will still sidestep the caches and your access will be uncached so you just need to keep around two addresses one address via the uncached alias which you use on CPU side and once you tell the you need to reference that address when talking with for example the Ethernet controller you use the other alias which if you referenced it from CPU side would be cached but when the device does it it will be uncached and that way you can declare your arched specific coherent DMA pool on top and the Linux APIs for the DMA already supports that barebox imports them so as long as your drivers don't do stuff like taking the void pointer and casting it to an unsigned long and passing it to the hardware you are fine just use the CPU pointer for CPU operations and the device pointer for device operations and for streaming DMAs a two cache controller had a register just for flashing so you could just use that so now we have seen these two hardware and they have very different ways of dealing with this cache incoherency so you will want to abstract this away somehow and taking a page out of the x86 playbook that means moving into firmware which brings us to privilege modes so the normal mode we execute in after power on reset is machine mode we have uninterruptible access to all hardware and we run directly on physical memory after that there is a hypervisor mode unfortunately that one is not yet ratified, it's not yet frozen as there was recently an LWM article on that our KVM patches and specification has been in the finalizing stage for a couple for quite some time now but well it's not yet frozen and as such the RISC-5 maintainers are a bit hesitant to accept that code yet sensor is a supervisor mode where you have virtual memory available where your operating system would run and then you have the user mode for user space application and any CPU would mix these modes so if you have a simple embedded system with no memory protection you would just have M mode if you have an embedded system but user tasks should be somehow isolated with a memory protection unit you would mix M plus U and if you have a full-fledged Unix-like operation system with virtual memory you would have M and S and U and the way this is designed allows RISC-5 to be classically virtualizable so all instructions that are sensitive to the mode you are running on would trap to elevated privilege mode that means a hardware implementer could get away with just implementing the exception handling CSRs and hardware and then emulate everything in the trap handling and by the same means you could even run virtual machines so S mode could just keep executing code and once a sensitive instruction is hit it would get a trap it would emulate it for example by translating the shadow page tables and then continue execution normally that works but it's quite slow because shadow page tables add a lot of overheads thus the need for hardware visualization support here is an example how that could look like so you have in Bearbox this pre-boot loader that contains some initialization code and it contains a device tree that describes the hardware and then you have this Bearbox proper which is usually a compressed binary which you can run just like a multi-platform Linux kernel on a huge number of SOCs for Bearbox a couple of RISC-5 SOCs are supported some of them are in S mode some in M mode and some of them don't support fence I but you need fence I once you have instruction caches because you need to invalidate this instruction cache for example you have just uncompressed the Bearbox now you want to execute it but you can't be sure what was in the instruction cache before you uncompressed Bearbox so you will need to invalidate the instruction cache instead of having two Bearboxes one for architectures for CPUs without fence I and one for CPUs with fence I you can just have a trap handler that supports both so if fence I works everything is well and if fence I doesn't work a trap is raised and in the trap handler you can make use of this very regular scheme of RISC-5 instruction to find out which instruction was the illegal one and if it's a fence I could just skip over this instruction because if you have no instruction cache fence I can just be emulated as no operation so let's revisit a bit why we talked about privilege modes it was because we could use it to offload functions into the firmware there is an interface for that and that's a supervisor binary interface that's a standard for explicitly trapping to firmware so that it can do functions for you this allows handling platform quirks and offers functions like intra-process inter-process interrupts hard state management system reset this sounds like RPSCI and indeed it has some overlap in general you could say SPI is to RISC-5 what the secure monitor called calling convention is for ARM and you have S mode trapping to M mode which would be exception level 1 trapping to exception level 3 on ARM and the reference implementation for RISC-5 is open SPI and for ARM it's ARM trusted firmware A and like you have these inter-processor interrupt functions you could also have functions for flushing DMA so you don't have to encode these vendor-specific stuff into the Linux kernel you could just ask the firmware for it and once there is a specification that's frozen you could use that in Linux some people are quite wary of that to have this layer of firmware that's the operating system it's just supposed to accept and that you as a user might not be able to change and they are right in that just having an open instruction set architecture doesn't mean you have an open CPU and certainly doesn't mean that SOC that results from this will have open gateway or have open documentation and experience shows that maintaining a product with severely under-documented hardware is very challenging and it's often not worth the effort in retrospect so when you are starting with a new project or you are a hobbyist wanting to invest your time into RISC-5 you should look at it like this these RISC-5 hardware vendors they benefit terribly very much from the free and open source software ecosystem that has sprawled around RISC-5 some of them take active part in this and these should be rewarded but the ones that don't provide documentation for example to recreate the firmware should probably stay clear of those because they might not be worth the time or the investment if you try to build products using this what drove this home to me was the GH700 on the Beagle 5 Beta so for the beta developers there was a reference manual of 150 pages available which is not a lot so you would expect something of that functionality level to be like 4,000, 5,000 pages of reference manual and among the stuff missing was the clock tree and the clock tree is quite fundamental if you want to save power or if you just started you want to start the clocks for example so you can use the network interface the vendor provided a downstream fork of Uboot, of SBI, of Linux Linux common clock framework does a very good job at reflecting such intricate clock trees and then you could run Linux and dump your clock tree and then learn about the hardware but clock tree and Linux didn't say much this was just some fixed clocks looking into Uboot you find 10,000 of line of macrosoups it was directly generated from hardware description and then you have a thousand lines of initialization code which starts with just toggle set reset enable set clock then does this then reparent this and so on and you don't really know which ones you need which ones you don't here is an example this is a MOOCs you see you have these defines set reparent the clocks and looking long enough at this code and at the structure of the bits are set and that are removed you will understand that okay so these bits are for gating these bits are for the divider these bits are for choosing the MOOCs and looking into the code you might glean what are the possible divider values or which clocks are inherited from which clocks how you could construct such a clock tree so in the end I wrote a script that pass it out and generated common clock framework code that looks quite a lot like normal clock framework code in barebox or in linux someone even ported it to linux which I liked very much of course there were missing parts and they had to comment well we will fill this out once we have documentation and that was my expectation okay we are beta developer we don't have yet the stuff available but come the end of beta we will get these data but the Beagle 5 foundation canceled the GH7100 based Beagle 5 as a Beagle foundation and well I am not sure if the vendor will follow up on the promise to provide documentation now now that the pressure went away as they provided an Excel table with some information about the clock tree not nearly enough and with a very big disclaimer that you may not use this for distribution for reproduction copying using making derived works so it goes completely out of the window using that for open source software and in retrospect I wouldn't do that again so expect your vendor to document the hardware if you really want to use it in a product or you want to invest your own time in it so the most of what we learned during this risk 5 trip me and Ruben went into the bearbox part bearbox it already supports a risk 5 before but we made it more similar to the ARM platform which is the one that's used most so if you want to check it out go to bearbox.org you will see the documentation reference and the git repository it has a very Linux like structure so you have arch risk 5 for all the either support you have a drivers directory for the drivers some of them are used for ARM and for risk 5 and our framework that enables this to happen possibly development is done via a mailing list and there is also an IRC channel where you can in real time chat with other people there is a matrix bridge so you could also do it out of your web browser and what personally got me involved in risk 5 for bearbox was because I am using bearbox not only as a bootloader but also as some sort of bare metal hardware bring up toolkit and that works quite nicely because you have this Unix like shell with device files and you can copy from one device file to another you have and POSIX like API with file descriptors and so on which makes it quite nice to directly interact with hardware to test things out or even just to prepare something before booting Linux and I wanted to give people a way to try this out without now compiling bearbox or flashing it on a board so I wanted to use tiny Emo which is a risk 5 virtual machines that can be combined to web assembly and run in the browser and yes that was how I got into risk 5 development so there is JS bearbox you can just click on the link and it will open an emulated bearbox for you and in case you are wondering yes it does run Doom there is a button for that if you click it it will open an HTML canvas element where it will just by the magic of simple frame buffer VIT IO and the VIT machine for risk 5 start playing a Doom that was linked against bearbox so that concludes my talk I thank you very much for listening and I will be available here for questions