 So Our next speaker is a Rick lander bar, and it will talk about programming a reconfigurable device. Let's upload it Yeah, thank you So I'm gonna have a short talk about using reprogrammable devices or reconfigurable devices namely FPG ace within linux space systems We use the recently mainline ported FPGA framework That has been written by Alan tile that is working for Intel and I think last year on the first time there was already a speech about that a talk that was a longer one But sadly that is lost so the video is not present anymore So why we came across this this topic is we implemented a system that Accelerates cryptography engines by That's the traditional one the software driver running on the CPU and we move that into an FPGA here and That's basically what we did we needed some More subsystems. That's the hardware driver that talks to the AES hardware engine via DMA And that actually integrates that into the crypto f crypto fp a crypto API within the linux corner that leverages the Accelerator the hardware functionality and exposes it to all system Partions that want to to use it so that could be user space tools the same as Colonel space tools because all of them can talk to the crypto API and This way we have a very flexible interface to our new hardware then as cryptography algorithms advance and the systems change you also want to be able to change the Baseline cryptography algorithm. That's why we use a reconfigurable system So we can swap out the old algorithm and swap in a new algorithm or move more of them whatever and For doing that we use the linux kernel FPGA framework and that's shown here we have the The internals of this framework shown here and the last slightly darker blue than the other one and that handles all the FPGA specific parts and The user actually Influences what happens via device tree overlays. I come back to that in the next slide and You obviously need some drivers that interact with the physical device and that's those ones here that are Specific to whatever FPGA device you use in our case, this was a Multiplayer system on chip Type of platform so we have those CPUs actually was to and the FPGA pie apart on one single chip but Other configurations are just working the same way So you have the FPGA region that Represents a physical region within the reconfigurable device Then this one is configured from by the user here as mentioned via the device tree overlays The FPGA manager over here Managers the the Association of which firmware or in this case bitstream actually is loaded and that's also leveraging the standard Linux interface structure for firmware loadings. That's the same device. That's a system That's also used by USB systems for example then the FPGA bridge part decouples every device specific things and uses the Decouples and the configuration access port the spot actually loads the firmware into the FPGA a config configuration memory and the decoupler is associated with those little devices and During a reconfiguration process That's Decouples what is in the region from what is within in the outside of the region because the behavior of Logic within the region isn't specified during the reconfiguration process Yeah, okay So how do we actually trigger a Reconfiguration and how to separate what we want to specify what goes in there care parameters platform drivers or the bitstream itself that's done via the device tree overlay so every region that is within the FPGA has a stuck present representation in the overall system device tree and By loading this overlay onto this addresses the actual process of reconfiguration is triggered so We put the device tree overlay loaded into the currently present device tree that triggers the reconfiguration process which is shown in the next slide and The AES bit stream is configured into the FPGA and afterwards the driver is loaded and the system is fully operational and can use the new hardware so that's the the process of Reconfiguration we have the configuration part and the deactivation I would say and It's multiple of steps The coloring I come back to that later on so we start with loading the device tree overlay as already presented and this triggers the bit stream loading part interacting with the firmware subsystem and When the firmware is available The devices are decoupled so to make the FPGA ready for reconfiguration and do not disturbing any other parts in the FPGA that's that are keep on working and and then the actually Done reconfiguration is executed that uses the program and configuration access port and Loads the bit stream into the FPGA, which is actually one of the steps that Contributes a larger amount of time to the overall process Afterwards of this configuration is completed the Region is coupled again so that's from this point on the hardware can be accessed that has just been loaded into the FPGA and After that the change that has just been done to the hardware Needs to be reflected into the device tree so that all other systems now know and are aware about this new hardware So this is the application of the device tree overlay After that all the other subsystems that might be Involved and might use this new hardware now gets Triggered so the driver is loaded that is specific to this new hardware and All the subsystems are initialized like in our case the support for the crypto API Then the system is ready it can be used That's this cycle. That's usually people are interested in But as you have reconfigurable systems, you all you also want to know how fast can I reconfigure a system? How long is my let's say dead time in case I cannot actually use the resources? I have within the system because they are bound within this process here and that's why we started to have a closer look at it which steps are executed and which take how long and That's coming later on then so let's say the the execution phase ends at some time you have encrypted your file or whatever you have been done and Then you go on you want to reuse this region you want to put another algorithm into the device Your the on the the platform driver is unloaded the DTO the device you overlay is unremove from the currently present device tree and The region is decoupled again from the system so after this step we are ready again and The the complete process can start all over again Let's come to some of the results that we obtained during measuring the system So as there are a couple of steps involved with drivers from the chip vendor and also some more subsystems within the kernel We used F trace to Gather information about when functions were entered and exit during the overall process Then originally the intention was to show or to see what what the performance of our accelerator was but then we just covered about this interesting process and Now we present this this results because they are more general interesting than our specific ones and Yeah, we used F trace for this our bit stream size or so the the size of our accelerator was almost six megabits megabyte that have been loaded into the FPGA and this overall process also all of that were take it took about 135 milliseconds and So you see if you want to change the encryption algorithm for your hardware accelerator to I don't know Encrypt one email and then another one for another email you'll see that Probably doesn't lead to an very efficient system because you spend more time reconfiguring it than actually computing an interesting part was to to see that the the second Largest contributor to the overall time is actually loading the bit stream and That doesn't involve the FPGA anyway, so it's just fetching the bit stream That can be Accelerated, but that's let's come back to that later So that was what we originally also thought about was the main common contributor That's the actual configuration process itself that's the time needed to transport the bit stream from the run to the configuration memory on the FPGA through various interconnect connectivity interfaces on on the on the chip and the third largest Contributor is the driver itself. So that's also interesting as This is not a part of the framework. So anybody who implements Drivers for reconfigurable systems should think about the setup times for them the good news is Only 1% of the overall time is spent within the FPGA framework on everything else that we haven't covered in this slide so the framework itself seems to be quite well implemented with less overhead and shows good performance and That also leads to the next Slide as we see performance bottlenecks, especially with respect to the FPGA reconfiguration interface That's nothing we can change about as the FPGA when I have to support that also Some issues within the fabric Rootcourses for that so that communication with the reconfiguration access port is actually a bit slower than it should be The firmware caching itself has not yet been used in the system. So that's that could that can be implemented within the FPGA framework in the kernel That currently does not leverage the the caching mechanism, which is already built in to the firmware subsystem Yeah, additional components that have traditionally built Into reconfigurable system supports Schedulers and governors or Whatever other things are needed to make an efficient use of the hardware that has just been enabled that's Hopefully intentionally left out of the scope of the FPGA manager as that's very use case specific and We can implement it in user space anyway Yeah using device to over days uses interfaces that are more or less stable currently it has they has been developed for Supporting shields and modular embedded systems like the raspberry pies out there and We just reuse them for reconfigurable systems and that works quite well for now Which would be an interesting part is getting Automatically automatically generated Device to overlay from the vendor tools as they usually do generate general drivers for Hardware parts that have been created some of the mappings like addresses or Offsets and such things are known to those tools so that could actually help why with using those approach and to get to our last site it is a good reconfigurable system and We can use it. It's efficiently supports using reconfigurable systems and the reconfiguration times are quite slow fast, but the overhead is low and We can efficiently develop heterogeneous systems by them Because of we don't need rebuts if you change our hardware. This is a traditional problem So you get fast test cycles you have a Compatibility layer for static and reconfigurable systems. So that's not Especially needed to reconfigure, but it can also be used to static systems Let's conclude That's it. Thank you