 Let's get started Today I'm going to talk about introduction to suck those FPGA solution. My name is Marek Vashut And first of all, I would like to introduce myself. I work as a contractor For a couple of companies, but mostly for dank software engineering For the most part of my day job, I do a Linux kernel engineering you would bootload their work, open and beated work and I'm a maintainer in some way or the other in these projects. I also do FPGA work, but While I do it at my work, I don't do it As a professional FPGA designer, so that's pretty much about it about me now About this talk. I'll structure it in this six Parts. First of all, I would like to introduce you to what the SOC is, what the FPGAs are What the combination is, how it looks like What an FPGA is and so on so you get like the the basic knowledge in and then I would like to go through all the available suck FPGA solutions show you the small ones the big ones and Finally, how do you get the big ones running with Linux the right way? So Let's get right to the first part What is an SOC? So I guess since we are at the embedded Linux conference Most of you knows what an SOC is and system on chip That would be some sort of CPU core Today mostly an ARM core, but they can there can be like a MIPS risk 5 whatever With some peripherals that's put on a single piece of silicon from which you have some pads You can put it on your on your board and run some code on the on the CPU core communicate with the outside world And a standard thing, right? An FPGA, this is a little bit more complicated. That's a programmable logic solution. So it's again To simplify it. It's a chip which again has a lot of IOs, but But you can put some sort of user-defined logic function into the programmable logic So the chip will do something which you want you can define that on your own Think of for example a UART You know you can program UART into an FPGA. It will behave like an UART that sort of thing You can obviously put more complex stuff in there. I'll talk about that in a bit more detail shortly Now if you combine an SOC and an FPGA together Then you get benefit of both So basically you have a single piece of silicon on which you have a CPU core like a hardware CPU core and A programmable piece of logic into which you can put whatever you need So let's talk a little bit more about the FPGAs So I said it's programmable logic. You can put whatever you want into it But what should you really envision under the term programmable logic? Well Think of a device which is reasonably high speed. We are talking like hundreds of megahertz On the logic side. It has plenty of IOs like Hundreds maybe thousands of IO. This is what we are talking about. It's extremely parallel because well you have like these Hundreds of thousands of blocks which you can program in the FPGA and chain them together somehow So like everything happens in parallel in the FPGA and it's extremely useful for stuff like You know paralyzed workloads think video processing You get some video data from let's see some Video encoder you just push it into the FPGA. You can have some sort of filter in the FPGA So you pipe them through the FPGA the filter is applied you capture them on the other side or Yeah crypto for example, you know you push a lot of data into the FPGA the crypto happens in parallel You capture them on the other side that sort of thing But you can do pretty much anything with the FPGA is asic prototyping is another thing you can synthesize custom Hardware blocks into the FPGA, you know that sort of stuff if you need like bazillion new arts. Well Yeah, knock yourself out just put them in the FPGA and you get bazillion new arts There are multiple vendors of the FPGA so there's a lot of stuff to choose from silence all there are the big ones Then there are a lot of specialized smaller players like latest micro semi Cypress and so on Now if you look into the FPGA in a little bit more detail The FPGA wonders will tell you like it's super difficult technology and so on it actually is not so This on the left is a schematic of an FPGA. It's kind of simplified But that's pretty much all there is in the FPGA. It's not that difficult So as you can see like these are the IO pads This is what is actually physically coming out of the chip and there are some IO adaptation units. So like you can Get like differential pairs out of the FPGA that happens on the IO Block sides you can get multiple different voltages on the IO pads of the FPGA is also done by the IO blocks But ultimately that is connected into this blue all-encompassing Goo, which is called the global interconnect and it connects everything in the FPGA together And it can be reprogrammed so you can wire it pretty much in any way you want. It's like this massive big patch board now The other thing which is connected into the global interconnect is this red Blobs which are the places where you synthesize your logic actually These are called the logic array blocks in Altera parlance in silence. I believe it is CLB's and This is actually where you define the logic functions now if you have these blocks where you can define your logic function and like combine them together by Programming the global interconnect. They can effectively Assemble together any sort of logical function. So that's how the FPGA works internally Actually, if we look a little bit more in detail on this logic array block It's actually a little bit more complicated, but not that much It's assembled from multiple logic elements, which are the smallest building blocks You can see it here and these logic elements are actually connected together by local interconnect But that's like just a small optimization so that You avoid signal propagation delays If you go all the way down You reach the logic element which looks internally like that So it's just to look up table with multiple inputs and optionally a register Which allows you to assemble both combinatorial and sequential logic and Ultimately, if you chain these together in some way, which you need it you can assemble any sort of digital logic block from that be it simple thing as you are Be it USB 3 controller or whatever. So that's how the FPGA is work. Now Why would you want to have a sock with the FPGA on your on your like board, right? So You can look at it in two ways one way is You need something special like you need a CPU which has like crazy amount of Whatever you are, it's right So no one will make you such a CPU and if you have decided, okay, I want an ASIC It will be crazy expensive and if it's a small run It's more cost efficient to just put in a sock of PGA put the CPU there with the So with the FPGA and just synthesize the you are it's into the FPGA, right? the other thing is Why don't you put the CPU into the FPGA? That's looking at it the other way around well, the reason for that is If you put a CPU into the FPGA it will cost a tremendous amount of resources in the FPGA and I mentioned that The FPGA fabric is running in hundreds of megahertz range But if you put something complex in there the speed of the fabric kind of decreases Because you need to handle signal propagation delays through all the elements in the FPGA So ultimately the CPU will waste a lot of your FPGA resources and be slow So that's not great either But if you just need some specialized hardware to soak FPGA is kind of a nice compromise now Are there any questions to this intro sort of part for so go FPGA? Is there anything no good? So, okay, let's get to the second part what's available actually the entire landscape is covered so from the like super small devices all the way to the biggest ones with like our Cortex a 53 it's the big ones the I'm 64 So first of all I would like to go through the Cypress devices which are like super small. They cannot actually run Linux, but they are pretty interesting in my opinion Originally this came from 8051 with some analog blocks the background story is that these devices were used in like smoke detectors and Every smoke that across kind of specific in its own way and Like designing the analog circuitry over and over again was kind of Boring, right? So what they came up with is let's put a small CPU core into there And then programmable analog mesh which can be configured in some way or the other and like the blocks could be chained together and tweaked So that the vendors of these smoke detectors could just buy one chip and then just load the programmable part of it Now this was the beginning, but right now they have like a newer bigger parts with arm Cortex M And they grew optional digital blocks blue to the le and so on it can run our to us And it's pretty interesting in my opinion and you can get a kit for like 10 bucks somewhere so the downside is the tool is Windows only to program these but in fact the tool only generates like all the register block which you need to Program into that the Well, the PSOG and then the programmable logic is loaded and then you can do whatever you want. So Technically if you export this block into your art you as There is no problem basically plus. There is a project called the PSOG tools Which is working on mapping this programmable block. So there will be an open source tool Well, there is kind of open source tool available work in progress If you're interested in these small things definitely check the PSOG tools. It might actually already be able to do whatever you want Back to the proprietary tool, it's basically like a schematic entry for the programmable part to make it easy for people So this is how it looks This is a thermometer actually. So here are some Really schematic entry blocks once you are done with that you just click compile It spits out some main C and bazillion other C files. So you can call like convenience functions. Yeah, that's sort of thing But again, you can also like pull out the register programming From this tool put it into your own RTOS. There are BSPs available for free RTOS UCOS keel, I believe as well just put it in there then get into the RTOS main function and then do whatever you want now Let's move on to another one, which is micro semi-smart fusion 2 This one is actually a little better in that it's still Cortex-M, but it has a DDR memory so it can run Linux The footnote is the Linux kernel port is like ancient vendor kernel Well, I don't know something ancient and still use the Linux so not super amazing and yeah Too bad, but it's also mostly targeted at running RTOS solutions. You can get a kit for like 125 bucks I actually tried getting this going I installed their development tool Which is called libero and here is how to how to get it installed It's kind of complicated. So you might want to read it once I'm done with the presentation anyways But this is not all of it actually once you get through all this annoying Stuff to get the libero installed you then need to search the internet to find all these magic incantations and magic variables which you need to specify in your Which you need to export in your shell To even allow the libero to launch and if you don't have all of them actually running and Export it then libero will kind of start, but it will fail at random places so yeah, there is a lot of hassle with that and Ultimately, I wanted to show you how the Ubud and use a Linux works, but Even flashing the demo image didn't get me a serial console. So I Don't know maybe I'm getting something wrong Yeah, and obviously no upstream support for this at all it would be real nice if someone actually bought that kid and got it upstream that would be super nice and Actually, I would be super happy to help you out getting it upstream into Ubud Maybe also Linux if I can help there So if you have any interest that would be amazing So that's pretty much it for the Cortex-M ones for the small ones now Let's go to the Linux ones. Is there any are there any questions for the Cortex-M ones? Is there anything? No Okay So getting to the Cortex-A ones the alter also give PGA is like the first one I have in the list just because it's starting from a It's Cortex-A 9 so it's a little bit older core, but it's arm Cortex-A They have it in both UMP and S&P configuration with the standard peripherals on the SOC side like CAN, SPI DDR-DRAM this sort of usual stuff Altera has an upcoming Stratix 10 which will be arm64 This stuff runs the usual stacks of Ubud with Linux. There are RQS offerings for this But that's not the main target now. There is another kind of interesting quirk for the Altera It is capable of running in this AMP configuration, which means you can run like Linux on one core and RQS on the other So in case you need like some special Real-time hard real-time control you can run it on one core and Linux will be doing I don't know like some sort of UI thing on the other core So if you delve into the Altera you will definitely run into the Quartus design tool Now I believe it's called Intel FPGA tools at this point It's proprietary, but unlike the previous design tools it runs fine on Linux for a certain definitions of fine It doesn't crash out of the box and if you install it it actually starts and does its thing So it's not that terrible If you're interested in something open-source in terms of Altera there is project Typhoon So you might want to look it up or talk to me about that But once you get through the Quartus You will obviously want to boot your device and I would like to talk about that on Altera a little So you have the obvious options U-boot vendor U-boot or mainline U-boot on Altera I just go for mainline U-boot unless you have aria 10 Where this is kind of work in progress and Stratix 10. This is also submitted So all the generation 5 stuff that support in mainline U-boot The FPGA loading works everything works If there is a bug in mainline U-boot is actually a bug. It's not a missing feature Altera has some U-boot, but it's like ancient and it's just not worth Even looking at it There is another thing if you're a super hostile to GPL there is a bootloader called MPL It's BSD licensed that basically loads a binary into RAM and executes it And it like super sucks because all the bugs which are fixed in this Altera vendor U-boot and all the other bugs which are actually fixed in mainline are still there So sometimes it fails to calibrate RAM and this sort of thing Whoa, what happened is that okay? Well, yeah, it seems to work. I was just like getting ready to deliver the finishing blow for this bootloader and Completely lost my traction here. Yeah, so this one sucks just Don't use it Yeah, okay, so now let's get back to the Altera Linux kernel support situation is kind of the same But Altera is doing the good thing that they're kind of tracking mainline So their vendor kernel releases are kind of close to mainline and there's like a couple of patches on top of it Then again, you can use mainline and there is not that much functionality missing in mainline What is missing for the most part on the sock side? It's all there and I don't think anything is missing on The FPGA side configure this DTO overlays definitely missing The FPGA manager is already in mainline. I didn't update my slides properly. So this is in mainline Yeah, and I believe it's just the DTO support for like loading the FPGA and then binding the drivers to what's in the FPGA now Actually, there will be a device 3 overlay buff now today at Six is it at six? So if you're interested in like loading FPGA is with DTOs and DTOs in general come to the buff Definitely gonna be interesting So I'll get to that in a bit actually once I'm done with the thing I'll get to loading you put into that Yeah, okay, let me get through this ink Now Signing says two offerings Cortex a nine and that's the thing seven thousand and the new one as ink MP. That's our V8 a 53 it's again the same thing on the sock side you get SDMMC SPI nor DDR can the usual stuff Actually signings decided that with Zing MP they will put in more interesting stuff like multimedia stuff That's where they put in like a VPU and a GPU Except the problem is they put in our Mali 400 GPU, which is like ancient and It kind of sucks because there is no open-source driver unlike for example. I mix six which has the ad na'vi which is amazing So the Mali that's basically just blobs and it's like ancient blobs GBM support is missing if you want to run like A modern Linux 3d graphics stack just forget it. You're basically stuck with either x11 or battling the blobs the upside is that Recently there has been a new activity in the lima driver And it is now possible to actually use it on Zing MP and to such an extent that it can do off-screen rendering So if you are interested in that part, you should definitely check the limit driver There is a guy from China actually writing a new shader compiler for the Mali 400 and it's super exciting. So This this is really making me happy to see that And with a little patch it can work on on the Zing. So that's great Now except for the GPU the stack is usually the same. So you boot Linux I think really that interesting RTO sports again exist The Zing MP has the perk that it has Cortex R5. I believe so if you need like an RTOS capable core use that one Otherwise, let's let's move on to the software support in case of Xilinx Vivado again It's kind of on par with the Altera stuff. So for the Xilinx Vivado, it's again proprietary. It's big But it kind of works Right now there is also open-source solution for the Zing 7000 in the works So like they are analyzing the bitstream format and work in progress It's actually done by the same guy who did the The ice storm project So if you're interested in that look around the ice storm project look for his new Zing stuff now With Zing you have two options again with the bootloader again One of them is you boot if you have like a Zing 7000 use mainline you boots. It's just no brainer It's support is there and just works On the Zing MP things are a little more complicated because this is a new chip and the upstream support is still work in progress So it's like going into mainline, but Your mileage may worry a little If you think mainline has all you need for the Zing MP then just use mainline If not, there is this combination of FSBL plus you boot, which is what Xilinx recommends So they have their own patch to you boot plus the FSBL which is like a preloader Which in it's the chip loads the FPGA loads power management unit and then starts you boot basically So if you're missing something from mainline, which is kind of critical and you cannot really use mainline you boot That's what you will have to use but this is only Zing MP sort of thing As for the Linux support, yeah, it's again comparable to Altera pretty much on the Zing 7000 Most of the IP blocks are both supported in mainline already in like recent 4.x 4.1x For the Zing MP that's kind of coming in now as we speak. It's just being fed into the mainline What is again missing is the ConfigFS support for loading the DTOs. Actually the FPGA manager is also in mainline both for the Zing and Zing MP The vendor kernel, well, there is like a stack of 600 patches on the Xilinx You can probably cut it down to like 200 if you throw away everything which you don't need That's kind of the state of the Xilinx vendor kernel on top of 4.9 Yeah, to answer your question. There we go, right? So how to get these boards booting kind of comparative analysis to get the you boots working on both of these SOGA FPGAs On Altera you start quarters. Just compile your project. It allows you to then run this BSP editor tool Which will generate you some header files use a QTS filter script which is from mainline you boot There we go So this QTS filter will take these files generated by the quarters BSP editor Make them a little civilized so you can put them into the upstream you boot source tree. You put it into your like your board slash Vendor slash board slash QTS Then you just add SOGA FPG AC and make file which you can copy from another board because there is nothing there Everything else is controlled by device tree you put in your device tree in your board config and then type like make foo deaf config make It will generate this sort of SFP file You take the SFP file and then like write it either in your SPI flash or put it into Some specific offset on an SD card, which I don't remember it's in the documentation for you But so just check the read me And then you like flick the board on and it starts everything works obviously because it's mainline Then you use the FPGA command to load FPGA if that's your thing But I would like advise against loading FPGA and you boot if you don't have to just use the FPGA manager in Linux. That's a better approach to that On the thing it's quite similar fire up the design tool with auto compile your design click Export hardware you get the HDF file out of it Yeah, the HDF files actually secretly is a file, so we just like type unzip foo HDF It's just Expands into a couple of files Depending on which thing you have you get either PSU or ps7 init files. Just again copy them into your board slash whatever Copy and make file and couple of other missing files add config type make def config make You get a boot bin file This one again install either and do a fat the fat partition on SD card That's a bit of a limitation. I believe of the boot room, but this is something you would have to check with Xilinx or just put it on the Beginning of SPI flash flick the board on it boots again use standard FPGA command to load the FPGA So does that answer your question? right Yeah, right. So you mean like a thousand SD cards, right? So, yeah, there will be definitely a so the question is We have to manufacture like a thousand boards where you put the initially you boot on those boards Yeah, well if you have a script which programs your SPI flash in the manufacturing then you just take this As a P file just put in this SPI flash. That's it. Yeah, actually with the altera tools. You said your use area, right? So with the altera tools there is there's something which allows you to program the QSPI flash directly. So you say like this foo bar to Program QSPI you need the JTEC port. Yeah, you actually need the blaster to for that But if you're like Manufacturing it then you probably want some sort of like a toaster sort of device where you attach to the SPI flash It's the same. Yeah Thanks Yeah, right. So I can move on from this one to the vendor kind of FPGA loading horrors. So Thing is you want to reload the FPGA in the kernel, right? So the vendors came up with these interfaces like How do we do that? Well, let's create a depth interface into which you like you cut the bit stream in there and it programs the FPGA and then Let the user control the bridges between the FPGA and the SOC Problem is You kind of bind drivers to the stuff that's in the FPGA and then you accidentally reload the FPGA what happens there? Well It's game over. It's like done. So this doesn't work unless you have like a super especially strictly control the users Which you don't So there is a better way to do it and that's to use the device to overlays and again now Frank will probably bash me about that So device three overlays is a way to patch the device through which you load into your kernel to describe the hardware It's you can patch it at runtime with that And the idea is that you just describe the additional Part of the hardware which you are adding into the kernel just compile it like a usual device three loaded into the kernel and Something happens the kernel just recognizes the new hardware binds drivers and so on how does that look? This entire process is actually super simple. There's an example there So basically this is using the out-of-three config of us loader Which is not gonna happen in mainline for a while But well, it's kind of one of the only reasonable options now ish Right, yeah, so the demo is that basically you create this sort of my DTO director in the config of his device three overlays Compile your overlay it's just cat it into this DTBO file the kernel has hooks for that It's just loads the DTBO patches its own Device three and the new devices basically pop up. That's the gist of it. Anyway If you want to unload the DTO, we can do that just remove the directory the overlay direct I don't know if that can be actually seen, but he just do RM there on that directory and that's it. It's that simple cool Device three overlay source looks pretty much a lot like a device three just have this like a plug-in annotation at the beginning and Then you just describe the fragments which say okay a patch this Ethernet here What I want to add into this Ethernet node is that my phy mode is RGM II and I want to enable the Ethernet This other case is that I want to add an 1891 E from under an I square C switch. That's how it Kind of DTO looks like Now let's mix it up with the FPGA manager So FPGA manager is a new framework in the Linux kernel, which allows you to load the bit streams into the FPGA It allows you to toggle the bridges correctly And if you combine it with the with the DTOs you are able to actually say okay so I have this bit stream and It creates devices under these bridges and they need to be enabled and they are like mapped like this and that So that's that's what I'm going to show you now We actually have device three We actually have FPGA manager support for all the mainstream FPGA devices, which is alter Everything silence everything late as I see 40 actually is there as well with FBI interface and Yes, it supports partial reconfiguration. I haven't seen that used yet So how does it work? Okay? again, you describe what's in the FPGA in a DTO right to Compile the DTO you load it into the kernel So first thing that happens is that the FPGA manager is actually triggered and loads the FPGA with the matching bit stream Now the matching bit stream is fetched through the kernel Firmware interface so it has to be somewhere in lip firmware something something RBF or something something bit for Xilin The next thing is it enables the bridges between the SOC and the FPGA and Finally only after that is already it can start binding the drivers now There's a quirk when you remove the DTO in that The bridges are shut well the drivers are unbound first then the bridges are shut off and Ultimately the FPGA is not turned off Now the reason for that is that there can be something in the FPGA which you didn't describe in the device tree overlay Which can be super critical to the system and that can adjust Downclock the FPGA or unprogram it because otherwise it could I don't know kill the system or kill somebody So that's why the FPGA remains programmed and running even after you unload the DTO So what does an FPGA manager DTO looks like? Very similar to a regular DTO in this case. I am patching The bridge I create one FPGA area. So that's the partial reconfiguration thing in this case I have only one area so the entire FPGA is populated by a single bit stream. There we go in this case output file or RBF and In this example, I'm adding one single UART which is under this bridge, which is there we go. Yeah, this bridge Once I load this device tree overlay this new UART will just pop up as theft TTY as something That's the example now. I should have some sort of conclusion, but I just couldn't come up with what to put on this slide I have no idea. Well The main line support for all this dog FPGAs is amazing. Obviously, you should use it DTO support while it's coming so thank you for your attention and do you have any questions? Thank you Yeah, okay So Yeah, yeah, go ahead Yeah, go ahead. Yeah, sure. Yeah, so the question was how does it work with the loading sequence? So that the UBOOT starts running and then what happens then, right? Well, I'll just use the alter as an example Basically, yes UBOOT comes up Actually before UBOOT the CPU comes up, right? It has to start reading from like address zero and this is In modern CPUs, it's like a boot room in the CPU So there is like a piece of code and baked into the CPU which cannot be replaced Which is CPU starts executing from No, that's there's actually hard CPU. Yeah, it's Isaac. Yeah So that starts executing from its internal boot room. So to say Now the boot room Checks what the strapping of the CPU is and that takes the boot media. So in this case, let's say SPI flash It loads some piece of the SPI flash into its own internal memory like an S-frame or something from that it executes That's usually the UBOOT SPL That thing initializes like the DRAM basic pinmooxing clocking that sort of stuff and then loads the actual UBOOT from Again the boot media. It can be the same. It can be different depends on how we configure the SPL So the SPL is something which you can already replace. It's usually like 64k Again depends on the chip So once you have like the full UBOOT running It can again load whatever from Whatever boot media. Let's say the same boot media. You can load Linux kernel from there You can load device 3 you can load the FPGA bit stream whatever you want start Linux kernel and from there it As usual, yes, it's not that surprising except for sometimes you need to load the FPGA You can do it in UBOOT. You can do it in Linux It's preferred if you do it in Linux because then you have more control over This entire system. That's so much more flexible That's actually good question. So, yeah, there was a question if the Linux and the FPGA are communicating through a regular CPU bus I should have mentioned that. Thank you There are actually XE bridges on the Altera side on the Xilinx side. I believe there are also XE bridges, right? Yeah, so like standard bus. Yes right yeah, so the Common was that you can have also other like bridges in the FPGA like between XE and SPI and the sort of thing Yeah, so you can synthesize anything into the FPGA then. Thanks. Any other questions? Yeah, they're in the back. Yeah, so the question is about high availability Is it possible to like reboot this SOC without reloading the FPGA, right? Now with the device tree overlays it actually is not to my knowledge possible But what you can basically do is you can disable the DTOs and just say okay I load my FPGA in UBOOT and keep it loaded and That's it, right? So just make Linux not touch the FPGA at all and just assume there is hardware there And if you reboot the system, yeah, just check whether the FPGA is loaded in UBOOT. If not loaded if it is Just use it You will have to check what state the bridges are in So if you want to access the content in the FPGA, you will have to make sure the bridges are enabled And you can do that once after you load the FPGA and know that the content in the FPGA is valid Then you can enable the bridges and just use the stuff in the FPGA Thanks for the question Does it always? So there was a comment that Zynga actually nukes the content of the FPGA on reboot From someone in the in the audience Actually, we can discuss that after the talk if you want some more. That's an interesting question. Thanks Any more questions? Yeah, I don't know. It seems kind of died Yeah, so there was a comment from from friend mine He said that it's now called FPGA region not FPGA area because it just kind of changed Thanks Any more questions? Yeah Go ahead. Yes, there was a question about x86 and device 3s and the using DTOs and FPGAs So on x86 usually the FPGA is sitting on a PCI express and you can definitely use device 3 for that, right? I mean you can describe a PCI express device in a device 3 no problem It's that flexible. I didn't try it myself now Did you hear it back there? Otherwise just like come here we can discuss that because there is a guy who's digging in that stuff and It'd be great if we can discuss it together. So if there are no more questions. Thank you for your attention