 to that way too much because this would take another talk which would span multiple hours. There is a lot of research going on in the metal stability field, but in case you are entering the FPGA technology, you don't have to worry about that. The FPGA tools will take care of not doing anything very funny at least at the beginning. Alright, so another property of FPGA is implicit parallelism. So I was saying that the FPGA is field-programmable gate arrays, so arrayed of gates. The FPGA is a highly parallel chip. All of the logic elements are running in parallel, right? And in case you use the HDLs, I'll get to that soon, they look considerably similar to programming languages, but they don't behave that way. And this is where a lot of people who are entering the FPGA fields kind of start pulling their hair, it's getting really crazy. So in case you implemented this first snippet, first example in C for example, you would have two variables, right? And if your compiler was not optimizing at all, if it was just spitting a sampli, then what it would do? It would check the first variable, if it's 1, then it would do assignment, right? And then it would probably check the variable again and do the assignment again. If you observed what's going on there and tried it for both values of the variable, you would see that it's assigned a different time, this bar variable, in case it really wasn't optimized at all. Now, in case of the FPGA, this entire statement, both of the lines are executed in parallel, so the FPGA is testing both 1, 0 in parallel and then based on that doing the assignment. Because this is synthesized as an actual knot in the logic element. The other example is even better and even more confusing. Now, both of the assignments happens in parallel because they are synthesized in two different logic elements and in case you swap these two lines, it would have exactly no impact on the design. It would be synthesized into exactly the same thing because this just runs in parallel. Do you have any questions about this crazy stuff? Yes? One, yeah, I should be 0 and one, which one? I can't part of the symptoms sort of go with it, so can you read what it, what do you mean the apostrophe? Oh, I mean the apostrophe, this one. So this is kind of, yeah, the question is what does the apostrophe mean? So this is kind of like Vitalog, but not really Vitalog, which is one of the HDLs. I didn't want to get into the details that much in here. So the apostrophe means that the one says it's a vector which is wide exactly one bit or one signal. The apostrophe is just a delimiter and the B means it's binary. And in case we actually spoke about Vitalog then this is a blocking assignment here and this would be a register. So this would be a signal most likely and this would be a register which is then assigned a value. Any other questions? Because this is really important, the implicit parallelism of FPGAs is really important, you should keep that in mind. No, okay, so moving on, let's talk about FPGA tools. So FPGA tools are kind of in a sad state, the FPGA vendors are doing a lot of work in there in that area and they are trying to keep all these algorithms, they use closed source, so the tools are unfortunately closed source. Each vendor has their own specific tools. Also the bitstream format which is the programming file for the FPGA is unfortunately closed source. Now this is not entirely true, there are projects which are trying to open source this kind of stuff by means of reverse engineering but the FPGA vendors are not very happy about open sourcing their own algorithms for obvious reasons because they are spending billions on actually developing them and they don't want other people to use them. Now the standard vendor tools are actually available free of charge and they are not that limited so you can download them. It is really a big bulk of software and in case you are entering the FPGA field again with some sort of smaller FPGA then the standard free tools are not that big of a problem. The free tools are actually missing stuff like incremental compilation but that's only for FPGAs which are really big. In case you use a small design and small FPGA it takes minutes to compile so that's not a problem. You have all the verification and simulation tools there for the small FPGA, it's also not a problem. Let's see what is missing, yeah, that should be about it. You can implement your data cruncher and try it out, see how it works for you with the free tools, pretty much not a problem and in case you are getting a small FPGA we are talking like 100 bucks bulk arc so this is kind of fine. Once you install the tool, let's talk about the flow a little. So you generate your design, you pretty much write it, some sort of hardware design, you put it into the vendor tools. So what happens then when you want to generate the programming file? First of all you run analysis and synthesis, at that point the vendor tools parse the textual representation of your design and generate some sort of net list from that and then they check connectivity, they check if your net list is actually valid circuit or not and spit out pretty much some sort of high level net list. It contains gates and all but it's not possible to put this net list onto an actual physical FPGA, it's kind of something which you put on a virtual circuit. Now after that you do place and route and this is the first stage which knows for which FPGAs you are synthesizing and it knows the actual physical properties of your FPGA. So it takes this high level net list and generates a net list which is compatible with your FPGA. It can do register merges, it can throw away some pieces of the circuit because it doesn't make sense on that FPGA and so on. This is also the most gruesome part for CPU. It really stresses the CPU because it's really, I mean it's NP complete problem so it has to do all sorts of heuristics at that point and it takes a long time. Now the result of this is actually net list which can be mapped on your FPGA but it's not a programming file. For that you need an assembler which reads the net list again and generates the finally the binary file which you can program into your FPGA so to say. There is a bit of more magic involved but let's not get into that. The net list which comes out of place and route also can be used for timing analysis so you go through that. You figure out whether, well timing analysis tells you which signal paths are the longest in the FPGA and from that you can infer how fast your design can go. So timing analysis is something you want to do with your design and in case timing analysis tells you well your design cannot run at 100 MHz it can run at 50 then you should run at 50. If you don't, if you try to stress your design over what the timing analysis tells you is the maximum speed of your design then your design will likely not work or it will misbehave or it will just give you wrong results. Now speaking of the flow here the analysis and synthesis can be done with open source tools like Icarus, GHDL and so on. The place and the route can also be done using open source tools surprisingly. There's a lot of academic research going on in that. One of the tools is Versatile Place and the Route made at University of Toronto I believe so VBR this is what you want to check and the only really hard closed source part is the assembler but like I mentioned before there are some projects which are trying to change that. Like for example Project Ice Storm this is for the latest FPGAs but unfortunately only for the small ones. Okay so how do you implement your actual hardware model on the FPGA? This is like, this is it. FPGA, to implement hardware model on the FPGA you use HDL, hardware description language it's a textual representation of your hardware model. There are two mostly used HDLs, Veralog and VHDL. I'll show you examples of both. The vendor tools allow mixing these two so it's not a problem to have one file on Veralog and the other in VHDL. Of course it might be a little problematic for the hardware guys but it's not a problem mixing these two. Now at this point it makes sense to point out that there's a project which is called OpenCourse. If you are searching for readily available hardware models on the sensible license then check it out. They have everything ranging from simple communication controllers like serials, port stuff, all the way to PCI Express, USB 3, this sort of stuff. They also have cryptographic implementations of pretty much everything you can think of. It's likely at OpenCourse. They have CPU implementations pretty much. Everything is there. Another thing is set an open hardware repository. This is mostly tools which make your life with FPGAs a little easier. When you are working with the OpenCourse course you will find some nice tools at the set an open hardware repository to deal with that. I want to stress out one thing. When you are modeling hardware for an FPGA you cannot think about it like writing a computer code. This is a very nice slide. I believe I really like that. In case you are working with CPU you have this universal machine, universal hardware which can execute instructions and you are working on top of that. You are writing those instructions through your algorithm using this universal machine. In case you are modeling FPGA content to implement some sort of algorithm you are not working at this high level. You are working much lower. You are implementing a kind of a CPU so to say which can only run one algorithm. You can throw away all this unnecessary stuff which you have in a general purpose CPU. You are just implementing a really simple kind of design which runs one algorithm really well. That way you are throwing away complexity. You don't have to implement that many gates in your design and your hardware becomes simple. It can become faster. It can become much more efficient in terms of energy consumption but it won't run any other algorithms which I believe is fine if you just want to run one thing. I would like to show you CRC5 which is awesome when you implement it in hardware. It's designed to actually be implemented in hardware. In case you are doing Ethernet you are doing CRC32 I believe and it's just very similar. This is a schematic of CRC5 unit with polynomial x to the power of 5, 2 and 0. It's actually really simple. You have a code input here which synchronizes the entire circuitry. Then you have one bit data input here and you are getting CRC5 CRC bits in each tick of the clock at the top signals here. I believe most of you actually saw it in the textbooks at the university or whatever. This is the way you implement CRC. In case you are wondering about the polynomial, the power of 2 to 0 and the power of 2 actually denotes these sort of gates over here. If you want to implement it in FPGA, the snippet of a Veriloc code which I will show you now is what implements that circuit. I picked it from ASIC World which is a great website in case you want to learn Veriloc. There is a lot of examples, lots of beginner kind of things which you can read about Veriloc. It's a great website. Just like I had in the schematic I am implementing a Veriloc module which is like the building block in the HDL. I have clock input, I have one data input and I have a 5 bit CRC output. I also have a reset. That's a really good idea to have a reset of your module. Now the main meat of it looks like that in Veriloc. In case I have a post divided clock, I am checking the reset. So my reset is actually synchronized to the clock. In case I have reset, then I just set the CRC to 0. In case I don't have a reset, then I am doing the actual CRC computation. The check for a reset and execution of the subsequent computation again happens in parallel. So let's see, about the CRC computation, there is a trick that here this stuff on the right side is actually done using a combinatorial logic. So pretty much everything is pre-computed and when the clock edge comes and the reset is not asserted, then it's just this latching which happens at that point. So no real computation happens here as in the traditional sense that it would be doing like the data crunching. Also if you look at this, all of this stuff happens in parallel, it would not make any sense if this was implemented in C, right? In case you have some questions about the Veriloc, some more detailed questions, I will be running it on the conference so please just catch me and we can talk about that. VHDL syntax is based on Ada language and other than that, essentially the implementation is the same thing, Veriloc, sorry, VHDL has a little better typing system, so Veriloc just gives you an array of bits or arrays of signals, VHDL gives you some sort of a little better control on that, but other than that I have an entity which has, again, data input, reset and clock input and then the CRC output and now I'm doing the CRC computation here in case I have resets and currents to clock, oh yeah, right. Then I am resetting it to once, unlike in the previous design, sorry about that, and in case I don't have a reset and I'm just clocking out the CRC. Now, Hdls are not really friendly, right? It looks weird and it's just kind of like doing assembly and writing a kernel in assembly, no. That's why there is a lot of attempts to actually fix that and there's a lot of research going into that area, like some sort of higher level synthesis languages. Now, Chisel is a very interesting option there. It's based on Scala, which is a functional extension to Java, but it's not that they are using it actually to compile anything, they are just using it to model the hardware and they are using the functional and object-oriented properties of that. So essentially you get inheritance there immediately. You don't have that in the Hdls. So you can just model some sort of hardware and base and inherit something from that and just do some sort of minor changes or tweaks to the hardware just like you are used to when you are using object-oriented programming languages. Now, you also get the hierarchical structure, you get the functional paradigm where you can just create buses with some sort of crazy properties and just stack it together like you are used to. It's still closer to hardware than to software development, so it's still kind of HDL but with a little bit more powerful modeling. What's interesting is that the RISC-5 processor is implemented in that. The RISC-5 is backed by Intel, Google and a lot of other companies. So this, I believe, has potential. In case you are interested in checking this out, if you are bored by these HDLs then the website is here, there is a nice tutorial to that, which is really cool. So I would suggest you to skim through the tutorial. It's like 15 pages and it's really well written. Now, very similar to that is my HDL. This is Python to hardware. Again, the Python is just used as a modeling language. You cannot run that, but it supports simulation, verification, so you can essentially simulate your design using Python by means of native speed of your processor. The results can be exported to HDLs and then compiled on your FPGA design. So I have to speed it up a little. I was still talking about HDLs to this point. Now let's talk about some standard programming languages to hardware. It starts with C2H, C2 Gates. The idea was that you have a small software in the FPGA and you are affloating some functions into the actual hardware. So the downside is that you were running bare metal code on the soft cores. That's unfortunate. And you actually placed some marshalling code into your software, which then called the stuff which was in the FPGA. This was great for iterative functions and it did speed things up, but unfortunately the marshalling was not great and you had to have the core in the FPGA. Similar to that is ROCCC, which is a research project. It's C2VHDL, it's not locked to any particular FPGA vendor. C2H was locked to an Altera in their NIOS 2 soft core. Okay, so one last thing about this is Altera OpenCL. This is pretty interesting in fact, because it is possible to use OpenCL kernels and compile them into an FPGA bit stream. Now the compilation takes a lot of time, so it's not really possible to do that at real time to actually compile the kernels. That's why the Altera uses the OpenCL Embedded Profile, which allows you to do a compilation of the bit stream offline and then just deliver it to your machine. It's closer to software because you are just implementing OpenCL kernels than HDL. Altera also provides a kernel module, which is used to communicate between the host PC and the FPGA. This is also used to tunnel the data between one and the other. And the unfortunate part is that you actually need a license for the compiler, even though it's based on LLVM, you actually still need a commercial license for the compiler, so that's unfortunate. There's a lot of examples of that. Altera has some sort of extensions to the OpenCL in fact. There is one extension which bypasses the sort of GPU thingy where, in case you're communicating between two threads of execution in the GPU, you have to use a cache in there. In case of the FPGA, you can just build a direct pipeline between these two threads of execution and just pipe data back and forth. You can also use it as a synchronization primitive. The OpenCL stuff, I believe, is really closest to the software part compared to the rest of the hardware design when you're doing the FPGA stuff. Now, talking about Linux interface, this is the last chapter. This is kind of really, really nasty. This is like a wild west in case of the FPGAs. Because when you implement something in the FPGA, it's completely up to you what you do there. There is really no standard interface with Linux kernel. Unfortunately, there are attempts. In the embedded, there is the device tree overlay stuff, so you can have device tree overlay for your FPGA design, which describes what is in the FPGA. There's also attempt at CERN to define something which is SDB. It's very close to ACPI kind of for FPGAs, which again is a small piece of memory which describes what is in the FPGA. But the problem is that you usually have some sort of control registers there and they can be completely ad hoc. Whatever you just implement there, it will be there. And also you can have completely custom DMA devices there in the FPGA, which can poke your CPU's address space. No problem. And the kernel doesn't have drivers for that. You would have to implement them and so on. This is where it becomes nasty. So there are two ways to do or deal with that. One is to implement your own custom kernel module per FPGA design. Now, this is downsides mostly. You need to make sure that you are running the right kernel module against the right FPGA content. Sure, you can figure out what FPGA content version you have based on some register. You can work around that, but it's still kind of custom solution. You need custom user space IO unless you really put a lot of effort into doing it right. In case the driver is badly written, then it can crash your machine, which you don't want. In case you upgrade the kernel version, then you still have to log around this driver and update it to a newer kernel version and so on. There's a lot of maintenance overhead. That's not really cool. Then the other option is user land approach where you can export basically the FPGA register space into user land that let user do all the register poking and handle all that in the user land. But then you have problems with DMA because exporting DMA, Apple memory into user land is a little difficult. You can use CMA for that. It's possible, but then you are looking basically user control some sort of DMA somewhere which can be used for malicious purposes. It's also not great. Okay, so let me just do a quick summary here. Whether FPGA is great for highly parallelized pipeline tasks, basically when you put data into the FPGA, just pipe them through, do some transformations, put them somewhere else. What is FPGA really bad for is in case you're doing general computing tasks with a lot of branching, this is not where FPGA's shine. Also the good point of FPGAs is that they're manufactured with the real estate of the art technology. This is great. So you save a lot of power consumption. Right. The downside is there is no unified kernel interface. Doing the FPGA development can be difficult but you can kind of work it around by using the OpenCL or maybe stuff like the MyHDL or Chisel, this might help. And the FPGA ecosystem is still rather closed so there is no really open source implementation of the FPGA assembler that's unfortunate. And I'll cut the short here. Thank you for your attention. And questions, please. No questions. I'd like to just remind that for questions, my chance to reward you with a depth of thought so you can try it a little harder. Thank you. Okay, so that one and after that one and I believe there was one in the back, right? So that one will be third. So go ahead, please. So what's the cheapest way to start playing with it? Is there some simulator or something? Okay, so the cheapest way to enter into the FPGA land, right? Okay, so there are simulators. They are actually part of the design softwares. You actually get a software simulator for the FPGA design. You can do that. You can analyze what's going on in there, actually trigger on all sorts of signals that's possible. And then there are really cheap FPGA kits on the market recently. You can get a pure FPGA without any RAM for like 20 bucks. It's really simple. Then you need some sort of debug probe, but that's like, I don't know, again, 10 bucks or something. You can also get nice kits from certain Taiwanese corporation. And they're like 100 bucks. They also have a small ARM core. They have about gigabyte of memory. They have gigabit ethernet. So you can already use that for data cruncher. I used very similar kits for doing, let's see, some clustering application. And even that sort of stuff was usable there. So 100 bucks is a nice kind of ballpark when you can enter the FPGA technology. Thank you. Can we give them a scarf? No, it's up to you. Yeah, that was a really good question. And it was definitely not covered in the talk, so you know what, come for the scarf or maybe after the talk. So I have one ready for you there. Marcin. I just wanted to ask which boards you can buy for cheap to start. All right, so which boards you can buy for cheap for starters? For the Altera FPGA, there is Jurassic Socket. That's like 200 bucks or something. There is also Jurassic the E0 nano sock chip board. That's like 100 something box. And it also comes with the debug probe. So you don't need anything else but a USB cable to actually get into this field. There is something with Xilinx Zinc chip, but I don't know what's the name of the board. Sorry about that. Also, there is Lightness Ice Stick, which is the reference development kit for the ice storm stuff. And there is also open source, entire open source tool chain for that. It's the Project Ice Storm. So this is notable and it costs like 40 bucks. Thank you. The third one, please. You mean when you programmed the FPGA? You mean what happens electrically when you program an FPGA? Okay, so yeah, I was actually planning to do an entire talk about that at another conference. That's really bad. So electrically what happens, the FPGA has an SRAM in that, a big SRAM, which just connects to the transistor. The memory cells in the SRAM connect to the transistors in the FPGA. And based on what you put into the SRAM, you actually configure the transistors which then implement the lookup tables and the registers. This is what electrically happens. That's not necessarily true. Okay, so the common FPGAs are nonvolta, I have SRAM in that, but there are FPGAs families like Altera, Mox, TAN, which is something between CPLD and FPGA, and it has a permanent flash storage instead of the SRAM. So they are nonvolatile. But most of the FPGAs are, yeah, volatile. It's not a problem. There is actually, you can have an external memory from which the FPGA programs at start. Outside work there is a new zoom, right? It's in the money space to use, so don't worry to reach in that far. Good, yeah. Thank you. You need as many as you thought possible. Can you put it in there yourself or should I do it or do you want to use this computer? I can use this one.