 The next talk is being held by Matthias Lasse. Matthias is talking about how to reverse engineer FPGAs. He did it by himself and he will tell you how he did it and how he especially reverse engineered the Xilinx 7 series and Latisse ICE 40 series. He knows much more about this than me, so please give me a warm applause and here's Matthias. Hi, hello, can you hear me? Okay, what is this talk about? In this talk I'm going to explain to you what is FPGA? How does it work? What does it do? What does FPGA stand for? And of course I will tell you stories on how I reverse engineered them, show you some pictures and so on. What this talk isn't about? This talk is not about how to use FPGAs. I actually cannot use them. I never learned Verilog or VHDL. And this talk is also not about high level synthesis. Maybe a quick story why I decided to reverse engineer the ICE 40 series four years ago. I wanted to build a small CPU and I had the problem that chip design and building chips is far too expensive. So the next simpler solution would be an FPGA. But I did not want to learn Verilog or VHDL. So I decided to sit down and document bit stream format and the internal layout. Anyhow, FPGA stands for Field Programmable Gate Array. What means Field Programmable? In a sense, it means that the device is in place programmable. So to say in a live circuit, we can just reconfigure the device. The gate part is, yeah, FPGA simulates or implements logic gates and the array, yeah, it's a two-dimensional array of logic gates, programmable logic gates. Yes, but what is logic? Before I will tell you how an FPGA works and so forth, we have to get down to logic 101. We have four operators. Operators, yeah. We have a NOT gate, we have the AND gate, we have the OR gate and we have the exclusive OR gate. On the bottom, you can see the truth tables. When we have at the input, for example, at the OR gate, 00, we get zero on the output. When we have 01, we get one and so forth. Next thing, we can chain together logic gates into a circuit. Here is a small example of one-bit full-edder, which is used to implement addition. And as you can see, it uses two exclusive OR gates, two AND gates. Ah, this one uses three exclusive OR gates. Okay, there are several implementations of them. In, on the former slide, we could see that we can generate a truth table dependent on the input states and what we get on the output. With that, we also can combine several logic gates into one table. I did the work for the full-edder here and now we have our three inputs, we have two outputs and if we, for example, get on A1 on B0 and we have the carry set to one, we know the result will be zero and the carry out will be one. The nice part about this is we don't have to trace through the logic and we just kind of implemented several logic gates in a lookup table. And the lookup table is the smallest part in a FPGA. This thing implements the logic gates. However, we need more than one lookup table, of course. So let's zoom out a bit. What you can see here is a slice in the seven series silings FPGA. These are for lookup tables with six inputs each. They are followed by a special carry unit because implementing addition with lookup tables would take up too many resources and because we need it quite often, it is far cheaper for the manufacturers to include a carry chain. And then on the outputs, we have flip-flops because sometimes we need to synchronize the state or we need to store one bit of information. We then pack together one lookup table, one part of the carry chain and in case of the seven series, two flip-flops into what's called a logic cell. Four logic cells make up one slice and two slices are then grouped together, combined with a switchbox and interconnect into a tile. As you can see, we have surrounding tiles and that's how we implement logic and that's how we wired the logic. Now we zoom out a bit more. Here we can see that several of those tiles are grouped together into columns and we zoom out a bit more. One column with the seven series contains 50 tiles and one clock tile in the middle. This is a rather small device. It only has 186 columns, which equates to 9,300 tiles. The columns then are grouped together into regions. This particular device has six of them and that's the basic FPGA fabric, but we're still missing something. We still can't communicate with the outside world of the chip. For that, we need something like a bridge or input-output modules. At the borders, we have the IO tiles, but are there more than those two tile types? Of course, sometimes logic is not enough. Maybe we need memory. Of course, we could implement the memory in logic, but that's expensive too. The vendors gave us small units called block RAM. Here are the columns of block RAM in that particular device. Each small rectangle contains 36 kilobits of memory and there are 140 blocks of RAM in this device. Sometimes memory is also not enough. Sometimes we need processing power. Implementing arithmic functions like multiplication would also use up lots and lots of resources, lots and lots of logic. The vendors gave us DSP tiles. DSP stands for digital signal processor or in that case it's just a small addition unit combined with a multiplication unit. Now we really know about the basic makeup of FPGAs, but how do we configure it? How does each lookup table know its values? How do the flip-flops know their initial state and how is everything routed? For that, we have the bitstream. The problem with the bitstream is it is undocumented and it configures the switchboxes lookedup tables and provides the initial state. This thing decides which switch to turn off and on. The goal of reverse engineering FPGAs is getting the bitstream documented. For years ago, I reverse engineered the IS-40 FPGA. A quick summary. The IS-40 FPGA is a very, very small one. It is optimized for low power consumption. It only has between 384 and 7680 lookup tables with only four inputs. Even the block RAM blocks are small, but it is very beginner-friendly and the cheapest one on Mouser. This is the picture the manufacturer gives us. It shows that the programmable logic blocks contain eight lookup tables and that the whole fabric is surrounded with the IO tiles. But we don't know anything about the interconnect. And we don't even know how many tiles, how many rows and how many columns there are. A closer look at the controller logic block or in that case the logic cell. We can see there is only one flip-flop and it can be bypassed. You can use both the pure output of the lookup table and the flip-flop. And we have only four inputs. What's special about the routing in the iS40 is they are routed B-directional. So you have more than one source at each wire. You can if you put in the right or wrong configuration because it sheds circuits in the device itself. Another thing they provide us with are eight globally routed signals. They get routed to every single tile and at every tile we can choose out of those eight four. The interconnect between the tiles mainly consists of wires that span over four tiles and over twelve tiles horizontally and vertically. And of course every single tile is connected to its eight surrounding neighbors. That's it. What were the challenges with reverse engineering the iS40 FPGA? Well, we had no knowledge about the internal layout. I had no schematic. I had no idea at all how many wires they were, where they go, where the switchboxes are, how the switchboxes connect to the controllable logic blocks. And even the bitstream was a command for commanding the FPGA to load the correct bitstream where only partially documented. Another challenge, of course, is mapping the tile location to the bitmap coordinates but I will show you more details of that later. So how did I reverse engineer the iS40 FPGA? Well, I took a closer look at the tools the vendors gave me, especially the bitstream generator. The bitstream generator seemed to contain several strings which related to the names of the wiring and you could compare them to bit names. But they were behind the debug flag that I could not breach because they commented it out and the compiler didn't optimize it out. That's why it was easy for me to document and reverse engineer the iS40 FPGA because I only had to replace one single jump instruction in the vendor tour and I was able to get every single name of every single bit there was and a short description of its function. Alright. Another fun story about how this tool was written when I tried to decompile it and looked through some functions in it you could see where the copy pasted everything together. If you have one function and it's a combination of print devs and cout they copy and pasted that shit together. Another thing I noticed the bitstream contained a CRC or cyclic redundancy check but there was not a single opcode that related to this function namely exclusive or the whole binary had none. If you implement the cyclic redundancy check you normally need XOR. That puzzled me. So I just randomly stopped the bitstream generator and took a look at the memory dump and they found out they generated the bitstream in ASCII. So at some point I found a giant string of ones and zeros in ASCII and they generated that part only for the CRC. I don't know what happened in that program. I don't want to know what led to these decisions but with the Sailing 7 series I partially reverse engineered the Sailing 7 series two years ago I had another challenge because the 7 series is a really high performance device. One single LUT 6 lookup table uses up half the memory of one CLB tile in the I-40 and even the smallest 7 series FPGA has more lookup tables that are 4 times bigger than the I-41s and the biggest one has more than 1.2 million lookup tables. This maps to around 150.000 tiles. Other resources in the 7 series contain the block ram. The block ram has 36 kilobits of data, as I mentioned before. We have the central clock line, the DSP cost, the IOTiles. By the way, this is just the bottom most part of the Zinc 7020. Now I will tell you more about the Zinc 7020. This is the particular device I decided to reverse engineer because it contained two ARM Cortex A9 processor cores that could reprogram the FPGA and interface with it. I really really liked the thought about combining an FPGA with the interconnect with the memory system of those processor cores and use that. But then again, I didn't want to learn very long of the HDL. So I decided to reverse engineer it. With the 7 series I had to scale up my whole operations because for example, as I mentioned before, there are 9300 switchboxes. Each switchbox contains 216 multiplexers and 3,738 possible connection states. That's a lot. They also connect through 135 wires to neighboring tiles and route 117 wires from neighboring tiles to them. The whole operations suddenly got very big. And of course the whole device contains more than 3 million wires and there are 32 million or more than 32 million configurations bits. With each of them I had to find out what they do. What were the challenges with the 7 series, the complex design? But with this one I was not able to get any debug information whatsoever. Also the tool chain was much more complex than the latest one. It was first written in Java. So no decompiling there for me. I'm only a C assembly C++ programmer. And it was written much nicer. Another thing that bothered me, I will show you shortly, is that there is a small part where the pattern of the bitmap you can extract out of the bitstream doesn't match the rest. And this part is as I later found out for the error correction part. Also another small challenge was mapping the tile locations to the bitmap coordinates. Now I will show you a very small section of the bitmap I can generate out of the bitstream. Okay, when I first looked at this I was like, fuck, this thing seemed like an insurmountable wall to me. But I already, and you already can see some patterns in there. We for example can see that there are some bigger chunks. Most probably these are the configuration data for the lookup tables. Nice. Now we only have to find out what the other columns that look like noise do. Of course they are for the switch boxes. But mapping them to the sync wires, that was hard. About mapping the tiles to the bitmap, I have another picture. We can see that 64 pixels map to one tile. And this part, or yeah, this part maps to the other part. With the middle part I was puzzled. Of course the small regular flock of pixels we can see here and here. Had to be used for the clock interconnect. Here in the middle we can see there is the clock. We know that 25 tiles on one side and 25 tiles on the other side. But that's all I got first to work with. About the error correcting code that we can see there, that thing was a challenge. But I had an idea. I wrote a small parser that counted the number of bits that were set in each row. If this number was one, I stored the information about the middle part. Out of that I was able to find out this thing was using Hamming code. Or single error correction, dual error detection. Extended Hamming code. I would love to show you more. But right now I had a problem with my hard disk and my notebook. And that's kind of where my talk early ends I think. What can I tell you more about the reverse engineering of the seven series? With the Vivado tool chain we get the net list, we get the internal schematic. And you can extract it automatically. You get the information on the tile coordinates, you get the names of the wires. But we don't get the information in the bit stream. But with the knowledge of where the tile sits in the bit stream. We can correlate the data. I created several automated tools for that. I would have loved to show them to you. But something went terribly wrong. I'm sorry. So what are the implications of my work? I did it there. Because I can create a net list out of the bit stream. I'm able to cross compile bit streams to different architectures. With that we can copy, extract and reverse engineer IP cores that are otherwise impenetrable. Another possibility is starting project ISTOM. Another project ISTOM with Clifford together to create a second target for his open source tool chain. I'm very sorry that my talk got this short. Any questions? This was very short indeed. So if you have questions for Matthias, please come to the microphones. Here's one, two, three and four. And we can take questions from the ISC chat or via Twitter as well. Yeah, we have a question here at microphone one. Yes, I would like to know what happened to you? Were you pressured not to... What happened to your laptop? What happened? It was something about XFAT and macOS and unplugging it. Okay. And my windows when I try to repair the disk just freezes. Okay. Yeah, that's hard. It happened like one hour before the whole thing started. My second question, have you worked on Xilinx Spartan 6 series? No, I never cared about the Spartan ones and about the 6 series. I only wanted to reverse machine 7 series because of the Cortex processor in it. Okay. Thank you. Thank you. Microphone two, please. Can you comment on this like middle point of the FPGA? Because you have like this black part there and this white part there. Is it like the error correction code for half of the FPGA and the other half of the FPGA or how does that work? Exactly. With Heming code you normally mix the parity bits into the data. But of course Xilinx doesn't want that so they put it in the middle. And these 13 bits, yeah, they are the Heming code for one row. I can show you later the details when I get them out of my hard disk. They have everything in details with more details. Thanks. Thank you. Microphone three, please. I was somewhat puzzled by your remarks regarding your inability to decompile the Java tool chain you mentioned earlier. Because usually C sharp MSIL byte codes and JVM stuff is the easiest prey in that regard. How come? That might be, but I always come a bit from a different direction because I taught everything to myself. I had no idea how to take the Java. With the latest tool chain I created two tools for that code elimination. One, for example, patched every single jump instruction in the binary so it could get the program flow. Another one replaced every single opcode with breakpoint. I then hooked the structured exception handling from Windows and replaced every opcode as it was executed. In that way I could really use the code by two perks which was easy to decompile. But as I said with Java I had no idea how to tackle that. There's lots of automated software for that. Give it another go. I never used... I always wrote my own software, maybe. That's one of the reasons. Then we have a question from the ISC chat. What architectures do FPGAs use? You mentioned ARM once, but it wasn't the FPGA itself. No, FPGAs don't use architectures like CPUs. They use the building blocks like the controllable logic block. Like block RAM, DSP tiles, DIO tiles and that's the architecture. Then microphone one again. I'm wondering whether you have tried to extract some of the device database from Vivado or some information or did you skip that for legal reasons? I thought about doing that, but the binaries I think are more than 10 gigabytes in size. I was like, no, fuck it. Okay, I think Vivado encrypts IP cores. So basically VHDL code and they do the same way also for XML files which contains some of the device information. I think that's quite similar to what Intel is doing for those who listened to the E-Talk yesterday. I had no need for that information because I could get the device information out of like 10 or 20 example projects just flew into the core. If you want, I can show you later some more details about that. Microphone two, please. Hi, impressive work. Thank you. Sorry for the presentation. Okay, so let's exclude your presentation from your work. Still impressive. My question was almost the same. So you didn't look at GTX or high-speed IP cores because it was too complicated, I guess. No, I didn't have a device for that. Okay, okay, I don't have all the hardware in my mind. Okay, thank you. I will ask you details later. Microphone one again, please. Did you do any work on clocking? What? Did you do any work on clocking like PLLs and clock distribution and all that? Yeah, sure. I have some information about the column drivers, the row drivers, how everything comes together. Okay, but there was also part of the slides which have disappeared. I had detailed pictures of the schematics where I could zoom in and zoom out, but yeah, fuck up happened. Microphone two, please. Thanks for your talk. My question is regarding reprogramming the FPGA fabric from the ARM Cortex course. Did you have a look at that? Is it also possible with your work? It is, but another thing that interests me more, the small thing, the 70-20 I'm talking about, has two tiles that can reprogram the whole thing or partially reprogram it. And of course, readouts. Yeah, so it would not be a problem with flashing that with your self-made bit file. One thing about the sync. The sync is a combination of a CPU and an FPGA, but they're isolated. You don't even need to power up the FPGA part, but you need to power up the ARM part because it's the ARM part that obviously is prioritized and that configures the first bit stream. Okay, so I can just use your bit stream then that you generated using your tools? I have a very, very, very, very proof concept place and route tool for small gates and bits with DIOs that's working. And it's routing and it's placing. Okay, thank you. Anyone, please. What about the timing information? Have you also reverse engineered that? I started extracting the timing information, but I wanted to finish up more of the tile information before I started with that. But I have the tools already waiting and that would be one of the next parts I would tackle. So sometime in the future, we can expect to actually use it with timing analysis and... I would hope to because I don't have a real motivation behind that device other than for fun. If I could create something great out of that that interests the community, I would love to do that, but until now, I wasn't in contact with anyone. Microphone 2, please. So did you look at some other FPGAs like the Ultrascale Plus or maybe... Yes, I did. I started looking at the Ultrascale, but before that I wanted to finish up the 7-series and I don't have a working Ultrascale and I just want to hold it in my hands when I reverse engineer it. I think you can understand that. And another part of the question, did you look at other vendors, maybe micro-Semi, Altaira? Micro-Semi, I first considered before I decided to reverse engineer the IS-40, but I think they're crap. And yes, I really want to reverse engineer the Altaira Intel FPGAs next because then I kind of reversed and did all big free vendors. Super, thanks. Thank you. And number one? Yeah, I have a question concerning the place and route you do. What is the basic approach you take to reduce all the combinations of block placing? Right now it's just the proof of concept. There is no reducing anything later. Of course I will use simulated annealing and I really want to get into what way they call the reduced order binary decision diagrams. Okay, thank you. Number two, please. Thank you for the questions. You're saving me. Hi, do we have and tried out to get the bridges between the FPGA and the hardcore part working, for example, for memory or access from the FPGA to the hardware peripherals in the... I started working on that. That's just reverse engineering another tile. I can show you more to that later if you have time. Thank you. Number three. How does the FPGA apply the bootstream? In what sense? You put it in via UART or something. How does it work? Okay, there are several ways with the Zinc one because it has an ARM Cortex processor, so there's a small bootloader. You can put the whole thing on an SD card and it loads it automatically through the Cortex A9. Of course, you could use JTAG or you can connect an external SPI device. There are many possibilities. Great, thanks. And we have a question on microphone four. Hi, thank you for your talk. A little request. Could you present the whole presentation in a self-organized session later, maybe? I would love to. Cool, thanks. And number one again. Yes, I had a question. During your studies, did you discover new stuff about FPGA backdoors made by NSA and friends? No, but I found ways to detect them. But? I found ways to detect them. Okay, I'd love to talk to you about that. Sure. And I wanted to say something else about the Xilinx family you've been working with which integrate an FPGA and a CPU. I think that in terms of cyber security, it is absolutely not a good idea to mix in the same chip an FPGA and a CPU because an attacker like NSA can easily upload a few bytes of code. No, you cannot. And I will tell you later about that because I'm working on a proof of concept for a provable un-tempered device. Okay, I'd be delighted to talk about that with you later. Thank you. We have a question in the IRC chat. What is lacking to get a free and open source FPGA tool chain like with Storm for the IC40 series? What do you mean? Ah, the place and route tool. With the IC40, I gave Clifford my findings and he, together with some other guys, they created the place and route tool and I just provided them with the information about documentation, basically. Hope that answers that question. Microphone 2, please. Yes, thanks. You said that you don't do a very long NVHDL so what do you use as input for your design tools? The example project of the vendors. Then I drag around the gates to get some different placing and routing and with Silings you have the block design editor. I also had that in my other presentation. Okay, thanks. Number one again, please. Hello. Between, Silings forces you to use the XI4 bus between the programmable logic and the ARM cores. Did you figure out if there's another way to connect these parts? What do you mean? They don't force me to use the XI bus, there's also UR, UOM also have like 64 GPIOs. Oh, yes, to get outside the device. I don't know, between the ARM core and the fabric. Okay, good, thank you. If you want to, I can show you more details to that later. I think there will be a session later. Microphone 2, please. Could you explain your way of reverse engineering the FPGA? Did you create a bit stream and observe its behavior or did you just create a bit stream and not run it? I never run the bit stream that I created. I only once tried a small bit stream I created by myself and that was it. So what did you do if you did not run the bit stream? I just got the knowledge out of it. I tried to recreate the same net list information that I got out of the tool chain only by looking at the bit stream. Thank you. And microphone 2 again. Can you talk about reverse engineering the non-logic tiles like the PLL tiles, IO tiles? Is that somehow different from reverse engineering the logic tiles? Sadly, yes. The way you gather the information is almost to no extent automated. You have to look at, or I have to look at the schematic information, look what intern switches are used, where they go. Then I have to create another image where the switch is not used and I can take the difference between the tile and then OK. One detail. Do the vendors provide some sort of schematic of the PLL blocks or is that classified information? With silings, you know almost everything about the device. Cool, thanks. And another question in the IRC chat. When and where would the later session take place? When and where do you want it to take place? I'm not that good with talks. As you've noticed, I'm more a conversation guy. Maybe you just come to the front of this talk and you can figure out a place together, maybe at a bar or if there is some free space. It's a big space, actually. Are there any more questions? It doesn't seem like that. So give a warm applause to Matthias Lasse.