 My name is Niertz and in the next 20 minutes we're going to talk about hardware reverse engineering, or to be more precise about network reverse engineering. We're going to talk about how to get from something like this an unstructured netlet that may contain thousand hundred cells or even millions of gates to something like this. For that, we developed Dana, our universal data flow analysis tool. We're going to talk about how Dana works and what really cool things you can actually do with it by showing how easily you can identify key registers in the open Titan, which is an open source SOC developed by Google and the Low Risk Initiative. So let's get started. And we're going to start with an overview of the hardware reverse engineering process in general and how to actually retrieve a netlist. Starting with an ASIC, we first have to de-capsulate the chip, meaning removing the epoxy. Once we have the dye, we can start the delayering and take an image of each layer and process these images by stitching them together and stacking all images. This entire process is extremely time-consuming and requires a lot of specialized equipment and know-how. But once this is done, we end up with a netlist and can start the analysis. Starting with an FPGA, we have to extract the bitstream from the device, which is usually stored in some non-volatile memory and then convert the bitstream to a netlist. The bitstream itself is basically the netlist, but in some proprietary format. So the exact bitstream format has to be reversed and understood, which is usually a complex process as well. Once we have the netlist, one might think that we actually start with something like this, where we see all the module boundaries, see how different modules are connected to each other and what they do. But in reality, we start with something we like to call the Sea of Gates, a flattened netlist with no module boundaries or any higher level information available. As seen here, the netlist can be compared to a circuit diagram where we can only see the gates and the connection between them. And since modern chips can have hundreds thousands or even millions of gates, understanding the netlist can be a complex process. Luckily, a netlist can be interpreted as a directed graph where all gates are nodes and the signals and wires are the edges. This enables the usage of graph algorithms, which makes life for the reverse engineer usually a little bit easier. To sum up, the reverse engineering process consists of two phases. First, we have the netlist recovery where we extract the netlist. We then go over to phase two where the actual netlist analysis takes place. Usually the netlist analysis is an intertwined process between automated processes and manual analysis. In our group, the entire netlist analysis is carried out in HELL, our netlist analysis framework, which we have been developing for over four years now. HELL passes netlist of arbitrary sources, for example, FPGAs or ASICs into a graph-based netlist representation and provides the necessary built-in tools for traversal and analysis of the included gates and nets. When we first started to develop HELL, we envisioned HELL to become the hard reverse engineering equivalent of tools like IDAPRO or DEDRA used in software reverse engineering. HELL provides a graphical user interface to assist the reverse engineering process and already comes with quite a few plug-ins to further investigate the netlist. The core itself is written in highly optimized C++ and HELL is available open-source on GitHub. Now let's get to the interesting part of this talk, DANA, which is a publication at CHAS 2020. This work has been conducted by Nils Elbatus, Max Hoffmann, Sebastian Temme, Leonid Asriel and Christoph Paar. In DANA, we start with the unprocessed sea of gates. General goal of DANA is to recover high-level registers. This helps the reverse engineer to get a better overview of the netlist at hand and provides first clues and indications. For the register recovery, DANA provides two modes, the normal mode and the steered mode. Both of them run completely automated and require no user input. In steered mode, the reverse engineer can incorporate a priori information to assist the tool. In this case, the expected size of registers. For example, if the reverse engineer knows that the netlist at hand is in AES128, the reverse engineer would expect to find 128-bit registers for at least the stated key. In this mode, DANA will internally give higher priority to registers of the size in the process. Furthermore, with the output of DANA, graphs can easily be generated. At the end of this talk, we will show that just by analyzing the flow of data, a lot of valuable information can already be revealed. When we first started to work on DANA, we saw a lot of problems with previous works in the area of netlist reverse engineering. The probably biggest problem of all is that no up-to-date benchmarks are available. A 35-year-old benchmark suite, ISCAS85, is still the most popular benchmark. It is purely combinational, meaning no flip-flops or other sequential elements are even present in the netlist, which makes it absolutely not contemporary. Secondly, most works have poor evaluation methods, which are often incomplete or biased. Another thing that we often saw are magic values, which even when changed to sliders have a huge effect on the outcome of the regarding method. This makes our applicability to completely unknown netlist questionable, since there is no way to verify the results. So with DANA, we wanted to do better. We built our own benchmark suite of nine open source cores, ranging from CPUs to cryptographic core processors and one big SOC, which we synthesized for both ASICs and FPGA. For evaluation, we used the well-established metric, which is often used in cluster A evaluation, but more in that later. Most important of all, we did not want to rely on any magic values or datas. However, a priori knowledge can be applied to assist the tool. So let's have a look at the workflow of DANA. We start with the netlist, which is going through three phases, the preprocessing, the processing, and the evaluation phase, until we end up with the final register grouping. But before we are going to discuss the phases in more detail, we have to introduce some terminology. And we are going to start with registers and groupings. A register is a set of flip-flops as depicted here. A grouping is what we call a set of registers. So all registers combined form a grouping. Another important thing are passes, which are the heart of the processing phase. Each pass has a grouping as input, and based on its specific metric, outputs a new grouping. The pass can either join or split existing registers into smaller ones. Now let's go into more detail of all phases, starting with the preprocessing. In the preprocessing phase, three important things happen. As a first step, we create an abstracted netlist. Since DANA only works on the connection of flip-flops, we first remove all logic and create the abstracted netlist. The intuition here is that data flows between registers, so we only consider flip-flops and other sequential elements and their connection between one another. Another important step is the preparation of rules. The intuition here is that to prevent unreasonable groupings, each pass has to follow a set of rules to create registers. All flip-flops of the register candidate have to be in the same register state and they need to share the same clock and control signals. If a rule is violated, the creation of the register is not allowed. In the paper, we introduce a very sophisticated algorithm to identify register stages, but we are not going to cover in more detail in this presentation. But once the algorithm is done, each flip-flop is assigned to a register stage. Control signals and clock signals provide valuable information for the register grouping. The intuition here is that if a flip-flop shares common control and clock signals, they most likely belong together. But since multiple modules can often share the same clock and control signals, grouping by this metric would often end up in registers that are too large. As a rule, this often prevents grouping of wrong registers. All information about clock and control signals are cash for faster lookups since these informations are often queried by the passes. In the processing phase, Dana combines nine metrics that process structural and control information while abiding to set rules. These metrics are combined with each other to create groupings. To get a better understanding of how the processing phase works, we are going to go through this step-by-step with the help of an example. This net list has, of course, already undergone the pre-processing, so all logic has been removed. In the initial grouping, all flip-flops are in their own register group. This grouping is now going to be input to all passes. Pass one may now recognize that these flip-flops belong together and create a register of these three flip-flops based on its eternal metric. The output of pass one is given to pass two. Pass two now creates new registers based on the output of pass one, again, depending on its own internal metric. Pass two may decide that these two flip-flops belong together and creates a new register, and maybe also these two. We end up with a grouping that may look like this, but we end up with many more groupings from all different pass combinations that we are going to store and process later in the evaluation phase. All registers that were just created in the processing phase are now passed to the evaluation phase. The evaluation phase now condenses all information into a single grouping with the help of a specialized majority voting where the data decides the final output. Intuition here is that registers that were identified by a majority of passes are most likely correct. In the normal, simple majority voting, we saw all unique register candidates by how often they incur. We then choose from the top of the list the best register and remove all candidates that share any flip-flops with it. We repeat the process until normal candidates remain and all flip-flops are assigned to a register. But this normal majority voting has various drawbacks that make it not suitable for Dana. One problem that we observed is that registers often had similar number of votes with insignificant differences. We had to find a way to automatically decide on the register that is the best for the choice of the remaining registers. To be more precise, we wanted to avoid fragmentation. In general, a flip-flop can only be part of one register. Now if a candidate is selected, the remaining candidates have to be filtered. That's a bad choice may result in many small registers which is equal to a high fragmentation. Furthermore, the normal majority voting cannot make use of the a priori knowledge. The solution here is to give priority to specific candidates. First of all, registers that match our a priori knowledge and secondly, registers that result in a minimum fragmentation of the remaining candidates. Without going into much more detail, we implemented a specialized majority voting that considers both of these factors which actually prioritizes data flow. Since data flows between bigger registers, we prioritize registers that prevent the creation of small registers, in this case under 8-bit, since they typically carry control signals. The exact details regarding the implementation can be found in the paper. Once the evaluation phase outputs a single condensed grouping, we compare the grouping to the previous one. If new registers have been created, we set the current grouping as the new initial grouping in the processing phase and repeat the process until no more changes occur. Dana then outputs a final register grouping. The output of Dana can now easily be used to generate data flow graphs. Remember that we wanted to structure the sea of gates. Since this is now done due to the recovery of high-level registers by Dana, we can now take care of the graph generation. In the data flow graph on the right, the arrows indicate data flow, which in the net list itself is implemented in the logic, meaning there exists a way to get from one register to another through combinational gates. All boxes represent registers of a specific size that were recovered by Dana. Let's talk a little bit more about the Steer mode, which is a very powerful feature of Dana. In this example, we analyze a completely unrolled DES core with the help of Dana. Dana groups a 56-bit key register and the two 32-bit halves of the state register into a 110-bit register. Of course, this is not wrong since all 16 round registers are correctly identified, but this might not be what a reverse engineer expects to find when analyzing a DES core. Here we would expect to find registers of 56-bit for the key and two 32-bit registers for the round states. If we now use the Steer mode and then struct Dana to give a higher priority to register of the size, we end up with this graph. The great thing about this is that a reverse engineer can now easily distinguish between key and round registers and even recognize the unique files structure of the DES. In our experiments, we observed similar behavior for the AS128. Here in normal mode, Dana recovers 256-bit registers for each round and merging against state and key. Using Steer mode, we found the 228-bit registers cleanly separated. Now let's get to the evaluation. As already mentioned in the beginning, we created our own benchmark suite with cores from OpenCourse and GitHub. We used a wide variety of designs ranging from five cryptographic cores to three CPUs and one real world like SOC, namely the OpenTitan, which is developed by the Low Risk Initiative who's supporting industry partners like Google and Western Digital. To show that Dana is actually technology-agnostic with synthesized all cores for both FPGA and ASICs. In general, the functionality of Dana can be seen as some form of clustering since flip-tops are grouped into registers. So we had a look at how clustering methods are evaluated and noticed that clustering evaluation methods compare a given output to a ground truth. For the generation of the ground truth, we kept all human readable names in the net list. For example, these are the 128 flip-flops of the state register of round one of the AES128, which are automatically assigned to the same register as we would expect Dana to recover this. Of course, the names were not used by Dana itself. To evaluate our findings, we mainly relied on the NMI, a statistical measurement of news and clustering evaluation. The NMI is influenced by several characteristics, including cluster sizes and coverage of the ground truth. A value closer to zero means that the cluster is worse of the ground truth, while an NMI of one indicates a perfect result. Another variable measurement is purity, since it's a very simple and transparent evaluation measure. Here we measure how pure cluster grouping is. For example, if there are flip-flops that actually belong to different registers in one clustering group, the purity is low. Thus, a high purity is desirable, but if each flip-flop is in its own cluster, the purity is one. Purity alone is not a reliant measure, but combined with NMI provides valuable information. When we evaluated our course, we discovered that most of our results had an NMI over 0.9, which is close to a perfect recovery. We also observed that applying a priori knowledge almost always results in a higher NMI and thus in an even better recovery of register. Dana has a short runtime and finishes in a matter of minutes, even on a standard laptop. When we had to look at bad NMIs, like for the charsery in normal mode, we discovered that a bad NMI does not necessarily mean that the result is bad. Taking a closer look at the charsery, we saw that Dana identified one large 2,060-bit register, the ketchup state, followed by a rather unexpected chain of registers with around 32 bits each. These registers are actually part of a single intermediate register of the charsery sponge construction, hence the low NMI. The registers were split up into several smaller ones. A reverse engineer can still see a clear interdependency between these registers and the main state register, thus still recognizing them as one unit. Using steered mode, Dana also identified these as one register. Now that we had to look at how Dana works, we are going to take a look at what you can actually do with it. And we're going to start with the open titan. Here reverse engineer might be interested in finding security relevant registers and components. According to the documentation that is available on the website, the security features are guaranteed by an AES-256 and an H-MAC with a SHA-256. A reverse engineer might now have the following hypotheses. For the AES-256, we expect the netlist to have at least one 256-bit register for the key and 128-bit register for the state. For the H-MAC, we are looking for a 256-bit key register and a register larger than 256-bit for the message. We expect the SHA-256 to have a 512-bit message register and two 256-bit register for the digest and state. So we steered Dana to registers of size 512, 256, and 128. Among others, Dana identified the following registers, one 512-bit, four 256-bit, and four 128-bit registers. We expect that there should be some kind of connection between the registers since they should have some influence on each other. Using the output of Dana, we created this data flow graph and only drawing the registers of interest. Group one and two only contain a single register each, which is not connected to any other register of interest. It does conclude that they are not part of the cryptographic modules since they all consist of more registers. Group four includes 512-bit register and two 256-bit registers, exactly the registers expected for a SHA-256 implementation. Group three includes 256-bit registers and 128-bit registers, which we would expect for an AES-256. However, none of the identified groups met the registers we were expecting to find for the HMAC. Having a closer look at the suspected AES, Dana not only found two, but four registers in total that belong in this group. From merely studying the graph, it is possible to identify state and key register, not only by their size, but also by their connection. The key register, marked in orange, gets updated every round, marked by the looping arrow and influences the state register. In turn, the state register is also updated every round, but never influences the key register, hence there is no arrow pointing to the key register. In addition, we can identify an output register marked in pink, which is the only register with a connection that leaves the module. Since it is influenced by both, key and state register, we suspect that the final AES round, which is different from the remaining rounds, is computed on the fly when the register is written. The remaining 256-bit register in yellow is just from this graph of unknown functionality. Comparing our recover results from Dana with the official block diagram from the documentation, one can see that our hypothesis were correct. The remaining 256-bit register in yellow seems to be the decryption key register which enabled support for decryption and decryption with the same logic. Looking at the suspected chart 256, we hypothesize in the chart internals in mind that the 512-bit register in blue seems to be the message register, since it influences the state register in two keys, which in turn updates the green digest register, which is the only register with an output of this module. However, the registers of the HMAC were not found. For the HMAC, we were expecting to find a 256-bit key register and a 512-bit message register. We suspect that the registers of the HMAC should be connected to the SHA in some way, since they serve as an input. Since the SHA and HMAC are typically closely intertwined, we extended our rendering to include registers preceding the SHA. This immediately revered several 32-bit registers. Mapping this to the expected structure of the HMAC, we were also able to identify the key and message register. Finally, comparing our results of the HMAC to the official block diagram, we knew why Dana was not able to identify these registers as one. Both registers are specified as 32-bit FIFA registers, which each have their own control signal. Dana was not allowed to merge these registers due to our rules, but found the registers exactly the way the designer implemented them. Our successful identification of the cryptographic modules in the netlist now allows us to locate them on chip. Here we highlighted all the identified registers in the floor plan image of the design, information which can now be used, for example, in side-general attacks. Our second case study deals with the detection of Trojans. We modified an AES to incorporate the Trojan that is triggered by a certain plain text. If triggered, the Trojan leaks the key by overwriting the output register. Dana could be used to identify suspicious data plans, but is limited to a very small subset of Trojans. Here we see the Trojanized AES netlist that has been processed by Dana. Knowing the AES, we can immediately distinguish the key registers from the plain text registers. The key registers influence the state register due to the key XOR, but not the other way around, which is why there is no connection from the state register to the key registers. The AES designer analyzing a sample of the fabricated IC knows that there should be no direct data paths from the initial key register to the output. However, we see this very connection in the output of Dana. To sum up, Dana is a universal data flow analysis technique that is technology agnostic. It runs fully automated and requires no magic values or deltas as input. In our evaluation case studies, we showed an almost perfect recovery of registers and modern real-world designs. Dana is also fast and processes even large netlist in only a matter of minutes. Our results show that only by analyzing the data flow, a lot of valuable information can be revealed which can massively aid the reverse engineering. We implemented Dana as a help plug-in which has been open sourced as well. Additionally, we also released all benchmarks that were used in this project. Thank you for your attention. I hope you enjoyed this talk.