 Hello and welcome to this presentation of the paper Countermeasures Against Static Power Attacks, Comparing Exhaustive Logic Balancing and Other Protection Schemes in 28nm CMOS. My name is Thor Mos and this is a joint work with my advisor Amir Muradi from the Ruhi University Borum. This presentation deals with static power side channel attacks. In the past few editions of chess, multiple works have been presented which showed that the standby power of CMOS chips, which is also called the static power consumption, reveals information about internally stored and processed data. In consequence, it has been demonstrated that this side channel can be used to successfully extract secrets from cryptographic devices in order to circumvent critical security features. There are a number of differences between this side channel and the much more frequently exploited dynamic power consumption of CMOS ICs. First of all, the standby power of CMOS chips was extremely small in the past when ICs were still manufactured in technology generations of 200 or 300nm. However, with the downscaling of the technology, the leakage currents became larger and larger and now they consume a pretty significant share of the power budget of devices in current nanometer technologies. Therefore, static power attacks may not have been relevant in the past, but they are certainly relevant now. The second difference to the dynamic power consumption is that adversaries can amplify the leakage of information to a large extent by increasing the working temperature and the supply voltage of the device under test. For many physical adversaries, it is possible to influence these outside factors. While a traditional power analysis attacker can only exploit secret values at the time they are actively processed in the circuit, aesthetic power adversary can extract information as long as the data of interest is present anywhere in the circuit. Finally, if an attacker obtains clock control or if the data of interest remains present in the circuit for a longer period of time, without being operated on, measurements with an extremely low noise level can be collected due to simple averaging over time. This example here illustrates the main security problem in a pretty simple manner. What we see here is a table listing the leakage currents conducted by a common two-input NOR gate and 22 nanometer technology for different logical values of its input signals A1 and A2. It is obvious that the leakage currents conducted through the cell in a stable state heavily depend on the input signals. Leakage tables like this one exist in each standard cell library for any cell. Here the leakage current for input combination 01 is more than 4 times larger than the leakage for input combination 11. Such a strong data dependency is exactly the reason why adversaries can exploit the standby power to extract hidden internal information from CMOS chips. Here is another example in the same technology node, but for a D-type flip-flop. Interestingly, the leakage current of the flip-flop does not only depend on the inputs D and the clock, but also on the output value Q, which can be independent of the current value of D. Typically, the registers are not the main drivers of the information leakage, but it is important to remember this output dependency for the countermeasures we explain later. While all previous experimental works on the subject have concentrated on exploiting the static power consumption in the most clever or easy way, we focus here on the constructive side and develop, practically evaluate and compare a number of countermeasures against this threat. And we do this with the help of a 28 nanometer ASIC that we have specifically designed and manufactured for this exact purpose. The countermeasures that we introduce and analyze in the following will all be applied to a very compact serialized hardware implementation of the block cipher present 80, which you can see on this slide. It essentially consists of one 4-bit S-Box circuit, a state and key register and a bit of control logic. And that's all. Now let's take a look at the hiding countermeasures we have evaluated in this work. To be honest, this first one can barely be called a countermeasure at all. It is more of a design strategy that reduces the leakage. Namely, in modern CMOS libraries, standard cells typically exist in multiple different versions with different threshold voltages to fully take advantage of the latency versus leakage tradeoff. Cells with a higher threshold voltage have a larger latency but consume a lower standby power. Therefore, implementing cryptographic algorithms using only such high threshold voltage cells should reduce the exploitable signal that is available to static power adversaries. The second countermeasure is the most simple form of shuffling, namely random start index shuffling. Here in each round, a 4-bit random number decides which state and key nibble are passed through the S-Box circuit first to start the rotation from that point. As a result, state nibble 5, for example, is processed at different points in time in each recorded measurement. While the exact timing and alignment of traces is not as important for static power adversaries as for dynamic power adversaries, it still leads to the situation that state nibble 5, for example, is present in a different circuit part each measurement, which consequently creates noise and reduces the measurement quality. The third countermeasure we evaluate is called symmetric dual ray logic in short SDRL. Essentially, each gate in the circuit is duplicated and the duplicated cell receives the inverted input. This technique is based on the assumption that each cell leaks a current exactly according to a leakage table like the one we have seen previously for the NOR gate and the D flip-flop. Under this assumption, the balanced inverter is actually secure in the sense that an adversary cannot distinguish whether input i is 0 or 1. Because in both cases, one inverter receives a 0 and one inverter receives a 1 and the cumulative current does not give any information about which inverter receives which input. In reality, of course, there are tiny differences between the leakage of two instantiations of the same cell due to intradial process variations, aging effects and via imbalances of the connected signals. However, these effects are expected to have a rather small impact on the static leakage in stable states, which is why the estimation of the leakage currents using such leakage tables is common practice in IC design. For the NOR gate and the flip-flop, however, the balancing is not optimal even without considering differences between multiple instantiations of the same cell. For the NOR gate, for example, the input combination 00 for A1 and A2 can still be distinguished from the input combination 01. Yet a reduction of the exploitable signal is expected. The next countermeasure we have considered in our comparison is called quadruple algorithmic symmetrizing in short quadsiel. It has been proposed as a countermeasure against both dynamic and static power analysis attacks. Essentially, quadsiel quadruples the unprotected circuit, while in three of the four copies the S-box table is modified as detailed on this slide. Then a random permutation of inputs, keys, inverted inputs and inverted keys is selected from 24 possibilities and the values are given to the four circuits. The idea behind quadsiel is to balance all hamming weights and distances occurring in a cypher implementation and rotating the inputs to the balanced structures to account for remaining dependencies due to process variations, aging effects and pass imbalances. The final hiding countermeasure we analyze is called exhaustive logic balancing in short ELB. This countermeasure has been newly developed for this work and is essentially a logical continuation of the SDRL technique. Here we make the same assumption as previously, namely that each standard cell instantiation leaks exactly according to its corresponding leakage table. Then we build the balanced versions of the standard cells in such a manner that under the set assumption the leakage current should be perfectly constant and independent of the applied signals. For the NOR gate you can see the result here. In order to gain a better impression of the logical behavior of this circuit we have denoted exemplary input values here. For A1 and A2 being both zero, we have two inverters receiving a zero and two inverters receiving a one. One NOR receiving a zero zero, one receiving a zero one, another one receiving a one zero and the last one receiving a one one. In fact the same holds for any combination of A1 and A2. Take a look here, here and here. Of course we now need to replace each NOR gate in the implementation we want to protect by a circuit containing four inverters and four NOR gates, which leads to quite some overhead. By the way, the ZN outputs which are not needed can simply be left unconnected. The situation is even more complex for the D flip-flop. We have seen earlier that a D flip-flop not only leaks about D but also about Q. Since Q is an output, we cannot apply the same technique as for the NOR gate. Instead we need to choose the input values for the four flip-flops as a function of their output values. When focusing only on balancing the D and Q pins of the flip-flops for the moment, we can achieve balanced input-output combinations using the circuit below. Let's look at some values again. If in the previous cycle D was zero and in this cycle D is zero again, then we have all possible D and Q combinations from zero zero to one one represented. And again the same is true for all other possibilities. See here, here and here. Up to now we have ignored the data-dependent leakage of the three input X NOR gate and the two inverters. In order to make them independent of the applied signal values, we have to change the circuit like this. Here each ELB NOR is one of the balanced NOR gates constructed previously. It is clear that replacing each flip-flop in a circuit by this huge structure leads to a very significant area overhead. However, assuming the simplified model of a table-based leakage current, we are able to construct arbitrary circuits with a constant and data-independent leakage current. Now we have also considered a masking scheme to combine with the height-encounter measures detailed before. In fact, we have chosen a simple threshold implementation of the serialized present 80 cipher. This implementation uses three shares and a decomposed S-box to offer first-order side-channel security. In the following, we quickly present the target device we have designed and the measurement setup for the practical experiments. Here you can see the 28 nanometer CMOS chip we have developed for this investigation. On the left the layout is presented and on the right a microscope photography of the manufactured and bonded die of the ASIC is shown. Each side of the chip is only 1.3 millimeters long. For those of you who wonder what CASA is, this is the name of the excellence cluster at the Ruhi University Buchum through which this research project was funded in the last years. These 11 different present cores are implemented on the ASIC and their post-layout area consumption is listed in the table. Additionally, we have calculated the overhead factor for the application of each of the countermeasures in comparison to the unprotected circuit. While the random start index shuffling is comparably cheap to apply, the balancing techniques already cause an overhead factor of 4 to 8. Combined with the threshold implementation, which itself causes about a three-fold area increase, the exhaustive logic balancing comes at an overhead factor of about 23. Of course, this is very significant but the 60,000 gate equivalents might still be reasonable for a well-protected crypto primitive in some applications. This illustration shows the experimental setup we have used to analyze the cipher implementations on our ASIC. The chip sits on a board and is placed inside a climate chamber to increase its working temperature to 90 degrees Celsius. The ASIC is powered through a source measure unit which supplies an increased voltage and measures the current drawn by the ASIC when the cipher operation is stopped at the end of the first round for about 100 milliseconds. Here are two pictures of the source measure unit and the ASIC sitting on our custom measurement board. I don't have the time to show all practical results that are included in the full paper, but here are the most important parts of it. In the first step, we have performed leakage assessment in a fixed versus fixed manner. The left side of the slide shows the results for the unprotected circuit on top, for the shuffled circuit in the middle and for the ELB circuit on the bottom. The right side shows the results for the same hiding countermeasures in the same order, but combined with the threshold implementation. For each experiment, a histogram of the two fixed groups is shown in addition to a t-test and a g-square test result. It is obvious that the unmasked circuits all show detectable leakage in the first order. Although the two groups are harder to distinguish for the ELB measurements, it is still very possible. The masked variants show neither first nor second order leakage. It has been shown in previous works already that three-share threshold implementations are mostly vulnerable in the third order with respect to static power adversaries. And the masked variant with out-hiding countermeasure and the shuffled TI indeed show significant detectable leakage in the third order. However, the two fixed groups measured for the combination of ELB and TI seem to be indistinguishable using half a million measurements. After the leakage assessment we also performed key recovery attacks in order to determine how many measurements are required to reliably extract key information. This number of observations is listed as data complexity or DC here. Clearly the most effective combined countermeasure is also the most expensive one, namely exhaustive logic balancing applied to a threshold implementation. The corresponding attack requires about 3 million traces which took more than 90 hours of nonstop measurements to acquire. In addition to the absolute DC values we have also listed DC per gate equivalence here to judge the cost effectiveness. The most cost effective combination is actually the shuffled threshold implementation which is about 6 times cheaper than the ELB threshold implementation. It also becomes clear from the table that neither hiding nor masking alone provide a sufficient security level against this kind of attacker and that combined countermeasures are certainly required. To conclude we have seen once again that the standby power of CMOS chips reveals secrets to potential adversaries. We have provided a first comprehensive comparison of countermeasures against static power attacks and did so with a dedicated test chip in an advanced CMOS technology generation. By doing this we have learned that neither hiding nor masking alone seem to protect sufficiently against this threat but also that strong protection can be achieved using combined countermeasures. However, those typically come at a significant price in terms of area overhead. Thank you very much for your attention. If there are any questions feel free to ask them during the live session at chess 2021 on September 17th. See you there!