 Welcome, everybody, to EuroCube 2021! My name is Alexei Dovenko and I'm going to present you a joint work with Alex Birikov about dummy shuffling against algebraic attacks in whitebox implementations. Enjoy! I will start with a brief introduction to whitebox cryptography, followed by greybox algebraic attacks and algebraic security, which are the main topics of this work. The main part of the talk is about shuffling and how it should or should not be used to protect against algebraic attacks. Let's start! In the whitebox model, the adversary has full access to the cryptographic implementation, typically of a symmetric key primitive. The main security goal is to prevent the key extraction. There exist possible additional requirements, such as one-wayness, incompressibility, trait at riskability and others. We focus on the most challenging direction in the whitebox cryptography, which is about implementations of existing primitives, such as the AS block cipher. This is in contrast with the idea of designing dedicated whitebox ciphers. Whitebox implementations are closely related to the more general framework of cryptographic obfuscation. Indeed, any generic whitebox technique would likely be a useful tool for general cryptographic obfuscation. There are two camps around the whitebox cryptography. Industry has a strong need for practical whitebox implementations to use it in many applications, such as DRM or Securing Mobile Payments. Industry does whitebox, but relies on secret designs. In academia, theoretical obfuscation, such as indistinguishability obfuscation, was very recently constructed on top of reasonable hardness assumptions, but it is far from being feasible to implement. There were attempts of practical designs, starting from the seminal work by a show et al. in 2002 and followed by many reparation attempts, but overbroken by very efficient attacks. Even worse, a work from chess 2016 showed that most of them are vulnerable to classic side channel attacks, which can usually be performed in an automated way. This created a new line of research aiming to develop whitebox countermeasures against very generic attacks, such as power analysis attacks or fault attacks. Power work follows this direction and dives deeper towards whitebox-specific generic attacks, more precisely algebraic attacks. I would also like to point out that obfuscation in general has much more applications than just obfuscating the AS, and looking at the whitebox techniques from this viewpoint is much more fruitful and promising. To put the work in the high-level landscape of obfuscations, consider this graph, which relates the implementation complexity with some sort of security assurance. A reference or an optimized implementation is very efficient, but provides no security against whitebox adverse areas. Classic whitebox and code obfuscation techniques can provide security against some attacks at the cost of somewhat heavier implementations. Finally, theoretical obfuscation methods such as IO are fully secure based on some reasonable hardness assumptions, but are far from being reachable. Furthermore, the chess 2016 work uncovered a barrier for practical whitebox implementations, namely the classic graybox attacks and extended attacks applicable specifically in the whitebox setting. This work is directed towards breaking this barrier at the cost of some increase in the implementation complexity. While we do not claim to fully break the barrier, we provide a tool to protect against a very powerful class of attacks, algebraic attacks, which I am going to introduce briefly. Let's start with the differential computation analysis attack, or DCA. DCA is basically an application of classic differential power analysis to whitebox implementations. What is so special in whitebox is that there is no measurement noise. All computed bits can be recorded precisely. And the authors show most existing whitebox implementations are broken fully automatically. This somewhat puts the whitebox state of the art behind the side channel state of the art. This feels wrong, and it is natural to try to apply classic graybox countermeasures to strengthen whitebox implementations. And the most popular and efficient protections against power analysis attacks are masking and shuffling. What happens if we apply them to the whitebox setting? Let's start with the masking countermeasure. Classic masking schemes are linear. Assume that we protect a sensitive function as this is performed by splitting it into a T-additive shares. For now, we focus on storing the shares and ignore how to perform operations on them. The problem with storing the linear shares in the implementation in clear is that there exists a linear combination of the intermediate values that equals to the sensitive function. Let me illustrate the attack. The adversary runs the implementation on some input and records all computed values, a computational trace. Then the process is repeated several times. At the same time, the adversary computes the sensitive function. For example, in the case of IS, it can be one output bit of an S-box in the first round, which is a classical target is inside channel setting. Typically, such a function depends only on a few key bits, which can be easily guessed. The process is repeated until a sufficient amount of traces is collected. Now, the linear masking implies that there must exist a subset of columns that add to the sensitive value. This can be expressed directly as the linear system of equations, namely a matrix vector equation. After solving it, we obtain the positions of the shares. But more importantly, also a confirmation that the sensitive value S was indeed computed in the shared form in the implementation. In the case of IS, it would confirm the right guess for the portion of the key. What are the conclusions? First of all, higher order masking would not help. The attack does not depend on the weight of the solution. Secondly, the attack is very generic and automatic. It only requires recording of the traces and solving a linear system. What do we have? We want it to protect against DCA, that is correlation attacks, by applying masking protection. But it turns out that it is completely vulnerable to this linear algebraic attack in the white box setting. In the side channel setting, this attack is prevented by the inherent measurement noise. Before going into protections against algebraic attacks, I would like first to describe a few generalizations, which we should also take care of later. The first one, naturally, is increasing the degree of the attack, aiming to break non-linear masking schemes, which I will talk about later. This can be done by expanding the set of columns by all products of pairs of initial columns. We multiply the first and the second column, then the first and the third column, and so on. Now we solve the resulting linear system and obtain the recovered non-linear masking. Here the solution has two linear components, V3 and V1, V2, which we can recall as the product of the first and the third column. This recovers the quadratic masking scheme V1 times V2 plus V3. While this attack is more about finding the locations of shares, in this case it also recovers the actual masking function. This means that the attacker does not even need to know the shape of masking, only the degree bound. It follows that the attack is very generic. Generalization to higher degrees is straightforward. Not however, that the increase in the number of columns requires to increase the number of rows, which is the number of traces. And as a result, the attack's complexity grows very fast with the growth of the degree of the attack. The takeaway is that higher degree masking schemes can be attacked, but at a higher cost. On practice, protection against degree 2 or maybe degree 3 attacks seems to be sufficient. The second generalization is about allowing some noise in the system. It is not motivated by any existing countermeasure, but rather shows a particular requirement for a good protection. The idea is that an implementation might include some noise in the computations so that the sensitive value is not always reconstructed correctly. But the errors should happen quite rarely. In this case, a simple approach would be to attempt the simple linear attack, and if it fails, to repeat with a different set of traces. To clarify, the adversary does not see the error vector e, but simply tries to repeat the attack again and again. At some point, we might get a system without any errors, and complete the attack. The problem described is an instance of the so-called learning parity with noise, or LPN, problem. And there are more advanced attack techniques than just waiting for a system without errors. The takeaway is that good low-degree approximations of the sensitive function are sufficient for attacks, so that a designer must ensure a sufficient weight of the error for all computable expressions. The third generalization highlights the role of the sensitive value and its relation to the input in the protection. Assume that an implementation computes a function f, which is a product of the sensitive function s, and some pseudo-random, unpredictable function r. The basic linear algebraic attack would fail, because when s equals to 1, the function equals to a random value, which is hard to predict. And this happens half of the time. However, we can observe that when s equals to 0, then f also equals to 0. Can we exploit it? The idea is simple. We restrict the system to the subset satisfying s equals 0. Then we solve the remaining system and find the solution. Note that this is a degree 1 attack, even though the expression is quadratic. The takeaway from this generalization is that the protection should not depend critically on sensitive functions. As we could see, the algebraic attack is very powerful, especially with all of its generalizations. Now I'm going to describe the framework of protections against algebraic attacks. The framework of algebraic security was proposed by the authors of this work at Azure Crypt three years ago. The security model is defined in a setting similar to side-channel security models, such as probing security. The implementation is allowed to use randomness, which in the white-box setting has to be replaced by pseudo-randomness, and also includes an encoding and decoding steps, which are not included in the analysis. To define the security requirement, consider any non-constant function in the linear span of the functions computed in the critical part of the implementation. These are functions of the input and randomness. It is required that each such function must have a sufficient error term on any fixed input. That is, for any fixed input, any non-constant function in the linear span must be sufficiently far from the constant functions 0 and 1. In other words, the functions should have a low bias, or apparently a large error. Of course, algebraic security of higher degrees can be defined by considering the higher degree span of the functions instead of the linear span. The main goal of the model is to achieve secure computations, given that simple encoding and decoding functions are allowed to be out of scope of the attack. In addition, in the real white-box setting, the pseudo-random generator will also be a part of the attack surface. These parts are yet to be included in the model. They cannot be simply included directly, since they contain the input and the output, which are leaking by definition. Of course, one could include full implementation in the encode part and use some encryption to pass it through the critical part. For the protection to make sense, the encode-decode parts should at least be independent from the implementation that is being protected, but depend instead on the input and output size or maybe the implementation size. Finally, I would like to know that from the theoretical point of view, this problem is not hard. For example, one can use fully homomorphic encryption as a form of masking. It would clearly avoid all low-degree algebraic attacks, and so the difficulty is to find something much more practical, of the complexity comparable to classical masking schemes. One natural countermeasure achieving algebraic security is nonlinear masking, where a sensitive failure S can be decoded from shares by a nonlinear function F. At AsiaClip 2018, we proposed a provably secure minimalist quadratic masking scheme. That is, the degree of F is 2, and it has minimal possible secure quadratic expression. This year, H.S., Zekar, Eisenbarz and Liskevich generalized the scheme in two ways. They replaced the quadratic term by a monomial of an arbitrary degree, and also split the linear term into an arbitrary number of linear shares. This allows to achieve a combination of algebraic and correlation security more efficiently than by composing their respective masking schemes. However, they only proposed concrete algorithms and a proof for a degree 3 scheme, protecting against degree 2 algebraic attacks, which is likely more than enough on practice. In these works, the security proofs are quite involved, and it seems that reaching larger security degrees is quite hard. We now move on to our contributions, applying the Schaffling countermeasure to protect against algebraic attacks. Let's start with a basic Schaffling example. Assume that an implementation computes the same function t times in parallel. A usual example is the AS-block cipher, where the same S-box is computed 16 times in parallel in each round. The idea of Schaffling is to randomize the order of computations. The goal is to introduce extra noise against correlation attacks, which would also amplify the security from linear masking schemes. On practice, it can be done in many ways, depending on the computation model. The computations can be shuffled in time or in memory. In our formulation, we consider that the inputs are shuffled, then the functions are applied in parallel to all the inputs, and then the outputs are unshuffled back. Each such application is called a slot. We focus on the core of Schaffling, evaluation slots, as the critical part in the algebraic security model. It seems that Schaffling is a complex nonlinear procedure, and so should protect against algebraic attacks. In fact, here the Schaffling information is not even included in the attack surface, and the right order of slots is not recoverable by any attacks. But is it actually secure? I will actually now show that the basic Schaffling does not provide a reasonable protection against algebraic attacks. As usual, we focus on the critical part. We ignore the initial Schaffling and the final unshuffling steps. We only look at the shuffled sensitive values. Note that there exists no function recovering a concrete single value, such as z1. Simply because the information about Schaffling indices is not included in the critical part. However, there is still an algebraic leakage. The sum of all values is independent of the Schaffling order. And it also is a linear function of the computed values. Therefore, an algebraic attack can target this sensitive function. I would like to clarify the full setting, which may be not clear from the simplified figure on the slide. Of course, one could easily break such basic Schaffling in an ad hoc way by somehow noticing that the same values are being shuffled in the same places. However, the more general setting that the actual algebraic attacks cover allows to scatter these values arbitrarily in the implementation and to mix them by an arbitrary linear function with all other values. This only somewhat exaggerates what happens in a real implementation. The shares can be hidden in different places. All sorts of linear masking schemes can be used on top of Schaffling. In this case, it is quite difficult to come up with a non-algebraic attack. So please keep this setting in mind. I will only illustrate minimal cases, but for the algebraic attacks, more countermeasures are broken for free. That's why they are so problematic for the designers. Getting back to the leakage of the sum of values. More generally, all symmetric functions are leaked, which are by definition those that do not depend on the input order. Importantly, the attack ignores the implementation of Schaffling and Un-Schaffling. But is it exploitable on practice, or is it just a theoretic attack on the model? In the IS, the sensitive value would depend on 16 key bytes, which clearly cannot be guessed. Here we show a new kind of attack, differential algebraic attack that allows to break basic Schaffling at a minimal cost, even if it is further protected by some linear masking and mixed with randomness. Consider an IS instance with 16 S-boxes, which are shuffled, and their shuffled outputs are leaked. Again, this means that their sum is leaking clear. Now we execute the implementation on a related plain text, obtained by ejecting a difference in the first byte. In the sum of values is leaked, but only one of them is different, which means that we can exclude the unknown terms by subtracting the sums. As a result, we obtain an expression that depends only on one key byte, and it can be attacked by the linear algebra attack. This attack can be simply implemented by tracing the implementation on two related plain text, sorting the respective traces and the respective sensitive values, collecting enough such traces, and running the simple linear algebraic attack. Know that we shifted from typical random plain text attack to a chosen plain text attack. But it totally makes sense in the white box setting, where the adversary can encrypt arbitrary plain texts. One might say, another generalization, how can we hope to protect against all of them? Luckily, in the paper we show that the algebraic security model already prevents differential algebraic attacks. This highlights the strength and the universality of the model. In fact, it is not difficult to see the proof. The randomness in the model should be independent across multiple executions. Therefore, when noisy functions are added, the error can only grow. And adding the traces only makes it worse for the attacker. We are now ready to define our main hero, the dummy Shuffling. Follow the basic Shuffling described before. The dummy Shuffling simply adds a new dummy inputs, which are chosen independently and uniformly at random. We call the corresponding evaluation slots as dummy slots, while the original slots are called main slots. In the algebraic security model, we consider the initial Shuffling and the dummy generation part as the encoding step. And note that it is independent of the main computation C. The parallel main computations are considered as a critical part to be secured against the algebraic attacks. The decoding step is out of scope. Again, because including it would require extending the model because otherwise the outputs Y would simply directly leak. I will now describe security of dummy Shuffling against the linear algebraic attack. Assume we are protecting an implementation C. First we have to compute its property E1, which is the minimum error of a non-constant function in the linear span of C. Then we can show that the dummy Shuffling scheme is secured against degree 1 attacks, with the error lower bounded by the parameter E1, times the ratio of dummy slots to the total number of slots. Now we have two problems. First, computing E1 can be very hard, as it is similar to computing a minimum weight of a linear code, which is a hard problem. Second, what if an implementation has a very small parameter E1? Cannot we protect it? A good protection should be applicable to any circuit. Now we show how to deal with these problems. The idea is to sort of refresh the non-linear gates, such as AND or OR in the implementation. This is done by adding a uniformly random bit in a dummy slot and adding zero in the main slot. In order to achieve this, the respective random or zero bit must be prepared in the encoding step of Shuffling and then passed as an input to the slot. Note that each gate receives a new independent random bit. In other words, a main slot would accept a real input padded with zeros, while a dummy slot would accept a random input padded with more random bits. Note that this is not a security problem, since the large number of zeros cannot be detected algebraically by a low degree function with a negligible error. And we have the following theorem. The dummy Shuffling scheme is secure against degree 1 attacks, with the error lower bound of almost 1 over 4, and we can get as close to 1 over 4 as we want by increasing the number of dummy slots. The theorem follows from the previous theorem and the fact that the refreshed circuits have this special parameter E1 lower bounded by 1 over 4. We are now ready to describe our main result. It turns out that the dummy Shuffling with refreshed implementations actually provides higher degree security. More precisely, it provides security against algebraic attacks of degree up to the number of dummy slots. Of course, for each degree, the error bound is different and gets close to 1 over 2 to the 2D, as the number of dummy slots increases. Here we only consider a single main slot, as it is simpler to analyze. This is the most generic case and is applicable to any initial implementation. We simply consider a full implementation as a single slot. A final remark is that the error bound, 1 over 2 to the 2D, is optimal heuristically, at least in the Boolean Circus Computation model. That's because the first few nonlinear gates would take uniformly random inputs and so have error 1 over 4. And taking a product of D such functions gives the error 1 over 2 to the 2D. On the other hand, this may be only a limitation of the algebraic security model, which requires all functions to have as efficient error, even those that are computed from pure randomness and are independent of the input. Whether this achieved error bound is sufficient to prevent LPN-based attacks is yet to be understood that there was no study evaluating the feasibility range of such attacks in the wild box setting. I'm going now to compare briefly the resulting protection with previous works. This slide presents a brief comparison with previous works, namely the nonlinear masking schemes. The dummy shuffling scheme significantly improves both implementation complexity and the error lower bound. More importantly, it provides protection against algebraic attacks of an arbitrary fixed degree at a very reasonable cost, which we find quite surprising because the nonlinear masking schemes induced a rather significant overhead. Nonetheless, we believe that every provable security tool enriches the designer's toolkit and is worth studying in depth. In particular, dummy shuffling itself is very vulnerable to fault attacks, while masking schemes have at least some resistance. Please find more interesting results in the paper, including an interesting proof-of-concept construction used in one of the winning challenges of the V-Box 2019 competition. Thank you for your attention.