 Thanks for the introduction. I'm Jun Wei-Wang. I'm a PhD student at Crypto Express. I'm going to talk how to reveal the secret of an obscure Red Book's implementation. This work is done with my colleague Louis Gouban, Pascal Pierre and Matthew Heewan. This talk consists of four parts. First, we give a brief introduction of Red Book's Crypto, and then we overview Red Book's context, and then we have a look at the WENI implementation of this context, which is the obscure implementation, and then in the last section, we reveal the secrets inside of it. So, let's start with the introduction. So, Red Book's Crypto protects key extraction from the software implementation of cryptography primitives. In this context, a malicious attacker could entirely control the running environment. Specifically, he could actually pick inputs for the implementation and run as many times as he wants. He could record all the running and execution information, such as the access to memory, values, addresses. She could also tamper with the implementation, such as injecting force and lettering the control flow. Unfortunately, the current state of the art of Red Book's Crypto is very few theoretical discussion on it, and no probability security construction. Besides, all the practical construction are heuristic, and they are vulnerable to some known generic attacks in the recent publications. One of the generic attacks is published in the last RwC. Red Book's Crypto is initially designed for digital right management applications, but in recent years, mobile payments applications draw more and more attention. Since the legal for academically verified construction and the market keeps increasing, the user tends to adopt home-made solutions. Here, home-made means their security mostly relies on a secret design. It is therefore interesting to investigate the security strength we can achieve in practice in this context. In the middle of 2017, Red Book's competition was launched by E-Corrupt CSA as a follow-up event of the Red Book's workshop one year before, affiliated to Crypto and Chess, and also as a Chess 2017 catch-the-flag challenge. The motivation is to confront breakers and designers in this secret design paradigm. The idea is to invite the Red Book's designers to submit their challenge implementing AES 128 and to invite the breakers to recover the hiding case in the CSA security. The participants are not required to disclose their identity and they are not required to disclose their underlying design or breaking techniques. The result is that all the 94 submissions were eventually broken by nearly 900 individual breaks and most of them were allowable for less than 24 hours. The scoreboard of the challenge is ranked by the surviving time of the submissions. The winning implementation is 777. It is designed by CryptoLux and only broken by Team CryptoExperts. It survives for four weeks, 2.3 times of the second one. From the scoreboard, we can see CryptoLux is not only good at designing, they are also very good breakers. Three of the top five were first broken by them. So congratulations to CryptoLux, including Birkhoff and Yidwenku when the designing award and the breaking prize is for Team CryptoExperts, which is the order of this talk. So let's have a look at the winning implementation. So the ending result shows that this implementation is protected by at least three layers of protections. The innermost layer is included a Boolean circuit with probably some early detection mechanism. In the middle, it's a bit slicing program with plenty of AS instance running parallel. And the outmost layer is some classic engineering obfuscation techniques. For example, there is a water machine inside of the program to cover up the underlying implementation details. All of these three layers of protection makes the source code look really higher obscure. The code makes about 28 MacBat and has 2,300 lines. 12 global variable defined inside of it. Two of them takes most of the space. One is a global table used for intermediate competition states. And another is the program code running inside of the return machine. More than 1,000 of the functions are defined. They are very simple, but obfuscated looks like below. Actually, we found that only 200 of them are useful and they are indexed by an array. Further investigation shows that there are duplicates of only 20 different functions. They can be divided in several categories such as bitwise operation, table lookups, and control flow primitives. So now let's have a look at how do we break it in five steps. So, firstly, we perform a human reverse engineering where three sub steps are taken. Namely, readability processing where we rename all the functions and variables in understandable way. We clean up all the redundancies. And then we remove the return machine to obtain a bitwise program. And we further simplify the bitwise program into a Boolean circuit. After that, we transform the Boolean circuit in its single static form and minimize it. Then we perform a deep dependency analysis to extract some structure leakage of this implementation. Finally, we recover the key with the help of some algebra analysis. Since there are not too much technical detail in the readability processing, we will ignore it in the following discussion. Okay, now let's look at the role. So, as mentioned, there is a machine inside of this implementation. Basically, it has an interpreter loading instructions from the program bad code and running these instructions. We then decided to simulate the return machine, namely, while executing this return machine, instead of invoking the interpreted instructions, we print them out. And we obtain the bitwise program. It consists in plenty of loops. And these loops intersect for six, four times, and each of them contains a sequence of bitwise operation. So, in order to understand how this bitwise program works, we need to know how the global table is used. Basically, we have a global table of two to 18 elements. Each element is 64-bit length. It can also be viewed as a two-dimensional array of 64 rows and 4,096 columns. Then we have an iteration of the loops. Basically, it cycles for 64 times, as mentioned, and it has a sequence of bitwise operation. The operands and the result are taken and put back to the table. For example, in the highlighted instruction, two wire robots from the blue and the green shells are taken, exotherm and put back to the orange shell. The iteration works as a follow. When L equals one, we use these relocations. When L equals two, it uses another relocation in the corresponding columns. And the distance between these columns, the respective two locations are constant. Be careful of the green column. Because it gets back to the top of the column, it cycles back to the bottom of the column. This is similar to when we move L from two to three and so on and so forth. The loops are iterated for 64 times, and all the 64 locations in these three columns are used. It is similar to all the other instructions. Generally, if a location is used in the initial iteration, all of 64 locations in this column will be used after the loops. In a predefined order. And the other columns used in a similar way, but probably starting in a different role. Note that not necessarily every column are used in a bitwise loop. And not necessarily a column is only used either for reading or writing. It could be used both for reading and writing. For example, in this bitwise loop, in the GTH instruction, it is used for an operand. In the ITH instruction, it is for the place to store their result. We call these loops memory overlapping. And we notice that these loops only implement swipe values inside of the column. And we realize that their existence doesn't affect the output of the suffrax, which means they can be totally removed. So after all of this, we still have a sequence of bitwise loops. But all of them are not overlapping anymore. In the beginning of it is 64 times 64 bit-slice program, where the first 64 is taken from the number of iterations. And the second 64 is the word length of the table. And right before the end of the program, there is a bit combination procedure taking all the outputs of the bit-slice program, combing the 64 bits into Boolean values. And the ending of the program is a small Boolean circuit. Probably it's some error detection mechanism. All these signs indicates that there are 64 times 64 independent AS instance running in parallel. And we found that all the number of them are real and identical. The rest of the instances are implemented with the fake hard-coded case and the pair-wise existence. We select one of the real implementation within a Boolean circuit with about 800,000 gates. We verified this Boolean circuit is functionally equivalent to the original program. Now we transform the Boolean circuit into a single static form, which means each intermediate variable is only assigned only once and all the access to it is after its assignment. Then we try to minimize the circuit in several respects since it's super large. Basically, we detect an intermediate variable over many, many executions. We see whether it's a constant. If so, we see it's a constant variable and we can replace its appearance with this constant value and propagate this constant. We also detect whether two intermediate variables are equal to each other or many executions. If so, we see they are duplicates and only one copy of them can be, we can only keep one copy of them. Besides, we try to flip an intermediate variable and to see, to compare the output of the program, whether it matches with a normal execution. After many executions, if it's true, we see it's used for randomization and it can be replaced by a constant value such as zero. After several rounds of detection and removal, the size of the circuit is halved. Then we decided to do a data dependency analysis of the minimized circuit. A way to do it is to plot its data dependency graph. A data dependency graph means that in a data dependency graph, a vertex stands for an variable and directed edges means there is a data dependency relation between these two variables. Namely, the ending vertex relies on the starting vertex. However, it's costive for us to plot the data dependency graph for the rule circuit. Luckily, we can manage to plot the first, for example, the first 20% of it. But it looks like a mess. We can't figure out anything from that. Then we reduce the number, the size of the circuit we plot. We plot the first 10% of it. Some structure leakage start to appear. We can see it's symmetric along with the red line. Even though we don't know what is this symmetry, we know we are in the right direction. Then we plot the first 5% of the circuit. Now everything is clear. The graph is plotted from the center to outside. We can see there is a ball in the center and 16 branches come out from it. They are divided into four groups by the Flickr structures. If you are familiar with AS, obviously you are thinking the branches are the competition for the first round S books. The Flickr structure might be mixed column computations. However, the Mr. Central ball, we still don't understand it. It might be used for initial pseudo random generation which we can't remove. Fortunately, there exists somewhere knowing clustering algorithm to help us to detect and extract the viral boards in a cluster in a very complex network. We apply this clustering algorithm and we extract all the viral boards for each individual S books. We then identify the outgoing viral board of each individual S books. Namely, we extract the viral board that's used in the future competition outside of the S books. If the cluster algorithm accurately extracts each cluster, then there must exist a deterministic decoding function such that the decoding of this outgoing viral board uses this function. We can get the real S books output. We then further make a hypothesis that this decoding function is linear for some fixed coefficients. We then record the computation traces and extract all the outgoing viral boards for each cluster over T executions with randomly selected plant tags. And for each key guess and S books output, we compute the hypothesis bit. And we build this linear system of equation and we claim that if our assumption is correct, the system will be always solvable for correct key guess. And it works. For instance, we have a cluster with about 500 Twitter points inside. And there are 34 outgoing viral boards. And we record the computation traces and extract the outgoing viral board for 50 times. The result shows there no solution for any incorrect key guess. But there is for the correct key guess for each S books output, we can solve this linear equation system. The solutions are listed below. Where the viral boards are ordered by the index, we can see only 15 consecutive viral boards are used among these 34 outgoing viral boards. And then the decoding function is actually just to multiply these five outgoing viral boards with these binary metrics. We repeat this procedure for the remaining clusters. And finally, we can extract 14 of the 16 subkeys. For the other two keys, we just use brute forces. Super easy and fast. So until now, we recovered the key of the implementation. So in summary, there is no realistic solution for web books to crypto. But the industrial demands keep increasing. And this makes the user tends to adopt homemade solutions, which is not a classical crypto way to solve problems. The web books contest was launched by eCrypt CSA to increase the openness of research on web books crypto. And benchmarked the state of the art of constructing and attacking techniques in this secret design paradigm. Frustratingly, everything was broken in the end. But it could be only at the tip of the iceberg. Our novel attack techniques breaks the winning challenge. It illustrates that the resistance against this generic attacks is not sufficient in practice. And our attack could be generalized to attack implementations with higher-order decoding functions. More attacking techniques will be disclosed in a way that people will appear online. Thanks. Hi. So my question is, there is a large area in cryptography that aims to construct obfuscation constructions, which are generalization of white box cryptography. So the state there is still, we have many constructions and we have also attacks. But the nature of the attacks there are more constrained to one specific primitive, which is multilinear maps. And I was wondering whether you are aware of any parallel between your crypt analysis techniques that you presented here and attacks that we have on the algebraic structure of these tools. Because these two types of constructions seem very different and obfuscation in general aims to give you some guarantees. Okay, thanks. Indeed, we realize there is some parallel research on obfuscation, especially the theoretical obfuscation, not the engineering one. And indeed, there is an implementation. This contest was used these techniques, which is the part of this submission. But this submission was with very weak security parameters. And it was a modification of the real construction. So it wasn't bringing all the, because of the efficiency restrictions for the competitions. So we were aware that we knew how to attack the submission. Yeah, you know that there is an implementation that uses IO, which is the fifth one. But with the, how to say, low-degree parameter, and it's broken by others. And I want to say what the script has different with obfuscation. Obfuscation is generally to obfuscate and program, to make a program intelligible. But WebBooks is assumed the attack knowing the algorithm. And the target is to extract the key inside the implementation. It's different. There must be, there may be some relation between them, but it's different topics. Okay, thank you. One question. Okay. So you're, I'm saying it's about, so you really, you look at this circuit and you get, it's AS. Yeah, it's like, does this mean that you could actually design, have like a whole new area of block cipher design, where you have weird block ciphers that your analysis are not so structured that your analysis wouldn't be able to attack? Sorry for my English. Can you repeat your question? Okay, so your attack is very much because you see the 16 things of AS. Yeah. So you could imagine that there are other block ciphers which aren't as, aren't as regular. Yeah. That would give you, would make it much, much harder for you. So this might open up different avenues for block cipher design. Yeah. Yeah, it could be. Okay. Something to think about. Anyway. Okay. Thanks. Thanks.