 Hello everyone. My name is Jin Yang. In this video, I'll present our work about victimized linear approximations for attacks on through 3G. This is a joint work with Thomas Johansson and Alexander Nexmov. Me and Thomas are found in the University while Alexander is from Ericsson Research. My presentation will follow this offline, so I'd like to first give the motivation of our work. I'd like to give some background about the confidentiality and intact reproduction in cellular networks. So in LTE, there are three standardized algorithms for the confidentiality and intact frequency protection, which astronauts 3G, AS, and SOAP, and they are all used for 128-bit security level. Well, when we come to 5G, 3GPP has suggested to use 256-bit security key and algorithms to resist against quantum computing. So one possible solution for the 5G confidentiality and integrity algorithms could be to reuse existing algorithms and the advantage is obvious since existing circuits can be reused. However, the security of these algorithms under the 256-bit key length should be carefully investigated to make sure that they can actually provide 256-bit security. So that is the motivation of our paper. We have given linear quip analysis of SOAP 3G. Specifically, we have given a distinguished attack with complexity to the power of 172 and a correlation attack with complexity to the power of 177. And these attacks indicate that SOAP 3G cannot achieve the full 256-bit security. Next, I would give some details of the SOAP 3G Cypher. SOAP 3G is a stream cypher with a linear part and a linear part. The linear part is a linear feedback shaped register with 16 stages, while the linear part is a finite state machine with three memories. The alpha is defined over the finite field of order to the power of 32, so every value in the cell is 32 bits, thus giving 512 bits in total. The feedback polynomial is given here, where it has two special coefficients, alpha and the inverse of alpha, where alpha is the root of a polynomial in the finite field of order to the power of 8. For the update, in each clock, every value in a cell is shifted to the right cell, while S15 is updated according to S11, S2, and S0, according to the feedback polynomial. After that, S15, S5, and S0 are used to update the FSM and to generate the case-stream block. For the FSM, it takes S15 and S5 as the input, and the output will be used to be XORed with S0 to generate the case-stream block safety. So, specifically, the IT is generated by adding R1 and S15, and then XORed with R2 and S0. Then, the registers are updated. R1 is updated by adding R2 and the XOR sum of R3 and S5, while R2 and R3 are updated from R1 and R2 through two S-transforms, S1 and S2, respectively. Here, S1 and S2 are 32-bit, two 32-bit S-transforms. The picture shows the construction of the S-transforms, which are composed of four byte-wise S-boxes, followed by the mixed-colon operation in AES. So, S1 can be expressed as L1 times SR, whereas SR is the AES S-box, while L1 is the mixed-colon matrix in AES. While S2 can be expressed as L2 times SQ, here the S-box SQ is derived from the Dixon polynomials, while L2 is the mixed-colon matrix in AES, as well. Next, I would give some details of the linear-couple analysis we have on S1 and S3. We would first give the linear approximation of the FSM we have found. Then, we show how we use it to give a distinguishing attack and a curation attack. Before going into the details, I'd like to first give some basics for the linear-couple analysis of stream ciphers. The basic idea is to approximate long-linear components as linear ones and further to derive some linear relationships. If this linear relationship involves both the electric stress and k-stream symbols, it would result into a curation attack. Well, if these linear relationships only involve k-stream symbols, it will give a distinguishing attack. For the linear approximation, if we express the output k-stream symbols Z as the output from a long-linear function NF, we can approximate it as the linear function LF plus the noise E, which would be biased. Usually, a linear approximation is binary, but in our paper, we consider the general vectorized linear approximations. Suppose noise E has distribution D, we can use the SEI, which is squared Euclidean impedance, to evaluate it. The SEI is defined as the sum of the squared differences between each entry of the distribution and the uniform probability, and then multiplied by the dimension of the distribution D. It's well known that the required samples to distinguish E from London is around 1 over epsilon, and we can see the noise E evaluates the quantity of the linear approximation. And the key point of linear group analysis is to find a good approximation with the large bias. Recall that the FSM takes S15 and S5 as an input, and the output is XORed with S0 to generate a k-stream block ZT. Thus, we would like to explore linear expression, including only S15, S5, S0, and Z for time set I, such that this linear expression is biased. Here, the same values are the linear masking matrices. The SEI of this linear expression evaluates the quantity of the approximation, and the goal for us is to find a good time set I and good masking matrices, such that the SEI is as large as possible. Since the FSM has three registers, we would like to consider three consecutive k-stream blocks to cancel them out. If we know the values of R1, R2, R3, and S5 at clock T, we could get the values of R1, R2, and R3 at clock T minus 1 and clock T plus 1, which are shown in the block. Correspondingly, we could get the k-stream symbols at these three consecutive time instances, which are shown in the block. For the k-stream block at clock T plus 1, we use the linear masking matrix, which is the inverse of R1. We then build a 24-bit linear approximation. Specifically, we combine the first bytes of each k-stream block to build a 24-bit variable, which is denoted as ZT here. While for the expression on the right side, we divide it into three parts. The last part is a linear part, which is denoted as ST. It is contributed from EFSA. While the first two parts are nonlinear, denoted as N1T and N2T, which are regarded as the noises, and they are independent to each other. So if we approximate ZT as the linear part as T, we would get a noise NT, which equals to N1T x or N2T. And we would like to know the distribution and the bias of the noise. Since N1T and N2T are independent, we could get their distributions and biases independently. For N1, it's easy, since we only need to loop over the first bytes of R1 and S15 with complexity through the part of 16. Well, for N2, it's complicated. Since there are four 32-bit variables, which are R2, R3, S5 and S15, and it's impossible to get the distribution by looping over all of them. So we need some smart way. The smart way is to spread the involved variables and also the noise expression into a smaller field, and we then compute the sub-distributions in this smaller field and combine them to get the full distribution. So the figure shows how it works. We can see R2, R3, S5 and S15 are divided into four bytes. And correspondingly, some functions are also divided into four sub-functions. Since the distributions for every byte can be computed and combined in the end to get the full distribution. And we would like to mention here that we need to consider the carries between different bytes. First, wash-hard mode transform can be used to speed up this process. The complexity is around 2 to the power of 41 and the bias is around 2 to the power of minus 29. The total bias is around 2 to the power of minus 37, and if we consider the XOR sum of four such independent noises, the bias would be 2 to the power of minus 163. Then someone would ask if the derived bias is correct, and we answer it by providing experimental verification. Recall that for a distribution Px with bias epsilon, around 1 over epsilon, number of samples are needed to distinguish Px from London. So if with 1 over epsilon, number of samples, we can distinguish Px from London, we can conjecture that the bias of Px could not be much smaller than epsilon. The tool for the distinguishing is the hypothesis testing, where hypothesis H0 indicates that the sample distribution follows the computed noise distribution, while hypothesis H1 indicates that it follows the uniform distribution. The core back lever divergence can help to make the decision. The KL divergence between two distributions, X and Y, measures the difference between them, and the closer X, Y is, the smaller the KL divergence would be. So if the KL divergence between the sample distribution and the noise distribution is smaller, it will give the decision that this sample distribution follows the noise distribution, otherwise it follows the uniform distribution. Recall that ZT equals ST XOR NT, and if we XOR ZT and ST, the result would be NT, which would be biased. So we can collect many samples of ZT XOR XOR ST and verify if this sample sequence follows the noise distribution or the uniform distribution. So we have run 64 Snow's regime instances, and each up to the power of 40 iterations. In each iteration, we build a sample XT by XOR NT and ST, whose detail is given in the bracket. At the same time, we also build random sequences by collecting the lower 24 bits of the K stream samples. Then for each of these sequences, either the sample sequence or the random sequence, we check which distribution it follows, and there could be two types of errors, type one errors, where our noise distribution is judged as random, and type two errors, where our random distribution is judged as biased. The picture shows the results of the error probabilities under different lengths of samples. We could say that the error probabilities decrease with the increase of sample lengths, and at the length of to the power of 40, both error probabilities are smaller than 0.1. At the length of to the power of 41.5, there are no errors out of our 21 sample sequences. So with 8 to 16 times 1 divides epsilon number of samples, we could distinguish the same sequences with large success probabilities. And we could say that the bias should be correct. So next, we would use this linear approximation and this bias to give a distinguishing attack and a correction attack. A distinguisher can distinguish in the K stream sample sequence from random. We recall again that the XOR sum of ZT and ST equals to NT. If ST can be cancelled, ZT would become biased, and with enough samples, ZT can be distinguished from random. Then the main question becomes how to cancel out ST. The idea is to find a time set I with usually three, four, or five time instances, such that the XOR sum of the LFSA contribution at this time instances is 0, and corresponding the XOR sum of the K stream samples at this time instances equal to the XOR sum of the noises, and which would be biased. This process is equivalent to finding a multiple of the generating polynomial Px of weight three, four, or five with all coefficients being one. And there is already some research into this problem. And we will use the method from a paper in 2014 to find a weight for multiple Px with time and storage complexities to the part of 172. Suppose the four time instances are T4, T3, T2, and T1, then the XOR sum of the LFSA contribution from these four time instances would be 0. Besides, any time shifts T of Kx, which is x to the power of T times Kx, are still weight for multiple. And this means that the XOR sum of the LFSA contribution from these four time instances shifted by T clocks are still 0. Then we could build new biased K stream samples for different T-venues where the sample at clock TXT equals to the XOR sum of the K stream samples at these four time instances shifted by T clocks. And the result would be the XOR sum of the noises. If we regard these noises as independent, the bias would be 2 to the power of minus 163. However, there is actually some dependence between these noises. So the bias would be even larger than 2 to the power of minus 163. So the data complexity is upper bounded by 2 to the power of 163. So till now we have given a distinguishing attack with time, storage, and data complexities all far below 2 to the power of 256. We also give a correlation attack. A correlation attack is usually modeled as a decoding problem in the binary field or binary extension field. The picture shows the model. So the LFSA initial state S with length L is regarded as an information symbol. While the LFSA output U with a much longer length N is regarded as a code word. Here we would like to mention that every element Ui of U might not be the exact output of the LFSA, but could be some linear combinations of the LFSA states. The code word U can be generated by the multiplication of the information symbol S with a generation matrix G. The received code word, which could be some linear combinations of the K-stream samples, Y could be regarded as a noise version of the code word U. So every element Yi equals to Ui XORs with a noise Ei. From the coding theory, we know that when the code rate R is smaller than the channel capacity, there always exists a decoding method which can successfully recover the information symbol. Because that decoding problem is usually defined over the binary field or binary extension field, so our 24-bit approximation could not be directly used. So instead, we viewed a new 8-bit approximation by XORing every byte of the noise. But here we used two linear masks lambda and gamma applied on the first and last byte of the noise. We searched for different values of lambda and gamma, and the best one is given here, which gives us the bias to the power of minus 41. Then we could get the every element of the code word UT and the received code word YT, and here we could say they are the linear combinations of the LFSA states or the K-stream blocks. Then the task is to recover the information symbol S according to the received Y sequence, and it consists of two stages. During the pre-processing stage, many private checks are generated, while during the processing stage, the decoding is performed. We used the method from the paper of Bin Zhang in crypto 2015 to give the correction attack. During the pre-processing stage, many private checks involving fewer LFSA states are generated, and the decoding requires private checks to the power of 172. The time and space complexity is 2 to the power of 177. During the processing stage, a distinguisher is built, and this distinguisher would be biased if a guess of the LFSA states is correct, otherwise it would be random like. First, what can help to compute the bias of the distinguisher? The decoding complexity is 2 to the power of 175 and 160 bits are recovered. We also gave a 16-bit correction attack with similar complexity, but fewer private checks. So that's our work. To conclude, we have given linear quip analysis of Snows 3G. We have found a 24-bit linear approximation with bias 2 to the power of minus 37. We verified these bias by connecting a large number of samples. We then used this linear approximation to give a distinguishing attack with complexity to the power of 172 and a correction attack with complexity to the power of 177. This linear quip analysis results indicate that if the cadence in Snows 3G would be increased to 256 bits, there are academic attacks on it. But we would like to mention here that these attacks are not immediate straight for 5G, since in similar networks, the lengths of the keyframes are usually restricted, such as we cannot get enough data. That ends my presentation. Thank you very much for your attention.