 Привет всем! Сегодня я буду говорить о ЦУК-256 и спектраль-анализистах для криптоанализистов. Меня зовут Александр Максимов, и я из Эликсона-ресечь, в Резадинге в городе Люнд, в Сведен. Эта работа была сделана вместе с Янг-Янг и Томас Йохансон из Люнд-Университи. Сначала я сделаю небольшую интердукцию к ЦУК-габаритуму, потом я покажу для линии дистингущих атак на ЦУК, потом я сделаю спектраль-анализистов для криптоанализистов. ЦУК-256 это домистический экстремальный сайфер, который использовался в 4G мобильных сообществах. Он был разработан и использован в Китае. ЦУК-258-бит-ки-версия была стандартирована по 3GPP в 2011 году. В данный момент, у ЦУК-256-бит-ки-габаритумы используются в 5G и beyond. ЦУК-дизайн-тимп предложил использовать именно то же дизайн, но с длинной 256-бит-ки и 184-бит-ИВ. Новая версия ЦУК-бит-ки-версия была представлена в Eurocrypt Rump сессии в 2018 году. И там была фоллоупная работа на альгаритуме. ЦУК-бит-ки-версия имеет линиевый фидбекшист-регистр ЛФСР и файнестет-машин ФСМ. И он производит 32-бит-ки-стрима в каждом клоке. ЛФСР работает в прайм-филде МОД-ЛОД-2-31-1, пока ФСМ построен на GF-2-32. Бит-реорганизация лея миксирует бит с ЛФСР и опускает 32-бит-ки-версия к ФСМу, длина как икс-терм в фигуре. Это трудно анализировать ЦУК-бит-ки-бит-ки-габаритум. Так что стандартные криптанализовые методы не очень легко применять. Но в этом работе мы нашли первую и лучшую знаменитую, академическую атаку, которая быстрее than exhaustive key search. Это линьер-дистингвичная атака в комплексе 2-2-3-4-3-6, которая означает, что цук-бит-ки-стрим генератера не имеет целых 256-бит-ки-бит-ки-версия. Яеру уduпатиться, что в 5-гитisions эта атака меняет nueva пора, он фактует только циордичный результат. ussr 0-1 К Фильм. the degree of that polynomial is expected to be around 2 to the power 167, based on the birthday paradox. Then for each 32-bit x-term constructed by the bit reorganization layer, we can get the equation under two parallel 16-bit additions, that is almost true, but with some 32-bit carry noise c. For example, if we consider the signal x1, then that equation would look like as it is shown in the green box. Wiesen found a theoretical result, basically saying that if we take s-values in the equation 1 and extract any consecutive t-bits from this s-values, then the sum of those sub-values truncated by t-bits would generate a random carry with only possible values 0 or plus minus 1, and probabilities of these values are the same for any sub-bits that we extract from those s-values. As the consequence in the red box, we then conclude on the distribution of the 32-bit carry noise c. The lower and higher 16-bit values of c are independent and they have the same distribution, such that the probability of 0 is 2 3rd, and for other two values plus minus 1, the probability is 1 over 6. Now, when we found the constellation rule on x-terms, we can proceed with approximation of the finite state machine. Consider the expressions for two consecutive key stream words. It's clear that two words should be enough to approximate the FSM, where on one side there will be a noise, and on the other side there will be x-terms. However, we can use the above constellation rule on x-terms and derive the total noise, while including the constellation rule on x directly into the noise expression, and it gives a larger bias overall. That idea can be used in analysis of other stream ciphers as well. In our attack, we chose a straightforward form of the sampling equation given in the yellow box, and there the attacker only needs to choose a single binary masking matrix M. As a result, we took a simple form of sampling, which is then equal to the biased noise expression. We found the masking matrix M that effectively made the bias of the total noise B2 to the power minus 236. We actually used squared Euclidean imbalance for the bias computation. It means also that the complexity of a distinguishing attack is one over the bias. This way we achieved an academic attack on Tsuk that is 2 to the power 20 times faster than exhausted key search. There are only two problems remain as noted in the green box. The first problem is how to compute those 32-bit noise distributions for N1 and N2 noise variables, the exact expressions for which will be given on the next slide. And the second problem is how did we actually found that linear masking matrix M that gave us the large bias of the total noise. Here we have complete expressions for 32-bit noises N1 and N2, where N1 is for the most split into the sum of two sub noises N1A and N1B, just simply because they use different random variables. Without going into details, these three sub noises are basically approximations of arithmetical additions into X-source plus those C-carry noises involved in the approximation of X terms. If we take for example N1A, a naive way to construct that distribution would require a very huge loop of size around 2 to the power 553, which is totally computationally infeasible. However, we can utilize and modify a bit slicing technique so that we could compute the distribution of this noise variables in time around 2 to the power 47. I would like to skip the details of the bit slicing technique and would rather refer to the full paper for this. So the second problem that is more interesting was to find the matrix M such that the total bias of the noise expression is maximized. And this we can do with spectral analysis techniques. So in the next part of this presentation I will mainly speak about spectral analysis tools for cryptanalysis. In cryptanalysis we often have to deal with multidimensional expressions for noise variables. It is fair to say that multidimensional approximations give larger biases than binary approximations just because of a multidimensional distribution contains more information than just a single bit of it. So I assume we have n-bit alphabet, like 32-bit alphabet, for example, and we have t random variables x1, x2 and so on. An n-bit distribution table of random variable x can be converted to the frequency domain by using either FDFT or Walsh Hadamard transform, depending on operations that we will later want to do with that spectrum. That conversion can be done efficiently in time by utilizing fast algorithms. So in case of 32-bit expressions, it is 2 to the power 32 times 32. The values of the spectrum table can be positive and negative, but note that the first value in position 0 is equal to 1, which is the sum of all probabilities of the distribution table in the time domain. In case the original distribution table was not normalized, then that value in the spectrum position 0 will serve as a normalization factor. So why spectral analysis is so interesting and what we can actually do there in the frequency domain? In the frequency domain, we can compute the bias of a noise variable in such a way that even very small biases can be handled efficiently without the need for long number arithmetic. In the expression for the total noise, we often see an arithmetic sum or an xor of two or more noise variables, so we can compute convolutions of such noise distributions efficiently in the frequency domain and it is in linear time instead of quadratic complexity in the time domain. We can also use spectral analysis for approximations of s-boxes, and not only the small ones, but also large composite s-boxes as well. And we can also search for a good linear masking in the frequency domain that would result in large bias of the total noise in the end. So spectral tools are very powerful and very interesting methods for cryptanalysis. Let's have a look how the bias can be computed in the frequency domain. So we compute the bias as a squared Euclidean imbalance as shown in the above first formula. When the bias is defined in such a way, then a distinguisher would need around one over the bias number of samples in order to distinguish that source of noise from random. This way, if the expected bias is, for example, 2 to the power minus 512, then in the time domain we have to use float numbers with precision at least 256 bits. This means we have to use large number arithmetic, which in turn takes a lot of RAM and computation time. Let's again have a look on the right picture with the values in the frequency domain. We have shown that in the frequency domain the bias is actually the sum of those red boxes at all nonzero locations, but each red value should be taken into power of 2. The zero point value is again a normalization factor. The consequence of this observation is that in order to handle small biases in the frequency domain, we only need a few bits of precision, but the exponent field should be there and it should be preserved. So the standard C type double works very well for storing the spectrum of a noise distribution, even if the bias is very small, so we don't need to use a long number arithmetic in the frequency domain. Then having two or more spectrums of noise distributions, we can actually perform a convolution of those noise variables, so that corresponds to the expression where we do an arithmetical addition or XOR of those noises. The resulting spectrum is achieved by a point-to-point multiplication of the spectrums of the original noise distributions. This way, for example, we can compute the bias of an XOR of two noise variables in the spectrum domain without having to switch back to the time domain. Recall that there we need float numbers with high precision. However, an important observation here is that the convolution is a point-wise multiplication of the input spectrum vectors, so that the largest bias of the result will be achieved if there would be a red box in the resulting spectrum that would have a very high absolute peak value. So the peak values in the spectrum will contribute the most to the total bias. So let's take a simple case when we have only two noise variables and we want to sum them together. If we take the first noise variable and find a way how to rotate or shuffle its spectrum such that the largest red boxes of the first and the second noise distributions match in some nonzero location, then the sum or convolution of these two noise variables will result in a large bias. Because of the peak values are now synchronized and the product of them contributes as a large noise in their result. This observation can be used for searching good linear masks and this motivates us to develop spectral tools for cryptanalysis fuser on. So consider the general case when we have t n-bit noise variables x1, x2 and so on and we want to find the sequence of linear matrices m1, m2 and so on, each of size n times n bits such that the sum expression above would result in a new noise variable x that would have a very large bias. So we found how the matrix multiplication in the time domain affects the spectrum in the frequency domain and this result can be used as the basis for an algorithm that can find the spectrum peaks in each of those noise distributions of x1, x2 and so on and then the algorithm will construct such m matrices so that those peak values in those sub-spectrums will be shifted and aligned in certain спектrum points and this way it will result in a large bias of the total sum of these sub noises. We also found a similar technique for discrete Fourier transform case where the sum of noise variables is actually an arithmetical addition truncated by n bits but here instead of searching for binary matrices we need to search for odd n-bit coefficients and then we do in our best in order to get a large possible bias of the resulting sum. We didn't go much deeper with DFT cases since we couldn't use these results for two cryptanalysis but we sort of sketched the general way forward in DFT cases as well. Spectral techniques can also be used for an analysis and approximation of S-boxes. Normally there are small sizes S-boxes which are then combined with linear transformations so that we get a large composite S-box. For example in Sook we have 32-bit S-boxes composed by first applying a 32-bit linear transformation then 4 8-bit parallel S-boxes to individual bytes of the 32-bit input word. In Snow 3G it is vice versa. First we use 4 8-bit S-boxes then afterwards we apply a linear combination matrix on the 32-bit word. So in general we can say that such composite S-boxes can be generalized as a matrix R times parallel smaller S-boxes on the product of another matrix Q times the input X. An approximation of such a composite S-box would then be a boundary matrix M times the input X. The question here is how to find the approximation matrix M such that the approximation would give us a large bias of the noise. We started our analysis from usual smaller S-boxes and we found how an approximation matrix M of such an S-box would affect the spectrum of the approximation noise. The case point of the spectrum of such an approximation by the matrix M times the input X can be expressed as another point lambda equal to K times M of the spectrum of the case vector B that is in turn associated and constructed from the given S-box and the index K. So we know where we want to move the highest peak, it is location K in the spectrum and we know where the highest peak is currently located, it is lambda. Then we derive the matrix M to make this happen. This is the central idea of the spectral analysis to be able to move high red boxes into a wanted spectrum position and this way we learn how to align multiple noise spectrums and get a large overall bias in the end. For small S-boxes we can basically pre-compute all possible values for different K and the different lambdas. So for example for 8-bit S-boxes that would cost us that would have a complexity 2 to the power 19 which is still doable. But moreover we will use these observations to handle large composite S-boxes in a very efficient way as follows. So we also found that if a composite S-box is simply a vector of T smaller sub S-boxes, then the spectrum point lambda of the case vector b of such large S-box would be simply equal to the product of corresponding spectrum points of smaller b vectors of individual sub S-boxes. So that if for each sub S-box we pre-compute all possible spectrum values for the pairs of K and lambda, then the spectral probing of the large composite S-box is very simple, it is simply computed by T-table lookups and T-minus one multiplications. Let me give an example how this all may be used. So for all basic S-boxes like S0 and S1 in TOOC we pre-compute lookup tables that contain spectrum values for all pairs K and lambda. Then assume the composite S-box is expressed as a number of linear transformations R and Q applied in between the parallel sub S-boxes and we will approximate that composite S-box by a large matrix M times X. Then for any spectrum point K of the approximation noise of such a composite S-box we derive intermediate indices K prime and lambda prime. Then the spectrum value of the composite S-box approximation or the noise variable at the point K will be then the product of subspectrum points in corresponding lookup tables that we have pre-computed earlier. So this is a very efficient way to probe the spectrum of a composite S-box without actually having to construct the spectrum itself. Coming back to the final step of TOOC analysis, recall the expression for the total noise as above in the first bullet. We will try to align peak values of sub noises in some position K in the spectrum. Then the expression for the spectrum value of the total noise in the location K can be expressed as given in the second bullet. So we then need to search for a pair K and lambda where the product of those spectrum values is maximized and thus achieving a large bias of the of the total noise. Note that the involved variables are 32-bit long, so we cannot really make an exhaustive loop on K and lambda as it would be then 2 to the power 64. However, we can use the spectrum of the noise N1 to search for promising candidates of lambdas and where the spectrum values are the largest. Then we can also search through the spectrum of the noise N2 for candidates of the index K. We then do an exhaustive search over all pairs K and lambda from the two lists of promising candidates and compute the complete spectrum value of the total noise for each pair K and lambda. For probing the S-box approximations, we utilize the results from the previous slide and do it efficiently in a constant time by using a number of small lookup tables. When the best pair K and lambda is found, then we then construct the matrix M such that lambda is exactly equal to K times M. Note that here we can actually synchronize 32 largest spectrum peaks with the same matrix M, but then in this case all K indices and lambda indices must be linearly independent. In the end, we verified our results by applying the derived matrix M and performing the construction of complete distribution tables of the S-box approximations and other involved noise distributions. The result confirmed that we actually got a distinguishing attack on soak of complexity around 2 to the power 236, which is faster than exhaustive key search. We think that presented spectral techniques may also be useful in analysis of other algorithms, not only stream ciphers, but also block ciphers. We believe there could be other improvements to our results and we encourage future research on these topics. Thank you all for your attention and listening my talk.