 Hello and welcome to this presentation. In this video I'll be presenting a new approach to linear cryptanalysis. So linear cryptanalysis is one of the most important cryptanalytic techniques in modern cryptography. And what we are interested in here is a linear combination of input bits. So that's something of the form u transpose x, where u is a mask, which is essentially a bit factor. And a linear combination of output bits. And we want to know what the probability is that this linear combination of input bits is equal to that linear combination of output bits for a uniform random input. Now we do this for a permutation or a function f, which can be for example a block cipher or a cryptographically secure permutation. And if this is a strong block cipher, then we would expect that this probability is very close to one half. So we can define a measure of quality as two times the distance of that probability from one half. And this is what we call the correlation of that linear approximation. So well-known fact is that if we have about one over the squared correlation samples, then we are able to set up a linear distinguisher. So it's not so surprising, given the importance of linear cryptanalysis, that there have been a large number of variants and extensions that were proposed. And I list some of those on the slide, but it's not so important if you don't recognize all of them. What I'm interested in in this presentation is the theoretical analysis of such function f. And in particular I want to know for a given approximation what the correlation is, because that also will allow us to search for the best possible masks, u and v. And I want to also do this of course for those extensions and variants. So I'm less interested in statistical aspects such as what set we should sample, the inputs from, or what the data complexity and success probability of the attack is. So in this presentation I will try to give a first introduction to the geometric approach to linear cryptanalysis and I'll do this by discussing three of its main goals. And the first one I want to discuss is how do we ensure that all those different variants of linear cryptanalysis, which are based on different kinds of input properties and output properties, that we can represent them in a uniform way. And after that I will use this description in order to generalize some concepts from linear cryptanalysis such as approximations and trails. So what are inputs and output properties? An input property could be for example that we say that the input is an element of a set S, so we have a subset S of the domain of your cipher G. And then an output property could be that we want to count how often the output is in a certain set R. And this basically boils down to computing the size of the intersection between R and F of S. Another way to think about this is instead of using sets to use indicator functions of sets. So the indicator function of the set S is just a function which is zero everywhere except when the input is in the set, then it's one. And this is a complex valued function, in fact it's just a real valued function, but later on it will be convenient to consider also complex valued functions. And here I just want to introduce one notation. So I will say that this function, this indicator function is an element of C superscript G. And that is basically the set of all functions from G to the complex numbers. And that's actually a vector space. So what is the output property then? Well, we have a set R, so we could say, okay, that's also just the indicator function of R, but that's not really what we want, because what we actually want is we want to count how many values are in R. So that means that if we think of the output as being itself an indicator function of some set, which just means the output is in some set, then we want to be counting how many elements of that set are also in R. So actually we need a function from C superscript G to the complex numbers. So this maps functions to complex numbers. And in particular for this example, what we will do is we will make a function that maps a given function f to the average of that function over the set R, average up to a multiplication by the size of R. And this can also be expressed as taking the inner product of the indicator function of R with f. So if we apply that function to the indicator function of f of s, then indeed we get the size of the intersection between R and f of s. So generalizing this, we could say that an input property is like a state. It represents what the state is. And this is mathematically represented by a function from G to the complex numbers. So this is like saying our input is a collection of values and for each of those values we assign a given weight and that weight is a complex number. And the observation of the state is then what corresponds to an output property. So the way we represented it above was using a linear functional. The important part here is that this is actually going to be always a linear function. That's the requirement that we impose. And so this is an element actually of this CZ star, which is the space of linear functionals or the dual. And there is actually a way of simplifying this because for any G star we can find a G such that G star of f is the same as taking the inner product of G with f. So that means that we can actually identify G star and G by choosing an inner product. And this basically means that both input properties and output properties can be represented by functions from G to the complex numbers. So something else that we need of course is we need to know how certain properties and input property transforms when we apply a certain function. And in fact this just means that we need to track how every value in this collection is mapped to another value and we then need to keep track of what weight was assigned. And there is a simple relation between f and the output state which is just a multiplication with a matrix or in other words a linear operator. And what does that operator do? Well it maps the delta function delta x which is just 0 everywhere except when the input is x to the function delta f of x. And this completely defines that operator. One thing that I want to mention which is not so important for this presentation but it's very useful in practice is that we can represent this matrix df in a different basis. So we can just take for example one change of basis is the Fourier transformation. And if we take the Fourier transformation then this matrix df should be replaced by a matrix cf which is also known as the correlation matrix. So what is the basis that we use in the case of the Fourier transform? Well it consists of group characters which are homomorphisms from G which we now assume to be finite a billion group to the complex numbers with multiplication. And so Fourier transformation goes from the delta function basis to the basis of group characters because we don't want to give an arbitrary labeling to those group characters. We simply index the basis functions by elements of group g hat which is the group of all the group characters. And the big advantage of this in practice is that if you have a constant addition, so a translation, so f of x is x plus constant d, then the corresponding correlation matrix, so this matrix cf will be a diagonal matrix. That's because of the properties of these basis of group characters. So far I've discussed only one dimensional input properties, but in fact we can generalize this. We can say the input property or output property is a subspace of functions. And that just means that we consider in the case of the input all possible states in the subspace at once. And similarly we consider inner products with all possible functions from the vector space for the case of output properties. And this is necessary for example for multiple linear cryptanalysis and also for a broad class of input output properties which are known as projection functions. And I'll discuss those on the next slide. Before that I just want to mention that everything I will say in this presentation is independent from the choice of the basis for v. And this is very important because sometimes two different properties or different seeming properties are actually the same, but they are just represented by a different basis for v. So what are these projection functions? So it's a framework introduced by Wagner at FSE 2004. And Bernier at all at AsiaCube 2004 used this to provide a general statistical analysis of optimal distinguishes. So for this projection function based distinguishes. So what does it do? It says that we are building a distinguisher based not on the values of input and output itself, but on the projection of that and that projection is just a function, a sort of compression from the domain of the cipher to much smaller set S. And what I will want to do here is I want to show that these projection functions can all be described within the geometric framework. So we need to associate a vector space of functions to them. That accurately represents that property. And this is the space that I'll be making correspond to that, which basically is all the functions from S to the complex numbers composed with the projection. So just to illustrate what that means, let's take a look at the case of linear cryptanalysis. So in linear cryptanalysis, P is a Boolean function, a linear Boolean function from F2 to the end to F2. And then this V is just a span of delta 0 and delta 1 composed with P. So this for the input property, for example, means that we can apply both the set where P evaluates to 0 and the sets. So it's a hyperplane and then the complement of that set, those can both be applied to the cipher. And if we add up those two basis functions and subtract them, we get a new basis consisting of the functions chi 0 and chi u. So this is what this V looks like for linear cryptanalysis. So one note actually about this is that usually in linear cryptanalysis we do not consider this chi 0 because for permutations this is always preserved as it corresponds to basically the uniform distribution. And so it doesn't provide additional information. Other examples include multidimensional linear cryptanalysis where then you would have a projection function from F2 to the end to F2 to the M, where M is usually much smaller than N. So that would be a linear function as well. But multiple linear cryptanalysis as opposed to multidimensional linear cryptanalysis does not actually follow that framework because of a subtle difference that in multiple linear cryptanalysis while we have several linear combinations that we observe, the distinguisher is not able to combine them in an arbitrary way. So now this provides a way of describing different variants of linear cryptanalysis in a uniform way. So especially those input and output properties, all of them can be represented by vector spaces of functions from G to the complex numbers. And now based on that we can generalize the notion of a linear approximation and then give some links between different types of approximations. So an approximation is just a pair of subspaces, one U corresponding to the input property and one V corresponding to the output property. And for this kind of a pair we can define an approximation map which is a map from U to V and it consists of three composition of three maps. So the first is an inclusion map just from U to C superscript G. Then there is this transformation by the transition matrix for F and finally we do an orthogonal projection on V. And this combination of three maps has the underlying idea of essentially computing all the inner products between vectors in U transformed by F, also functions in U transformed by F and then the inner product of those with functions in V. So this is the basis free way of expressing that. Of course it can also be expressed in terms of the correlation matrix if we take the Fourier transformation. So the first thing I want to mention about this is the notion of principle correlations which is a concept that generalizes ordinary linear correlation. And it's defined as the largest, so certain, the singular values essentially of the approximation map but we only take the largest few because some of them are necessarily going to be zero. And what this means geometrically is that we look at, if the case that F is an injection, we look at the principle angles between T, F, U and V. So in this example there is a common line here so that means the first principle angle will be zero and that means the cosine of that angle and the first principle correlation is going to be one. And then there is a second principle angle, so that's sort of a minimal angle which here is something non-zero but positive. So let's take a look at that in the case of linear cryptanalysis. So here we have this vector spaces U and V of the form that I described before. Let's assume that there is a permutation just for simplicity. So now the approximation map is going to look like this. It's a two by two matrix when we express it in the basis for U and V that's consisting of the scribe functions. And then the off-diagonal elements are zero because of the balance of linear functions and the on-diagonal elements are one and C, so some constant C. And that means of course the singular values are one and the absolute value of C. And this C actually is the correlation of a linear approximation because it is equal to this expression which is indeed exactly the same as the correlation of that linear approximation. So this means that indeed principle correlations are a strict generalization of ordinary correlations. So now I want to discuss a few possible geometric situations for approximations. The first case I want to discuss is that of perfect approximations where we have a situation where the space Tf times U is included in the output property space V. And if F is a permutation this is the same as saying that all the principle correlations are equal to one. So one interesting case here is when V is equal to U, that case is what I'll call invariance. And this includes among other things invariant subspaces and nonlinear invariance. One interesting thing to note here is that if T superskipped F is diagonalizable or equivalently the correlation matrix is diagonalizable which is normally the case, then any invariant subspace must split into one dimensional vector spaces spanned by the eigenvectors of the superskipped F. And after taking a free transformation this means that invariants are a span of eigenvectors of the correlation matrix of F, which is a characterization of invariants that I introduced in previous work at Asia Crypt 2018. So those are perfect approximations. Now I want to discuss the opposite case which is zero correlation approximations. So what we have here is we have a situation where TfU is perpendicular to the output space V. And this is again equivalently in terms of principle correlations all of them are zero. And I just want to mention, so this is for example zero correlation linear kip analysis. I just want to mention one interesting result about this. Namely if UV is a zero correlation approximation then UV perp which is basically replacing V by its orthogonal complement is going to be perfect approximation. So this is clear if you look at this figure if you take the orthogonal complement of the plane V here that's just a line perpendicular, sorry parallel to TfU so it's a perfect approximation. So this is very simple result but it's very powerful and one of the special cases of this for example is the well-known link between multi-dimensional zero correlation linear kip analysis and integral distinguishes. So those are kind of edge cases which are closely related by this link by taking the orthogonal complement. Now I want to discuss the general case in the remainder of this presentation. The general case basically depends on what the approximation map is so it includes then a description of nonlinear approximations and multi-dimensional and multiple variants of that as well as some things like partitioning kip analysis. So here the important thing is that we are talking again about probabilistic approximations. But the big question here is how do we compute that approximation map? So we now know that approximations are pairs of subspaces with an approximation map but in general we need to know what that approximation map is. And the way to do this for most ciphers is using trails and then we need a corresponding piling up principle. So what are trails in ordinary linear kip analysis? So the idea is essentially that we have a sequence of intermediate masks and these define a number of one-round approximations. So we assume that our function is actually a composition of smaller functions fi that are called round functions. And we have a linear approximation for each of those with a known correlation and now we want to glue them together in order to obtain an approximation for the composition of all those functions. And then the question is of course what is the correlation of that approximation? This is not a simple question in general to answer. In our general setting we have something very similar. We are asking for the approximation map over the composition of those functions given the approximation map of approximations over one round of those round functions individually. So here we also will call the sequence of subspaces u1 up to ur plus 1 a trail through this structure. So the traditional piling up principle is the way to glue together these one-round approximations. So what we do there is, first note that we are interested in knowing the probability of the left-hand side of this equation. So we want to know the probability that this is zero. So we can write that as a sum of similar expressions for one round and then everything drops out except the first and the last term. So let's call these terms zi. Now if the zi would be independent, which is not true, it cannot be true, but if they were independent then the correlation for this approximation over r rounds would be the product of those one-round correlations. But as I said they cannot be independent because they are all basically deterministically, so they depend on this xi and these xi are just deterministic functions of x1, all the same x1. So how can we motivate this if this is not true? Well, there have been two main ways. One is the Markov-Cypher assumption. The Markov-Cypher assumption is essentially equivalent to saying that we can say on average over independent round keys the correlation of this approximation is equal to the product of the one-round correlations. But this requires us to put a mask on the key because otherwise the average would be zero. But of course the downside of this is that first of all round keys are not necessarily independent and second of all, and this is more important, we are not really interested in the average correlation. We are always interested in the correlation for a fixed key. So that's the downside of the Markov-Cypher assumption. Also especially if you think about cases such as kptographic permutations where piling up is also regularly used but where of course there is no key. The second approach to this has been the dominant trail hypothesis and this essentially follows from a result that first appeared in Johan Daman's work on correlation matrices. So the result is that the correlation of a linear approximation is the sum of the correlations of all linear trails. So you need to consider all possible linear trails to get the exact correlation of the approximation. But if one of those trails has a much larger correlation than the others then it can be considered to be dominant and then this piling up principle might be true. So this is quite a nice motivation but the problem is that in a nonlinear case it's not so clear what all trails means because there's no unique way as for linear approximations to represent all the trails. So that is one issue. Another issue is that for some of the extensions this dominant trail hypothesis is also kind of underdeveloped. So for example for multiple linear cryptanalysis. So the general piling up principle proposed in this paper is that the theorem that says that the approximation map for the composition of f1 up to fr is equal to the composition of the one round approximation maps. So for each of the approximations in the trail plus an error term e and that error term can be computed explicitly if we want and that is also written in the paper. So in this case the motivation here sort of is very similar to the dominant trail hypothesis but it has a geometric flavor. So in particular we call that at the end of each approximation map we have an orthogonal projection on the output space. So if we start with something in u1 it will get transformed into something after tf1 which is not in u2 but we will project that orthogonally on u2. Now when we do that there is also part of the output which is not in u2 but which is in the orthogonal complement of u2. And that part will be essentially ignored in the sense that we will move that part to the error term e. And we just continue with the part which is in u2. We transform it into tf2 and then we get something which is not in u3 but again we project it orthogonally on u3 and then the part which is in the orthogonal complement is moved to the error term. So this essentially motivates piling up as a process of successive orthogonal projection and it's of course possible that something goes wrong here in case that there is a good approximation between one of those orthogonal complements and the output space. But this, similar to the domain and trail hypothesis tells us where to look when something goes wrong. So to conclude I've given a rough overview of the geometric approach to linear cryptanalysis so the first step here was to represent different kinds of input and output properties all by vector spaces of functions. This provides a uniform description for them and then based on that we can generalize the notion of a linear approximation to a pair of subspaces and for such a pair we defined an approximation map which then had some properties such as its singular values behave like a generalization of ordinary correlations. And then there were several different types of approximations such as zero correlation perfect approximations but in the general case it's necessary to compute that approximation map or at least to estimate it and for that we need trails and the piling up principle and so as I discussed this piling up principle in the general setting takes the form of a geometrically intuitive process of successive orthogonal projections. So as I said this is only a brief overview and there are more results and more details in the paper with also some specific applications. All the source code for the results in the paper can be found at the link that's shown here on the slide. Thank you for your attention.