 Generating graphs packed with paths is joined between Philip, Verna, and Matthias, Ho, Anderson, and Matthias, we'll give the talk. Thank you. Thank you for the introduction. So the talk is divided into five sections. So first, we're going to have some motivation about what we're trying to do, and some of the civil folklore about linear cryptanalysis. Then we are going to rephrase linear cryptanalysis as a set of graph problems, and then we're going to look at some heuristics for finding good sub-graphs and why this is of interest for SPNs. And then we're going to look at some plots and results, and then lastly, some future work. So first some motivation. So since the 90s, we have been knowing about linear and differential distinguishes. And generally, when you suggest a new design, one of the things that you have to argue about is provide some sort of argument against your resilience to linear and differential cryptanalysis. And of course, what is of interest is to determine the optimal parameters of these distinguishes, as well as the expected power of our distinguishes over the key space. So in this presentation, we'll primarily focus on linear cryptanalysis, both because of the session, and also because the differential case is largely analogous for the work presented here. So for an hour-round cipher, iterated cipher, we have this notion of a trail, which defines approximations over every round function. And the idea is that we can calculate the correlation of the approximation over every round function efficiently because it decomposes into relatively small and simple non-linear components for which we can enumerate the full domain. And then we have this notion of a correlation contribution for a trail, which is simply the product of the correlations over the approximations over all the rounds. And then we can describe the correlation for an approximation between alpha and beta over the entire cipher, simply as the sum of correlation contributions over all the trails under suitable assumptions. And then for key alternating ciphers, we generally assume that the analysis becomes, if you consider the square correlation, then it becomes independent of the key. It essentially becomes a sign. And then we have this notion of an ELP, which will be the expected correlation amplitude over for our extinguisher, and we can estimate, again, suitable assumptions. This is the sum of the square correlations of trails. And then what happens at practice is that we sum over some smallest subset, calligraphic U. Often this calligraphic U is merely a singleton, in which case you're assuming that there is a single dominant trail. So the idea is that we will sum over the dominant terms. We will consider the dominant terms of the summation, and they will give us a good estimate for the entire sum. But of course, the performance of a single trail, and particularly even any small set of trails, is not necessarily a good indication about your design susceptibility to linear and differential cryptanalysis. And this is the problem that we're trying to mitigate. And generally current methods for considering multiple trails of a whole are linear in the number of trails. And then for the signs, which have a very large number of good trails, this is suboptimal. So we would like to get some sublinearity, which is the goal of this work. And to get into this frame of mind, we are going to rephrase linear cryptanalysis as a graph problem. So the graph is going to be what we call a multistage graph. So in this case, we have a cipher of three rounds. And it's a directed graph here going from left to right, but I omit the arrows. And the vertices are parodies for linear cryptanalysis, and they're similar for differential. And the edges then naturally become the approximations of the round functions. And the length of the edge is the squared correlation of the approximation of the round function between the source and the destination of the edge. Then if we consider with this notation, then if we consider the length multiplicatively, then paths through the graph becomes trails. And the length of a path becomes the squared correlation contribution of the particular trail. And with this, we define the notion of a weight of the graph between an alpha and a beta as the input and output nodes. And this corresponds to the whole, which is the sum over the lengths of every path between alpha and beta. One very simple observation here is that you can calculate this in linear time over the number of edges, basically using bottom-up memorization. So you start by calculating the hole between any alpha and any u-node, which is trivial in this case. And then we can calculate the hole between any alpha and any v-node by, at any v-node, calculating a weighted sum of the holes between the alpha and the u-nodes. And then we progress similarly for the beta nodes, again calculating a weighted sum over the holes between the alpha nodes and the v-nodes. Also note that we can calculate, we get for free the weight between alpha and any beta node. So we don't need to consider any pair. But of course, this is all nice and fine, but the problem here is that the graph is part too large, right? So if we did naively define the graph as this, sure, the algorithm runs in linear time, but the graph is exponential size, so let's use. So the question is, can we find some suitable soft graphs that contain, in a sense, most of the good trails? And of course, the observation here is that we are not looking for good trails necessarily. We're looking at the more general problem of finding these good graphs, i.e. we want the maximization of the weight between any alpha-beta pair to be large. And for this, we have devised some heuristics. And in this case, we're going to be focusing primarily on SPNs, for which we have the best heuristics. So the overall method will be that we'll pick some sort of disjoint families of edges, which we'll describe. Then we'll prune the families using an approximation of the graph defined by these edges. And then finally, we'll expand these families to a full graph. And then at the end, we'll do some cleanup work on the graph. So one immediate and easy observation is that if we have any edge of length zero, we can just remove it. This corresponds to a trail with length zero, and it doesn't contribute to the whole. And then once we've removed it, we can remove any vertex that doesn't have a predecessor or successor, because no path can traverse it. But this still doesn't get us far enough. So now describing these families of edges, so this is probably best described by giving an example. So consider a 16-bit SPN, a small toy cipher, with four identical S-boxes. Then if we have the following two approximations over the S-box, with the following square correlations. So three goes to D with correlation two to the minus two, and seven goes to four with the square correlation two to the minus two. Then if we have the S-box pattern, which we call it, this will be a family. You want to have one of the families S1 to 1.2 to the minus two, 1.2 to the minus two. This corresponds to every approximation over the S-boxes, which have this particular pattern of square correlations. So in particular, the first and the third S-box must be inactive. And the second and the last S-box is basically any combination of the approximations above. And then we also naturally have the projection onto the first and the second coordinate, which corresponds to the set of all input parities and set of all output parities for the family. Then if we have some set of families, so we have some set of S-box patterns, then we can consider the graph defined by this set in a natural way, by simply just expanding every member of the set, these forms are edges, and then our vertices is just the union of the projections. So suppose we have some set of S-box patterns defining this soft graph that is of our interest, then we can immediately serve that for any intermediate stage in the graph, so any node that is not an input and an output node. If V does not lie in the set of both, if V does not lie both in the set of input and output parities of the expansion of all families, then it's immediately pruned because it can't have a predecessor or successor. Our problem now is that, so we've defined these families, we can keep the description of the families in memory, but the expansion we cannot possibly keep in memory. So the idea will be that can we somehow prune on these families before expanding them, so that we have less families to expand and we can manage the graph. And the overall approach and the more details in the paper is that we generate an approximation of the graph by applying a compression function very, very like differential, truncated to differentials in such a way that if there is an edge in the original graph and it's a path between two nodes in the original graph, then there is also an edge in the truncated graph but not necessarily the converse. And then our algorithm is basically iteratively refining the compression until you reach the trivial case. So we start by generating some set of patterns using a heuristic. We have a generic heuristic but if you have some cypher specific knowledge you could apply this. And then we pick a compression factor. Then we generate this truncated graph by applying the compression on every vertex when we expand the members of the set and then prune on the truncated graph. Then we remove any family, any S-box set, any S-box pattern for which all members of the expanded S-box pattern are not in the truncated graph because if a member is not in the truncated graph it cannot be in the full graph either. The truncated graph is sort of strictly more connected. And then we refine the compression until we get to the trivial case. And then we expand the graph. Notice here that in a sense this procedure is lossless. So you're not removing something that could otherwise have created additional paths. Once you've fixed your set of patterns the graph that you will get out in the end will have the same set of paths through the cypher. So suppose we have this following graph at the end. Then we observe the problem and this is basically that if we apply now pruning to the intermediate rounds at the end after expansion then we will lose a large part of a search space. And the way that we heuristically avoid this for SPNs is that we add some vertices that are not in the set of the expanded families that are optimal for the vertices that would otherwise be pruned. So they input and output vertices. See if we can do this here and here. Here and here that would otherwise be pruned. Then we add optimal input and output vertices to ensure that they're not. Okay. Then we can take a look at some plots and results. So we've implemented this heuristic in a publicly available tool that we call cryptograph. And for SPNs it's very simple to use. It's essentially file and forget. You give it a linear layer, give it a non-linear layer and it can start giving you analysis. So of course one thing you can do once you've found this cryptograph is that you can plot it at least for small parameters so that you can actually look at it with your eyes. And we hope that this will be of use to the cypher designers so that you can then visualize how the whole looks. So here's a plot for present. So you immediately see basically the Okuma observation that you have this huge amount of trails because you have this level of freedom in your choice of approximations over the round functions. In contrast, here's a plot for gift which is essentially a present-like cypher with this specially crafted bogey permutation that is designed to avoid exactly this observation that is found in Okuma. That you have this very large number of Hemingway 1 trails that are all roughly equally good. So in particular for a person, you cannot simply upper bound the square correlation of any particular trail and take that as indicative of your cypher's susceptibility to linear cryptanalysis because there is this very strong hold effect. So we have applied the technique to 17 different cypher's for both linear and differential cryptanalysis. And I want to highlight the following results. So we have four present-like designs. And for the present-like designs, we generally find a very, very large number of trails. So just as an extreme example for Puffman, we find actually the hold that we end up considering between our approximations contains 2 to the 112 trails. And for present-like designs, we generally see about 2 to the 60. So for present, we also see 2 to the 60 which is also not feasible to do if you would have to enumerate them with linear time. So we improved the analysis of present by essentially considering a vast number of trails at really no additional time. And then for rectangle, we also improved the hull analysis that is found in the original design work by considering a few hundred thousand trails. And lastly, some points of future work. So we would like to add support for ARX cypher's. We need some good heuristics for ARX cypher's. The current heuristics obviously don't port. And it's not immediately obvious how to do this. And we would also like to have some better heuristics for fast network. We do have support for fast network, but we would like to have some less generic heuristics. Yes, that's the end of my talk. Question from Matthias. Thanks for the talk. You said that for the tool, you intended to be helpful for designers. So if I am now designing a new SPN, what do I have to do to apply this tool to my new design? So for SPNs, it's really, really simple. You simply supply the S-boxes, so you can have different S-boxes in the same line, but we currently only for technical reasons don't support different S-boxes in different runs. But you give it the S-boxes. You give it the linear layer. You implement this in a Rust function. And then even if you also give us the key schedule, we can also generate an approximation of the correlation distribution over the key space. Because this method also gives easily rise to a way to quickly enumerate the correlation over the key space. You basically just re-weight the graph according to the particular expanded key that you get from running the key schedule. So for SPNs, it's really simple to apply. Okay, nice. Thanks. So does your tools also support the partial S-box layers like low MC? I think we do. I'm not particularly... I'm not super sure about that, actually. Okay. Let me see. I don't... The question is whether we've added that subsequently to this because we've kept working on the tool. I also have a question. Maybe I missed that part, but do you have a bound of the results and the actual situation? Could you rephrase that? Yeah. So do you have a bound between the result you find and the actual situation? Or is that error? So the results that we find, right? So if you give me these S-box families, then I will consider every trail that lies in this expansion. So if you give me, say, all the Hemingway 1 approximations over the Hemingway 1 parodies over the S-box, say, for present, then I can give you the result for the whole over every trail that lies in this, right? So in a sense, it's more robust. I see. So the heuristic step is choosing this calligraphic piece set of patterns. I don't know if this is... Did I answer your question? Yep. Okay. Any more questions? If no, let's thank Matthias and all other speakers in the session.