 Welcome everybody to AsiaQuip 2021, my name is Alexei Dovenko and I will be talking about convexity of division property transitions, theory, algorithms and compact models. Enjoy! First, I would like to provide a quick high-level overview of the work. The work studies division property, which is a cryptanalysis technique for symmetric ciphers. There are many variants of division property, and this work focuses on the earliest and the most widely applicable one, bit-based to subset division property, also called traditional or conventional division property. It was introduced by Todo at EuroCrip 2015. The contributions of our work include new theoretic insights, such as closed links of division property propagation with graphs of finalized functions, which lead to one of the key results, a new compact representation of division property propagation suitable for modeling by SAT solvers. In addition, we described new algorithms for computing propagation tables and compact representations in quadratic time, improving previous cubic time approaches. As a proof of concept, we studied the super S-box model of LAD, which was impractical by previous methods and was challenged in the recent work of Derbets and Fouk. Now I'm going to introduce all the relevant concepts required for the work, including division property, working with the partial order on bit vectors, and formalization of division property. Consider a cryptographic primitive built from iterations of a simple round function, for example a block cipher or a stream cipher. In integral cryptanalysis, we want to know whether the multivariate polynomial expression of an output bit of the primitive may contain a chosen monomial of the input variables. If the presence of the monomial does not depend on the involved secret keys, we obtain a distinguisher from a random function, which can be used to attack the primitive. The core idea of division property is very simple. In order for y0 as a function of x, to contain the monomial x0, x1, x2, there must exist a sequence of monomials in the intermediate states, such as at each step the product of functions defined by the output monomial contains the input monomial. For example here the product of the bits of w with indexes 2 and 5 as a function of z, may contain the monomial z1, z3. Such a sequence is called a division trail, and each step in it is called a division property transition. Usually division trails are searched using SAT or SMT solvers or MLP optimizers. In this work we focus on studying single step transitions in depth. In particular we aim to find compact encodings of possible transitions into such solvers. This process is often called modeling. A notable difference we see linear or differential trails is that here the distinguisher holds only if there is no division trail, otherwise we cannot conclude on the existence of the distinguisher. I will now introduce some definitions related to the partial order on bit vectors that are used extensively in this work. The order is defined as follows. We say that a vector u is below or is equal to a vector v, if for each bit position the respective coordinate of u is smaller or equal than that of v. On the slide you can see the partial order graph associated with the set of four bit vectors. Here edges go upwards and one edge corresponds to flipping one zero bit into a bit equal to one. Then the partial order can be reformulated as follows. A vector u is below or is equal to a vector v, if there is a directed pass from u to v in this graph. Monotonicity and convexity are defined with respect to this partial order. A set is called a lower set if there are no edges going from a non-element to an element of the set. The upper set is defined analogously. Extreme elements which are maximal for the lower set and minimal for the upper set form a compact representation of monotone sets. For upper sets extreme elements can be viewed as patterns with wild cards and ones. For lower sets the patterns consist of wild cards and zeros. They can be seen as vectorial upper or lower bounds. A set is called convex if it has both lower and upper bounds. In other words it is an intersection of a lower set and an upper set. Since we aim to model transitions for say a SAT solver we are interested in modeling this monotone or convex sets. Quite naturally an upper set can be modeled using a monotone DNF, disjunction normal form, where the number of the clauses is equal to the number of the minimal elements in the set. For example the first clause here corresponds to bit vectors with the last two bits equal to one, and all such vectors are included in the considered set. Alternatively an upper set can be modeled by a cnf formula, conjunction normal form. It requires to consider the complementary lower set. The size of the formula then is defined by the number of maximal elements in the complement. It is worth noting that the compact representation is not always really compact. In addition the gap between cnf and dnf representation can itself be huge. However on practice there is a significant chance of large reduction in size of the representation, increasing modeling possibilities. Finally a convex set can be trivially modeled by concatenating the lower and the upper bounds in the cnf form. This method however doesn't work with dnf formulas. Notably and perhaps counter-intuitively, due to higher dimension the upper and the lower bounds can intersect, as is shown in the example on the slide. Here the vector 1010 is both minimal in the complementary upper set and maximal in the complementary lower set. This phenomenon introduces some freedom into modeling. We can choose for the elements in the intersection whether they should be removed by an upper set or by a lower set. I will now briefly recall formalization of division property. At CRIP 2016 Buran Kanto introduced parity sets as another view of division property. Simply speaking parity sets capture which products of bits sum to one over the given set. Toto's division property of a set can be defined simply as a lower bound on its parity set. Here upper closure simply means the smallest upper set containing K. It can be shown that the parity set is basically equivalent to the algebraic normal form of the set's indicator function. I recall that indicator function is a Buran function equal to one on the elements of the set and two zero on the non-element of the set. In other words it represents the set as a function. So using this link we can conclude that division property simply defines the vectorial upper bounds on monomials in the ANF of the indicator of the set. Recall the division trails I mentioned when introducing division property. For example in the trail here I said before that we know that the product Z1 that's 3 may contain the monomial X0, X1, X2 for some key K0. To handle the secret keys being stored between the round functions in a more simple way we can relax the propagation rule to allow smaller products of Z to contain larger monomials of X. Then we can ignore the secret keys. For example we consider this transition valid if say Z1 instead of Z1, Z3 contains X0, X1, X2, X5 instead of just X0, X1, X2. Where we consider only the function F0 and no keys at all. This manipulation introduces the monotonicity aspect because we can now add or remove variables from the products which is equivalent to flipping bits in the exponents. Now I will describe new insights into division property transitions including new characterizations leading to compact representations and also exhibiting explicitly convexity of minimal transitions. First recall the definition of the graph of a vectorial Boolean function which is simply the set of all valid input output pairs of the function. One of the main results of this work provides new characterizations of valid division property transitions. In particular it turns out that the set of valid transitions through F coincides up to negating the input part with the division property of the graph of F as a set. It is an interesting connection between the division property of a set and transitions of such division properties. In addition it draws attention to the extreme elements which are minimal elements of the parity set of the graph or alternatively transitions U to V with maximal U and minimal V. Moreover thanks to the AF formulation of division property we can deduce that this extreme set coincides again up to some negations with a set of maximal monomials in the NF of the graph indicator function. This link is in particular interesting due to a recent work by Carole deriving new methods of degree bounds based on the degrees of the involved graph indicators. I will focus on the extreme set of transitions highlighted by these characterizations as it will play the role of the compact representation. And that's exactly how we define the compact representation which due to its utility we shall call the division core. So the division core of a vectorial Boolean function is simply the set of minimal elements of the parity set of its graph. Equivalently it is the set of transitions with maximal U and minimal V with the first part negated. Alternatively it is the bitwise negation of the set of maximal monomials in the NF of the graph indicator. I would like to emphasize the difference with the notions from the literature since it may be confusing. Classic propagation of division property focuses on transitions which are called minimal or reduced which only require that the output vector V is minimal for every fixed input vector U. The idea of considering every possible input vector comes from the table-based modeling method. First the table of all transitions is computed where for each input vector all possible output transitions are enumerated. And then this table is modeled by an appropriate method. Here we instead avoid full enumeration of input vectors and record only constraints in some form which is achieved by considering transitions with the maximal input vector U. A simple formulation then is that any valid transition should be equal to or lie above of some vector from the division core. The compactness here is associated with the fact that we consider the set of minimal vectors. I remind that it doesn't have strong guarantees on the size but provides very reasonable results in practice, at least if compared with the enumeration of full input vector. Finally I recall from the beginning of the stock that extreme sets lend themselves naturally to CNF or DNF modeling and so does the division core. It is notable that solely from the division core we can deduce the full set of valid transitions or the full set of minimal transitions. Even more, division core contains information about division transitions through the inverse of the function if it exists. This highlights completeness of division core as representing the behavior of the function with respect to the traditional division property. One last bit of the theory covers minimal transitions, whereby minimal I mean the usual minimality under a fixed input vector U. These transitions are important due to their utility in modeling where we want to reduce the search space as much as possible. If you look at the full set of possible transitions, invalid transitions form a lower set and redundant transitions form an upper set. It follows that minimal transitions form a convex set. It has both a lower and an upper bound. Therefore, we can model it by CNF formula by removing the redundant upper set and the invalid lower set. However, due to the high dimensions involved, the covering of non-minimal transitions by an upper set and a lower set is not unique. We saw that already on the example from the convex set, where the maximal complementary lower and upper sets actually intersected. On practice, we noticed that removing the maximal complementary upper set usually has more compact models. In this case, these side parts are removed twice by the lower bound and by the upper bound. While this is a simplified picture, it nicely conveys the idea. All these sets and their extreme parts can be computed from the division core. I would like to emphasize that here the minimal set is defined by considering each fixed vector U separately. While in the figure, we consider the full set of transitions. This is precisely what allows to obtain more compact models. Here are examples on model size for some S-boxes. For S-boxes, we consider modeling the set of minimal transitions precisely. In the table, you can see the number of minimal transitions in the set and the size of the CNF representation using our methods and also using optimal modeling methods, the Quine-Maklowski algorithm. We can see that our representation is about twice as larger than the optimal one. The problem with the optimal one is that it does not scale well. Obtaining even suboptimal results using the Quine-Maklowski algorithm quickly becomes invisible for larger S-box sizes, while our method scales reasonably well. Notably, we observe very compact representation for heavy S-boxes, that is, those having most high degree monomials in their output bits or in small products of output bits. For example, the AS S-box has quite compact model. Misty S-boxes, however, have algebraic weaknesses, which was exploited by Todo in the attack on full misty, and so have more complex and less compact structure of transitions. For larger S-boxes, I focus only on the number of constraints necessary to remove the invalid trails. Redundant trails can be removed imprecisely by much smaller formulas, such as cardinality constraints. We can model super S-boxes with varying success. For example, strong LEDs super S-box can be modeled by about 300,000 constraints, while weaker Midori's super S-box requires 2 millions. For modern SAT solvers, it may be still in the feasible range, but it really depends on the exact setup. An interesting case is the heavy linear layer of LED, which is based on an MDS matrix. It was the source of complexity in the work by Derbets and Fook. It can be modeled by just 30,000 constraints using our method. Finally, merely as a proof of concept, we managed to model a randomly generated 32-bit S-box. As I mentioned before, heavy S-boxes tend to have very compact representations, which is nicely illustrated by this case, where we need only 3,000 sign-of-closes to model all valid transitions. The compact representation is practice-oriented, so it is necessary to understand how to compute it and work with it in practice. To this end, we propose an elegant algorithmic framework capturing all the necessary manipulations. The key algorithmic component we use is a slight generalization of the classic ANF computation algorithm, or the fast Fourier transform. These algorithms transform an array of size 2 to the n in quasi-linear time. The basic function we use, called transform, takes an array of size 2 to the n, which can be the two-stable of a function or a set's indicator vector, and transforms it by two elements at a time, such that indices of these elements differ in exactly one bit. The parameter F of the algorithm is a two-valued binary function described in the manipulation. Depending on this function, we can obtain many useful bit-oriented transformations. The function XORUP, which XORs the first element to the second one, corresponds simply to the Möbius transform, where it is the classic ANF computation. XORDown, which XORs the second element to the first one, computes the parity set of a set. This transformation is in particular important in view of our theorem, which links the set of division property transitions with the parity set of the graph of the function. Or up and or down, simply compute the upper closure and the lower closure respectively, which are the minimal upper or lower sets containing the given set. Finally, less up, which replaces the second value with the n0 of the inputs, computes the set of minimal vectors, when called after the upper closure. Similarly, we can compute the set of maximal elements. Again, all these algorithms work in time n times 2 to the n. Using these algorithms and our theoretical insights, we can deduce a simple algorithm for computing the division property propagation table of a function, which improves previous algorithms, mainly due to the non-heuristic mean-set step. As a result, we can compute all relevant sets for a netbox in quadratic time, while previous algorithms work cubic. In addition, this framework can be implemented very efficiently in a bit-slice fashion. Finally, I will briefly mention our application to LED. Unfortunately, while compact modeling greatly expands visibility of modeling division property, we could not find other interesting applications with large S-boxes. LED is a lightweight boxyper published at chess 10 years ago. Best integral distinguisher is on 7 rounds, due to Hu Wang Wang, who used an SMT solver to model the linear layer precisely for the first time. I recall that with our method, we can model it with CNF directly. Recently, Der Besen Fuch applied ad hoc division trail search using super S-box models of ciphers and linear combinations of bits in the input and at the output. Their method was not practical for LED due to its complex linear layer. Therefore, the question remained open whether the super S-box model with linear combinations can find 8 round distinguisher or not. Our compact models for LED are rather large, but reasonable and farewell with modern SAT solvers, such as KISS SAT. The 8 round model for a fixed linear combination is solved in about one minute. We exhausted all linear combinations, but found no 8 round distinguishers. Therefore, one has to go beyond this model to find them if they exist. Here we can see an example division trail for LED. The first column in the first transition covers a subset of all possible linear masks alpha that correspond to a constant component at the input's super S-box. Similarly, the first column in the last transition covers a subset of all possible linear masks beta that correspond to a considered output component. We found that all possible pairs of masks alpha beta can be covered by 255 different columns on each side. This means that all combinations can be checked by running the SAT solver almost 2 to 16 times, which is quite a lot. On practice, we found that a small set of discovered trails is sufficient to cover all possible pairs, so that a SAT solver has to be run only a few dozens of times. To conclude, please look for more results in the paper and check out the code repository, which has convenient implementations of the described techniques. There are still many interesting problems left, such as compressing CNF models into compact MLP models or using more advanced techniques to find or prove in existence of an 8 round distinguisher for LED. And of course, we hope that the new techniques will find more applications. Thank you, and I will be happy to answer your questions.