 Welcome to my presentation of the paper Analyzing the Linear Keystream Biases in ICHES. This paper is joint work by Maria Eichelseder, Robert Primas and me, Marcel Nagler. Today we are going to talk about our motivation, so why did we choose to analyze ICHES in the first place, then we will quickly recap the design of ICHES and previous analysis, afterwards we will show new bounds and attack on the keystream bias and then we will conclude with some final remarks. So why did we analyze ICHES? I think most of you have heard about the CESA competition, which aimed to find authenticated encryption schemes and when it concluded in 2019, ICHES was selected as part of the final portfolio for use case 2 high performance applications. This cipher was designed by Hong Jun Wu and Bart Brenel and they actually proposed a family of authenticated ciphers and it has three members, ICHES 128 which was part of the final portfolio, ICHES 128L which features a more efficient design and ICHES 256 which claims 256 bits of security and all of these members have very high software performance thanks to the AES based state update function. So previously it was analyzed by the designers who focused on the initialization and finalization part of the cipher and using a conservative bound based on the active S-boxes they were able to show resistance against differential clipped analysis. It was also analyzed in the non-smuse use settings and the most notable analysis was performed by Brice Minot in 2014, we used linear characteristics of the AES round function to find linear keystream bias in ICHES and his keystream bias has a squared correlation contribution of 2 to the minus 154 for ICHES 128 and 2 to the minus 178 for ICHES 256. And you may note that for ICHES 256 this leads to an attack which is more efficient than a generic keygassing attack and of course it has very high data requirements but the data can be collected on using different keys which is a plus. So in the meantime Minot's analysis inspired attacks on a similar CISA finalist Morus and first of all in 2018 a team of researchers proposed a linear distinguisher which they found by hand and discussed how these exploits could be used in practice and then about a year later another team of researchers used a mixed integer linear programming model to substantially improve the power of the linear distinguisher and by substantial we mean an improvement of an estimated complexity from 2 to the 146 to 2 to the 76 and while Morus has quite a different round function it has a somewhat similar modes to ICHES and this begs the question can we apply similar ideas to the ICHES ciphers? Well let's quickly recap the design of ICHES so as an authenticated encryption scheme it expands key and tenons into a much larger internal state and then for each message that needs to be encrypted a keystream extraction function is called which generates 16 bytes of keystream or 32 in the case of ICHES 128L and after the keystream is exiled onto the message the state update function is called which mixes the message into the internal state and once this is done enough time so for each message block a finalization routine is called and this outputs the tag that is used to ensure authenticity. Let's take a quick look at the inner workings so for ICHES 128 we have five substates where each substrate is like an AES state so it is 16 bytes and arranged in a 4 by 4 matrix and as you can see to generate the keystream we apply a Boolean function onto the internal state to arrive at 16 bytes of keystream and afterwards a single round of AES is applied to each substrate and it is exiled onto the neighborhood neighboring substrate for ICHES 128 for ICHES 256 we can see that the state is a bit larger so we have six substates we have a slightly modified keystream function and we also we essentially have the same state update function and ICHES 128 offers more performance by generating two keystream blocks at the same time and so how can we use this knowledge to find a linear keystream distinguisher? Well let's first talk about the notation we use so we exploit the bias of a selection of keystream bits and we specify that selection using linear masks called lambda and to calculate the data complexity we use the correlation contribution which is the product of the individual correlations and this leads us to a data complexity of about the inverse correlation the inverse squared correlation contribution and we also need internal masks that specify the path the linear characteristic takes through the cipher and which results did we get so let's quickly recap the results of Brice Minot he found two keystream approximations with for ICHES 128 and 256 and today we will show three different models so we will start with a simple mixed integer linear programming model that uses the usual constraints for AES like ciphers and while we are able to find some bounds these are quite weak and we are also unable to find valid corresponding bitwise characteristics and to fix that we propose an improved model that identifies linear relationship relationships between substates that we will talk a bit more later on about and using this improved model we are able to find linear approximation of the keystream in one each is 128 although the gap between the bound we identified and the best keystream approximation is still quite large to make that gap even smaller we propose a partially bitwise model with bitwise modeling of mixed columns the XR and the N gates and this way we can decrease the gap between the bound significantly so a bound for each is 128 is now 2 to the 132 and using our constraint programming model we are able to find the same characteristic with squared correlation contribution of 2 to the minus 140 and these models also apply to each is 256 so we have a bound of 2 to the 152 and we are able to find a characteristic with an estimated cost of about 2 to the 162 for each is 128 L the bound boils down to 2 to 140 and the best characteristic we were able to identify is 2 to the 152 and also note that the distinguishing complexity is likely a bit lower for example a priest we know estimates a complexity of 2 to the 140 for his characteristic with squared correlation contribution of 2 to the minus 154 and also there are some dependencies so we disregard the effect of the key and the nonce and also we evaluate F and G separately but which might lead to a linear hull effect so how did we find those bounds and attacks first of all start about start we start with the simple truncated model so we have a variable so we have a variable for each active byte in the state and then we add constraints for all the linear operations in the cypher so for mixed columns we model it using its branch number five which corresponds to the mds property of the as mixed columns and we model the linear branch using its branch number of two and because it's a mixed integer linear programming model we require an optimization goal so we minimize the overall weight weight and the weight is the logarithm of the inverse squared correlation contribution so for the s box we have a weight of six which corresponds to the best squared correlation contribution of 2 to the minus six and for the aggregate this corresponds to the best squared correlation contribution of 2 to the minus two and we also need to fix the number of blocks that are used in the approximation so before the first key stream extraction we set all these masks to be zero and after the last key stream extraction we also had all these masks to be zero and which results did we get for that well as expected we did not find any solutions for less than two key stream blocks so that would be a single state update function and the best results we found were for two state update functions or three key stream blocks and we were able to prove a bound of two to the 92 for each is 128 and two to 116 for each is 256 but unfortunately it is impossible to find valid corresponding bitwise characteristics and we will get to the reason for that in a second and as a quick note for more key stream blocks the cost turns out to be much higher than that of the previous results of prismino so well so how does such an character truncated characteristic look like here you can see a picture so we have a single x for all bytes that have a non-zero mask and no x for all bytes that have a zero mask and the inconsistency that i'm going to talk about mainly concerns the mixed columns operation and if you take a look at the mixed columns operation you have some input difference mu and some output difference beta and we know that the input difference mu is determined by the output the input mask mu is determined by the output mask beta using some function f and we might also have a second input mask mu prime that is also a function of another output mask beta prime and because the mixed columns function is linear this relationship f between the linear masks is also a linear relationship so we can say if we have the difference of two input masks it must be the function of the difference of the output masks and this holds for if we take a look at the bitwise masks and if we take a look at the truncated masks like we have here the differences must must meet the mds constraint so the branch number five and now take a look at where it is violated we are going to focus on these two mixed columns operation and if we take a look at the input difference we have a single byte of input difference and we have up to four bytes of output difference and so right now this doesn't violate anything but note that these bytes are non-zero and these bytes are non-zero so the difference might actually be zero we don't know for sure yet and to know for sure we take a look at these two branches and if we take a look at the masks on the side so we you have these masks because we know that the input is zero we know is it's equal to this mask and through these xors xors it's also equal to these masks and again we have a zero here so it's equal to these masks here and because the the branch imposes a extra relationship between the masks we know that this sigma and this sigma prime the difference between that is the same as the difference between the mu and the mu prime and oh sorry this sigma prime and if we if you take a look at the difference you can see that there's exactly a single byte of difference so we know that the difference in mu and mu prime must also be exactly this single byte and if you have a single byte of difference at the output you and a single byte of difference at the input this violates the mds property of of the aes mix columns and there's also a second example again we're focusing on this mix columns and this mix columns take a look at the input we have a single byte of difference at the output same as before up to four bytes however if you take a look at these two branches you can see that this mask is shared so the difference between this mask and this mask is the same as the difference between this mask and this mask and as you can see we have two bytes of difference here and therefore we also have exactly two bytes of difference here and if we have two bytes of difference at the output and a single byte of difference at the input. This also violates our MDS property. So how are we going to fix that? Well we use the improved model to fix those inconsistencies and we take a look at the differences of the masks in mixed columns and these differences of course must satisfy the branch number of mixed columns and so we use alternative expressions of the output difference so alternative expression of the X of the beta masks and then we constrain those differences with the input differences using the branch number 5 and if we apply this to two state update calls for each is 128 we are able to improve the bound by 2 to the 10 to 2 to the 102 and we are also able to find a bit this characteristic with squared correlation contribution of 2 to the minus 140. However as you can see the gap between the bound and the found characteristic is still quite large and why is that well it is mainly due to the end gates because each end gate has a squared correlation contribution between 2 to the minus 16 and 2 to the minus 2 depending on how many bits are active. However to derive a valid bound we need to be pessimistic and assume the best squared correlation contribution and to remedy that we propose a partially bitwise model with a bitwise specification of the end XOR and mixed columns and for the S box we allow any transitions with a squared correlation contribution of 2 to the minus 6 which is not too far away from reality where about 93% are possible with a squared correlation contribution between 2 to the minus 12 and 2 to the minus 6 and this substantially improves our bound so for each is 128 we are able to find a bound of 2 to the 132 for the estimated data complexity and a constraint characteristic that is the same as for the improved model with estimated data complexity of 2 to the 140 for each is 256 the bound improved to 2 to the 152 and the the actual best characteristic we found has an estimated data complexity of 2 to the 162 and for the final variant each is 128L our bound improves to 2 to the 140 and the estimated data complexity for best characteristic is 2 to 152 so how did we of course so this model has is very good at finding bounds however it has the drawbacks that it is really slow and why is it really slow because it has a lot a lot of variables and why is that well first we modeled the branches using this constraint so we said for each for each three bits a b and c they must be equal to two times some dummy variable and this dummy variable is either zero or one so this whole sum can either be zero or two which is exactly what we want for such a three forked branch however we were then able to use an alternative model where we eliminate this dummy variable and instead add a constraint for each invalid combination of those three bits and using this alternative model we were able to get a speed up of about 100 times in solving the model and it took about 80 minutes instead of five days to solve the model for each is 128 so using the output from the characteristic we use we use a constraint programming model to find good characteristic where we use the output of the mail solver to fix the truncated characteristic because otherwise there would be too many active s boxes and the model would blow up and for each active s box we only allow a few so only the most a few transitions with high probability because otherwise we would also end up with way too many constraints for the sub solver and finally we use soft constraints to minimize the overall cost and soft constraints allow us to associate a sort of cost with a constraint and if the constraint is very violated it and occurs a predefined cost and then the solver tries to minimize the overall cost and which results that we get as you can see this is our best key stream approximation for each is 128 and this is our best key stream approximation for each is 256 and also for each is 128 L we were able to find the good key stream approximation and some final remarks when experimentally verifying our results and our characteristics we found a linear hull effect between the and gate and the s box that must not be neglected and so if you take a look at this example when you evaluate the and gate and the s box separately you arrive at a square correlation contribution of about 2 to the minus 12 2 to the minus 10 but in reality the square correlation is 2 to the minus 7.7 and even more starkly if you take a look at this example if you evaluate both operations independently you arrive at a square correlation of 2 to the minus 14 but in reality it is zero and we tested all possible variations of these masks and found that the best possible square correlation is 2 to the minus 7.4 instead of 2 to the minus 8 and as a quick note while these results to influence our bounds we think that that the effect is not too large because this linear hull effect is most starkly pronounced if you have many active bits at the output of the end gates however in our characteristics and in most good characteristics we only have very few active bits at the output of the end gates and to conclude today we showed improved keystream approximations for all members of the eaches family and we also have shown upper bounds on the square correlation contribution below 2 to the 128 for all three variants and we know that straight forward models only produce very weak bounds and do not provide any solutions and if you're interested i invite you to read the paper and take a look at our code which is hosted on this git repository and if you have any questions i invite you to ask them during the live sessions thank you