 Hello everyone, today we're going to talk about my work on Fast Neocollision Attacks. So this is a joint work with my two PhD director, Pierre-Marc Foucque and Patrick Tapas. So Fast Neocollision Attacks are a new generic attack against StreamCypher, first introduced at Overclipped 18, by Zong Tzu and Mayor against Guan V1, and after that Zong did another paper for A5-1 at AsiaCrip 19. The basic idea of the Fast Neocollision Attacks, which is an internal state recovery attack, is a divide and conquer where you split the internal state in two parts, the core parts that you will first search and retrieve with the Neocollision property and the knowledge of a small bit, well, the knowledge of a small number of the Kiston bit, very few information on the Kiston, and the second part, the rest part, is easily computable from the knowledge of the core part. So in the paper, there was an extension of the state recovery to key recovery with a very efficient complexity. For example, for Guan V1, there is something like a power 2 to the power 12 about economy compared to the existing search, and for A5-1, your office computer could normally run the attack. We were really happy to see that because we wanted to do something else on A5-1, so we give it to students to implement the attack, and as a semester project, and at the end of the semester, for 10 groups that had to implement the attack, there was only one successful implementation with it. It's not really surprising because the attack is really hard to understand with a lot of mathematical details that make it really hard for students, I think, and the successful implementation of the attack was really slow, and the code wasn't too bad. So we went back to the paper to understand why it was so slow, and going back to the paper, we found errors in the complexity of the core part. So that's what I'm going to present here, why the core part doesn't work, and so what's the central idea of the core part is the self-refine method, which is computation split in three phases. So the first phase is to compute what's the variant of a differential distribution table that is called TD. So basically TD match a keystream K, a difference on this keystream to good differences in the state. Good differences in the state being defined by being small, and there is one state that will give you the keystream that you're searching, and when you add these good differences to the state, you obtain your knee keystream, so the keystream added to the differences delta key. After that, after this precomputation, that is normally you do it and it's okay. You have the online phase, so that's really where the error is, so I'm going to stress the result of this online phase, which is a set X that has a high probability to contain your internal state that will that generate your keystream, what you're searching basically. So the idea of the online phase is that you sample a random state that gives you the knee keystream that you're studying, and you're using your table from the precomputation phase to add a difference to the state to obtain the good keystream. And after that, you keep the candidate in the set and you do it a lot of time. After that, after the online phase, there is something that is called the amplifying phase, basically just a repetition and a combination of multiple online phases to increase the probability. But it's not important for us, what's really important is the online phase, and the fact that the only things that you use in the online phase is the knowledge of the keystream. So basically, if you have two states that give you the same keystream normally with what you're seeing here, you shouldn't be able to differentiate between these two states. And that's really why this method will not work, because as I said, the things that interest us here is the set X at the end, and not precisely the size of the set X and the probability that the good internal state, well, the good core part of the internal set is in the set. And when you watch the attack against Greenery 1, you are searching for 12 bits of internal state with only the knowledge of two bits of keystream. And the claim of the fascinating collision attack is that the set X you obtain with the self-refined method has 838 elements, and the probability that the good state is in this set is around 90%, which is a lot better than what you would expect if X was simple at random. Another suspicious thing with this attack against Greenery is that at one point, if you see how the keystream is computed, basically it's Xor over 5 variables of the internal state and Xor with Xor of an output of a function H. And in the fascinating collision attack, H is taken new, so output of H is taken new. That means that you're already constraining the search space without using the knowledge of the keystream, so we don't know how or why they can do it, but apparently they can. The same thing is done with a A5 run, basically they're searching for 15 bits of internal state with only two bits of keystream, well, with only knowledge of two bits of keystream. And the claim of the fascinating collision attack is that the set X you obtain after the self-refined method contains 7,835 elements, and the probability that the good value, well, the good internal state is in the set should be more than 99%, which is better and second than what you would expect if X was taken at random. And the claim that we have with this work is that the probability are false, basically because you cannot differentiate between a good and a bad internal state that will give you the same keystream. This is said a bit more formally with a simple information theory argument, so if you have the algorithm A, that will be the self-refined method. That take as an input a function F, the stream cipher, and some elements that will be the keystream, and it outputs a subset of the antecedent of the keystream by the stream cipher. And you're searching, you're watching, you're observing the probability that one particular value is in this set, when it's at random from the antecedent. The probability that this value is in the set X, sorry, should be about the size of X divided by the size of all antecedent of your value of your keystream. So the self-refined method verifies the hypothesis of this theorem, musty, I'm going back to that just a bit later, but the self-refined method verifies this hypothesis. So the probabilities that are given last slide should be false, there is no way that they're right. A simple experiment to prove that this probability is false is to first choose a keystream value, run the self-refined method to obtain a set X, and only now choose a secret one, an internal state, a good internal state that will give you the keystream, and you check if this internal state is in the set. As I said, the self-refined method verifies this hypothesis, with these little details that we think that the value XS of the internal state should be parallel at random in the antecedent of the keystream. Normally, if your stream cipher is well built, it's true, but maybe there is some bias in either the generation of the keystream from the internal state or from when you choose the key to the internal state after the initialization phase, and that basically gives us the two family of experiments that we will run, the experiment to check that there is no bias in the generation of the keystream, basically, and the second experiment to experimentally verify the probability that the good value is in the set X when you run the self-refined method. So once we did with this verification, we went back to the paper to understand where the theory was wrong, because the verification didn't match what was given in the theory of the fast-nuclear collision attacks. And basically what we found was that when they present the attack, there is two independent theorem about the output of the self-refined method X, one where the computer probability is at a good value, while the good internal state is in the set X, and another, which computes the size of X at the end of the self-refined method. And the one about the size of the set outputted by the self-refined method is false. And it's false in such a way that the size of X is always, always underestimate. That means that basically as they obtain at the end, a smaller set than what you would expect with the probability as I claim, which explains why there is this difference between the experimental value we obtain with the experiment and what is the claim. And basically we think that the central error and the reasoning is that they assume that there is only one good value for the internal state, which is true. But only when you use enough keystone bits, whereas in the attack they use a very, very smaller number of keystone bits, something like 5 or 20 keystone bits. Very rarely they use a very small number of keystone bits, which explains why they have this difference, this probability. So to go in more detail in the experiment we did, I will first introduce again E5-1. So it's a nulled stream cipher that it was used in the GSM standard 30 years ago now. So it's a stream cipher composed of three registers, the clocking, well it's an asynchronous clocking that are represented with a red arrow. And basically to obtain the keystream at the end you add the last three bits, the last bit of the register. So in the attack, in the Fastenier collision attack against E5-1, basically what's going on is that you're searching 15 bits of internal state, knowing only two consecutive bits of the keystone. So the 15 bits that you're searching are in deep blue, basically. And that corresponds to the first two bits of keystream that will be output. So when you run the set for reframing methods, you obtain a set X, when you follow the kind of the paper, of 7,835 elements for the 15 variables, with a probability that it has a good value of more than 99%. And in fact, if you watch the attack in more details, they run the set for reframing method with the first and second keystream bits, the pair of the first and second keystream bits, the pair of the second and third keystream bits, the pair of the third and fourth keystream bits and another pair, well the next pair after that, and they merge all the results. At the end, they obtain a set X of 2 to the power of 16.6 candidates, for the 33 bits of the core parts that are in blue in the figure. These 33 bits allow you to compute the five first keystream bits. And what interests us in this case is that the set X at the end is a lot smaller than what you would expect, what you expect is to have a set of 2 to the power 28 about. So we did the experiment, so first the experiment to check if there was a pass in the generation of the keystream. So when you have a given value for the five bit of keystream, there are exactly 2 to the power 28 configuration of the core part that will generate this keystream. So there was no bias here. And the second experiment we did, we ran, it was to check if there was a bias in the initialization process. So basically what we did was we sampled a random key on an IV, we ran the initialization phase and after that we count the number of time each configuration of the core part is reached. So we did 2 to the power 36 initialization and what we obtained at the end is the free graphic here. So in blue you have what's happening when you directly sample at random your 33 bits of the core part. In the middle you have what's happening when you measure the same thing about the core part when you choose a 54 bits key and an IV at random, 54 bits key because it was one of the way A51 is used in the GSM protocol. And in the right, it's what's happening when you sample at random a key of 64 bits with an IV. So it's simply the same. At one point we wanted to use a key square law to verify that it was the same distribution but it cost too much and since we think that if such a bias existed for F1 it would already be known. So we didn't go in more of this for this experiment. The second experiment is to refute the claim about the probability of the fast near collision attacks. So basically what we do is that we choose 5 bits of key stream. We run the self-reflying method with this 5 bits of key stream. We obtain at the end the set X with 2 to the power 16.6 values and only now we'll choose the key and the IV. We will run the initialization process such that the first bit of key stream will match with chosen bit of key stream. And we will check if the core part is in the set X of candidate or not. So we run it a lot of times. And the experimental probability that basically the good states are in the set X at the end is 2 to the power minus 11.4 which is a lot less than what is claimed by the fast near collision attacks. Once we did this experiment we went back to the code given for the attack against F1 which normally was supposed to test some components of the attack. We found some bugs in this code. So we correct them, we add a for loop and just a simple counter to compute the probability of the self-reflying method at the set after the self-reflying method with the code given by the attack. And when we run that we obtain results that are in line with our theory and not the theory of the fast near collision attacks. Which may explain the following statement by the author while they did the experiment and almost all the results matched the theory. So we did the same verification with Crain so it's a stream cipher of the stream portfolio composed by 80 bits LFSR and 80 bits NFSR and the output is compute with function while the key stream bit are compute with the function h star. We did the first family of experiment to check there was no bias in the generation of the key stream by taking sampling at random a key and IV and verifying that the core part at the end is uniformly distributed which is the case. And after that we did the simple experiment to refute the probability claim of the fast near collision attack. We took, we choose a value for the key stream, we run the self-reflying method, we obtain a set x at the end and only now we take a random key and IV that match the key stream and we check after initialization if the core part is in the set x. It gives us the experimental probability close to 55% which is a lot less than the claim probability from the fast near collision attack. In particular the overall complexity of the attack against Crain v1 should be increased by a factor of 2 to the power 40 about meaning that at the end the attack, the fast near collision attack against Crain is slower than the excessive search. As a conclusion the fast near collision attacks are not fast attacks, they are slower than the excessive search for Crain and for A51 they are at most as fast as the attack by Golisch 25 years ago to explain why they are not as fast as claiming the original preppers. It's because there is a narrow end analysis on the complexity of the self-reflying method and we contacted the author about this and the mayor was okay with what we did. Basically he agreed with what we presented. Zonger River disagreed with what I just showed you. So we agreed to disagree and I think if Zonger want to prove once and for all that the fast near collision attack has a really fast attack the easiest way to do so is to implement completely the attack against A51 normally it should run on your office computer. So thank you for your attention and if you have any question you can send my list to the return me, Pierre-Anne-Aure-Patrick.