 I'm going to present joint work with Angel Bertini at Iberstay, Eric Markov, Thomas Perran, and Mark Stevens. And it's going to be about finding and actually computing collisions for the SHA1 hash function. And this is nearly an anniversary talk because the full collision was actually found on the 15th of January last year, today with the 11th, so not too far away. And that came after a long series of work, some of the notable ones being the first practical free start collision in October 2015, the first theoretical attack in 2005, and the standardization of SHA1 was in 1995. So you can notice this kind of 10-years pattern. So maybe in 10 years, we'll get the first theoretical pre-image attack. And in 20, the first practical, who knows if that can motivate you. And so what I'm going to talk about is, well, I'll do some brief recap about what's the SHA1 collision actually like and how you can compute one. And I hope I'll have time to say a few words about, well, how you actually measure the complexity of an attack, what actually means in terms of practicality. So, well, let's just get away with this. That's a SHA1 collision. That's 1024 bits in two blocks of messages. And, well, they're kind of random. They have differences in specific locations, red and blue, and they're hashed to the same value. So what you can do with this, you can do that. You can have two PDFs, one showing Calvin the other hubs, and they're actually hashed to the same value. I could have cheated, of course, on this slide, but, well, the PDFs do exist. And that's one of the things you can do with any kind of pictures you like. One of the best applications probably of these collisions. So, well, first I'll go about cryptanalytic attacks on SHA1. Well, you don't really need all of the details on these slides if you know about them. That's better, but not really necessary. So as I said, it was standardized in 1995. It's quite similar to MD4 and MD5, if you know them. And two things, without going into details, that will be useful to understand the structure of the attack is that SHA1 uses Merkle-Damgard as domain extender and uses a compression function, which is a block cipher in Davies-Megamode. And it has 80 steps. Remember that, because there will be some reduced attacks on, let's say, 75 steps. So that's nearly the total amount, not entirely. There was a SHA0, which was very similar, but much more broken, if it's possible. And the hash size is 160 bits. So in principle, a generic attack cost 2 to the 80 cost to the function, roughly. So that's what we would want to achieve for this function, that we do not. OK, so those were some details, but we want to attack the function, not to describe it. And the colors are actually pretty bad here. Oh, sorry, there should be some colors. So I will start. Anyway, the colors are not important for the first part of this slide. We have two blocks, because the attack uses two blocks of messages, as I said before. And first, one block is representing one computation. Well, one rectangle is one computation of SHA1. We have two inputs on the left, the initial value, and the right, the message. So we want a collision. We want basically everything at the start is similar. We have the same IV, the one of the function. And we want different messages that eventually hash to the same value. So first, we have a zero difference on the top left, because the IV is the same in both cases. And we have a delta M difference on the message. And if everything goes well, we get, at the end of this rectangle that symbolizes the computation of SHA1, we get a difference in the chaining value, which is delta C. So at the end of the state, we have delta C. The feed forward on the left is the feed forward for Davies Mayer. We add the IV back, because we had no difference in the IV. That's no difference in the chaining value, so we get still delta C. Then the area that goes on the right is for the Merkel-Danckard chaining. And that's the input value for the IV of the next round. So we get delta C. And then we use minus delta M on the message. If everything goes well, we get minus delta C. We have, again, the Davies Mayer feed forward delta C plus minus delta C equals zero, so everything is fine. And we get a collision. So that's the idea of the main structure of the attack. But of course, it's not so easy to actually find those messages that go to the right differences. And that's the goal of the middle part, with not really the colors. But first, we have the nonlinear part. The NL thing is a nonlinear differential path that is hard to compute, but easy to satisfy, hard to find. And then we have a linear part that is mostly the real probabilistic cost of the attack will be in this area, so that's the L thing. And in the middle, we should have a blue thing, which is the accelerating techniques, which basically try to delay as much as possible the start of the probabilistic phase of the attack. You don't really need all of these details, but I will mention them because, well, next in a few slides, because that's all the steps you need to go through when you actually want to compute a collision. Okay, so if you've attended a crypto talk in the past decade, maybe you've already seen this picture because that's the structure of the one type attack from 2005, from Wang Yin and Yu. So nothing really new. And that's the attack they had at crypto 2005 that led to a theoretical attack with complexity equivalent to 2069 computation of Xiaowan. This was eventually improved. There was a lot of follow-up works. And in 2013, Stevens proposed another theoretical attack with theoretical complexity, 2261. So now what do you need to do if you want to compute your own Xiaowan collision? So first you pick colors of back, kind of. You first need to pick a linear path, which was the L thing, and that comes first in determining your attack parameters. Then once you set up on the linear path, you have to compute the non-linear path corresponding to this one for the first block, the one on the left. Once you have that, you can compute your accelerating techniques to try to make the attack as efficient as possible. And when you have everything, well, you can implement the attack. And at that point, you can just estimate quite precisely how long it will take you to compute a collision for the first block. So the first nice difference we wanted. Once you get one of these, you can go on to the second block and then you do the same. You don't have to pick the linear path anymore because it's going to be the same for the first block, but the rest is pretty similar. So you again have to compute a non-linear path, get accelerating techniques, and once everything is implemented, you can again compute the time it will take you to get the attack. So on that side, how do you do that? How do you get these estimates? The best way is, well, basically it was what I just said, you just implement the attack. Then you can measure the production rate of what we could partial solutions to that follow the differential path up to a point. So not up to the end, but up to, let's say, A56. And then you look at how many of these you generate per second. You can quite easily compute the probability that one of these extends to a full collision and so you multiply, well, not by the probability, but by the inverse probability. And that gives you a pretty good estimate of the time it will take you. So that's if you implement the attack. If you can't or won't, then you have some alternative that's not as good, but basically it says you that, well, everything up to A16 is free because you can use the freedom you have in the message. That's nice. And then afterwards, you just look at how many conditions you have in the differential path that you need to satisfy and how many free bits you get from the accelerating techniques if you have any. That should kind of cancel that leads you to some credit estimate of the critical step, which is where most of the work will be done. So everything below, you don't spend much time. Let's say at A22, you spend most of your time, then you look at the probability that it extends, blah, blah, blah. So that works kind of well, but not as much as implementing everything because, well, there are more assumptions behind. Okay, so that was for the recap. Now I will describe some of the steps that led to computing a full collision. And first, everything, well, not everything, but some of the things from 2005 to 2011. So the first non-trivial attack I could find or remember was in 2005 by BI Metal on 40 steps. And that was very efficient. They don't even give a complexity estimate, they say within seconds. The same year in the paper from one detail where they had the theoretical attack, they also had a practical one on a reduced version on 58 steps, sort of 80, with complexity roughly two to the 33 Shawan computations. The year after, there was work by Ducanier and Reshberger and they extended it to 64 steps with roughly the same cost. So that was also quite nice. And the year after Reshberger et al extended to 70. And the same year, Jouan Perin to 70 with the decreased cost, so two to the 39 instead of two to the 44 for Reshberger et al. And then there was nothing for a few years. And in 2010 and 11, Grashnikov and Adinets had some improvements up to 75 steps with a quite huge cost increase. So two to the 57.7 for the latest version. So that was up to 2011, then there was nothing again for a few years. And in 2014, well, it was maybe time to just start working on the final push to the full collision. So that was some work with Tomah Perin, Mark Stevens. And because what there's a lot to do, the attack is bound to be very expensive, we wanted to warm up a bit. And to do this, we considered free start collisions. So I won't go much into detail of what a free start is. There'd just be a simple picture next to kind of explain it. But in principle, a free start attack is easier than a full hash attack. So then we hoped we could have a free start, full collision rather efficiently. But because there was no work on SHA-1 on free start collisions, well, it was not so clear. So there were some cryptanalytic work we needed to do. So that was also interesting. And the first step was to search for 76 step free start collision because that was the largest number step that was not attacked before. And that gave us also the opportunity to develop the GPU framework that we used in all of the subsequent attacks. So why a free start is useful? Oh, well, colors are not back, well, selectively. And so on the left, you'll have the structure for one type attack, which was the one I described before. So you start with the initial value, then you have the probability one free part, roughly equal to one with the accelerating techniques. And then you start the probabilistic phase with the low probability. And if you do free start, basically the idea is to say, well, you don't initialize right at the start. You do that in the middle. You kind of slide back everything so you delay the start of the probabilistic phase. That's expensive. And you don't really care so much about what happens in the idea anymore. So that's roughly the idea. And well, it worked quite well. In December 2014, we got what we wanted. So 76 step free start collision. And the cost of these was roughly two to the 50 shower and computations. So which was two to the 7.5 less than the 75 step real attack. So the free start was kind of useful. And to be more explicit what it means to do the 50 shower and computation, that's roughly four days on a high end GPU at the time. So that's not so expensive. If you have 3,000 Singaporean dollars, at the time you could buy a four GPU machine, then it's just one day. So that's pretty cheap. So okay, well, now we can try to get more power because 3,000 dollars, well, you can get more money, hopefully. And also, well, we reckoned that the full attack would be only 32 times more expensive, which is not so much, well, it's still manageable. Of course, that was maybe a bit optimistic, maybe. Well, it turned out it was. But anyway, even if it were not optimistic, as I said before, if you don't implement the attack, you never know for sure. I mean, it's really hard to have something that, well, up to a factor two, it doesn't seem a lot. I mean, it's a pretty good guess, but from a practical point of view, it's quite important because if the factor two gets you out of your resources, then you don't do anything anymore. So, well, the idea was then to buy more GPUs, like 64, and then we still need to develop the attack and I said, sadly, well, also that's easier to sell, but unfortunately, you cannot really carry out the attack. You cannot just take the code from 76 step, extend it to 80, that doesn't work. So, you'll have to do everything again. So, that was the cluster that was waiting in the office at the time in Singapore. You can guess that there was a FSC 2013 poster in the back advertisement a bit late, but... And so, those were the machines that we wanted to use and that we ended up using. And in September 2015, we got what we wanted at that time, which was a full practical attack, like full free start collision. So, not exactly the real thing, but the closest thing to the real thing. Again, with Steven Saint-Perrin. Didn't mention it before, but I should have. So, same team. And this time, so it was a bit more expensive due to the 57.5. And so, that's about 680 days on a single GPU, same as before, so a bit less than two years. And a 64 GPU cluster, that just 10 days, a bit more maybe. So, that's really manageable. And also, if you want to use, well, to buy this computing time on Amazon, and if you're a bit patient, if you just use the cheapest instances, that's only about 2,000 US dollars. So, that's an actually pretty cheap attack. If you could run it yourself, if you have 2,000 dollars to spend. So, this started having some impact, industrially speaking, because it was quite funny. At the time, we just really did attack, there was a ballot for the Certification Authority Browser Forum, where some of the industrials wanted to push issuing shower and certificates through 2016. And that was exactly at the time where we said, hey, we have this free start collision. And then, they read out that maybe it was not such a great idea, so the ballot was withdrawn. And also, some of the browsers, the major ones, for instance, Agent Firefox, Atissaino, Chrome, I'm not sure, already had planned to deprecate shower and to issue warnings of websites for using shower and certificates. But then, they kind of sped up things. So, it was nice. But some, in some other cases, shower and was still used, no issue whatsoever, for instance, you get, no problem. Okay, that's nice, but what if you try even more power? Because, well, what we want in the end is really just the actual hash function attack. So now, again, we can come up with a guess about how much more it will cost. So maybe 50 times more, kind of optimistic maybe. And again, it's hard to know before implementing. And even, what's even trickier in the case of the real attack is because we have two blocks. You cannot implement the second block before you already have the very expensive near collision of the first block. So you can implement the first block that will give you some idea. But if you don't run the computation until the end, you cannot implement the second block, so you don't really know what the real cost of the attack will be before you run it nearly up to the end. So then, what can we do? We can buy more GPUs, but, well, if you want 1,000 GPUs for academic, maybe it's a bit too much. So you can also get help, for instance, from companies that already have a lot of GPUs. Again, you have to develop the new attack, well, as before. And because it's also a nice, well, objective, we can add some cool exploitation features that for instance allow to these colliding PDFs I showed you at the first, that we didn't do for the free start. So, that was done in the context of a CWI Google collaboration. And then the steps that led to the attack are basically what I described before, except that now you have to start preparing, well, the prefix for the collision that is used for the exploitation. That has to be done before the computation starts. If you just do a real attack, you don't care, but now you do. And then you compute the first block collision. Actually, two of them were computed. Then you do compute the second block, and that's it. This time it costs, again, a bit more. So, two to the 63 equivalence R1 computations, roughly. Which was, at the same time, a bit more and a bit less than expected. A bit more, well, a bit less because we were a bit lucky. We found it a bit earlier than the estimates gave us, but the estimates were worse than what we expected. So, in terms of computation, that was 6,500 CPU a year, or core year, I think CPU a year. Plus 100 GPU a year. And don't be fooled, the 6,500 CPU a year is actually less computation than the 100 GPU. I mean, it takes more time, but it's just less efficient. And now, if you have 100,000 US dollars, you can also do that on Amazon. Actually, the second block. But, yeah. So, this, well, once we got it, it, again, had some impact. So, finally, it realized that maybe it was time to move away from Shaoan. We also unwittingly broke SVN. We didn't plan that. And also, well, there was further deprecation of Shaoan certificates. So, that was, well, kind of what you could expect. Okay, so now, I will finish with some words about measuring the complexity of such an attack to also put into a kind of new-end things when I said everything so far. I said, well, it's two to the 63 Shaoan computations, blah, blah, blah. What does it actually mean? And is that really meaningful? So, because I'm not sure, I don't know if all of you are entirely familiar with the way we usually measure complexity of attacks in symmetric key crypto. So, generic attacks are kind of easy. We just say, well, we have a generic birthday attack. We know that it takes about two to the end of our two computation to an oracle to get a collision. We know ways to efficiently implement the attack. For instance, you use, well, many parallel hardware to do that. That's quite well understood. But for a dedicated attack, well, it will precisely depend on the attack, and probably the best way is just to implement the attack measure the cost. So, for Shaoan, we could do that because we let the, we run the attack until the end. We can measure. But then, you need to express this cost in some kind of metric. And usually what's used is to say, well, you run the attack on this platform, these GPUs, that took so much time. And we just divide this time by the time it takes to compute the raw primitive on the same platform. So, and that's it. And the idea is, well, for instance, you say, well, the attack took two to the 63, well, a time equivalent to compute Shaoan two to the 63 times. You know that the generic attack is two to the 80. So, that gives you the gap between what would have been necessary to do just a generic attack, two to the 80 and two to the 63, which is what you did. So, and then you could say, well, that's more efficient. It is an attack because you gain some time. So, that's usually what we do. And for instance, just an example of this kind of computation for the 76 step example. So, I said before that it took about four days. You can measure that from the rate of production of A56 partial solutions. You measure the number of Shaoan computations you can do on this platform. It's two to the 31.8, so you multiply everything, you get two to the 50.3. Now, you can do the same computation again for CPUs, which we did at the time. And with the Haswell Core F5 reasonably efficient CPU, it takes 606 core days estimated to compute the attack. Now, you can compute two to the 23.5 Shaoan per second, and then you do the same computation and you get two to the 49.1. So, you get a gap of more than two between the two, and it's actually exactly the same attack. And from, well, if you want to sell the attack, it seems that running it at CPU is better because, oh, well, you get a more efficient attack, but actually GPU is much better, it takes less time, less energy, so there's no really point to do that. So, you can get some kind of discrepancy, and it's not only about comparing GPU to CPU, because, for instance, for the real attack, if you use GPUs, well, on different GPUs, it had different efficiencies if you measure with this kind of cost. Again, more than a factor of two between the most efficient, the K80, Tesla, and the GTX970. Of course, if you optimize code generally or for some specific platforms, these figures may change again. So, well, you could say that's not really an issue because, well, there are some gaps. We might not carry so much. We might, if the attack is really efficient, it is really expensive because, again, a gap of two or four can really make a difference between the amount of resources you need. But, also, what's kind of worse is that it's not only about CPUs and GPUs, but you can also look at how efficient the code is running, unfortunately, on FPGA or ASICs because they are very, for dedicated tasks, they will be quite fast, energy efficient, and in fact, for hash function collision, they are very well suited to generic attacks, but not really to dedicated ones. As far as I know, because the code is much more complex, you get a lot of branching of conditional things you need to do or not. So, then, if you want to do the generic attack, actually, you won't use a GPU, you will most likely use an ASIC, and when you are using the kind of cost function I mentioned, implicitly, you assume that you do the generic attack on the same platform you are using on your attacks, or CPU or GPU, but that's actually not true. So, it becomes less clear what the dedicated attack actually improves on. So, why don't you use a GPU for a generic attack on SHA-1? Let's see why. So, if you want to have a collision within one year to get to the 80 hash computation, you need about 12 million GPUs, and that's about 3.1 gigawatts of power for the entire year, if you are very optimistic about the cost of your infrastructure. And so, what's a gigawatt? Well, it's just so, if you want 3.1, you need maybe two or three nuclear power plants for an entire year just for the attack. So, that's quite a lot. And for an ASIC attacks, we're quite lucky with SHA-1 because we have this thing called Bitcoin, and they are using SHA-2, which is very close to SHA-1 in terms of resources for computation, and then you can use just the mining hardware, and which is kind of off the shelf, and assume that you will compute your attack with something that is similar, and that's, I believe, a very realistic assumption. And then for the same attack, you just need 3,000 devices, four megawatt of power, so if you have a large wind turbine and wind for one year, that's enough. And the first one, of course, is not really realistic even for a state actor. The second, I believe, is. I mean, I don't know if you have 20 million dollars, probably that's something you can do. So, yeah, we'll finish quickly. One way to take this kind of effect into account is to use another cost measure, which I call the fund calorie, which was introduced by Lenzreich, Lanyun, and Tommy, and the idea is to say, well, you have this attack, you take that much energy, how much, what's the volume of water you could boil with the same amount of energy? And for RSA 70, factorization of RSA 768 modulus, they say, well, it's two Olympic pool, which is quite a lot. So, those are some figures in terms of fund calorie. So, Sha zero collisions, I didn't say much about it, but, well, it's about boiling the water in a teaspoon, so that's not a lot. The 76 step free start is four showers, so it's shower security. The full free start is 100 shower security. The GPU part of the attack on Sha one was about pool security, so roughly the same as RSA 768. The first block, because it was done on CPU, it's less efficient, it's three pool security. Discrete log computation on 768 prime field is six pool security. And if you do this generic attack on ASICs using the estimate I showed, it's the 1,000 of the rain security. And in technical term, the rain security, the data lakes, the nurse life, dumping energy behooved to secure hate. So, in the end, we got this full GPU, well, if you do a full GPU, Sha one attack, it's about one pool security, which is this kind of order of magnitude, and it's still 100 times better than a generic attack on ASICs conjecturally. So, it's quite a meaningful gap, but it's less than this 100,000 gap you would guess from the typical estimates that we usually give. So, in conclusion, what you can do more about Sha one, if you have time, you can compute a chosen prefix collision, which would lead to even more exploitation, that would be nice. A collision for the combiner Sha one MD5, why not? Then SVN would not be confused as it was. And also, you can design a Sha one cryptocurrency and use the shiny hardware that people will develop to just do, have fun. And so, those are the papers for the free start and the full attack. The attack code is finally on GitHub. So, if you have, you can run it if you want to consider your own collisions. And then, if you want more details about the crypto side, there is a video of Mark's talk at crypto, and if you want more details about the exploitation side that I didn't say much about, you can look at the video of Endstock at Black Helps. That's it, thank you.