 Thank you for the introduction. So I'm going to talk about Shawan. This is a joint work with Dona Perrin. Well, cryptanalysis of Shawan is a very technical topic, and I know it's early on you're all tired after two, three days of conference. So we'll only talk about the high-level aspects of this attack, and I will not go deep down to the bits and the bytes of the bit flips. So if you want more details, please read the paper, but the main ideas are pretty high-level anyway. So I think you will get the main points. So let's start from the beginning. What's a hash function? It's basically a public function that will take as input an arbitrary document and gives you a short output that you can use as a kind of identifier for the document, and the security property we want is that this fixed function should look like a random function. In particular, it should be hard to find collision or pre-images, and that's why you can really use the output as a kind of fingerprint, and this is very useful in many cryptographic contexts. For instance, for signature instead of signing a big document, you first hash it, and then you compute your signature on the hash. It's also used on blockchain. I'm just putting here so that there are more people in the room. So I will mostly talk about Shawan. This is a very important hash function because it was widely standardized and used basically everywhere until a few years ago. It was designed in the 90s and the state size on the output size is 160 bits, which means you expect a security against collision attack of roughly 2 to the 80. If you look at how it's built, it follows an iterative construction. So you have an internal state x here, which is 160 bits. You start with some fixed value that we call the IV, and then you cut your message into blocks, m0, m1, m2, and you process them one by one. So you have this compression function that takes the message block, the current state, and gives you the next state. And now what is inside this compression function? So we have something that looks like this. It's a construction called Davis-Mayer, and it's based on a kind of block cipher. So you have this here is a block cipher. You take the message as the key, you encrypt your chaining value, you get a new value here, and then you add the encrypted value and the initial value, and that's your new chaining value. And I will not go deeper inside the block cipher, so this is really all you need to know about Shaoan for this talk. So in terms of crypt analysis, you probably know that Shaoan is broken. So it's actually been broken for almost 15 years. So there was some really amazing work in 2005 by Chaoyun Wang and colleagues, and they gave the first collision attack on Shaoan. This was an attack with complexity around 2 to the 69. There's been a lot of follow-up work to try to better understand this attack and improve it, in particular a paper in 2010, giving a better estimate of the complexity and some improvements. And finally, about two years ago, this attack was implemented in practice, and we now have real collisions. So it took a long time because 2 to the 69 is actually a very big number, and it's hard to do this kind of computations. So what's the status today? Well, it's been broken for 15 years, so you should expect that it's not really used anymore, right? And the good thing is it's actually not used anymore in web browsers. They reject Shaoan certificates since 2017, so they took a long time, but now they do. The bad news is there's more than web browsers in security, and in some applications you still use Shaoan. In particular, Shaoan certificates, you can still buy them. If you go to some CES website, you can buy a Shaoan certificate, and a lot of clients will actually accept them, so not web browsers, like I said, but if you look at mail clients, for instance, the mail application in Windows 10, it's perfectly happy to connect to an IMAP server with a TLS connection secured with a Shaoan certificate. There's nothing wrong with this. And those servers actually exist. Until a few weeks ago, if you went to this machine, this is a mail server of one of the departments of the University of Darmstadt right here, so it's not just a random machine, and this machine had a Shaoan certificate, so it's now expired, so they replaced it, and it's now a SHA-2 certificate. But yeah, Shaoan is still really used for security applications, and besides certificates, it's also used in Git. It's used in the TLS 1.2 handshake, and probably in many different other places, probably in banking, they always use very old standards. So I think it still makes sense to look at Shaoan and try to see how badly we can break it, so that's the point of this talk. So I said we know how to compute collisions for Shaoan, so what does it mean to compute collision, and what can we do with this? So a collision is just, you start from the IV, and you manage to build two different messages, C1, C2, that give you the same outputs. And the collision attack is a very complex process, and basically those two messages, C1, C2, they look like big blobs of random values. So it's hard to really do something meaningful with them. And you will also have probably to hide this random looking blocks somewhere in your document, so you need a document format that's nice enough to allow you to hide this somewhere. But in order to make exploitation a bit easier, there's a very nice trick that you can actually add a prefix before your collision and suffix after the collision. And this is because of the iterated structure of hash functions. So if you put the prefix, it just means instead of starting your attack from the real IV, you start from the state after the prefix, and basically it's the same attack. And the suffix, well, after you collide, if you put the same suffix behind, you still collide. So what this means is here, if you take message P, C1, S, and P, C2, S, this is also a collision, and you can choose P and S freely. So you can use this to control your message. And this is very useful if you want to exploit it. The main issue is you want a collision on two messages that are meaningful. If you just have a collision on two random messages, it's hard to use. But using this prefix and suffix trick, you can do it with relatively meaningful messages. And in particular, a nice trick is that many document formats allow some kind of conditional branches. And so you can build messages that look like this. So I'm using pseudo code here, but this could actually be more like a PDF document. And so you start with some if condition, and this will be your prefix. Then you compute a collision. And then you put a message like this as suffix. And now what this does, those two messages will collide because this is a collision block and you just have a common prefix and common suffix. But now when you try to view this document or to execute them if they could, they do very different things because the condition is true in one case and false in the other. So now you have two very different documents and both of them have the same hash value. And this is actually what was used to build those two PDFs here. So this is what we can do now in terms of practical attack. So this is good when you can use those tricks in the document format, but in some cases you cannot use those tricks. Then the collision attack will not be powerful enough to really break the protocol. And the nice idea that was introduced in 2007 is to do something a little bit more general than just a collision attack. It would be nice if you could start from two different prefixes, p1, p2, and then somehow manage to get a collision from these two different states. And this is something we call chosen prefix collisions. And what this means is a challenger is giving you p1, p2, and now you have to find two messages, m1, m2, so that p1, m1, and p2, m2 give a collision. And if you know how to do this, you can break a lot more stuff. You can break certificates and you can break many internet protocols. So just to give you a simple example on the kind of PKI infrastructure. So what is a PKI? How do you certify your key? Well, the idea is quite simple. If Alice wants a certification on her key, she just generates a key, then she makes a document like this. The public key of Alice is blah, blah, and she goes to the CA, and the CA is going to sign this document. Now, how do you attack this? Well, the idea is that Bob is going to create two different documents, one that says the key of Alice is something and the other that says the key of Bob is something. And it's going to use a chosen prefix collision attack to make those two documents collide. And here the prefix is on one side the key of Alice and on the other the key of Bob. And so if you just have a collision attack, you cannot do this. But if you have a chosen prefix collision attack, then you can do this kind of collision. But now Bob can just ask for a certification of his key. And because the two collide, he can actually use this signature on the key of Alice. And now he can impersonate Alice because he has a document that says the key of Alice is something and he controls with something and he has a certification of that. So to summarize, chosen prefix collision are a more dangerous kind of collision attack. They really break stuff in practice. They've been used to create a rogue CA and they've been used by the flame malware. So it's really a practical threat. In terms of generic attacks, they both have the same complexity to the end of the two. But in terms of crypt analysis, chosen prefix collision attacks are much harder and currently the best known attack on Shawan has complexity to the 77. So it's still not really usable. So the goal of this work is to reduce the gap between the complexity of the identical prefix collision, which is due to the 64.7 on Shawan and the chosen prefix collision. So we want to improve the chosen prefix collision attack to make them more practical. So first I'm going to talk a little bit about how you do crypt analysis on Shawan and related function. So the main idea is that you do something based on differential crypt analysis and the idea of differential crypt analysis is that you try to control the differences during a computation. So if you can somehow start from a zero difference, then have some differences from the message, at some point cancel them and you go back to a zero difference. Well, this directly gives you a collision attack. You just have to find a message following this trail and this is a collision. Unfortunately, it's hard to find trains, in particular trains like this. But what we can do, there's a nice trick using the fact that the message expansion is linear in Shawan. We can build some trails with good probabilities. I will not go through the details, it's not very important, but we know how to build some trails, but they don't start from zero and they don't go to zero. The next important trick is that in the first round here, at the beginning of the computation, you don't have to pay for the probability cost because you can just choose the message that satisfies the path. So this is nice, we don't really care but now what we can do is in those initial steps, instead of using the nice trails that are linearized, we can just modify them and use basically arbitrary trails even if the probability is super bad. It doesn't matter because we don't pay this probability anyway. So we can actually start from an arbitrary difference and then connect to a good trail in the middle. Using this, you already get near collisions because you can start from zero and get a small difference here. The last trick is a multi-block technique where you're going to use two blocks using the same kind of trails and then the output difference will cancel because of a feed-forward. So it actually looks like this. You start from a good linearized trail here from delta i to delta o. You repeat it two times, one the normal trail, the second time you just flip it so you have a negative sign and then when you go through the feed-forward so you start from zero difference, you get difference delta o and the feed-forward they cancel out. So this is how we do collision attacks on MD5 and SHA1. Now, how can we do chosen prefix collision attacks? You're going to need a few more tricks and the main idea is that you want to define a set of differences here that are somehow nice meaning that starting from one of those differences you know how to go to a collision. If you can define this then what you do is you just start after your two prefixes so here you have some random difference in the state and you just process random blocks until you reach this nice set of differences. And this is just a birthday attack and the complexity is about the square root of 2 to the n over the size of s. So if s is big enough, this is not too expensive. And then you have a phase where you use several blocks of near collision and you erase your difference because you assume that it's nice so what kind of differences are nice? Well, there have been two variants of those attack proposed so far so on MD5 what we do is we use several trails that are different so they affect different bits of the output and therefore you can cancel differences bit by bit and this gives you a nice structured set that you can define easily. On SHA1 it's quite different because you only have really very few good trails so you have to start from a single trail and then you cannot really have a nice structured set s but instead what you can do is give a little bit of freedom in the last rounds so you can affect a few different values and this just defines your set s so you have a very small set with no nice structure and so the goal of our work is mostly to get a bigger set of nice differences for SHA1 and for this we basically try to combine the two approaches because we target SHA1 we want to use a single core trace because we don't have many good trails but we want to use ideas with several blocks because this allows us to get a bigger set that's really the main idea here so we introduce three little tricks and when you combine all of them you get something relatively nice the first trick is to look inside the compression function and give more freedom at the end the second trick is to use several blocks and the final trick is something we call clustering I will talk about later but it's basically we don't fix in advance which blocks we're going to use so let's start with the first trick so we start from some differential trail in SHA1 and like in previous work we look at the last rounds and we give a little bit of freedom in order to be able to reach a few different values of output differences so in the previous attack they use the set of 192 differences and in our work we relax it a bit more and we show that we can actually reach more than 8000 differences and just by using this you actually reduce the complexity of the attack from 2 to the 77 something to the 74.3 so that's already a nice improvement so now the next trick is to use several blocks so as I explained we don't have a nice structure on one block so we cannot really have a nice set as that we can describe like abstractly like you need to have this and this bit to be zero but what we can do is if we're going to use two blocks we actually know in advance the type of values that we can cancel because we know that the first block will have an output which is in the set in the set that correspond to those values here which is denoted d here so we know that after the first block we are in set d, after the second block we are in set d and actually when you do the feed forward what you get is if you want to have zero at the end it means the initial value might be the sum of two values that are in d so you just build a set like this and then if you want to cancel delta 1 plus delta 2 it just means in the first block you want the output difference to be minus delta 1 in the second block minus delta 2 and you will be able to cancel it so you have to compute explicitly this set you don't have a nice description but as long as you can compute it it's quite easy to just exhaustively build it and you know that all those values are nice so this is with two blocks of course you can do it with more and you can get a relatively large set if you increase the number of blocks and using this the complexity now goes down to 68.6 so now we're getting a relatively large chain and finally the last trick is what we call clustering and the idea is now we're going to look at this in terms of the graph so we take the set S of nice differences those are all the vertices in the graph and the edges are the near collision block that we can use to move from one difference to another so the difference between the two points is one of the value that we can reach in this set D and an important observation about this type of graph is that there are many paths going from a given point to zero which corresponds to collisions there's not a single path so if we just do a naive attack you would first select a path and then use the blocks corresponding to the path but instead of this we will try to take advantage of the fact that there are many different paths and use them at the same time and well there are several reasons why there are several paths but the most basic case is you can just change the order of the blocks I mean if you have two blocks if you do delta 1 then delta 2 or delta 2 then delta 1 you get the same result so you have at least this amount of freedom and so how can we use this freedom well if you look at how the attack is actually performed so you start with a birthday phase you get your nice difference and then you want to cancel it you're going to start with the first block so you know you have to target some difference delta 1 and now what you do you start from delta 1 you look at your collision attack this gives you some message condition how to reach delta 1 and then you try many random messages until you hit delta 1 but actually in many cases you can have several differences that are useful for you and that have the same message conditions and what this means is you're going to choose several interesting target delta if they all have the same message condition you can find a message for any of them simultaneously and if you have for instance two possibilities for the target then it's twice as easy to reach one of them and then the cost of each block becomes much smaller that's really the trick that we use here so if you want to really do it properly it's a little bit tricky because you have to look at well all the blocks don't have the same cost some of them are farther from collision than others so you have to be a little bit careful but in the end you can actually compute the complexity of each target difference and you can decide what message condition you should use at each step and how to move around in this graph and in the end the complexity is reduced to roughly 2 to the 67 so those are all the tricks we use at a high level now if you want to go more to a lower level so it's going to be a little bit ugly, I will not go through the details but the big idea is we just start from the shattered collision attack because this attack was implemented in practice so we know it actually works which is not necessarily the case of old proposed attack on SHA-1 and we know the complexity of this attack and now in our case we need a little bit more freedom at the bottom and at the top so maybe it's a little bit more expensive to find a block than in the shattered attack so if you're optimistic you can assume it's the same cost if you're more pessimistic or conservative you can add some safety margin and this is just how we will estimate the cost of the attack so now you just have to build the set and the graph it's actually a significant effort because the set has about 2 to 34 nodes so that's a big set and then you have to do when there are many edges and you have to do the clustering so it takes a bit of time but in the end you can perform this computation and then you have some trade-off that you can do you can either use a smaller set of differences with a small cost and this means your birthday will be more expensive because you have fewer values to target but then the second phase will be cheaper because you only keep the easiest value or you use a bigger set and then the birthday is cheaper and depending on your assumption on the cost of one block you have to select different trade-offs and then the complexity if you're optimistic we estimate it will be 2 to the 66.9 and with a more conservative estimate we get 69.4 so to conclude what we do in this work is basically propose a framework that we can use to turn a collision attack into a chosen prefix collision attack so a more powerful kind of attack and this is quite generic and we don't really need special property of the initial collision attack and we've applied this to SHA1 and so we get a pretty significant improvement from 2 to the 77.1 to something around 2 to the 67 and we've also applied it to MD5 and we get some results in some specific case if you limit the number of blocks to just two blocks so what we show is that the gap between collision attack and chosen prefix collision attack is not so big in the case of SHA1 it's between 5 and 25 so it's not a huge gap and much smaller than what was thought before so since this paper was published we've been still working on this of course we've been looking more at the low level details and we now have a better estimate of the complexity so here I was saying this range and as far as we understand now it will be around 2 to the 67.2 and we can estimate the cost to run this attack a nice way to estimate the cost is just to look how expensive it is to rent a GPU to run the attack and if you want to do this on the Amazon cloud it will cost about 2.6 million dollars so that's a large amount of money but it's also something that is probably feasible but actually you can get GPUs much cheaper than that and the reason is apparently some people bought lots of GPUs to mine crypto currencies some time ago but now it's not so profitable anymore and I don't think those GPUs are relatively cheaply so if you go to those kind of GPUs instead of the Amazon one you could actually run the attack for around 500,000 dollars so that's still a large amount of money but it's definitely reachable by a state level adversary so this attack can be run in practice so clearly if you're still using Shanwan please stop right now and just to conclude on some ongoing work so unpublished at the moment we have new ideas we're going to do a little bit more and we think we can actually get the cost below 100,000 dollars so that's now really reachable even for academics and so we're now working on implementing this attack and we hope we will get a real chosen prefix collision by the end of the year but of course we never know what kind of issues can come in the way and that will conclude my talk thank you for being here Thank you very much Thank you very much Questions? Oh that's for the birthday phase Can you repeat the question? Okay so Serge was asking in the birthday phase how we reach where was it written here yeah you're asking about this to the end of the square root of s right so yeah in so the idea so you start from two different states with some random differences and from each of those states you hash a number of random blocks from state one and from state two and you look at all the possible pairs and you check if one of the pair is in the set s so that's the basic idea now in terms of implementation of course you don't want to store everything so you need to use a memory less algorithm so there are some technical details but the basic idea is just a birthday search now we look at pairs from this side I mean the complexity is the number of random blocks we have to try here until one of the pairs of random block the difference is in the set s no not really you really have to look at the details it also depends this really depends on the hash function it depends on how you build your differential trails and what are the conditions also I think one big factor is how far your initial collision attack is the generic one so in shawan the gap is not too big so that's also why we don't lose too much going to chosen prefix in md5 the gap is much bigger I think we have numbers here the best collision attack is only 2 to the 16 so it has a big gap with the generic attack and so there we lose a lot when we go to chosen prefix collisions but in general now there's no no formula so it's not trivial so the question is when we do this birthday stage we need to detect whether the difference here is in the set or not the question was how easy is this so in general yeah it could be hard in the case of shawan it's not too hard because those values are built basically like this there are sums of values that are all in some core set and so this means basically some bits will never be affected or with very low probability so we just look for collisions on some specific bits and then we test the full difference if it's in the set or not we need some extra tricks absolutely yes so the question is what do we mean by theoretical and practical in this work it's just implemented or not implemented talking about it when I'm listing attacks on shawan here yes so I'm saying the first attacks are theoretical in the sense they were not implemented and then in 2017 the attack was finally implemented so there's no fixed threshold it depends on how much you're willing to spend to implement this attack so here what I'm presenting here the attack with complexity to the 67 I think it's almost practical in the sense you could run it if you really have money to spend but we didn't run it because we don't have this $500,000 so yeah there's no clear threshold between practical and theoretical okay so if there are no further questions so if there are no further questions then let's thank Tom again