 So yes, I'm quite happy to be here. And the topic of my talk today is improving attacks on round produced specs 3264 using deep learning. And I won't have, there are quite a few results in this paper which I won't have much time to tell you about so I will start with giving you sort of a bird's eye view of the whole paper. And so the first thing that I'm doing is I, we are building differential distinguishes using machine learning for specs 3264 which are superior to classical differential distinguishes even using the full difference distribution table which we also calculate for up to eight rounds of spec and one chosen input difference. And these neural network based distinguishes achieve a higher classification accuracy than the differential distinguisher based on the full difference distribution table. And the reason it can do this is that the neural network based distinguisher exploits the ciphertext pair difference instead of the distribution of differences. Then having these distinguishers we use by easy and optimization that is basically that's a generic optimization procedure that is used for instance in machine learning to optimize hyper parameter sets. Basically it's a method that takes a function that is difficult to optimize but where something about the shape of the function is known and where you want to find maybe the maximum of the function and tries to gain from some few evaluations of that expensive to evaluate functions some knowledge about where the maximum is and we apply this to the problem of searching for the last round key in the spec encryption using one of our neural distinguishers and this leads to an attack that uses very few trial decryptions namely for 11 rounds. For our 11 round attack we have in total a few million trial decryptions that we do and that's roughly that's a few million times less than the best attacks on 11 rounds that were previously in the literature. So the best previous attacks take about two to the 45 trial decryptions and our attack takes a few million trial decryptions. However evaluating a neural network is expensive so this advantage doesn't fully translate to a two to a speed up of the attack what we get in the end for 11 rounds of spec is an attack that is roughly 200 times faster than the state of the art. And we use also manual crypt analysis we use manual crypt analysis to define learning tasks to design the overall attack and to extend trained distinguishers to more rounds and you can absolutely try that at home because there is code available on GitHub. So one other thing that is different about this attack compared to previous attacks is that this attack does not use the key schedule whereas the best previous attacks they generate a large number of candidates for the last four round keys and then test every one of those keys against a known plain text cipher text pair and this is only possible because the key schedule is efficiently invertible and this attack that this neural network based attack doesn't need that because we can solve for the round keys one by one. So we would be able to break 11 rounds back with essentially the same complexity also if it use the free key schedule or if the key schedule was changed to not be efficiently invertible anymore. So it's a bit more resilient against changers and key schedule. The neural distinguishers they can we also show that they can efficiently recognize sort of randomized output cipher text pairs. So if you take a pair of 32 bit values and this pair has a corresponds to a high likelihood output difference and it's not necessarily a pair that would appear with high likelihood in the output distribution and the neural distinguishers can recognize that unlike traditional differential distinguishers. And in the paper we show some actual examples of this. So we show for instance one example of a cipher text pair that gets an almost zero neural distinguishers score and which belongs to a two to the minus 26 likelihood output difference and it has in fact we show that it has a zero chance of being an actual that it cannot be that it's an impossible output pair actually. And then one other thing that we do in the paper is we use few shot learning on cryptographic problems. Few shot learning is something that is well, I mean a common idea about machine learning is that when you might be able to do interesting things with machine learning but you need lots of data whereas humans we can sometimes learn from one example or a few examples of some problem and do useful inference about that problem later on and in machine learning this is also possible. There are a range of techniques that go under the heading of few shot learning so you show a machine learning algorithm just a few examples of some output distribution and then it might be able to recognize that output distribution and we also show that this is to some extent with some prior knowledge that this is possible to do on spec so we can observe for instance three rounds back for a few million examples with a random input difference and then give it six rounds back to classify and after seeing a few examples of the new distribution it will get a distinguisher that has some statistically some advantage that is very statistically significant over random guessing and we can also use a few shot learning capability to find good input differences if we didn't know them without using prior crypt analysis. Finally in the paper we also deliver some partial insight into the source of the additional signal basically showing that the neural networks have an internal data representation that groups the output ciphertext pairs into some more fine-grained system of equivalence classes than the difference equivalence classes and as an additional benchmark we also develop differential distinguishers for five rounds back that exploits the ciphertext pair distribution perfectly and these are of course in a little bit better than the neural networks so they are a little bit better but also much slower. So spec is a family of block ciphers designed by the NSA in 2013 and what the member of this family that we are looking at today is spec 3264 and this is the smallest member of this family. It's a lightweight error X construction has 22 rounds and it has a non-linear key schedule that basically reuses the round function. In prior work the best attacks on 11 round specs used about two to the 46 reduced spec encryptions and used two to the 14 chosen plain text on average. There are also attacks on 12 to 14 rounds and but they have higher complexity and these attacks they would all depend on the key schedule being efficiently invertible whereas our attack breaks 11 rounds and roughly 15 minutes on average on one thread of a normal single on a normal desktop computer and this is roughly equivalent to about two to the 38 reduced spec encryptions and one thing that I should probably emphasize is that this represents one time data trade off among many so if we used more data we could build faster attacks for instance so we could be roughly about a factor of 10 or maybe 20 faster if we used a lot more data but with these settings the attack is comparable in data complexity to the best attacks that are known and so it makes the attack nicely comparable to the literature. Now to calculate the difference distribution of round reduced spec to get a baseline for our distinguishes we do basically just standard things so we treat the we treat the differential transitions from one round to the other as a Markov chain with two to the 32 minus one possible states and we calculate the distribution the predicted distribution of that Markov process given one known input difference and this gives us one row of the difference distribution for table for spec 3264 and we choose this input difference here which is oops we choose this input difference here which is known from the literature we calculate the induced difference distribution then there are various sources of model error for instance we use double precision arithmetic and not exact arithmetic then there could be dependencies between transitions but empirically this model works very well I mean we have checked some of we have checked for instance the highest likelihood transitions and they are predicted extremely well by this model and the whole process is straightforward but it's somewhat expensive it takes a few hundred CPU days of computing time and about 35 gigabytes of memory for the final computed difference distribution table. Machine learning is an umbrella term for various techniques that aim to make agents or machines learn from experience it's very useful for some problems that are also for some high profile problems but it's also not difficult to find simple problems where a naive ML approach would fail or struggle very badly to find a solution for instance just if you dumped 64 bit numbers and disparity on a neural network without doing anything else then I think that would be a fairly hard problem for the neural network to learn to calculate the parity you can do it but you have to give it some other training data and there have been spectacular successes in recent time for instance the game of go or poker or machine translation and I think it's also important to realize that these mostly use machine learning just as one crucial part of a larger problem solving system and that's not different here. So a neural network is basically a differentiable family of functions that is parameterized by some weight so you have some inputs going in here at the you have some inputs going in at the input layer and these inputs are then multiplied these are real valued values they are then multiplied by some real valued weights then you apply some activation functions and pass to the next layer and you can approximate a wide range of functions by these neural networks and to find a good neural network for a particular problem you define a loss function you let it lose on training data and you try to find good weights by stochastic gradient descent and then you test your system whether your system actually has learned anything you test you find out by testing it on some other data on data that is not the training data so in cryptology machine learning hasn't been used very much and I think the outlook of the community has been fairly bleak in terms of machine learning being useful for crypt analysis for instance how I have found here two quotes one by Bruce Schneier from his textbook Applied Cryptography neural nets work well in structured environments but not in the high entropy seemingly random world of cryptography and then another quote from two machine learning researchers who wrote a fairly high profile paper on two neural networks that learn to protect their communications from a third network if that was trying to break their communications and they also wrote neural networks are generally not meant to be great at cryptography so I think this sums up the community view the prevailing view fairly well but there have been a lot of works on site channel analysis so there have been some other works on machine learning and cryptology there is some work by Klimov, Mitjagin and Shamir who used neural networks to break a public key encryption scheme that was itself based on neural networks then one work by Sam Gray-Danos building a model of the Enigma using a recurrent neural network and also there is a recent work from ICLR 2018 that showed that GANs can break simple short period visionnaires ciphers in an unsupervised setting that was a work aimed at improving machine translation techniques so to train a distinguisher for reduced spec we just generate a few million real and random examples of ciphertext pairs this is our chosen input difference for the real pairs and this is very quick this takes a few seconds then we train a deep residual neural network that's a type of network that has been quite successful for a variety of applications from image recognition to board games and then for five to seven rounds of spec encryption if you have a GTX 180 TI graphics card that gives you a better classifier than the difference distribution table after a few minutes of calculation so this is very quick and for seven rounds you can also use a slightly more expensive training scheme to get some better network and for eight rounds you've just failed but you can use curricular training i.e. designing a sequence of intermediate tasks to learn on and then you get also a distinguisher these distinguishers they are quite small they have about 44,000 weights and if you truncate their weights of the seven round distinguisher to 16 bit floats then this doesn't seem to hurt the distinguisher so you can pack the distinguisher in about 90 kb storage instead of 35 gigabytes for the difference distribution table and the results are quite good you get better accuracy than the difference distribution table across the board in all of these tasks for five to eight rounds and the advantages are quite small but if you have more than one ciphertext pair then of course this advantage will be larger so to build a nine round attack from this we take the seven round distinguishers we add one round at the bottom by manipulating the inputs which we can do because the first addition of a round key happens after the only non-linear operation which is the modular addition and we add one round at the top by brute force trial decryption and then we use a number of chosen plain text pairs so we use in this case for instance 64 and classify either by the difference distribution table or by the neural network and when we do this we see that the neural network gives much better mean and median key ranks than the difference distribution table so for instance the median key rank of the neural network is one and the difference distribution table is nine so the mean is roughly five times lower so then to build an effective attack an effective key recovery policy we look at the wrong key randomization hypothesis which if the wrong key randomization hypothesis were to hold then wrong key decryptions would tell you nothing so you would maybe see one of you would see maybe one ciphertext pair being much higher rated than all the others but from the wrong key decryptions you wouldn't get any information about the value of the real sub keys that you are targeting but if you look at the distinguisher response of the neural distinguisher as a function of the difference of the bitwise difference of the wrong key to the real key then you actually see that there's a lot of information in the wrong key descriptions and we use that by trying to decrypt a small set of ciphertexts under a few random keys then we observe the average distinguisher response and then we derive a new set of key hypothesis that maximizes the likelihood of the distinguisher responses and we repeat this a few times like three or five times with maybe something like 32 keys and then one of those keys that we find will usually be the right key and to build an 11 round attack we then extend this nine round attack to 11 rounds by adding a two round initial differential trail we recreate the conditions of the nine round attack i.e. that we have a couple of plain text pairs with this desired input difference by using some neutral bits for the initial two round trail and we apply the key search policy to derive a short list of key candidates and we detect success by looking at distinguisher scores returned and if the scores look good then we derive a short list of key candidates for one more round and if those scores then look good which are then judged by another neural distinguisher for one less round then if they are sufficiently high we output a key guess for the last two round keys and otherwise we ask for more data or we look again at data already acquired so in conclusion I think one can say machine learning worked really well in this instance the neural network efficiently exploits the ciphertext pair distribution it is crucial to choose the right learning task and choosing a good model structure in order to be successful we also use some manual crypt analysis to derive a competitive attack from the distinguishers we automatically more or less derived an efficient key search policy that is very helpful to form making a fast attack and one thing that I would like to also add is that neural networks that have a reputation for being black boxes but in this context I think it's more appropriate to think of them as gray boxes because for instance with just access to a black box distinguisher you wouldn't be able to do the few shot learning but with access to a neural network that has learned to recognize three rounds back you can very quickly then learn to recognize six rounds back with a good chosen input distribution and again the code is all in GitHub and that's with that I'm done Thank you very much I think an immediate question would be whether this applies also to Simon The weather what? Did techniques apply also to other block ciphers? I expect they will apply to other block ciphers here I would expect so I mean you may not be able I think for Markov ciphers you shouldn't obviously be able to get something a distinguisher ever that is better than the difference distribution table but I would I say there's nothing I think that is spec specific here and I didn't try it on a million ciphers I tried it on one ciphers on spec and it worked for spec so I would definitely expect that this will work for other ciphers here Okay, is there any other question? Thanks for a fascinating talk I have a question about an alternative about technique which is used by the machine learning community and this is the autoencoder approach when you are trying to use autoencoder you try to teach the machine learning to find a kind of condensed representation of the day that it is given Are you aware of anyone who tried to use autoencoders and whether he was successful in cryptanalytic approaches? Maybe you're giving a bunch of plaintext and a bunch of ciphertext and you are trying to find a concise representation that will translate to the multiple plaintext to multiple ciphertext I am not aware of anybody who has tried this I've tried some I've tried some unsupervised techniques to find some connections between plaintext and ciphertext I haven't tried the autoencoder but I've tried some other techniques that was, the results weren't uninteresting but there was nothing sort of that that was clearly better than known techniques that I found in that area Any further question? Then let's thank the author again