 He's a malware researcher at checkpoint and comes from the Israeli Institute of Technology And he's gonna to tell us something about key streaming reuse and how to avoid to spend nights and nights Trying to understand a crypto protocol, but what was it worth the time to use? I mean in terms of spending your night developing these other software for automatizing these instead of checking by yourself Let's see. Thank you. Please give him another round of applause Hi I'm Ben. I'm from the checkpoint malware vulnerability research group This was mentioned earlier, but I felt like mentioning it again What I do is I look at the applications of theoretical computer science to problems facing that we come across in the infosack field and In less whitewash terms what this means is that I am and a mature mathematician was infiltrated the industry somehow Don't tell anyone especially not the people working with me It's hard enough to keep this under covers as it is considering that when I handed my boss the first draft for this Presentation he took one look at it and he said you're really going to present this with all of those formulas They are going to lynch you So first of all, please don't lynch me second of all. I really tried my best to take away most of the formulas There's one formula somewhere in one of the slides So this talk as I said it is about Automatic detection of a curious vulnerabilities. I would like to dive right into that Unfortunately, first I'm going to have to explain what the curious vulnerability is and Unfortunately to explain what that means I'm going to have to explain how a stream cipher works So let's go over it as quickly as we can Okay This is how a stream cipher works Basically, there's this machinery called a pseudo random number generator and the pseudo random number generator Excepts a short key and outputs what what basically looks like noise to anyone who is not familiar with the key it looks like noise And what you can do is you can use it for encryption How you insert the symmetric key known to both parties trying to communicate say Alice and Bob and The pseudo random number generator outputs this key stream that looks like noise and you X or the plain text here it's the smiley image and You get the ciphertext which is a plain text X or the key stream and to the uninitiated Anyone who is not familiar with the key or you can see me It looks also like knows this is a property of the X operation you take a nice plain text that you X or it with What looks like noise? We also get something that looks like noise Next what can you do now? There's the ciphertext looks like noise But if you X or it again with the same key stream Which is what Bob on the other side of the communication will do get the plain text back back again Why because this is another property of the X operation if you X or something with the same element Whether it's noise or not noise twice you get the that element cancels itself out you X or with noise X or with noise again the noise cancels itself out and you get that plain text back again, so What's curious curious Astonishingly it is when you reuse the same key stream twice to encrypt two different plain texts and As you can see, let's take a certain key. This is the key and We X or the text with the first latex with a key and the second plain text again With the same key and we get two bunches of noise as we said earlier if you X or with noise You get something that looks like no as unless you know the key and advice is generated the key stream And now if Sans in the middle of the communication She's if dropping onto the on the communication and she gets this Noise and this other noise and it all looks like noise to her and of course she can't do anything And there is no such thing as a curious vulnerability. You can all get up from the seats and go home No, not really This is what happens if you encrypt two different plain texts with the same key It is possible for Eve who has access to both cipher texts to X or the two cipher texts And what did we say earlier about having the same element twice in an X operation here? The key appears twice You're the first plain text X or the key the second plain text X or the key and if you X or the two cipher texts Resulted the key cancelled itself out and you get this the first plain text X or the second plain text This is not something that you want to enable As you can I can see here. It's very easy for you to take a look at this thing and Have a very good idea of what the original plain texts were now This is due to the redundancy in the two original plain texts in Our context the context that we will be working with and mentioning in a moment. It's a bit more difficult than that It's not so obvious. It doesn't just jump out at you like that. In fact, there's all Effort and their algorithms dedicated to extracting the original plain text like that But this is an example so you can look at it and see how horrible this vulnerability is if if it exists It's not something very nice so What is it good for why do we we want to detect curious here's the first example This is a document for the Venom a project the Venom a project ran for 40 years from the 1940s to 1980s Initiated by the USA to Israel upon a Soviet communication. This was started before the NSA even existed and The Soviets they were reusing their key strings their one-time pads a one-time pad when used correctly it is an unbreakable cipher, but they were reusing their one-time pads and What happened is that the vulnerability that we saw earlier arose because of that and US intelligence was able to extract Information from those encrypted messages for the following 40 years It isn't the case that the Soviets continued sending vulnerable information for 40 years It was only for four years and then they wise up But for 40 years the Americans continued working on this project and starting more and more information And they got all sorts of useful intelligence including the names of and the identities of spy rings and such So this is one use for it if you can look at the traffic and say, huh, there's been curious here It's helpful. You can then start to attack the traffic and look for information This is another use now. I will be shamelessly promoting my colleague Nittai because he's awesome Nittai Earlier this year we researched a certain ransomware variant called dear crypt It's basically like a crypto locker wannabe. It infects your computer and it starts encrypting your files Unfortunately as Nittai dug deeper into the source code it found that their crypt commits curious It basically it uses the stream cipher RC for this is a good point to mention that everything I'm talking about right now applies only to stream ciphers and one time page which are like stream cipher and not block ciphers like a yes and that's and the like and And I found out that that's what happens there's a Every single file encrypted by dear crypt using RC for Uses the same key. It was a five-letter key black in all our case And this means that all the encrypted files could theoretically be recovered using Depending on redundancy in the plaintext. Anyway, it's not something that the malware offer planned on Now the funny thing is that we didn't even have to go that far because the malware author actually Included the key in every file. We don't really know why Probably it seemed like you would they did to him at the time and So unless the malware author had done this thing We could have recovered that plaintext or a large part of the plaintext anyway because this comes to use now If we had a way to look at the files and come to an epiphany, there's been curious here And it I would have had to sit down in front of the screen looking at either pro and the eyes are There's tears of blood in his eyes for nights and nights on end. We could have just looked At the files and say hey, there's a curious vulnerability here and proceeded from there and saved a lot of time and effort This is something similar. This is traffic from the ram knit malware ram knit malware came out around 2010 it still credentials and it was used to commit a financial fraud and Ram knit sends its traffic in a special home group protocol over port 443 it's not actually a cell but it's over port 443 and this protocol contains blocks and some of the box can be encrypted and Every single block is encrypted with the same key with the pseudo random number Generator restarted before every use so basically here you have curious on the block level every single block is encrypted using the same key so if we can look at this traffic and Understand somebody look at it and say, huh, there's been curious here again. That's useful because Coming into this traffic. We don't even know anything about it And now just if we could just from looking at traffic say curious has been committed here. That's interesting not we don't expect key reuse in a regular traffic and This is our last example Lest you think that only malware authors and shady characters commit curious Microsoft committed in in the 2003 version of office in their document encryption function Basically Every time you saved the file you modify the file and you saved it again It was encrypted again with the same key the key was bound to a file It was a single key every time and someone who monitored your directory over a long time could look at the files again and again And see as the basically different plain text different files are encrypted again and again with the same key stream and This enables a curious attack So it took time for people to catch on to this if we could only look at the files and come again to an epiphany Oh, wow, there's curious here. It could have been detected much earlier so how do we Manage to actually do this thing and I spent four slides explaining how wonderful it would be if we could just look at the hip of Bytes and understand there's been key reuse here. No, that's a nice, but how do we actually pull this off? well Do you remember has any one of you ever got stuck in one of those early 90s quest like someone the sorcerer and monkey island? When you are completely stuck and you have no more ideas of what to do. What do you do? That's right. You try everything on everything else until something works. So this is What we're going to do here. Basically if we take every bite from our original input and X or eat with every other bite We're going to get this space where again every bite is x or is every other bite So for example this Square will be the result of x or in the R with the R and it's going to be an albite because I think x or if itself is an albite and Every square every tile in this space is going to be basically the exos of their character from its column and the character from its Row, what is this good for I'm going to see in a moment First of all, this is what it looks like as you can see along the diagonal everything is now bites Right because it's an a character X or itself so What's our game plan based on the above? the if we take our input let's look at Typical input that we might want to operate on and find something interesting. Let's take this input Right, there's all sorts of noise in here somewhere in here are hidden to ciphertext Each encrypted each are they are both two different plain text and creeper with the same key This is the thing that we are out to find out and if we X or every bite with every other bite summer in here Right, this is the first bite of this ciphertext X or with the first bite of this ciphertext Why do we care because of the phenomenon that we saw earlier? if we X or the third the first bite of the two ciphertexts the key bite cancels itself out and the same applies if we X or the second bites of the two ciphertexts and Along this diagonal, right if we go up One unit and to the right one unit. We basically advanced to the next character in both Strings so along this white line you can basically you will be able to see the two ciphertexts with the Extra drift each other basically the two different ciphertext extra with the shower can be read along this Diagonal and why do we care again because we saw earlier what happens if you do that the key cancels itself out You could actually earlier see the smiley face and the sandcache message here. It's not going to be so obvious But it's still a step that helps us What we need to do why? because when you take We come into this whole thing assuming something about the plain text distribution that we are going to see plain text is Different from random characters, right? You expect letters you expect punctuation you expect at any rate The distribution is different as opposed to random characters that are distributed evenly The this is not the plain text distribution. This is the extra test distribution This is the distribution that you get if you pick a random plain text character from the plaintiff distribution and then a number Another random character from the plain text distribution and you extra the two of them You get another distribution and this distribution is again different from the random distribution so It looks to us differently, right? If we look at the bite that came from a two different plain text characters being extra It's going to appear different to us from just a random bite in the long run So what are we going to do? We're going to scan this space that I mentioned earlier. We're going to first of all construct it and we're going to scan it diagonally the diagonal fashion because the Demning evidence is like a crime scene and the damning evidence where all the bites are going to look suspicious Look like they came out of a different distribution or the random distribution They are going to appear along a diagonal such as this we of course We don't have all those colors and all this color guide when we come into this All we have is the input But we know that if we scan this diagonally and it just so happens that there's curious here somewhere we're going to come across this diagonal eventually and We're going to look basically each bite is like a little piece of evidence that May be pointing in the direction that there's been curious here And we are right now looking at the two ciphertexts that had been encrypted with the same key stream excellent with each other And we're going to work along the diagonals and each time we come across a bite We're going to pick it up and look at it and say This looks like evidence for our hypothesis or maybe against our hypothesis and if we walk along a diagonal and we found We find an overwhelming amount a sufficient amount of evidence that supports our our hypothesis that there's been curious here Then we raise the alarm and we say this is enough evidence. There's no way this was a coincidence there's been curious here and the vulnerable ciphertext to the curious Exploiting attack are in this offset and this offset and their length is so and so This is the formula I warned you about it earlier and The this this is the TLDR of the math involved basically I said look at the evidence look at the evidence I'm going to have to mention what I mean by that basically there's a A Quantifier rubric for deciding how much a bite a piece of evidence Influences our hypothesis and makes us think that makes us think that the hypothesis is more likely This is basically a computation involving the disparity between the probability that this might will arise from the XOR text distribution in blue that is earlier and The probability that it would it would arise randomly I'm not going to actually go into the formula, but anyway It's important and Regarding the question of how much evidence is enough. This is an important question. You can set the bar high We can set the bar low We for the sake of the proof of concept. We set the bar such that One false positive is what we're willing to live with of course you can send set it to lower chance it depends on the context where you're going to use this thing and The question is if we set the bar of evidence that high can we actually detect something? because It may well be the case that if we're just demanding When we come across the actual thing that we're looking for the two excellent ciphertext We're not going to be able to find them because there won't be enough evidence but as it turns out there's this formula it We found it using what's called the Chebyshev's inequality the long and short of it is basically that as long as the string the ciphertext That we're looking for is long enough. Everybody's going to give us some amount of evidence positive evidence that tells us that this may be the real thing and Basically if the string is long enough, we're going to have enough evidence The only question is it's a numbers game of what is the chance that we're going to fail? Anyway, even though the string was long enough in theory and we expected it to work. This is what Chevyshev's inequality is for it Bounce from above the probability that something unlikely will happen. So if the string is long enough and if you look at the formula you will be able to infer that The long enough is logarithmic in the length of the input which is good If we double the input we just need one more character in the ciphertext for our alarm to ring and Basically, this is what it is. We look at the formula and I Don't really expect anyone here to just you know look at it and Realize how the hell we came up with this thing But it's just a proof that we looked into it. This algorithm should work in theory now That's all there nice and good but We're going to have to actually Show how it works in practice because I don't think looking at the formula convinced anyone here very much Basically, we're going to look at sort of heat map You're going to see what the algorithm sees when it operates when the algorithm Iterates over the space that we constructed earlier is going to look at different bytes and each byte is going to look like more evidence or less evidence for the Hypothesis that we're looking at the case of Kiryu's and The areas the bytes that are That contribute more evidence are going to be in red and the bytes that contribute less evidence are going to be in blue and The algorithm looks for diagonals that are basically tricks. There are lots and lots of Evidence for our hypothesis and they're going to appear in red We're now going to look at the heat map of the evidence heat map of the ram knit communication that I mentioned earlier Well First of all, there's this ignore this. This is the main diagonal This is the diagonal where the input is exerb if itself it's all now bites So of course it looks suspicious to the algorithm. It doesn't look random at all, but if you look a bit further Can you see it? Here it is. This is the along diagonal Unique diagonal of where the algorithm is going to detect the Kiryu's because there is a streak of evidence here along this diagonal This is actually elliptic for anyone for some reason can't see laser pointers and this is This is the map for the second case. It's a bit more difficult I don't know if anyone can see it here, but again, you have the main diagonal and It's trivial, but this is the heat map of two files encrypted by the dear crypt mower that I talked about earlier and the two files were The two files were encrypted using the same key stream So we should be able to see the Kiryu's if we look at this it's going to appear as a red diagonal somewhere here and it's here, right The diagonal is over. It's the red angle of where every bite basically looks like it looks suspicious looks more less like it came out not of not out of a random distribution, but out of a distribution of plaintext bytes X or plaintext bytes right So I don't know if you notice, but basically we have succeeded in our plan we have managed to just take the input not knowing anything about it from beforehand and computed all sorts of Properties regarding the X source of the different bytes and the algorithm is able just like you are able to look at this diagonal and Automatically detect where the Kiryu's is so I can the algorithm Let's see if we have a time for the demo Hmm. I see the other demo isn't working for some reason I never mind the other demo wasn't as exciting as this one because it basically it showed you a script running The script that generates this basically the Romney's malware that with earlier it Sends this communication and the script iterates over this and eventually the script run X was to actually see the script running across this diagonal and Accumulating evidence it says I see this amount of positive evidence this amount of positive evidence and it grows and it grows Eventually it terminates because it runs across the end of the input And then it says aha, I found it. I found the Kiryu's and it outputs the two offsets So actually this is the more exciting thing There's no need to look at the demo to understand what's going on here But we have succeeded in our plan by Using the game plan that I have outlined earlier and I really hope that This thing will be useful. I plan on uploading it and as At the moment that I can actually get the code to a state where I can actually give it to someone to look at and then later Look them in the eye. This is going to be uploaded for the sake of anyone wants to take a look at it and This this is basically it I hope that you now Understand better what Kiryu's is and how we detect it's using this method Okay, and No, any questions at all. Okay in the heat map This one Test I was wondering about the totally not suspicious horizontal line at the bottom. Yeah, the horizontal lines. They are probably They are artifact. Basically. What's a horizontal line? This is a specific character from one of the From the plain text X or with the rest of the decipher text It's basically it doesn't have an application to what we're talking about earlier because specifically when you extra to a cipher text It's going to appear along the diagonal Probably it's one a specific bite of the cipher text when extra with itself produced some sort of anomaly Thankfully, it doesn't appear along the diagonal. So we're not going to suffer any false positives because of it Okay, thanks once the Segments where the the same stream stream size the cipher stream the key stream is used reused Once you identify these segments, which amount of manual Decryption do you need to recover the plain text? Actually, I saw and Is I published exactly on this there's automatic methods for Extracting the plain text in this case where you already have to cipher text that you know are vulnerable to the attack It doesn't work on a hundred percent of the cases But I saw this applied to the actual case of word 2000 and free encryption and it works pretty nicely It's like you don't have to use a manual work in order to the crib is there's automatic methods for doing that Yes So you need to know the bite distribution of the plain text. Yes. Yeah. Yes, that's right You need to and actually these e2 maps were generated with Guess we came in with a guess of what plain text more or less looks like You're going to have some uppercase letters and orcas letters and characters and such and Hopefully this our guess is going to be close enough to any case of actual plain text That is the alarm is going to raise regardless of the actual precise distribution There's one question from IRC. Yes. Thanks. The question from the internet are the docs somewhere available. You refer to Are the documents available somewhere The documents No, they're not but as I said I'm planning to work this out and upload this and I will put out the notices At the moment that this will be available to Honest to finish thanks for showing these vulnerabilities with stream servers in in general today, I think For anyone the recommendation is not to use stream ciphers at all So I think if anyone including my word designers Want to design their own crypto system, which is obviously a stupid idea Then they should better be used be using these a ad ciphers Authentic encrypt encryption Cyphers yeah, this is true then because not only because what you didn't mention was Just encrypting the data in the way you did alone is not sufficient. You also need the The key message died just along with it and which which these newer cyphers would actually bring it And that's the reason why in the new versions of our transport layer security that's being worked on You don't find any of these stream ciphers anymore because they're insecure Yeah, this is true actually the malware Malware's often used stream ciphers because they're easy to implement Rc4 for example is very popular because it's easy to implement and This is why you see it as you can understand Achieving actual security good security is not the first thing on their minds I don't know what Microsoft was thinking to themselves But the malware offer doing this basically they said to themselves, okay, I'm going to use encryption But apparently he didn't go much farther than that Yeah back to three hi, so just a quick question about here Just a quick question about the visualization part So are you doing like just bit for bit because this is like two colors, right? Or is it more colors? the call basically I looked at a random distribution of RGB values from red to blue and I computed its standard deviation and its mean which is of course in the middle and Then basically I looked at the distribution of evidence along this values of evidence of the P here and I use the matching function from the first mean from the evidence array to the mean of the colors and I computed calibrated grades right Off the value of the evidence values relative to the mean of the evidence and divided by the standard deviation To come up with this is like bite for bite for bite. Yes, this is bite for bite, right? You said that this this stream ciphers like RC4 are vulnerable to this problem, but what about when you use CTIS and CTR mode or in counter mode or something like this what's also Vulnerable to this problem. I think when you I'll just explain the class of ciphers vulnerable to this Problem is every cipher where you come up with a key stream of some sort Basically, there's a random key and you get the cipher text by x-ring the plaintext with this key stream any other kind of cipher as I said earlier block ciphers or anything like that is not Vulnerable to this kind of attack But what happens when you use in the counter mode and you reuse the nouns I Think I don't I don't understand. Can you please repeat the question? When you use the counter mode and we use the nouns from I Think that's wouldn't this result in the same problem. I think When you When you use AES and the counter no, no, okay No, because that you mean because there's x-ring somewhere within the operation of the cipher No, this is really really a specific artifact of the fact that you do a linear operation This ciphertext is a linear function of the plaintext and the random key the moment that your Encryption operation involves extra operation, but it isn't wholly linear There are stages of it that make sure that it is not linear then the result is not vulnerable to this kind of attack anymore Okay, thanks. Well, I guess we're done. So thank you Ben again