 And give a great round of applause to Sebastian Eschweiler. So, hi everyone. So, I want to talk about how I defeated not Petya's cryptography. Some might say that not Petya would be Ukraine's scourge. And for those of you who don't know what scourge means, this guy right here doesn't know either. Quick trivia quiz. Does anyone know what this movie is? What's the name of the movie? So, in the next scene, Johnny Depp enters. In the next scene comes Johnny Depp. A movie by Jim Jarmusch. A soundtrack by Neil Young. Dead Man. Right. Great. So, great movie. So, if you want to know what a scourge is, then you could watch the movie. So, let's begin with my talk. This is what actually the official Ukraine Twitter account tweeted some time ago at the end of June 2017. And there was an outbreak of a ransomware attack which was noticed mostly in Ukraine but also all over the world. So, millions of users and also companies, large companies were affected. And the damage went into the billions of dollars. So, the problem there is, I mean, this is not the everyday ransomware outbreak you will have. And I want to give you a short glimpse into the non-Petia Universe. And also how I decrypt all these stuff that actually was encrypted by the ransomware outbreak. So, first I want to begin my talk with a differentiation. I want to draw a demarcation line because all this non-Petia Universe, there's much sub-summarized under this whole label. And I really am just talking about a small fraction of this whole Universe. So, I will first distinguish between what my talk will be and what my talk will not be about. Next, I will describe non-Petia's cryptography and especially non-Petia's cryptographic failures which I then will be exploiting in the remainder of the talk and see how can the entire research get their vacation photos back. So, what was this whole thing? The outbreak started, as I said, in June 2017 and it started as a fake update or a malicious update from a software called Medoc. This is a text software, one of the two official text softwares in the Ukraine, so almost every company has it installed on text accounts on their computers. Many private persons have it installed. It was pushed and then side-loaded. This file proceeded that, was then downloaded to the computers and it comprises several parts. And some parts are more interesting than others. So, one component after like half an hour time, it would start encrypting the files depending on the access level. So, if there wasn't any access to infect the computer with this MBR factor, then it would just, based on the current user level, encrypt the files based on that current user. And like of a better name, I would call this Misha component. I know it's usually somewhat different, something different. However, this was the best name I could find there. So, it's basically just a final encryptor with AES. My talk will not go about this part, this file. The next very interesting component was the spreader part. It's basically based on the eternal blue, eternal romans exploits that had been leaked by the general romans exploits. My talk will not go about this as well. My whole different universe as well. What my talk will be about is the actual not-patient component. And this is an MBR cruiser and I will show you in the next slide what it's actually about. So, the user will see something like this upon reboot. So, if the access rights are granted, so if there is some local admin installed on the computer or the correct password could be guessed by some attack, and then this dropper, the proceed dat, would infect the dropper by overwriting the master boot record with a custom boot loader. And then reboot the system after a time, usually being about 10 minutes, and then the actual not-patient component would kick into action. This boot loader shows this check disk screen, and in the background would find and distract the files on the file system and it could hold this file. So, the main takeaways of this slide are that we are dealing with a 16-bit code here. So, we're in 16-bit real mode. So, this means no proper file system, no file system or 64-bit code. And there are no Windows APIs. So, debugging all this and analyzing this is a key. However, we have something on the plus side, we can say BIOS, you know, the basically output system and with that comes a range of interrupts very well-described and have a really nice interface and being able to debug all this and this was a plugin that had been developed by the authors. So, let's analyze a bit and check the cryptography and why that was really hard. I want to start with Salsa 20 and then talk about Salsa 20 and then compare the not-patient implementation. So, Salsa 20 is a power connection algorithm. There you have a plain text and you will have some kind of random number, pseudo-random, pseudo-random and the operations on the plain text and then what you get there are four different variables. There is the constant part which is obviously not variable and what we will see about that is obviously a very wonderful variable but we will take a look at it again. And then there is this really new thing, what's so new about Salsa 20 is when you would stream a Salsa 20 and you lose a few frames, then you can adjust this counter, so you can adjust the offset of the current so that you can adjust the distance in the current stream and then you can continue with the connection. It's a very good feature and the counter itself is 64-bit wide and the hash size down here which also creates a 64-bit hash of this input parameter and then turns it to the input parameter. If you want to have more details about Salsa then the person who found this cipher should be in the room. I guess you could ask him the details on what Salsa 20 is. That's a very, very appropriate cryptosciipher. And you probably shouldn't use IES. What's important is that at every call or every instance of NotPetya this connection there are these three variables or these three input parameters that are constant. NotPetya is the key, the connection, the nonce and the constant, the configuration. Then the counter would only change and the only thing that changes is the counter. That means the interesting question was what is the length of the key meaning the number of different outputs that would come from all of this hashing function and this is quite clear. The theory is quite clear. There are 64 bits, so 8 bytes times 64 output bytes, so about 2 high 70 of the periodicity. One difference about the actual implementation in NotPetya and the actual implementation of NotPetya in theory is that these constants were changed so that there is only one string where Invalid Sector ID, that means the number of invalid sectors would break the specification. And the first problem that I noticed at NotPetya was something like this one. I can skip this one because I think everybody sees it so who sees the fail? Not many hands. Okay, then I'll explain it to you. I was just kidding. I just made a mistake, of course. I didn't expect that someone would touch this. So remember that there are 16 bit codes. We have these left shift operations that move a register N bytes to the left and the register here is 16 bits. So it has only 16 digits, so to say. And it's now shifted to 16, so 10 hexes. This leads effectively to the register being set to zero. And even worse, down here an 8-bit register is shifted to 16 bits. This is something that you wouldn't expect from a proper crypto implementation. And I was really intrigued why that is. I actually asked myself, why does it make sense? Does it make sense in a source code? And the authors of NotPetya were the views? Or what's behind it? And I looked at the Salsa20 implementation. I just googled it and found a nice website where there was a Salsa20 implementation. And there you would see this code. So you see here what you see here is in the conversion of the Indianness. And there are these shifts of registers. And you see this guy here, you in fast 16. So it's relatively obvious a broken implementation. Not right now. You need to know some things more about this. There are two important facts that break this implementation. And the two facts are you need to compile this code for 16 bits. And you need to look out what Visual Studio and what Visual Studio specifies here. And if you actually look it up, that it's from the standard int.header file from Visual Studio and it shows that it's simply changed to an signed int. And this basic type in 16-bit code is a 16-bit register and suddenly everything makes sense. That was in a way a problem with that the authors didn't check if the code actually worked. So they didn't use test vectors. And the person who wrote this code made the mistake that they didn't check if it runs on 16-bit code. And here are two bugs of the NotPetia implementation from Salsa 20. And I want to explain both because they are both relatively important. So both revolve around the counter variable. Just remember that this is the only dynamic input. The input that actually changes from this Salsa 20 implementation. And the first mistake is that it's a sector, a sector number in the memory. And you have to know about the fixed plate. From BIOS it looks like a set of sectors. That's 512 bytes of a block. And they just come out. So if you read a sector number, you get 512 bytes of data. And this is obviously not the offset in the stream. And there is the jump in the stream. And here the same variable is used to close the data or to close the end. I think that it's not really recognizable for the Cypher implementer. But if you analyze it, it would look like this. So you have the keystream, the keystream of two different sectors that follow each other. And then it starts with FF, then goes on with D7 and so on. And the next sector has all the same bytes identical. That's a big problem. A big mistake. Because this Salsa 20 algorithm would then be converted from a one-time pad to a many-time pad. So that's the same key multiple times. If you pay a little less on code, then the second bug is this large keyword. So if we think about it, we're in 16-bit code. So the large keyword doesn't mean that a 64-bit is pushed, but it's only 32-bits. So it's actually only 32-bits of this large keyword. And both mistakes are a problem for this Salsa 20 implementation. So in this slide I took two hex-dumps and these hex-dumps were within this two-tile hex-dumps used for this key. So basically two snapshots. One right before the two snapshots, one before this Andean-ness conversion and right after the lower half. So you very much need to know how to convert the Andean-ness. And what you can see is that the different variables into these sectors, these constants that Petya uses, North Petya uses and the key is divided into two halves and what you can see here is these non-s and what really stands out is this amount of nulls. So the upper part of the 64-bit variable is not used, it's not even filled. So the first problem here is the first problem here and after the conversion of the Andean-ness, you can see that here not really an Andean-ness conversion is happening, but here bytes are nulled and the result is that this originally 64-bit variable would now suddenly be only 16-bit long. And I said I'm going to implement as key length and right now we have 16-bit bytes, so 16-bit times 64 bytes in key length so 26-bit key length, so 4 megabytes in the end instead of 2 high 70. That was a very interesting observation that I would have made here and that would lead to that you can connect together with this many timespad, so that the keys are used again, so it's very easy to break these keys here. To sum it up, we have a very, very short key stream that is often repeated. In every sector that we continue, there is only one byte further. So only 26-bits remain of this whole stream and as I said instead of a one-time pad, there is a many-time pad. I couldn't come around to implement a small joke here, so I didn't have the time to implement a short joke here. But I would say if someone does this implementation, then it's a larger 20 than the larger 20. Sorry. The main goal is to reconstruct a crypto-expert that's the only thing I know about plaintext. That's the only thing I know because it's so straightforward and for the rest of the talks I will show how I did it. Without further ado, we use the weaknesses and try to reconstruct the plaintext from a patya-inficient drive. The modus operandi from patya looks like this. Let's stop with the left-hand side of the slide and concentrate on the right-hand side. We leave the left-hand side and concentrate on the right for those of you who don't trust the details with NTFS. So don't worry. It's pretty simple. Every NTFS partition is called MasterFileTableMFT and it would contain some metadata to the files and if the file is small enough the actual content even fits the content of the files in this MFT. As I said, MFT is just very large. If it is a pointer, a data run that shows a cluster or a sector on the disk on the partition that actually contains the data. One of these MFT records is about 1 kB in size. Let's zoom out a bit and look at how this implementation in NorthPetia is used. NorthPetia actually iterates all these MFT records and then checks if this record shows a detail. If so, NorthPetia will close the first kilobyte and then the MFT. This implementation is good for different reasons. First, for efficiency it only makes sense to close the first kilobyte because a lot of interesting information is in this first kilobyte. For example, the data header compressed files have these very important components, wouldn't be able to work anymore because most of them would be able to use a header to recognize what kind of format it is. This MFT can be considered as a table of thoughts to know pointers to the files. You won't have any pointer to what you should work on. This is very elegant from the implementation point. And still very important. The MFT is also very important to recover that first and then check what I first tried to set the MFT and then whether I can continue with the solution of the files. The metadata seemed more important to me for the first one. I am a very visual person and here I took two disk dumps from one of my test disks in one of my test files. I added a clean system with Notepad here. And I took images from the test pad. On the left is the unalloyed version and on the right is the unalloyed version. I just clicked to get a picture of the unalloyed process. On the left you can see an indicator of how much of the data is actually differentiated how disk is more or less being encrypted. You can see that the entire hard disk has been closed at the bottom part here. At the bottom you can see that it is dark and the MFT might be a misconception. So my idea was something like this and we have two MFT files. We have an MFT and the unalloyed data. I wanted to analyze the MFT and extract the MFT from the MFT and after this analysis my goal was to use a little box here and actually decrypt that and then I was able to get the unalloyed MFT on the disk and then I was able to recognize the unalloyed files on the MFT and also unalloyed them. That was my first idea. Let's start with the MFT connection. A known plain text attack. A MFT looks from the perspective of the key stream something like this. On the left side we have 0, 1, 2, the data input and in the splits we have the actual key stream that was used to decrypt the input. What you remember is the function that the plain text decrypt with the key stream is just an XOR function. We have the key stream, the plain text and the only thing that happens is an XOR with which we can change. What you see here is for the first record we have very few key stream or sample bytes but if we move through the tables with the analyzes we will have more and more sample bytes that we can collect. That should give us more clear view of the key stream. The question is does the MFT hold enough plain text to start a known plain text attack? Let's go into the specification of the MFT. Every MFT has two information. The standard information a defined structure and an attribute list the second part. A dynamic structure that would be a bit more difficult to find the plain text. So I concentrated on the first part and the first part I quickly showed that the first part is the entry in this MFT and as you see it starts with a file and then some hex digits and on the bottom part of the slide I put my additional levels of different sample bytes I would have multiplied by the number of sample bits with the reliability of this plain text. So for the first record I have a pretty deep reliability of just one byte with these two bytes that are usually 500 bytes per sector normally you have 512 bytes for each sector the keystream goes for 2 bytes for the entry number 100 I have a security of 4 because I have 8 bytes at the beginning of each entry in fact the problem there was towards the end I would have many many I had many unknown entries because I had to focus on the first part of the headers that means at the end of the keystream I could not do any meaningful analysis so for every set I would calculate a histogram and check how many different bytes are in plain text in this position so I would need some kind of plain text attack so I have the plain text I have for this offset each entry this histogram and the question is how do I get many MFT entries so I asked a few colleagues who just gave me some MFT copies of their system the result of this search was pretty nice for all the first entries you can not do much for these entries there are very few samples but further in the data stream there is a dramatic change of this relatively deep security of 4 I could reach over 30 it's very nice and after doing science I came on this table for these two attack attempts for comparison of these two attacks let's start on the right and read it to the left for the first approach I have about 98% bytes with records and the nice approach is you have a unencrypted hard drive we have a unencrypted hard drive from which we can start after infection and then you let the infection process and then you can differentiate a differentiated base and can try different snapshots and all the stuff and there are different key values and you can test it all I could exactly pinpoint how many of these records were and find out how many bytes or how many records I could decrypt almost all of the bytes which is also quite nice is that we have zero false bytes of bytes which is not as nice as I could decrypt about 1-3% of the bytes this whole byte is about 4 megabytes and I could only reconstruct about 50 kilobytes or is it so? that was my next question so I drawn another nice diagram and this is the keystream that is the MFT in this keystream the keystream is only filled with the sample bytes and with about 2 megabytes and the question is are many files in this area in this area are there any files that I could use so I would check how many files I could decrypt and where the files are and the keystream is used in different keystreams using the whole byte sometimes more, sometimes less but actually the whole region is used in a perfect scenario in a perfect non-plain text scenario I could reconstruct everything but this is a problem here and I will solve this problem in the next part of this talk and I will show you how I could handle this problem think about that if the data system is decrypted by Lepetia then the data system looks like this we have the MFT we don't have data showers who show data so no nice data names and so on but with the first level the MFT looks really nice almost 100% of the entries we can reconstruct that means a lot of the metadata that we have we can use to decrypt the rest of the data and for files you would have the very first kilobyte the first kilobyte is decrypted and the rest of the data and the most data is bigger than a kilobyte the rest is actually not decrypted that means you have all the metadata that we can use and use as known clear text for the known play text attack and I have three approaches to handle this problem and the question I asked was what kind of data is there because I think you can easily recognize the data type you have the data extension and you can assume that is the data type and there are different types structured data and unstructured data I have imagined there are source text which I would call and I am doing a histogram which bytes are more likely or something like that to use a clear text attack and the second structured data I have chosen the same approach as with the MFT and then quickly see how many bytes per offset that is the end and the last approach is to use more of the data not just metadata but also data in the data and let's take a closer look as I said what I have here is only this small amount of data the rest of the data is not encrypted and also the data name the size of the data is not encrypted and what I do is I create a database of known data like Windows system data you might remember the background images and so on there is a lot of data that are available if you just search for it and basically we have three different differences between these to decide what is the correct clear text the correct size of the data name and the hash of the rest of the data if all of these three match then I would say that is the correct clear text and there are some collisions that means that is nothing that is very easy to do so the original idea the original idea is to analyze the MFT and then in order to to decide we have to change the database of known data and an additional level of analysis that goes into this box and then we are hopefully in the position to decide the data and then we will do some research and let's see if this would be a good approach for real data in the real world so let's do research first something to drink so what I did here is I have a data bank of known data and a whole lot of database installation of Windows data that was 340,000 data and then I built this histogram that I already talked about and prepared my test setup and these data were not part of my data bank that I talked about and then I infected this machine with NotPetia and tried to reconstruct the data and these are the results I did this with four different runs so I made it four times I tried the first run and then I made the three runs and it looked like if I had this generic histogram I could reconstruct only two data at least 8% were able to use this standard histogram with the over 90% and together almost all data could be solved so much about academia and the problem here is if you use this on the real world then it can lead to more problems there were a lot more things like for example installation of windows that were not very common or where a lot of updates were played and that was very challenging for me and there are a lot of programs that you can extract from a well-known source like .NET or Java installation especially the JDK that leads to 10,000 Quail Texts and that was really quite nice and that was actually quite practical the drawback there was there was so much data that you could collect that my first attempts failed because the large amount would result in the admins constantly giving me more that the admins had to give me more work in my VM and that ended up with 228 GB of RAM in my test VM and in the end the MFT the master file table gets bigger and to make a comparison of this prototype this test setup that I made at the end there was a MFT that was about 26 MB and real MFTs were at least 500 MB in the real world or could also get bigger in the Gigabyte area that means there was of course a lot more and with that you could reconstruct the full 4 MB of the power of the key and to summarize that that means the conclusion of not-pattern is possible what the data system that I looked at was having all these first bugs after these first mistakes were made I could reconstruct all the data and the chance is that the urlops photos can be reconstructed and that is the end of my talk to summarize that not-pattern has a few very fatal mistakes regarding the cryptography I would call this not-soul-signal not-soul-signal it might be possible to look into this expand key expand key expand key there are a lot of cryptographic problems that I didn't look at but I know that some of you are professors and that might be a nice homework for your students that you look at to analyze the cryptography and you should note that this whole not-pattern you should keep in mind that the cryptographic mistakes are not only in the not-pattern there are these clums every single version of the not-pattern all of these not-pattern that you may have picked up can be solved and the last point that I wanted to mention if ever you are really a treyana then you should lift the not-pattern in a cabinet and wait for a lecture like this one and then hope that someone will have these clums and then you will have your vacation pictures that would be all from me thank you for your attention thank you very much Sebastian now it's time for questions please queue up by the microphones time for questions please the microphones thank you very much for sharing your findings with us I'm from russia the largest harbor in europe and as you might know we were struck by petia the terminals went down for a couple of days a couple of hundred million millions of damage and your price is quite theoretically so now the practice if you were there in the summer with these findings would all the companies stop running again or would you have the data it's a practical approach we have a practical approach and we have some locations where we were able to help the customers it's a a very practical thing and I was trying to I wanted to this slide here to make it clear that I wanted to talk about the real world in this scenario and I looked at a lot of fests with not petia and I could see almost all fests that could not be fenced there were mistakes level 8 I know that I don't know thank you microphone 6 please thank you in the beginning you mentioned that basically the key was shortened 26 26 yeah from that point wasn't a brute force attack way faster and way more reliable no no no don't get me wrong I don't understand the length of the key stream was 2 to 26 so 4 megabytes so you won't be able to brute force that so do you get that so the number of bytes was 4 megabytes and you couldn't be able to brute force that yeah but you already mentioned at the beginning that you basically shortened it down to almost 2 to the power of 26 and I guess you quite calculated yes yes I understand the question but I think the point there you don't understand it doesn't matter the space the length of the key was 2 to 26 I'm not very good to transform that into a decimal representation 2 to the computer guys 2 to the 26 how many bits is that so again a lot so it's 4 megabytes of key length you couldn't just just brute force that because you would have each of 4 megabytes got that the key is not 2 to the 26 the key is not 2 to the 26 but the length of the key is 2 to the 26 got that the key space would be longer than the bible you know if you brute force the bible the text of the bible would give you enough time longer than the bible you couldn't just just brute force that you could just there's this theory with the afs and the writing machines but I think you just switch 2 numbers questions from the internet please does the MFT work the same for petia and not petia does the MFT work the same for petia and not petia the algorithm behind it is the same the cryptography differentiates in such a way the constant number different so this thing here that's different it's usually like expand key something something and here it's invalid sect ID it's in a way different but the encryption the bytecode the algorithm that's actually the same any more questions there's still questions then please give a great round of applause to sebastian then please give a big applause to sebastian and that will end