 And I got to cheat, all right, and we're live. All right, so we're here with Mike Stay, another fantastic speaker for virtual DEF CON safe mode. He is covering how he recovered a six digit sum of worth of Bitcoin from an encrypted zip file. And I guess if you just wanna like quickly go into your talk, spend just like a minute or two, and then we'll start asking you some questions. Yeah, sure. So short summary is I used to work as a reverse engineer back in the late 90s. I broke the zip encryption that was used by InfoZip, which was the open source version. And so everybody except PKZip based their encryption on that particularly WinZip that had like 95% of market at the time. And yeah, so then 20 years later, somebody locked up their Bitcoin in a zip file that they made on their Linux box and forgot the password. So they came to me and said, hey, I found your paper. What's the current state of the art? Can you help me with this? And this is your first time talking at DEF CON, right? It is, yes. So we've given him fair warning, but there is a tradition for first time speakers of DEF CON. They get to take a shot with us on stage. QA session is the closest thing to a stage. So thank you, Mike, for providing DEF CON with some wonderful content. Cheers to you, man. Thank you for having me. Okay, so I was actually kind of surprised. So I have never thought about zip encryption as being something that would be difficult to get around. You did go into a couple of different types of encryption. I was also surprised that like, well, I wasn't surprised that early word was as difficult it was, but I later on, the 40-bit encryption that was just really difficult to brute force, that one kind of surprised me. Do you have any other, have you worked with any other type of encryptions that have been surprisingly difficult to get into for being such a legacy weird proprietary protocol? Let's see. There were a couple where they clearly knew enough to be dangerous, but then completely screwed it up. Like early word perfect, you know, the founder of the company had broken that one himself. And then when they released their new version saying, oh, now we're using strong crypto, nobody will be able to break this, went in and found that they took the password and then ran it through DES in the wrong way and got out some vector and then just X-word their file with it or something ridiculous like that. So they had DES, but they didn't use it right. It was so close. Just, just, just quite, quite it. There were ones like Microsoft Access 97, I think was one where they had RC4 encryption, but it was a fixed key. And so they would RC4 encrypt the file with this fixed key. And then you'd go to this offset in the file and look up the password. And it was just sitting there in plain text. Some of the details might be off. It's been 20 years. Yeah, fair. So we'll go ahead. Go ahead, Val. I want to ask a really, while we wait for people to come up with some really good technical questions to throw at you, I'm going to do one that, all right, so let's say that I don't know everything there is to know about encryption out there. Let's say I want you to do, I'm going to ask you to do a similar thing that you did in your talk. And I know that my password starts with a word and has some unknown thing after that. Are there things that I can provide you that I might know about the password that will help you get through this or does the encryption work in such a way that that doesn't work? A dictionary attack, right? The product you're using has strong crypto. The guys that built it knew what they were doing. Then pretty much a dictionary attack is the only option you've got left. And so there are specialized attack software that you can get, one of them is called Hashcat. It's built for running on GPU farms. That was what we were originally looking at is maybe writing a Hashcat module for this. But it's really designed for processing a key space. And so you can give it a dictionary, you can then say, take this and then do all alphanumeric strings up to length six after it or take this and try all different capitalizations, replace vowels with numbers, say an I goes to a one or an O to a zero and so on, E to a three. Whenever you do that sort of thing, you can, there are these rule sets that you can say, okay, Hashcat, this is what you're gonna start with and these are the rule sets that I want you to use when processing in it because this is the best in my memory, what the password looked like. On the other hand, if you're doing something like correct battery horse staple from XKCD, you've got too much entropy and that's really the way to protect your files if you're doing something is just make it longer, right? Because if you go from 26 characters, which is all lowercase letters to 97, you've roughly tripled, that's adding two bits per character to the entropy, right? So if you've got a length eight password, 26 to the eighth has all possible lowercase letters there, but if you go up to 97, all printable characters, that's only adding two bits per password, I'm sorry, two bits per character. So on a length eight password, that's, let's see, 26 is about five bits, that's 32 and two times eight is 16. So it's adding three characters to the length of your password, adding printable characters to a password of the same length, it's just adding a few more, but if you go and add a whole bunch more characters to your password, make it a long one, that'll make it really secure. And so- Use a passphrase instead of a short random string. Yeah, and so if you, even if you're using English words, right? If you make it a passphrase rather than password, that'll make it really vulnerable to a dictionary attack. There may be other attacks if the crypto's bad, but if the crypto's good, then it'll protect you. So just, this is entirely for my own curiosity. So after you broke through the zip file, you got the password that you could use to decrypt the zip file. No, we didn't recover the password. Oh, you did recover the password, okay. Yeah, so the way zip works is it derives a 96-bit key from the password. And it was the 96-bit key that we recovered. Now, if we wanted the password, we could take those 96 bits and then go launch a hashcat attack using dictionary and some other stuff that others have worked out to get a few of the initial characters. That's where it fits into the type of password cracking that many of us are familiar with, hashcat or John the Ripper. Okay, so if you've got the 96 bits, then there's something you can do with the dictionary attack that'll see whether the initialization process gives you those bits or not. That's great. Yeah, what I was gonna ask is if you got far enough to see if like a dictionary attack would actually work, like it's a zip file in less time than you spending all this time to brute force it. If you didn't... He suspected it was on the order of 20-something characters or more, so that would make it quite a while to brute force. With this technique that you went through, work for any encrypted zip file or... Yeah, yeah, this'll work on any zip file with... So my original attack back in the late 90s required five bytes, five files in the archive with the same password. This one we were able to get away with two because we also knew the timestamp. So if you've got the timestamp and you've got two files, then this will work on any of them. So how does the number of files affect the crackability of the zip file? When... Suppose you don't have the timestamp, okay? In InfoZip, it was meant to run on Unix machines as well as Windows machines. So in PKZip, they just allocated some memory and used whatever bytes were there, those random bytes. In InfoZip, on many Unix machines, it would initialize the bytes to zero, and so there would be no randomness there. So they'd used the process ID and the timestamp to get a little bit of entropy and then fed the XOR of those two into C's RAND function and generated a bunch of bytes, but they thought maybe that's too weak, right? There were some known plain text attacks and they're like, well, if they brute force the timestamp and the process ID, then they can derive the rest of these bytes. And so they took the password and encrypted those bytes once, and that's what they used as the random bytes when they encrypted that and the rest of the file. But when they encrypted it twice because of the way the zip cipher works, it produced the same stream byte twice at the beginning. So it encrypted it once and then it decrypted it for the first byte of each file. So when you say that, is it? Files in the archive, I have every 10th output of that C's RAND function and 40 bits were enough to figure out the 31-bit internal state of C's RAND function. So once I knew the internal state of C's RAND function, I could generate those first 10 bytes of each file. And then I would do a bunch of bit guesses. And because of the way the cipher was designed, not all 96 bits were used when producing each output byte of the stream. So I guess like 40-something bits upfront. And then because I had five files there and I knew what those bytes had to be, I could filter all of those bits. I could say, I've gotta know which of these 40-bit guesses are correct before moving on to the next stage. Got it. And so by having five files, I could both derive the internal state of C's RAND function and filter my guesses and finish one stage before moving on to the next one. And so it was a parallel divide and conquer attack. In this case, I only had two files. So even though I was making a 40-something bit guess, I only had two bytes to filter it with. So that meant two to the 24th wrong key guesses went to the next stage and I had to guess more. And so it just kept getting bigger and bigger and bigger up to two to the 60-something before I could start pairing it down at the other end. So just for clarification, it's resetting the stream cipher every single time it encrypts a separate file and that's why you're able to do this? Yes, yeah. So it starts over again with the same password, resets it to the original state and starts from there. Because you want to be able to extract a single file from the zip file without having to extract all of them. That makes sense. I think you just answered one of the questions we got with the new attack, is there an acceleration in having more files in the archive? Absolutely, I mean, in the original attack, the more files you had, the faster it went. And so this is just a refinement of the original attack. But certainly having more files means more bits to filter with and getting rid of false positives earlier. And sort of closely associated with that, do you know if this kind of attack works with the other encrypted archives like 7-Zip and RAR? Most archival software now uses 8-ES, like RAR5 switched to the ES256. So this isn't going to work against anything except zip files. Going for best standards, I like to see that. Even WinZip switched to 8-ES a while ago, so. Fair enough. We had another question. Do you know if your client was the legit owner of the Bitcoin? I can't be certain, but we looked him up online. We knew his real name and we looked him up online and found that he had reason to be owning Bitcoin. Didn't seem too shady. It wasn't someone reaching out across the dark web from. It was part of his employment that he would be dealing with Bitcoin. Fair enough. No, that makes sense. So this is really interesting. Do you expect with putting this out here and providing this talk, do you expect to get more of these requests to crack more things if you do get more of these requests? Do you have an answer pre-built of how you might respond? As far as breaking into Bitcoin wallets, yeah, when I first wrote this up on my blog, we got a whole bunch of requests. And for most of them, I had to say, nope, sorry, the best you can do is a dictionary attack. Many of them said, I bought Bitcoin with a credit card ages ago, but now I can't find my wallet. Can you help me? I've got my credit card records. Oh, no, we need a little more than that. The one that was most interesting was a guy who claimed that his hard drive had crashed that had Bitcoin on it. And so we were working with him to get some data recovery. But after a while, it became clear that he was perhaps schizophrenic or delusional, that he believed that someone was cheating him out of his Bitcoin and had stolen. Anyway, it was interesting. But if you have, there are about four situations where we could potentially recover software. One of them is if you printed out or wrote down the seed phrase for generating the 128 bit key, right? When you generate it, the wallet software always says, keep this in a secure place, right? And it's this 30 odd word phrase that'll generate the 128 bit key. So if you've got that, you can recover the key, you can recover all your Bitcoin. The next case is if you have had damage to your hard drive, right? If the hard drive crashes, then the data in the sector is probably okay. And even if the data in the sector is bad, we only need eight bytes to be okay. That has the encrypted key in it, right? So if we can recover that data, then we can probably recover your wallet. If you have the wallet software, you don't have the original phrase, but you know you used a weak password, then we can try and do the dictionary attack approach. Right. And do that. And then the least probable, there have been wallets with security flaws that make them susceptible to breaking more easily. And if you happened to use one of those back when they were being used, most of them have been fixed since then. But if you happen to use one that had a flaw, then we could try to exploit that flaw. Yeah, so this was an attack on a zip file, but you're talking directly about Bitcoin wallets. Do they also use some zip-like structure? Have you attacked the Bitcoin wallets themselves? So the Bitcoin wallet takes the key, the private key information that you sign your transactions with and a password and generates a symmetric key from the password and some salt and then encrypts the private key. So that private key is really what gets you access to the Bitcoin. What we can do is try to either recover the private key by means of that really long phrase, regenerate that same private key, or we can attack the password if you've got the wallet so that we can decrypt the private key that you had stored or we can attack some flaw in the Cypher where for instance, when they were coming up with the symmetric key they didn't use the entropy properly. And so there's a much smaller key space that we would have to brute force. There are very few of those that were out there but there were some. So there's possibility we could do that. So I like asking this question of people who know this technology really well, feel free to tell me no, you're not gonna answer this question, but do you yourself hold any value in any cryptocurrencies because you seem to understand how it works? I don't because I have, I mean, there's no inherent, when you pay taxes, you pay taxes in dollars because the government says you have to pay taxes in dollars. So there is this built-in necessity to own dollars at some point. There is no built-in necessity to own Bitcoin or any other cryptocurrency, right? There's, and for Bitcoin and Ethereum, I think that proof of work has shown itself to be susceptible to attacks like civil attacks, 51% attacks like Bitcoin Cash and Ethereum Classic have both suffered 51% attacks. They were rebuffed eventually, but if Google wanted to deploy their whole infrastructure, they could completely own Bitcoin. There are existing companies that could do that, not to mention nation-states, right? If the US wanted to take it down, they've got this thing in Tawila here in Utah that they could deploy against taking down Bitcoin. So my personal take, and we designed a system to do this, is that you need to use a consensus algorithm with true finality, that proof of stake and bandwidth. And then after a certain point, when you have enough witnesses, you say this block is finalizing, it can't ever change. Ethereum is trying to move in that direction with their proof of stake algorithms, but I don't want any. Yeah, I've heard a proof of stake, but the finality piece is new to me. You've definitely given me a few pieces already that I'm going to need to go Google, I'm sure now. So we got another question. It's sort of a meta question. One of the people that watched your talk had a little bit of struggle following your math. They understood all the aspects individually that you talked about, but zooming back out, they seem to lose pieces in their head. And they'd want to know, it's like, how do you juggle this? And are you aware that some people that follow your talk might have difficulty zooming in and out like that? Yeah, so I had some options when doing this talk. One was to go really deep and really hard on the technical stuff. Another one was to give enough background and the basic idea of how this attack played off and the challenges we faced. And so I chose to be less detailed for the sake of the story rather than go deep into it. If anyone has any technical questions, take them offline. I'll be happy to talk through them with you and point at lines of code and that sort of thing. That's great, that's awesome. Is there... As far as keeping it in my head, I would have to wake up and then come down and reload everything. I had stuff on whiteboards all over my office, pictures. It was a process, I would even have to remind myself about what was going on because I couldn't keep it all going at once. And it was a month's long process of trying to think through over and over again how things are going wrong and what I might be able to do to fix it. So if you don't get it from one short 45-minute talk, I certainly don't blame you. Makes sense. Did you discuss this at the end? I'm sorry, I missed this point. Did you actually get any compensation for this work? We did, yeah. So when he first talked to us, we said we'd like so much upfront. We estimate that the total cost will be about this much. We took longer than we said we would. We expected it to be done in three months. That was October, so November, December, January. It was April before we actually, late March, before we actually got the key back. But because of all of the extra crypt analytic work that I did, it took a 10th of the time on the hardware. So the hardware cost ended up being only roughly 10, 15 grand as opposed to the 100 grand that we thought it would take at the beginning. So it gave us a big bonus afterwards, which was nice. Yeah, I actually missed this. Another one of the speaker unions was mentioned that it was on AWS. And they wanna know if the 10 to 15 was about what you were expecting from compute cost. No, we were expecting it to take far more, right? The original estimate was around two to the 64th work, which is comparable to finding a collision in SHA-1, which is 100, sorry, SHA-1 was 160, Md5, it was a 64 bit thing. And there were some recent work where to find a collision they had to deploy an enormous amount of work to do it. I guess Md5 they were able to do because of flaws in the hash function. SHA-1 took roughly 100K of GPU time to break. And so we were estimating it would be comparable to do this. And this is what your company does, like data recovery or is it specific to- Not originally. Originally we were working on a distributed operating system we could get clients interested if we can get in the door but it was right at the time when cryptocurrency was taking off and we didn't have to talk to anybody to get an intro. We started doing some consulting work there, built up a team of about 20 scholar developers that were top of the cream the crop, top of their field. And then built the R chain cryptocurrency platform our chain started having some financial troubles so we allowed them to hire the devs early. We had a contract that we'd hold on to them for a while and then they could hire them after they'd worked for us for a year, but we let them hire them early. So they've taken over the dev team and then we started working on some other things and this particular consulting job came up at a nice time and was a whole lot of fun. So we took it. But right now we're looking for any interesting consulting work that people have. That was exactly what I wanted to segue into now that you've done this talk. What is, do you have another research item on your to-do list that you're trying to aim at? Sure, at the moment I'm doing some consulting work for the Ethereum Foundation. I've got some consensus algorithm research that I'm working on. We're working, we got access to GPT-3 so we're building an adventure game kind of like AI Dungeon but with more structure using GPT-3. We've got various ideas for voice assistants that will be able to call somebody at a restaurant rather than figure out every different restaurants online ordering system. You just have your assistant call them up and have a conversation. GPT-3 seems to be able to have conversations so maybe we can use that. Yeah, this is probably like a really small piece of what you're doing, but I used to be like incredibly into mods. So an adventure game that's generated by GPT-3 sounds interesting. Yeah, so we're working on the room generation. For quests we'll have things like you have to convince this character to give it to you. He's got desires and needs. So you'll have to be role playing while you're doing this game, interacting with these characters. There is a person that goes by the handle Evil Mog is running a DEF CON CTF mod right now. Just a shout out to him. We are really close to being out of time. There are a lot of questions over here about specific pieces, specific technologies. I think I'd like for people to bring those to you on a less moderated basis. So we'll let those go for now. Before I let you go, I want to know what is the thing that you would like us to take away from this if there is a final idea that we should walk away from your talk? That attacks on cryptography only get better. At the time MD-5 was proposed 128 bits for a 64-bit attack was inconceivable. And yet within five to 10 years, they were able to attack that one. And then Shah, there are attacks. AES with the by-click attack. They've now broken, I think seven or eight of the 10 rounds. If there is something that you need to keep secure, choose the best software and have a plan for upgrading your crypto and any product that you put crypto into. Because the attacks are going to get better. You'll need at some point to transition from the broken system to a new one. And so that will come up during the lifetime of your product. So be thinking about it. That's definitely good advice. Falville, you got any more questions you want to sneak in under the hood? No, I think I'm good. I really appreciate the work that you've done here. And thank you for coming to present and giving your time to do this Q&A session. There are some more people that have some more questions coming in if you're willing to do so. If you would put your contact information in the track one channel and Discord here, we will get that out there. Folks can look you up if you're willing to be available to that. I also recommend you put all of your company information if you're willing to do so because that's a good way for people to find you for those contracts you were talking about. Great, thank you very much. All right, take care.