 So here we are, David and I are here to talk to you about MS-Chap V2, Microsoft-Chap V2. You know, Microsoft, the designer of such illustrious protocols as NetBios and SIFS did not stop there. They also gave us an authentication protocol. And it has two major functions. The first is to provide mutual authentication and the second is to provide key agreement. So what that means is that a client can authenticate itself to a server as, you know, a client logging in to something, but that simultaneously the server authenticates itself to the client. So the client knows that the server is really who they're trying to connect to so that no man in the middle attack is possible. And then key agreement provides, you know, key material so that you can set up an encrypted session moving forward after the handshake is complete. Now this is sort of an archaic protocol. It was actually published in 1999, but really that was just an update to the older MS-Chap V1 which was done even before that. So this has been around for a while and it's aging, but it's also strangely pervasive. You see it in a lot of places, particularly with PPTP VPNs and WPA2 enterprise connections for the inter-authentication for WPA2 enterprise. And so, you know, we want to look at this protocol so that we can attack these two points. So the first question is like, shouldn't we all know better, you know? Shouldn't we all know that this isn't secure, MS-Chap V2, PPTP, these kind of things? The internet is full of articles like this, right, you know, PPTP authentication proven to be very susceptible to attack. The problem is that all of these articles are about dictionary attacks, right? The idea is that if you get a packet capture for a PPTP VPN connection or a WPA2 wireless inter-authentication capture, and you combine it with a tool like a sleep, you've got yourself an offline dictionary attack. And you can try and guess the user's password and if you're successful then you have access to their encrypted traffic as well as their login credentials. There's even a paper that was actually published by Bruce Schneier and Maj way back in 1999 where they look at this protocol and the conclusion of the paper is basically, you know, we looked at this and they fixed most of the problems that we found with MS-Chap V1, but the real problem here is that the security of the protocol is based on the user's password and we know that users don't choose good passwords so, you know, this protocol could be susceptible to that kind of attack. And so the internet is actually full of questions like these where, you know, people say, well, how secure is PPTP really? You know, is it just a matter of choosing a bad password? If I were to choose a good passphrase, is the protocol otherwise secure? And you know, usually there's a response like this one that says, yeah, that's the deal. That you know, it's a secure protocol if you can manage to choose a secure passphrase. And so, you know, I actually think that that's a reasonable conclusion, you know, given that the, given the information that is available. I have a friend who runs a kind of a high profile VPN service and he supports, he supported PPTP and I asked him, so, you know, why did you support PPTP? And he said, well, you know, we looked at the Bruce Schneier analysis and we concluded that if users, you know, it was just the problem with the passwords. And so we didn't let our users choose passwords. We had a mechanism where we would generate random strings that were the passwords and we felt that we were secure that way. And you know, I think that that's a reasonable conclusion given the information that's available. A lot of other people agree. I put together a list of VPN providers that currently support PPTP and MSChab v2. There's actually more than this. I just got tired of typing them in. So it's a lot of people. There's even some really high profile VPN providers like iPredator. This is the pirate base of VPN provider. And they only support PPTP for their authentication and encryption. And presumably, they're trying to protect their traffic from government level observation. You also see it in WPA2 enterprise setups. It's very common. For instance, the DEF CON wireless network is using MSChab v2 as its inter-authentication credentials. So you know, what's the deal? Why is this so popular? Well, I feel like there's this cycle where it's really widely supported. It's built into Windows XP. It's the only VPN authentication method supported in Ubuntu by default out of the box. And so it's what people use. It's really widely supported and so everybody starts using it. And since it's the thing that everybody is using, when people develop new products, that's what they support. And since it's what's really supported everywhere, that's what people end up using. And since that's what everyone's using, that's what people support when they develop new products. It's really like maddening cycle that's just impossible to escape from. Just because it's reasonable, I think, to presume that this is an otherwise secure protocol doesn't mean that it's correct. So let's take a second and look at the internals of how this works. MSChab v2. So this is the MSChab v2 handshake. You look at it and it's just like, you feel like they're almost trying to dazzle you into submission. It's like the digital version of hand weaving. If we just hash it again, then crypt analysts will look at this and just wither in the hand. So let's run through it real fast. All right. Client sense, hello. Server since bike. Sins back a 16 by random challenge. The client generates its own 16 by random challenge. Calculates sha1 of the service challenge, the client's challenge, and the user name. Calculates the NT hash of the user's password and then calculates this thing, the challenge response by encrypting the challenge hash three different times with three different desks, which are different sections of the NT hash of the user's password. Sends back the 24 by challenge response, the 16 by challenge hash, and the user name in the clear to the server, who then calculates the MD4 of the MD4 of the user's password. Get that? The password hash, hash. And then calculates sha1 of the NT hash, hash, the challenge response, and the literal string, magic server to client signing constant. Then calculates another sha1 hash of the previous digest, the challenge hash, and the literal string, pad to make it do more than one iteration. And then sends that back to the client. You get the feeling that maybe the designers didn't know that this was going to be public one day. So what's interesting is that if you really look at this, you know, you get through all the dazzle, you realize that there's actually only one unknown in this entire protocol, which is the MD4 hash of the user's password, the NT hash is the only unknown thing here. Everything else, and that one unknown is used as the three desks for the encryption of the challenge hash. The one unknown is used in the desks that used to encrypt that thing. Everything else is either sent in the clear or can be derived from something sent in the clear, which means that we can sort of just rip off all this other stuff and only focus on this one core problem here, this one unknown that we're dealing with, right? And so this is where people usually plug in their dictionary attack, right? That they do a packet capture, and then they just start doing MD4 hashes of random words and then using that as the three desks of the known plain text and seeing if it matches the known cypher text. And if you get a match, then you found the user's password, right? But I don't think we should be satisfied with that, right? We want 100% success. We don't want to just try and guess a user's password, because as in the case of the VPN provider that choose random passwords on behalf of the user, that would not be feasible. So let's look at these DES encryptions here, right? This is really all that's stopping us. Now, as a refresher, a DES key is normally eight bytes long, so that's 64 bits of key material, and it should give you a total key space of 2 to the 64. That's a really big number, a lot of keys, right? Somewhere along the way, someone made the dubious decision to turn every eighth bit of a DES key into a parity bit, which means that they're not actually used as part of the key material, which means that a DES key is actually seven bytes long that only gives you 56 bits of key material for a total key space of 2 to the 56, which is substantially smaller. So when we look at our core problem here, we have these three DES encryptions. So that's triple DES, right? The way that triple DES works is, triple DES, back when people were starting to lose confidence in DES, they would use triple DES in order to secure their communication. The way that works is you have this nested construction where you start in the middle, and you encrypt the, you does encrypt the plain text with one key. And then you use the output of that, the ciphertext from the middle encryption as part of a DES operation with a second key, and you use the output of that as part of a third DES operation with a third key. So when you do that, you get this multiplicative complexity, right? Where your key space jumps from 2 to the 56 to 2 to the 56 times 2 to the 56 times 2 to the 56 for a total complex, the total key space of 2 to the 168, which is enormous, right? And in practice, there were some attacks that you could do to reduce that to 2 to the 112th, but that's still quite big and certainly sufficient. So when we go back and we look at our core problem though, there's no nesting here. This isn't a nested construction. One DES operation has no effect on any other DES operation. They're actually totally independent of each other. You know, it's just encrypting the same plaintext three different times with three different keys. So that is not triple DES. That actually gives you an additive complexity, right? Where your key space is 2 to the 56 plus 2 to the 56 plus 2 to the 56, which is 2 to the 57th 0.59. So that's basically the number that we're dealing with here, and that's still kind of a big number. Now, really, we're doing these three DES operations, and so we need three DES keys. Each key is seven bytes long. And so that's a total key length of 21 bytes. So we need 21 bytes of key material. But remember, what we're using for the keys is the MD4 hash of the user's password. Now, an MD4 hash is only 16 bytes long, which means that we have 16 bytes of something, of actual material, and we need 21. So what does Microsoft do? They just pat out the last five bytes with zeros, which means that this third key is only two bytes long. It's a 16-bit key that you can reforce that on your laptop in under five seconds. So if we go back and we look at our total complexity, we reduced the additive complexity to 2 to the 56 plus 2 to the 56 by basically eliminating the third key for a total number of 2 to the 57th. So now this is what we're dealing with. That's still, you know, a pretty big number. So if we go back and we look at the core problem, we've gotten rid of this, you know, effectively this last DES encryption. So we're dealing with these two DES encryptions here. And, you know, something that's sort of interesting about this is that the plain text is the same in both cases, that for both DES encryptions, we're encrypting the same thing. So if you think about how you would implement a naive sort of brute force attack, what you would do is you would iterate through the entire key space and for every key in the key space, you would do a DES encryption of your known plain text and you would see if it matches your first known cypher text. And if it does, then you know you found the first key. Then you would start over again and iterate through the entire key space again and do a DES encryption of your, for each key in the key space, do a DES encryption of your known plain text and see if it matches your second known cypher text. Now the expensive part of this whole brute force operation is these DES operations. That's what costs something and makes iterating through the key space expensive. But since it's the same plain text both times, you can actually reduce this into a single loop where you iterate through the key space once, do a DES operation for every key in the key space and just do two compares. You compare the output of your DES operation with your first cypher text and your second cypher text. Effectively, reducing the total complexity here just down to two to the 56th. So if we have an MSChap E2 handshake, we can reduce the entire security of this handshake down to a single DES encryption. So at this point, you know, we thought about it some more and we were like, all right, can we do, you know, some tricks, whatever with them before? Is there anything we can do? At some point, we were just like, well, fuck it. Let's just call David Holton who runs a company called Pico Computing and know stuff about brute forcing keys. So a while ago, we started looking at DES because part of my job at Pico is trying to find archaic algorithms that we can actually attack with FPGAs that would normally be out of the reach of normal computers. So just looking at DES, pulled these from the Wikipedia page, it's a Feistol network cipher. And basically, back in the day, this was originally developed to run in hardware. You know, this was developed to run in ASICs and so they really developed it as a flowchart. There's lots of bit permutations and lookup tables and things like that. And so if you zoom into this F function and just look at, you know, we have like a, this E box is a permutation, S boxes are all just lookup tables and P box is a permutation. If you look at the P box and zoom in a little bit, it's basically just transposing bits inside of, you know, input bits for output bits. And so if you're trying to do this on a normal computer, like this is a really horrible implementation of a P box in software where basically, you know, you have a for loop and then it looks it up in a table and like moves bits around and stuff like that. And you know, this is using 32 or 64 bit operations to move bits around, like it's horribly inefficient. And sure there's lots of optimizations that people use nowadays with bit slicing and things like that, but it's really, you know, trying to do really, really simple operations that should be extremely simple on, you know, something that's made to do all sorts of general purpose operations. Now if you take this problem and look at it, actually implement it on ASIC, you know, it's essentially free, it's just routing lines. You're just moving bits from one place to another. And so, you know, why are you using, you know, billions of gates in order to perform these operations? So one analogy that I like to use for this is that, you know, doing it in software is kind of like an octopus riding a tricycle versus, you know, a Ferrari or something like that. So, so I think in this case the winner is definitely the ASIC. People have realized this for a long time. Back in 1999, the EFF built Deep Crack, which was kind of their, they were trying to basically prove that anybody for, you know, a small amount of money, in this case, it cost them about a quarter million dollars, would be able to break des in a pretty short amount of time. In this case, it was about nine days' worst case with about 1,800 chips total that they made. They fabbed themselves. And so, you know, sure if you have a quarter million dollars you wanna spend on this and have a whole team of people that can design the chip and send it out to a fab and all that sort of stuff, that's all fine and great. But for other people, there's this new technology called FPGAs that were invented in the mid-80s. And up until recently they haven't really found many applications in the general sort of supercomputing space. But recently, they're finding their ways into all sorts of different areas. Here's just a few that I pulled off of the Wikipedia page. Digital signal processing, they're used in like base stations, like most GSM base stations have some sort of FPGAs in it, so they're software upgradeable in the field. And software defined radio, because with an FPGA you can basically implement your own ASIC and then program it to the FPGA and update it however many times you want. People just love this, you know, with like the USRP and new radio project and stuff like that. And so all sorts of other areas. And the one area that I focus on is cryptography. And so kind of the general idea is designing your own chip and making it specifically for breaking a certain algorithm and then programming an FPGA with it. And then you basically have your own custom crypto accelerator that's specifically made for breaking an algorithm. And then it's kind of the same thing that works for all these different areas of supercomputing. So looking inside an FPGA, the bare bones of it is that you have a bunch of lookup tables which can basically describe any sort of arbitrary logic. And so there's your gates basically. And then all you really need is logic and registers, some sort of sense of time that can actually store your data. And then you just need to be able to connect everything together. So they provide lets, registers, routing, and then they provide a few other things just to make things easier like a small block RAM which is essentially kind of like small pieces of RAM inside the FPGA just because people like to use FIFOs and all sorts of other general purpose storage elements and it makes it a lot more efficient to do it that way instead of using up your lets. There's also DSP blocks that implement like adders and multipliers and things like that. And DCMs for multiplying clock frequencies and et cetera. But the general idea is anything you can describe in a circuit as an ASIC you can program onto an FPGA and do it programmatically with software instead of having to actually fab a chip. So on an FPGA, you probably can't see this too well but basically describing that S box you're just making a sign statement and saying the output of P equals these bits from S. And so there's no shifting involved. It's all, you can describe it at the bit level or the byte level or whatever level you want because it's basically just rerouting signals. So you can kind of see how this can make DES a lot more efficient to run on a specialized chip. And so we went ahead and a while ago we implemented DES just to see how fast we could get it on FPGAs. And this is a picture of one of our servers down in the bottom here. It's just a 4U rack mount server. Draws about two kilowatts. And we went ahead and implemented DES as a real pipeline. So on an FPGA you can actually, each one of those stages of DES you can actually implement and then have just registers in between them. So as you flow data in it just gets clocked along through the whole thing. And once the pipeline is full you get an actual DES operation every single clock cycle that you're clocking this. And we can clock this up to, we've clocked it up to 600 megahertz reliably. You know, we're maybe doing 450. And so that means for each one of these cores we're doing 450 million DES operations per second. And each one of the FPGAs can fit 40 cores in it. So that equals about 18 billion keys per second. Then we can fit 48 FPGAs in the server. And so that basically means that we can go through the fold to the 56 key space in under 24 hours. And that's worst case time. So in practice it'll probably be half a day. So if you look at equivalent performance we just pulled up some benchmarks on CPUs and GPUs. On CPUs to do it in under 24 hours it would take about 80,000 CPU cores. Or on GPUs about 1800 of them. So just for an idea of scale, 4U machine versus data center full of GPUs or CPUs. So the next question that we had is can we make this faster on an FPGA? And so we started looking at this trying to see if there's any way that we can speed things up a little bit. And so some of the things with implementing this on an FPGA is that we have certain data that needs to get set in order to set up everything. And so if we're splitting this up across many, many cards we have to take the whole key space and then split it up into small little sections that each FPGA can crunch on. So we have like a key start and stop sort of registers set up to tell it where to start for each one of the cores. And then we also have to provide the plain text and cipher text that we're looking for. And so one optimization that we worked on is basically instead of having this whole bus going to each core so we set up all these values every time we start up the core, we're thinking well we could actually take this bit file that we actually load onto the FPGA with the configuration and pre-configure it with these start, stop, cipher text, plain text values. And so we program the FPGA, it comes up and then it sees all these values there and it's like oh I know exactly where to start it goes off and starts crunching on things. And so we don't have to have all those resources dedicated to just setting everything up. And so it turns out that Xilinx actually, which is one of the FPGA manufacturers, provides a method for programming specific block rams inside the bit file without having to rebuild the whole thing. And so we play around with this and we're able to get that to work. And basically how it works is you set up a couple of files and then you can just essentially just run a command and it'll drop memory into the bit file and configure these brams and then you program it and everything comes up perfectly good. So the second optimization is we still need to have a return path saying when it's found a key or whatever and so we created a really minimal bus that basically just says it found a key and that's it. And then based on the time that it took to find the key then because this is all deterministic we're getting one key per clock cycle. We know with fairly good accuracy exactly where in the key space it was when it found that key. And so with these sorts of things we can reduce quite a bit of the logic and get it down to be a lot more optimal. And so this got us about another 20%. And so we're thinking what else? And this is kind of a pipe dream that we've been thinking about for a while which we haven't implemented yet but we're planning on releasing a framework sometime in the near future for it. So FPGAs have this thing called JTAG which if you've done any sort of hardware hacking you're probably familiar with it where it's a debug interface that most chips have. And on an FPGA this is an interface you can use for programming it. They also have all sorts of things built in for doing on-chip debugging where you can basically put essentially a logic analyzer inside the chip and then tap lines and things like that. And so we found that there's actually a couple of commands they have with JTAG for letting you read and write to BlockRAM entirely over JTAG. And so with this, if we do all of our communication with just these little tiny BlockRAMs inside the FPGA we don't have to have a bus at all communicating to the outside world. All we have to have is JTAG going to the device and we don't have to use any routing resources at all to talk to any of the cores. We can do it all transparently. So this is one thing that we're working on and hopefully this will be working soon but that could probably get us another 20, 30% speed improvement as well. But anyway, regardless with all of this it's no fun to crack MSChap v2 if only we can do it. So all these talks that I give it's always like oh yeah, you could do this if you get your own FPGA and set it all up and spend a bunch of money on this but we decided why not let you guys do it? Hello? So this is one of these things where it's like David's clearly a genius and he can crack all this stuff. If there was only some way that we could leverage his genius so that everyone could benefit. And so we put something together where I wrote this tool called Chap Crack and it's a simple tool that you can point at any packet capture and it will parse the entire packet capture and pull out any MSChap v2 handshakes that exist in the whole capture. And it will give you the things that we're interested in here like the known plain text and the known cipher text for that core problem that we're dealing with. And it'll even crack that third key for you, the two byte key. And it tells you the username of the person who's logging in in this case because that's of course end in the clear with MSChap v2. And then it gives you this last little thing here, a submission token. So what is that? So right now we have, right here we have everything that we need from both parts of this thing except for the key with the exception of the third key. And so I have this online service called Cloud Cracker and it's basically an online password cracker. We support a few different formats and you can submit a hash of a WPA handshake or Cripshaw 512 or something like that. And we spin it out across the cloud, run a cracking job and mail you your results. And so what we've done is added a MSChap v2 option to Cloud Cracker. And so if you have any kind of MSChap v2 handshake, you can run a chap crack over the handshake, get that Cloud Cracker submission token and paste it into the website here. Once you do that, you just submit the job with your email address and David has been kind enough to put his FPGA cracking magic online and it transparently sends that over there, runs the job and you get your results back in less than a day. Which means that anybody should now have the ability to crack MSChap v2. We're hoping that by doing this, we can break out of this fucking cycle of support and use. But in the meantime, have some fun with it. So thanks for listening. And let's all give a round of applause for Marsh Ray, who couldn't be here today but was instrumental in a lot of this stuff as well.