 Thank you very much. I guess that's a pretty good reason not to go to Turecon. OK, so quite a lot of people were over in the other room for my other presentation, right? Yeah? OK, cool. So yeah, this is my presentation. A little background about everything that happened with other thing was I checked on the DefCon website last week to see when I was speaking. And as you can see on the slide over there, like going along the left-hand side, it says July 30th. And that's when I thought that I was going to speak. But I guess they moved me to July 29th or something like that. And so I thought I was speaking tomorrow, and I thought that I had an extra 24 hours to prepare everything. But I didn't. And so in the middle of coding all this stuff together, I got a call from one of our friends saying that I was supposed to be speaking right then, earlier today. And we were over at Caesar's Palace kind of far away. And I don't know if you saw this big box over here, but it weighs about 100 pounds. And it carries all this crap. So it was a little difficult to get over here. And then once we got over here, they just decided to put some other people on and move me somewhere else. But this is my talk. I'm sorry that I didn't show up for the other one if you guys waited in line or anything like that. If you did wait in line and you're angry at me, go and have some more beer. I think we have some more left here. You have to fight George first, I think. OK, so yeah, if you want to fight George, feel free to. The winner gets free beer. OK, so this is my talk. I'm going to be talking about basically cracking crypto on FPGAs and some different things that people are doing. And just a few, if you don't know anything about me, I started this thing called DocBoden Labs. Made BSDR tools. I did some smart card hacking stuff a while back. I run Torcon. George helps out with Torcon. There's a bunch of other people here that help out with Torcon, too. But it's in San Diego. You already told you the info, so I also, I'm starting this company called Pico Computing. We've got our huge logo on the side of this thing. We make FPGA cards that are really expensive. If you have a lot of money, please buy our cards. If you don't, then buy somebody else's. Our disclaimer, educational purpose is only all that. Crap. So I'm going to be covering an introduction to FPGAs. First of all, who here knows what an FPGA is? Yeah, oh, wow. OK, awesome, awesome. I can just fast forward through that then. Some gate logic crap. Cracking with hardware. I'm just going to cover some of the stuff that other people have done in regards to cracking stuff on FPGAs. Cover some optimizations that you can use to speed up your code and hardware. And introduce this program called Chipper, which we just named a couple of days ago and just started coding a couple of days ago. And it does Landman and NTLM cracking for Windows. It does it about 100 times faster than a PC on one of our tiny little cards that's a compact flash card size. I have to safely remove this thing. Otherwise, our driver stops working. There's some bugs in it. OK, so PCMCIA card. This is our card that we make. And it's got a FPGA on it that cracks stuff 100 times faster than a PC. And yeah, it's just a compact flash card. I'll cover more of the details later. But I'll give a demo off what this box does and give you some performance information. So first of all, an FPGA lets you prototype integrated circuits. And basically, you write code. It translates directly into gate logic and stuff that runs on the FPGA. And yeah, there's the basic logic, Boolean sort of stuff that you can do with gates and stuff. And you can implement like adders and all sorts of other more complex sort of gates using just really simple basic components. And you can chain them together and do all sorts of crazy stuff. That's basically what electronics is based off of just using small little bitwise operations and chaining them together to do more complex stuff. And you can also use gates to store values if you want to make memory most like flash memory is based off of a bunch of gates that just store values. And yeah, so it can be implemented with electronics pretty simply. And so let's see here. Gates can be configured arbitrarily. You just have a bunch of different gates on there. You can say, I want to connect this one with this one. I want to connect this to this other component that I have. And so instead of like designing your whole thing and burning it to a chip or having a bunch of small little gates that you connect together with wires, it basically connects all the wires for you and has some basic components that you can use. So with an FPGA you can create almost any type of logic. You're just bound by the size of the device. And an FPGA just looks kind of like this. You have a bunch of CLBs that have registers and basic logic routing stuff. Those are those little, I'm not sure if you can see it, but the red areas are composed of a bunch of little small cells. Then there's input output blocks, block RAM, digital clock managers. You can also get FPGAs with built-in power PCs or built-in processors if you want to actually have a real processor. And there's a programmable routing matrix that allows you to connect everything together. So the pros of using hardware is you can do massively parallel stuff. You have a bunch of gates that are pretty much active all the time. You don't have to worry about just running one instruction at a time. You can have a bunch of things running at the same time. And you can pipeline stuff, which is basically kind of the same thing as parallelizing. And yeah, so the other nice thing about FPGAs is that with an ASIC you just burn all the gates to a chip and you're done with it. With an FPGA you can reconfigure it a bunch of times. And it makes it easier to prototype stuff. You can also have them reconfigure themselves and things like that. The cons are that you have size constraints and limitations, and it's a little more difficult to code and debug than with software. So there's some common applications for FPGAs. I'm just going to be focusing on encryption and decryption. But different types of FPGAs. There's anti-fuse ones where you just kind of tell it how the gates should be configured. And it just stays that way forever. You can't reprogram them. There's flash ones, which you can program and they'll keep their state even when you power them off and stuff. And then there's SRAM ones where you have to configure them every time they start up. So you basically have a port and you feed a bunch of configuration stuff into it. And it configures itself that way. And SRAM is the most common technology. It scales better for the larger FPGAs. The only problem is that it requires a loader. So there's a few different FPGA manufacturers. We use Xilinx FPGAs. There's also Altera, which I think is a lot bigger with academic places. Does anybody here use Xilinx or Altera? Xilinx? Anybody? Xilinx? Altera? Anybody? One? Two? OK. So yeah, the main difference is they kind of have the same sort of chips. Xilinx is a lot bigger company and a lot more commercial. And you can also get them with an optional power PC that'll actually run up to like 300, 400 megahertz as opposed to a soft core, which only goes to like one or 150 or something like that. So how do you program these things? There's a programming language called Verilog. You can also use VHDL, but most of my stuff is in Verilog. And it has a simple C-like syntax. And it's really simple. It's really easy to learn, but difficult to master. So a basic code comparison between the two. This is kind of stupid. But if you want to do an or with C, then you just kind of, or an and. And then with Verilog, it's kind of the same thing where you assign and, I don't know, hey, stop laughing. And so if you want to do like an 8-bit hand, then it's kind of the same thing. You can have data buses in Verilog where you can specify how many bits you want and variable to be. And then just do operations on them. And you're not limited by like 8 bits or 16 bits or 32 bits or whatever. And then you basically just have flip-flops for storage if you want to save state or basically have something that gets configured a certain way. And then on every trigger, you can create a flip-flop. And really all you have is wires and flip-flops and other sorts of components that are actually on the device. And you can do a lot with just that. So I'm going to cover history of FPGAs and cryptography. Just to start this out, there are a bunch of these guys that wrote this paper a while back on minimal key lengths and symmetric ciphers. There's Schumer Moore on there. There's Rivest, Schneier, all those people. And they did some research to try and figure out the best key lengths that people should be using that would be protecting against the common attacker, like Kidd in his basement, major corporations, or NSA. And so this is basically what they came to the conclusion of. Basically, Kidd in his basement could take a tiny FPGA $400 and crack 40 bits in about five hours or 56 bits in 38 years. And then at the very bottom, you have an intelligence agency with a budget of about $300 million. They can crack 40 bits in pretty much real time and 56 bits in 12 seconds. And this is what they found with their research. And this was published in 1996. And so it's kind of scary that most likely the government can crack like 56-bit encryption, blah, blah, blah. But everybody's using 128, 256. And that's way much more difficult than this. So it's not saying much, but it's still a little bit scary. So the EFF in 1998 built a desk cracker that could crack. Well, it ended up cracking desks in three days and ended up winning like the RC4 challenge or whatever, or sorry, the desk distributed on that challenge. And they claimed that they did about 9 billion keys per second they could search. And it cost them a little less than $250,000, including engineering and all that stuff. Has anybody heard of that? EFF desk cracker. There's these other guys that hacked ATM machines using an FPGA, just because they need to crack a little bit faster. These guys are really interesting. They actually implemented a linear crypt analysis attack on desks using FPGAs. Basically, you need to compute a big dictionary and then do lookups. And it's a chosen plaintext attack, so it isn't really that crazy. If you had some sort of device you were trying to figure out the key in it, then you could do this sort of attack. But if you're just trying to decrypt a document, it's not going to help you that much. But they're able to recover a key in 10 seconds with 72% success rate, as opposed to about five months, I think, or five years, a really long time. Then there's a bunch of other people implementing stuff and cracking like RC4 and all sorts of crazy stuff. So there's quite a few people out there that are doing this stuff. And basically, the way that you can exploit FPGAs to get really crazy performance out of them is on a PC, you do a simple for loop or something like that. And it takes a certain amount of time. You have to do it sequentially. And in hardware, oh, I lost my end. Well, basically in hardware you can do it all in parallel. And so instead of having a for loop where you have to go through each iteration and compute whatever, you can do it all in parallel. Hope I didn't lose everything else. Oh, damn. OK, just use your imaginations. So yeah, then there's a pipeline example. There is a pipeline there. You just can't see it. It's down there. There's like. But basically, if you're trying to perform a bunch of different operations, you can have a, OK, that kind of sucks. But basically, you have information that flows in on one end. And you have a register that stores a value for that state where it's like some value plus one. And then if you need to subtract another number from it, then you do the subtraction stage in the next stage. And then this has that value plus one minus whatever. And then over here, you have the next value that's fed in. And so you can basically have all these parallel operations going, well, you're performing all of them sequentially. But essentially, the speed of the whole pipeline is equivalent to the fastest or slowest operation in the pipeline. So if your addition somewhere is like the slowest thing, that's as slow as the pipeline is going to be, plus how many clock cycles it takes to go through the whole thing. But yeah, that's a really common way of optimizing things. But it's basically just paralyzing things. So another nice thing is with self-reconfiguration, you kind of get an advantage over A6 where, oh my God, it lost everything. Well, essentially, if you need to say multiply some arrays, you can load on a core that multiplies arrays. And then you can switch it out with your RC4 core and RC4 it and then switch it out with MD5 and MD5 it. And you don't have to have three different chips or one huge chip. You can just reconfigure it with specific parts of whatever you need to do. And there's some special components that you can use with Vertex4s. Anybody heard of the DSP48 slice? One person. It's a new thing with the Vertex4s. Yeah? Sweet. You can do 18 by 18 multipliers, 48 bit by 48 bit adders. And it does 18 by 18 multipliers at 500 megahertz, which is pretty decent. It's not that great compared to a PC, but you have 48 of them, so you can do 48 18 by 18 multipliers in parallel at 500 megahertz. So that's pretty decent. There's also block RAM that you can use for storage. And they're pretty fast. They have 5-4 support. If you just need a 5-4, that's all built in. And they have these sort of components that people commonly use. They commonly use 5-4s and adders, multipliers. And so they just build them in as hard components. And so you can utilize them, and they're extremely optimized for whatever they need to do. Another really interesting thing is that if you get a Vertex-4 with a PowerPC, they have this new thing called the APU, where basically you can create your own instructions on the PowerPC and have it execute your verilog code or your hardware code. So you can essentially make a DES instruction. And you just run this instruction, and it runs DES on whatever it needs. Or there's lots of really cool things you can do with it. And basically your code has access to all the registers, and so it can just read from whatever needs right to whatever registers, and then it's done in however many clock cycles it takes to compute it. But it just makes it a lot faster than having to deal with drivers and a bus and stuff like that. OK, now on to the fun stuff. Chipper is what we call this thing. It was originally PicoKrak, but our boss didn't like PicoKrak because it had a drug name in it or something. And so we thought that we'd try and start a new term for cracking on FPGAs called chipping. So this thing is called chipper, and currently supports UNIX DES, Windows Landman, Windows NTLM, and multiple FPGAs and cards. And Clipper? Yeah, yeah, yeah, there you go. It kind of has a ring to it. So if you're going to make a project like this, use our terminology so we can get popular, OK? By the way, with the multiple cards and FPGAs thing, it was supposed to have 10 of our cards in it. You can kind of see them down here at the bottom. But three of them didn't work. They gave me 12 of them, and three of them didn't work, so I only have nine in there. But yeah, Landman hashes. Everybody knows what Landman is? Yeah, kind of, basically. It's really shitty crypto for passwords. So you have a password that's like 14 characters long. It splits it in half, encrypts them separately, and then appends the two hashes together. So you can basically crack them separately, and then only have to search seven characters instead of 14. And hardware design is basically, the whole cracking engine is done on the card. There's a key generator that generates passwords and then feeds them to the desk operation. And that's basically what it does, generates a bunch of keys, encrypts them, and compares them. And you can specify how many bits you want to search, so if you want to split it up onto like four different cards, you can crack it four times faster by just splitting up the key space. And you can specify if you want to search typeable or just printable characters, or alpha numeric, and basic stuff like that. And we threw together the software interface. I'm going to demo the Windows version. If we had an extra day, we would have it running on Linux with all the cards. But yeah, if you guys want to see it some other time, I can show you. And by the way, thanks to Rackney. She built most of the GUI code and a lot of the architecture of the software for this. So round of applause, round of applause. Somebody needs to beat them up later. So we use WX widgets, which supports Windows, Linux, everything. So once we get this all built, it'll run on pretty much every platform and it'll have a nice GUI for it. Right now it supports cracking 128 keys in parallel on each card, or 128 hashes. And it supports a faster mode. If you just want to crack one password, you can do it four times faster, because we just do one compare and then put on four cores. And it can automatically load required FPG image. So depending on what you're doing, it'll automatically load the right card image and stuff. And it supports multiple card clusters, which will be demoing. So here's some speed information. Right now on a PC, this is probably pretty low compared to some people that write optimized stuff. But has anybody used any sort of NTLM cracking? What sort of performance do you get? Like 3 million crypts per second? Rainbow crack? I always use a rainbow table. OK, yeah, I was running rainbow crack and it was generating my hash table at about 2 million keys per second. I think that you can probably do maybe up to like six or eight or something like that. If you have something really optimized, maybe 10. And that's on modern P4 or AMD64 or whatever. In hardware, right now, each card will do 125 million per second. And if you just want to crack one password, it'll do 500 million. And the main problem is that we can clock it a lot faster. But we're running into heat issues and power issues. And so we're going to be experimenting with liquid nitrogen cooling and other sorts of cooling to just cool it down. Because in theory, we should be able to get it up to like 1 million keys per second. I mean 1 billion. So it should be a lot faster whenever we get that running. There's some basic information. On a P4, you can crack if you just have a 64 character table. Because that'd be like Alpha Numeric and then like 32 symbols or something like that. You can do it in about 25 days on a P4. On one of our cards, it'll do it in about two hours. And if you have eight of them in about 18 minutes or so. And let's see, 48 characters. Yeah, you can just read that. But basically, if you're just doing Alpha Numeric, you can get it down to about nine seconds on eight cards. And I'm just going to show the baseline sort of attack. It'll take about four minutes for going through, let's see, 128 times 9. About 1,000 passwords, it'll be cracking in parallel. So yeah, this is our card. It has a Vertex 4 on it. You can get it with a PowerPC processor. It has RAM, ROM, Gigabit, Ethernet, and all that on a little thing like this. It's basically a full-on computer if you get it with a processor. OK, here's the cool part. This is going to load the bling-bling image here. So now it's running that image. This is our little program. It's still called pico-crack, the line of being chipper. So I just loaded in this PWDump file. And it's already starting to crack them. I'll show you what this file looks like. So this is a standard PWDump file. And it's got some fake accounts in there. But yeah, it's cracking 64 in parallel right now on just this card in my laptop. And it's going at 125 million keys per second. So it's a little bit faster than if I just use my laptop. And then, so we can just let this run. And I'll fire up the next demo. Wow, we got plenty of time here. So all I can demo here is just our command line version. But the graphical version will be pretty close to being done. Oh, you got to hear this. This is funny. I'm guessing that people use PCMCA ACS on their laptops. And they hear that dinging thing. I've got 10 PCMCA sockets in here. So that's a PWDump file of 4,000 passwords. And this will just start scrolling a ton of stuff. So there we go. Over on the right is a count of how many passwords it's cracked so far. And this is the slow mode. It's just doing 125 million per second on each card. And it's cracking 128 in parallel for each card. There's a faster mode where I can do basically like 500 million times 10. So what's that? Like 5 billion keys per second or something like that for just one? But this looks a lot cooler. So yeah, this will be running for a little bit. Whoa. Yeah. Well, with Landman, it splits it up into two seven-character chunks. And so this is actually, oh wow, that's crazy. Yeah, yeah, they're 14 characters and no special characters. It's just alphanumeric with 64 characters and it would take a couple hours or something like that. But this will take about three minutes. It's basically taking all the hashes from this file and then slowly filling them in. And then as they're getting cracked, it's replacing them and continuing on. So this thing will run forever until the 4,000 are cracked. It'll go through the whole key space in about three or four minutes for this. And if you're doing 64-bit, then it would be a bit longer, or the 64-character search thing. So yeah, it isn't like, yeah, you don't have the crazy stuff, but it would just take a little bit longer. Oh, OK. Yeah, I'm not, I know I don't have support for this because we're still in the middle of building everything. And we just kind of hacked this thing together in like last hour. But I might be able to, I'm not sure if I have the right image on my laptop, but I might be able to show you something. Oh, each of our cards? With all the software, they're $2,800 each. And if you are like a developer or you don't need all the software, then probably about like $1,500 to $1,000, something like that, you can buy development boards that have the same chip on them for about $400 maybe. And they don't really come with any extra stuff, but they're cheap, and you can build a huge array of them if you want to, and then a lot bigger. Yeah, yeah, yeah, something like that. And if you buy more than like one, then the price usually drops. I don't know. It really depends on the sales we need, so. Oh yeah, so here, all these are cracked, I think. There we go. So yeah, you see, they're all 14 characters long. They aren't really that extraordinary as far as the actual characters, but it's also upper and lower case. Because Landman makes everything uppercase when you type in your passwords, so it's case insensitive. Well, Linux is a lot more difficult, because with Desk, they run it through like 25 rounds at least. And so you have to run Desk 25 times. Most of the time, it'll run even more if you're doing like MD5, MD5 runs pretty slow. So it's a lot slower to crack the Linux stuff. I just went on this stuff, because it's a lot easier. On that, I've thrown together some. I don't have it with me, though. If you email me, I can get you some more information about that. With NTLM, it's pretty comparable to this. The only difference is that the passwords are a lot bigger, so if you want to crack a 16 character password, it'll take a long time. And then it's also case sensitive. You've got the uppercase and the lower case you have to read out. Let me see if I have this right in the chair. What's that? DES? Yeah, like for just cracking, it's great desk. No, I don't have any of that, but Landman is pretty much just straight desk. So the performance would be the same. What's that? It's about the same as this. It takes up a little more space, so in the fast mode, I can only fit three cores on there instead of four, but it's pretty much the same specs. But what's that? Yeah, right now it's mainly the bus that's the limitation for generating rainbow tables, because if you're generating like 500 million hashes per second, just getting that off the card is pretty difficult. We've been talking with the Shmugais, with the rainbow cracking stuff they've been doing, and I don't know, we've been exploring different possibilities with that, but they're really interested in trying to use something like this to generate them faster. It seems like our best bet would be using the gigabit ethernet to get it off the card, but still, then you have to worry about finding hard drives or RAM that can cache it that fast and stuff. I don't know, it's one thing that we're working on. What's that? Yeah, yeah. There you go. Our cards? They're all PCMCA compact flash, basically. But then there's also an ethernet dongle at the back. No, no, no. I've got 10 PCMCA slots in here. Yeah, I've got five of the two card adapter things. Yeah, yeah, they're all PCI. We're gonna be making a PCI Express board. That's gonna be like a few months down the road. Oh yeah, yeah, that would definitely work for doing that. Yes, we'll probably work on that once we get a card that can talk fast enough for it. Let me try to see if this works. A secret service? What's that? Yeah, yeah, the CIA, he's using that on all their PCs too, right? All right, I'm gonna read you about that. Oh, okay. Missing stuff. Oh yeah, yeah, yeah, the graphics and stuff. Yeah, yeah, I'll post that up there. I don't know why those got lost. You know, I just probably won't work. It might work. Whoa. Yeah, I think this is a totally incompatible version, but yeah, if you look at the performance here, you can do basically all the printable characters like one of them in two hours or eight of them in 18 minutes. So it's like two hours versus 20 minutes or two hours versus one minute right now. So it'll just take a little bit longer if you want to do a lot more characters. What was that? Oh, for the full length? Yeah, yeah, that's for the full length. So like up to 14 characters with Landman. There's just a few more slides here and then I can let you guys go. So I just started this opencyfers.org. I've got all the source for this stuff. It isn't posted up there right now. There's just some details on it, but it'll be posted fairly shortly. I also wrote a ModX core if you guys want to do any like RSA or Diffie Hellman or any sort of stuff like that. Did a A5 implementation for some GSM stuff I'm working on. And basically the technology trends, it seems like right now there's always like small devices that use really weak crypto just because they're limited by cost and speed and stuff. And then basically I don't see like simple brute force attacks really living that long just because the key sizes are getting really large but I definitely think that there's lots of potential for doing really more elegant attacks with crypto analysis on FPGAs and just doing it a lot faster. And I really think that's gonna be the next generation of everything and people are starting to do it. And so that's really the future. Hardware trends, it seems like FPGAs are increasing according to Moore's law just because they use the same sort of processes that processors use for manufacturing and stuff. And really the factors are pretty similar but you just end up with a lot more density which you're kind of getting in the newer processors like dual core and stuff like that. But it seems like there's definitely some algorithms out there that you can exploit with FPGAs. Some future applications we'll be working on are neural network stuff, attacks on web, WPA, PSM and analysis and correlation. There's some interesting stuff you can do with that. Any feedback, any other questions or anything like that? Yeah, I remember somebody told me about it. I haven't looked at it too much though. But yeah, I remember they used some FPGAs to speed up cracking the hashes and stuff. Oh, cool. I'll definitely take a look at that. Anybody else have any suggestions or? Any other suggestions or comments or anything like that? Yeah, it's a very good suggestion. If you really want to get into this stuff, just come up and we can work something out. But anything else? Okay, a couple of shameful plugs here. There's Tourcon and Shmoocon. Shmoocon's an awesome conference, if you guys want to go to that. Everybody know what Shmoocon is? No, yeah? Yeah, yeah, it's a, oh shoot. San Diego, I probably just copied and pasted that. But it's in February, it's in DC, I'll have to fix that also. I threw together most of this presentation in an hour before this thing, so opencyphers.org, opencores.org has a bunch of open source cores that you can use. Lots of crypto stuff. You can go to Xilinx, download their eval software, set your clock ahead 10 years and get it for free. And Pico Computing is the company that I'm a part of. And actually, here, you gotta check this out, this is hilarious. So our hardware developer decided to put these stickers on them, on this thing, and I'm not sure what they mean, but I don't know. He's kind of weird. But yeah, that's our box, and it cracks stuff, like five billion. It's basically half as fast as the EFF's desk cracker, I think. And theirs took up, I think, a few cabinets and cost $250,000, so that's one of our products. And any other questions? That's it. Okay, thanks a lot for coming.