 Hey, guys. Welcome to DEF CON 20. So what are we here to talk about? I got some good news. I got some bad news. Good news is we are actually going to fix this thing. Who here focuses on blowing stuff up? Who here is an offense? Hack some stuff. Yeah. You're awesome. We need you. Defense doesn't have offense. You know what happens? Defense gets stupid. We call that compliance. The bad news is we're not going to fix it by doing things the way we've been doing them for 20 years because look where that got us. There's a lot of dogma insecurity and the dogma is useful because there's a lot of really, really crappy stuff that we shouldn't be doing. But that doesn't mean what we're doing today is entirely correct. Got a riddle for you. What is the fundamental difference between offense and defense? Between an attack and a defense? You can tell when an attack doesn't work. Offense has a quality filter. Put up or shut up. Either it works or it doesn't. That doesn't mean we don't get crappy attacks floating around out there. Notice is a bunch of them. But it's not the same. It's not like it is in defense. Defense has a lot about dogma. It's not much science. We've kind of transmuted the, well, you can't defend against everything into I don't need to show what I do and don't defend against. And critiques against defenses don't need to show how useless they actually are. So we have this really random discussion with insufficient skepticism. So that's really what this talk is about. This talk is about doubt. This talk is about calling bullshit. If you walk out of this talk believing everything even I have to say, you're doing it wrong. Guys, my goal is to show you some new ideas. And they're out there. They're by normal theory wrong. If they are wrong, let's find out. Let's prove it. But let's not just assume because it violates dogma that it might not actually be a better path to protecting these networks. So just talk concrete. Here's a fundamental test. You take 2,000 machines with a given defense. You take 2,000 machines without. You come back in 6 months. You see is there a statistically significant difference in the infection rate? Now I'm not saying that everything and even in the near future is going to be tested like this, but let's at least recognize a gold standard when we see one. One of these days we're going to be spending as much time and money on security research as we are on medical research. Now, I don't know if you realize this, it took hundreds of years for medicine to get its scientific act together. And they had dead bodies. We got just like, you know, an economy of a world or something at stake here. Which, you know, kind of matters. We got to fix this stuff. We don't have hundreds of years. We got to start fixing it fast. So we got 3 eds in the security hydro. 3 reasons why we're in this place. One, we can't authenticate. Two, we can't write secure code. Three, we can't bust the bad guys. We're not going to talk authentication today because that's DNS sec. And if you want to talk to me and I'm private, you can do that. We're not going to talk about busting the bad guys. I don't know if you noticed, but there seems to be some lack of consensus on who the bad guys are. Me, I tend to worry about that time the entire Fortune 500 got owned. Aurora was really bad. People didn't even forget about it. It sucks it, sucks it, sucks it. I'm like, how about the entire stock market? And I'm kind of worried about these small businesses that are getting their payroll stolen and their banks saying, sorry, your fault. That's what makes me worry. But other people are worried about, I don't know, the dark night rises. So what are we here to worry about? We're here to talk about the inability to write secure code. Now just inability, let me clarify. It's not that it's impossible to write secure code. It's not impossible to make an authentication system with certificates using the X509 protocol. It's not impossible to go out and bust criminals and bad guys. It's just not going to friggin' happen. It's totally improbable. Possible is not enough. Not if you want to actually say you fixed the problem. I wish I could just blow stuff up. It's a lot of fun. I do it from time to time. But I'd like to be working on different stuff 20 years from now. New bugs instead of the same existing ones. So what are we going to do today? We've got five different things to talk about for DC 20. We're going to talk about addressing timing attacks. We're going to talk about generating random numbers. We're going to talk about suppressing SQL injection attacks. We are going to detect network manipulation, a little bit of follow-up to neuter from last year. And just for fun, we're going to go scan the internet really fast. These are all things that are totally possible today. How do we make them more deployable, less expensive, more probable? Let's talk about timing attacks. Timing attacks are a lot of fun. Basically, generally when we model the security of the system, we look at who says what to who. And we never look at when. Sometimes when a message is said, sent, leaks, internal information that allows you to break the security of a system. Now, the timing differences we have on computers tend to be small except for databases. They'll have like second level deviations. But it doesn't matter. According to a really interesting paper, we can distinguish 15 to 100 microseconds of latency over the internet. All the way down to 100 nanoseconds of latency over a land. Just like a thousand probes. This is a really small thing of saying, did that take one second? Or did that take 1.000001 seconds? And you can tell reliably. It's kind of cool. So Nate Lawson and Taylor Nelson did I think the best talk on this subject where they looked at string comparison functions and broke a bunch of, they broke open ID and open off. A couple of authentication protocols really nicely. So the canonical correct fix to timing attacks is you make everything take the same amount of time, the maximum amount of time any individual comparison would take. This is neat little trick. It involves using XR. And if when you compare values, you go through the entire string and you XR everything around, you'll find out if there's a difference or not. And the time will not vary. Job done, move on to the next bug, right? I have never met anyone in my life who has ever implemented this. The problem is that you have, if you want any security advantage, you have to do it every time there's a security critical comparison. That means you need to identify the security critical comparisons. You don't get to do this all the time because it is by definition quite a bit slower. And the difficulty of identifying all the circumstances where you need to do this stuff is enough to make this fix possible but not probable. So what can we do instead? I seem to note that when we're up against internet level latency, you know, the internet is slow. Sometimes packets take longer to arrive than others. I note that now we can only distinguish 15,000 to 100,000 nanoseconds instead of 100 nanoseconds. That's three to four orders of magnitude. And internet noise is not random. It turns out you can just tell Linux on your network interface be as slow as the rest of the internet and it'll do it. It's a one line command. Now here's the deal. This is a lot easier to deploy. And it turns out that matters. I could probably get someone to go ahead and say make things take a million to three million nanoseconds longer. I can't say make it take another second but another hundredth of a second that might be able to ask for that. Now who here thinks they know about this enough to say bullshit this is never going to work? Maybe someone. Here's the deal. Let's say this didn't work. I got some code to show you. This code right here is from the middle of open SSH. And what open SSH is doing is it's going ahead and it's comparing the password provided by the user through an encryption function to the password stored on the hard drive. And it's doing it in non-constant time. If the first character is right versus the eighth character is right, it's going to take a different amount of time. Now the difference is going to be like a nanosecond. Like it's going to be like one motion of the CPU. If you can find every single timing attack, if you just get enough samples, then I just dropped me some open SSH O-Day. Either my attack works, which it doesn't, or I'm on to something with this defense. Now what we're actually looking for, what the actual equation we would want to have is how much timing noise of what nature will permanently obscure how much timing signal beyond the point of infeasible return. Look, if I delay it a day, you're not going to find a nanosecond. So somewhere between a day and one nanosecond is an amount of noise that you can insert on a network interface that is going to destroy an entire class of security vulnerability with a single command. That is kind of cool. I like that. Now there are things that can go wrong with this particular implementation using the single line to Linux. It was not really designed to implement a random value precisely between one million nanoseconds and three million nanoseconds. So maybe there's quantization, maybe it's chunky in its delay. Maybe the noise pattern is wrong. It's Gaussian, which should be uniform or uniform when it's Gaussian. Maybe the fact that other protocols like TCP, it has its own time stamps. So every message that's sent out of a network interface actually has a thing that says, I was sent at this time. So even though the interface is delaying it, there's actually bits in there that say when it was sent. Maybe that's a problem. These are all things that can be fixed. The question is, is it worth investigating? And the deal is our perfect fix, constant time comparisons, is the enemy of the good. As long as we think we have a fix and we're done, there's no reason to look for something we can actually deploy. This pattern shows up elsewhere. Let's talk about that time RSA was broken. Not the time RSA was broken where the smart cards linked to a private key or the time RSA was broken where the secure IDs were stolen by China. No, it was like beat on RSA year. I guess it just keeps happening. No, this time I'm talking about with some work by Ariane Lundstra and James Hughes and also by Nadia Henninger where they found that one out of 200 cryptographic keys on the internet were actually badly generated. And this is a tremendous number. I mean, you got to think of this in terms of nines of reliability. Something went wrong and all crypto on the internet went down to two nines of reliability. That's failing. So the question is who failed? Now Lundstra and Hughes thought that RSA had screwed up here. Like, oh my God, RSA has this thing that allowed us to go ahead and recover things when the keys were badly generated. No, no, no. Everything breaks when the keys are badly generated. The problem is, is that the keys were badly generated. Bad random number of generations create trap doors in all crypto systems. If instead of breaking RSA or ECC or AES or whatever, instead of breaking the thing and figuring out what key was used, you can just guess what key was used. Well, that's a lot easier. And it turns out one out of 200 times you had the basis in order to be able to do that. So they thought that RSA was bad. They'd actually shown that still in 2012, random number generators are a problem. The stuff that we've been fretting about for 20 years. 20 years ago, someone was saying I hope I'm not worrying about this 20 years from now. And they're screwed. So when we found that the Debian random number generator was busted, Debian is a Linux distribution. It had some issues generating random numbers. It made a bunch of bad keys. We spent like years making fun of Debian. And it turns out it won't just Debian. A bunch of things are broken. Debian was just the tip of the iceberg. Now weren't operating systems supposed to fix this? Wasn't the OS supposed to do all this magic work? Well, it gave us some things that we could talk to. A thing called the random device which would always give us bits if it had it and would freeze up if it didn't. And then you random, would you do whatever it could? Give it something. Where would the actual randomness come from? So there's really only four sources of randomness that we right now really allow computers to use. We say there should be a little thing in the CPU, a hardware function that just has give me random bits. And it runs off of, I don't know, like a little radio or like bad diode, buggy hardware made a feature. Next thing is keyboard, mouse, and maybe, okay, the speed at which the disk goes around in a hard drive because that's impacted by air. Here's the problem. Most computers on the internet don't have a single one of these random sources. Not one. Desktops are okay. They got a human with keyboard and mouse and often some disks. Servers, less and less so they have disks, they don't have any of the rest. Virtual machines, nothing. Embedded devices, nothing. Like not a single one of those is present. Now, people say, oh, but Intel's top of the line to be out soon CPU has a random number generator. Yes, top of the line in 2012 has finally gotten around to it. The rest of all hardware in the world that developers have to build for have nothing and will continue to have nothing. There's a rule called the high end keeps getting higher, but the low end never goes away. Now, there's these things called TPMs that are supposed to be for trusted computers and PCs. They got into a whole bunch of things in like the mid-2000s. I tell you everything I've ever seen, tree things, things like radioactive. I don't know if it's because they break stuff. I don't know because they're unstable. I don't know if it's because they may or may not be present, but they're treated like radioactive gunk and they're almost never an embedded gear anyway. You know what I mean by embedded like, you know, stuff in a rack, your little links box, things like that. So what's kind of happening here because there's no sources of entropy? Let me give you an analogy. You go on the internet. Always a bad idea. Carbohydrates cause cancer. That's a real link. No, they'll tell you. Proteins cause cancer. Fats cause cancer. Alcohol causes cancer. Uh-oh. Man, you realize that would be a hot button subject here at Defconn. Now, so you don't consume proteins, carbohydrates, fats or booze. You know what happens? You starve to death. This is a thing that happens in technology all the time. You get too good at one solution and another failure mode crops up. We are starving for entropy. And the way I know this is because I actually asked some developers. I'm like, dude, what the hell? They're like, look, I got some code. It depends on the random device. I turn on my little embedded thing. It needs to generate a key. It's just born. And guess what? There's no hard drive. There's no keyboard, mouse, human, anything. It just locks up. And this is actually found during tests and it goes back to the developer and the developer says, oh well, let me go ahead and go to one of these not so good sources instead. The perfectionists think this isn't what's going to happen. They think the developer will protest, will march into management's office and say, buy us hardware with Ivy Bridge. We need that random number generator. Damn it. No, actually they say, security people failed us once more. Let's try some crap that actually appears to work. This is reality. You can secure theory. You can secure homework. You can secure your final exam. Or you can secure the freaking internet. I'm playing the latter game here. Perfectionism caused one out of 200 RSA keys on the net to be easily broken. We have seen the enemy and it's us. This was our fault. This was our failure. Because we were terrified that someone somewhere might have a bad key generated on the machine, one out of 200 things actually did. Well, and by the way, it's worse than one out of 200. It's one out of 200 easily detectable. Way higher, slightly difficult to detect. What we can do better, and what we can do is we can bring back an old hack called true rant. Humans and computers are not synchronized. That's why we like measuring keyboard and mice. See the deal is, even if you were trying, you could not hit that key on a keyboard with nanosecond accuracy the same way every time. It's like, one millionth of a second, one billionth of a second. I don't know, whichever one it is, you ain't doing it. Any system with two clocks has a hardware random number generator and it turns out the biggest lie about your computer is that it is simply one computer. Computers are small networks of interconnected devices on asynchronous networks that communicate with each other at their own time and pace. That's how they work. Every single computer has different devices so I can show that from different clocks, these clocks are not synchronized. Even if they had an error of one part per million, that's a bit per second per megahertz. We have way more than that actually going on. So there's an old hack from 96 by Matt Blaise and DP Mitchell. They do is they run the CPU in a tight loop. So they basically say increment by one. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Every 60 milliseconds they say, wait, how far did it get? On that interrupt they do some shuffling. Now here's where the entropy happens. You have a slow clock. It's 16 milliseconds plus minus some number of nanoseconds. And then that some number of nanoseconds determines how far the CPU spun in a circle. That's where the entropy comes from. Then they do a bunch of stuff to mix it up. Shuffle it around, hash it around and so on. This is not a bad approach. It is totally disowned. And it's too bad. Because if it had been used and it had been part of the Linux kernel, those keys all would have been good. Now why was it disowned? And I mean seriously, Matt Blaise is horrified that I'm pushing this stuff. I have a tremendous amount of Matt respect for Matt Blaise. So let me tell you, I'm at least honoring his discontent. His approach is, we can't model its behavior. And if we can't model it, we don't know how good or bad it is, so we shouldn't do it at all. I understand this attitude. It's a respectable attitude. You should know how good your system is. But what it's done is it's led to a reduction of the available entropy in the Linux kernel. We used to look at interrupt counts to go ahead and get some entropy. Someone said, oh my God, you might be able to remotely do something weird that would cause this interrupt to fire three times more and then we'll get all the keys. Ha ha ha. No, it's just not true. So I've been writing the thing called Dakarand. It'll be out next week. It's an update to the old model. We have various modes of generators. So we'll go ahead and we'll do a sleep with the use sleep command. And then we'll see how long did that sleep actually last with various clocks. We have a thing called the monotonic clock, the real time clock. We can actually ask the CPU, how many cycles did you run? You're like a 1.5 gigahertz processor. So the cycle count goes up 1.5 billion times a second. So how many did we go through? There's an incrementer. See how many times we can increment an integer within a time period. So we say, hey, just keep trying to go and wait until we've passed 50 milliseconds. What's your number at? There's the real time clock on PC hardware. Anyone remember IRQs? Yo, there's still one in use. IRQ 8 is hooked up to this thing on the motherboard. And it just fires whenever you want it to. You can just say, yeah, 8,000 times a second. Ping me. 8,000 times a second plus, minus some number of nanoseconds. So there you go. You got one clock against another. Then there's my favorite of these. Two threads, one inch. All right. Sounds crappy. Well played. Okay, look. Anyone who thinks computers are completely deterministic devices has clearly never written threaded code. Seriously, it's really true. Guys, it's hard to make a computer do things at the right time constantly. Time is the one factor computers we never pay any attention to. We have these things called real-time operating systems that are really difficult to write because at small time scales, computers do whatever the hell they want. Nothing a computer likes to do more than screw around for 10 milliseconds. Where'd you go? I don't know. So, you know, it's not a bug. It's a feature. We're using those properties. We're exacerbating them as much as possible and we're forcing them to compete, thus hopefully giving us a noise. So the flow of how all this works, we take whatever bits we get. Doesn't matter how they're good, bad, whatever. We throw them into a hash. So you have shot 256 and you just keep throwing data into it. Whatever they are, whatever the nanosecond clock was, you throw it in raw. Why do you throw it in raw? It might be good. Maybe it's bad, but it might be good. Let's not under count our entropy. However, we only count it if it passes what's called von Neumann debiasing. Von Neumann debiasing means you take a stream of either zeros and ones and we get, it never comes out of the generator. We count the number of ones in there. If it's even, we make it a zero. If it's odd, we make it a one. Zero zeros and one ones, we throw that away. Zero one is a zero, one zero is a one. So we're basically capturing the transitions only. Now, all we're, we've been putting lots of data into the hash, but we count that we got a bit every time von Neumann is happy. We also throw a zero one in there just for correctness sake. Once we have 256 bits, we finalize the hash, we have a 256 bit hash now. Now we can go ahead and we run something called S-crypt. S-crypt is a time memory hard function. What it means is you've got some value, you're going to spend a second churning on it until you get another value. It's just that when the bad guy tries to go ahead and do this, they got to spend a second on every single attempt. Finally, once we have this thing, we put it into encryption function AES 256 encounter mode. We get a stream and that's the result of, that's our entropy that we're looking for. So you want to mess with this. Actually, anyone here want to mess with this? They, they can break this approach. Good. Try. You pick the hardware. You pick the platform. You pick the generator. There's seven of them. They can't all work. Something here must fail in some environment. Here's where you can look. User space and hypervisor scheduling. So it turns out all this stuff is not running in the kernel yet. It's running as a program. Program is only visited some number of times a second, not at random intervals. So that might screw up my clocks. A bigger issue is what's called auto clocking. If you time something against itself, you're going to have a bad time. Okay? Clocks tend to be highly correlated against themselves. So let's say you're in a VM and the real time clock is being emulated and it's the same emulator that's running the clock monotonic clock that Linux is ultimately depending on. Then you might have a entropy problem here as well. Virtual machines more than anything else do not have to directly use this stuff. They should be able to ask their host, give me random bits. And maybe it does this stuff, but it's the host that's doing it. So that being said, I've run this in every VM I can find and it survives just fine, which is kind of crazy. What about VM cloning? Everyone worries about virtual machines being copied. See, here's the deal. Normally, you only get bits for your random source in the operating system really slowly. So you make a copy of this virtual machine. The pool of bits is exactly the same. You go to pull some out, you get the same bits. My approach says, look, you get different bits every time you go to the well. So each time you begin a unique read to my version of randomness, then you're okay. Now, if your virtual machine goes ahead and is cloned after I can't help you, but before you can be like right before the instruction that you say, give me some random bits and you'll still be fine. Okay, what else could go on? Is the underlying use of cryptography safe? Well, we have this thing I'm called Modified Von Neumann. We absorb a tremendous amount of low quality data into our hash. We are throwing crap in there. But here's the deal. I hate you so much right now. You're going to have one of your hands. Good man. I'm sorry by the way. This was supposed to be a two hour talk. Oops. This happened more talking. All right, check it out guys. You got 100 gigabytes of zeros. You got 128 bits of good high quality data and you hash all that. Generally we consider the entropy of the resulting hash to have 128 bits. Meaning you can't get worse, but you might get better. So I think Modified Von Neumann is going to be okay. The fact that I'm emitting the result of a pseudo random stream instead of the raw output of a random number generator, I seem to note that every attack against random number generators involve looking at large amounts of output from them. So you know what I want to do? Not that. It seems to be the way you break these things. Maybe we shouldn't expose the raw data from our generator. It gets us busted. In fact I wanted to see just how true that was. So normally random number generators don't use cryptographic functions. This is a mistake because they're just as fast and we keep having security bugs. Someone found at Black Hat that you know all those like forget my password tokens, the URLs or what not. So generally those tokens are generated by one of the random functions in PHP. Really millions of sites have bad forgot my password token generators because they're using any generator but the open SSL one. So the question I had was can you actually trust a output from a cryptographic stream to survive random number tests? I mean I assume so but let's find out. So one ahead thanks to Jamie Schwetman should actually set about 16,000 CPU hours with a diehard or entropy test run across 21 ciphers with inputs by the 16 megs of zeros or 16 megs of dev you random. At first it looked like there was actually the ability to differentiate ciphers and so on. Yeah data lies a lot. We eventually found the bug. There's different padding and different cryptographic functions so long story short the files were of different sizes. 16,000 CPU hours and I was detecting these two files are differently sized. This is the less embarrassing way of finding out about that. That being said we're going to release the data anyway. We're still trying to do some machine learning against it. I actually have a neat little tool. Mail me if you think it's useful. It lets you take a comma separated file of CSV and just like run SQL statements on it. Just like dynamically generates a database, executes your query and gives you statistics. So it's called C SQL. It's kind of nice. So this didn't work and find anything but we did it anyway. Some kernel recommendations these are things I'm going to tell the Linux guys to do. Dev random needs to stop walking. Just no. No. Crypt dev random doesn't block or crypt gen random on Windows does not block. And when I actually go ahead and get all the certificates for RDP, you know what I'm going to find? One out of 200 of them are not bad. They're going to be fine. Don't be so shy about interrupt sources. I don't care that much about interrupt counts. But when your network card is talking to your PC, it's talking to your CPU, it is not going to be nanosecond accurate to where the CPU is. That's just not how it works. So you actually have this thing in Linux called Ftrace. A really nice framework for tracing when events happen in the kernel. Right now it's microsecond accurate. Make it nanosecond accurate and then occasionally randomly use values from there to seed randomness. This is the appropriate approach. Maybe consider modified Von Neumann but really the time is over for these classes of entropy starvation vulnerabilities. It has to stop now. So that being said, our biggest problems in security do not revolve around random number generators. Just some of our more annoying ones. They revolve around languages. So we got this thing called language theoretic security. Which is the idea that security vulnerabilities are the consequences of the languages that we're writing stuff in. Coined by Len Sassman and Meredith Patterson. It has a corollary if language got us into this mess, perhaps it can get us out. One way of looking at language theoretic security is through the lens of computability. It's saying, you know, you just want to be able to declare some information but something bad happens and now you're running these big programs of a complexity called Turing completeness. This is a valid lens to think about. But I'm going to give you another one. Who remembers diagramming sentences? Who would have thought failure to understand this would cause the majority of all security vulnerabilities ever found and ever written? Turns out diagramming sentences is serious business. See, when you have injection attacks, that means SQL injection, that means cross-site scripting, that means pretty much every web vulnerability out there. What these attacks actually are when you get down to it is that one piece of the system thinks it's sending data, the other piece thinks it's receiving code. One piece of the system thinks code is coming from another program, the other thinks it's coming from the user. So everything is about tree differences. This is what's going on. You have two different trees for all injection types. Now, so why am I bringing this up now? You know, why another theory? Well, this simple stuff is kicking our ass. This is the majority of vulnerabilities. We haven't fixed them. If we did fix them, they wouldn't still be costing us billions of dollars. So this gives us the rules of the game. If we wish to fix injection vulnerabilities, we want to synchronize parse trees. That crap of mess that you saw earlier needs to be the same on the front end web server and the back end database server. They need to see the same thing. And more importantly, we want developers to actually use what we write, okay? A language unspoken has a term. It's called a dead language. So what's neat is that this approach explains what's not really well understood. Why did XML become popular? I mean, it was like this big huge, oh my God, XML is going to save the world. You know why? It's because developers looked at it and said, wow, if I use this, I don't need to spend four months finding out if the third bit of the fourth byte needs me to change my parser entirely. That was life in the early 90s and is life in a couple regions of the market. Why did Jason become popular? XML figured out its own ways to get fiddly, to be honest. Jason just worked Eve Valden move on with your life. Heart truth is that developers in charge. It's not me. I'll tell you that much. It's not architects. They love ASN1 and XML and web services. Not academics. They love Haskell. Now in management, they love money. Performance, reliability, maintainability, features, rapid developments, these are the things that make money. Security may lose you money later, so management doesn't give a crap. They just say, do what works fastest and best by these metrics, which oddly notably don't include security. So what is the number one thing that developers like? Code needs to work. And that's why PHP is so damn popular. Anyone here not think PHP is the most popular language on the planet? Because there's like a river right there and I encourage you to figure out which one. PHP is really good at copying and pasting code, which is a thing of course no one would ever do, but everyone always does. You know, you go to the documentation, you have some tasks like, okay, I got to read out of the database. How do I do that? Copy, paste, done. People say, oh, that's the wrong way to do it. I'm going to do it in Java. I'm going to do it in my IDE, my integrated development environment, which is short for a thing that moved, copied and pasted from the edit menu to the file menu. There is a metric of quality for languages that no one measures, which is when I try something, how often does it actually work? See, normally it does it fail fast. Does the compiler find it quickly? Is it discovered in test? No. Now devs aren't looking for code that fails fast. They wanted to work the first time. And it turns out PHP is an amazing language for that. It just goes. Almost no one tracks this metric of whether it works or not. Processing does, and it's an amazing language for it, but this is why all the successful met languages are the brainstorm of one guy. Like it's Guido for Python, it's Larry for Pearl. Like there's one guy who has it in his head. Art is science before we know what we're doing. Our languages that are popular are artistic endeavors generally by one person, supported by others, but one guy has got the vision. PHP beats your favorite language, which means if you want to fix security, which you may not. Enjoy the status quo. But if you do, PHP is the most important language in the world to repair. So what's wrong with it? Well, we've got these things called object relational models where you don't even go out to a database. You never write database queries with SQL. You just like use normal methods from PHP or whatever language you're in, and it just makes it work on the back end. This is amazing and wonderful, and I love seeing it. The stuff works right up until the moment where you need to ask a question. Anyone familiar with that? Who knows the language this is? Brain fuck. Check it out, guys. Brain fuck has a rejoinder. There are more things in this world broken by punctuation than just brain fuck. It not alone. Look at this. Result from dollar sign name dash arrow in dollar sign names dash arrow 32 characters of punctuation and it's like nested in like insult. Compare that to this thing. Result equals query select name from names more length than name less than five. I have 12 characters of punctuation there and huge gaps. Which would you rather write? It turns out this really matters. It turns out that SQL is a language that's really good at structured queries. This matters. Now, you got some classic things are supposed to fix this. We got like escaping. Now, does anything say we don't give a shit like a 25 character command? We have bigger problems. Escape, fail open. If you just don't write it, it doesn't matter. It's a black list. When's the last time a black list actually worked? Supposedly we can do this thing called parameterization. What you do is you write a template. You say I'm going to put these little holes here and later I'm going to fill in the holes and this is what I put in for the first hole and this is what I put in for the second hole and so on. Maybe if you're lucky you get to have alias. You're going to say I'm going to have a hole but I'm going to name the hole. I'll tell you what the hole's name is. So you got to like this big batch of code where before you had a single line. How popular is this? Well, nobody has ever written a parameterized query in their life without a gun to their head. We know we hold the gun. Even secure code when audited tends to be safe things are written quickly and unsafe things then we'll go ahead and go through all this work. We threaten to fire people if they don't write it right. That's a data point guys. If you need to like threaten people maybe you can make a language you don't have to threaten people. Not to mention for some reason databases don't particularly enforce too hard right here. There's a lot of stuff that you just can't parameterize. Go ahead and try to have a hole where select is. It's not going to work. Sequel for all of its elegance builds this really complicated parse tree and sometimes you get to go ahead and do stuff to make it safe against injection and a lot of times you just don't. So I released this thing in 2010 called interpolic and it was basically a way of saying we're going to let devs write code the way they want to write it and then we're going to figure out how they should have written it and we'll actually fix it compiler side. What we're going to do is we're looking at the fact that you say you see a string select star from full or x equals dollar sign x and y equals dollar sign y. Well humans can see the separation between data and code. Code is everything that doesn't have a dollar sign in front of it. Data is everything that does. But languages were throwing this out. They were just taking the value of the variable x, putting it in, making a single string done. Information separation has been lost forever. What if it wasn't? What if we kept this data around? And so what I would do is I'd have this alternate syntax where use the carats instead of dollar sign so carat, carat x and carat, carat y. Then I had a function that was a code generator and it would go ahead and it would look at that stuff and say anything that began with a carat, that's a parameter. I'll put a hole there and I'll fill the hole. And then you have to go ahead and evaluate, meaning run the code in line that's text, evaluate this new safe function. This works. This works really well. Among other things that are scary about it, what if the developer goes ahead and puts a variable into the statement that's going into the b function? Say do x equals dollar sign x instead of my funky x. Now the attacker is one step away between an evaluation and attacker controlled code. That b function in the middle, the text that it returns is going to be run as arbitrary code. It could look at what came through and say maybe I can predict what PHP is going to do when it runs this code. Yeah, that's a bad idea. That's not going to work. Predicting what some other language is going to do is give an arbitrary input is a fool's errand. So this is what we're actually doing. And Daniel Zula who has an amazing talk coming up on Sunday, by the way, has actually implemented this. The thing that does is called a self-scope function. Self-scope functions actually run and have access to all the variables from where they were called. This is heresy in language development. It's also a really good idea. So the idea is that we have this function my sql safe query and it says select star from foo where x equals care at x and y equals care at y. Inside that function, it can get the variable for x and it can get the variable for y and now can go ahead and pull that out, make them holes, make it a parameterized query and run it just like. So the dev runs this and the code does what it's supposed to. That's kind of cool. What other things could we do? Well, there's some arguments for code rewriting. If we know what devs should write, why don't we read what it wants and be done with it? Is anyone here ever audited auto-generated code? I'm just not a fan. There's a tainting approach that Zula has been thinking about and I kind of like it. The idea is when characters come in from the web, you mark them as these are web characters. So as they flow through the application, they're always marked as having come from the web and then your my sql safe function says wait, anything that came from the web is data, that's what I'll parameterize. It's kind of a variant on tainting. This is kind of a neat thing that can work. I've been working with the guys over at Etsy who have a hilariously good security team. What we've been talking about is based on what they're already doing. Where instead of doing magic underlying stuff underneath the character set, you know, expanding the object of what a string is, just hex and code the stuff. Like just like we would in HTML and whatever 41 for an A. And now you go all the way through the application, you get to the my sql query, it goes to execute, it says aha, here are some hex bytes. These bytes, these are parameters that came from the web. We'll go ahead and parameterize those. This work could also even be done in my sql itself. We can modify the parser in my sql when it got bytes that were marked in such a way, they could only be treated as a string and could never be understood as code. The final alternative is actually kind of interesting. And this is really a last minute ad. Instead of marking the dangerous stuff is dangerous, we mark the safe stuff as safe. So you actually mark all sql code using some sort of a marking, even if it's just hex and coding. Now the reason you do this is because the bytes from the network flow through a web application 83 ways from Sunday, they're transformed, they're moved from function to function to function. This is why we have these things called static analyzers. They got to track all this crap. But the actual code, the sql, that tends to be like right before the actual execution. So that is a thing that we could do. We could mark the actual sql and then let the other stuff go raw. This is not a bad thing. That's what Langsect means. What are people trying to express? How can we make it easier to say this? What errors will people do when they try to use our stuff? If you don't care about the crap that you're putting developers through, why the hell should they care about you? So we got two more things to talk about. I think 20 minutes left to do it. 20 minutes left to do it. Suck it. 20. It doesn't matter what code you write if there are parties in the middle that are blocking what you send. Content alteration and blocking is actually becoming a bit of a thing. Like Verizon's claiming the First Amendment right to edit your First Amendment rights. Entire countries are blocking web pages. It would be nice if we could figure out what's going on here. So last year I wrote a really, really fun piece of code called Nudr. It did some crazy under the surface mad packet tricks to impersonate a stream between some client and some test note out there on the internet. I could go ahead and impersonate any protocol, any host. It was brutal. It would find everything. Unfortunately it required because of its low level packet nature, I basically needed to ship hardware. Like an actual physical box that would sit at some customer site. Has anyone here ever shipped hardware? You guys are smarter than I am. That is a miserable experience. You know it's way easier than shipping hardware? Shipping web pages. That's pretty cool. So what else could we do? The figure ship executable code, you know, set up that XE install. Uniprobe takes that approach. There's a web page approach using iframes. You get a little window and says in this window do you see this website? Hurtict does that. It either needs user cooperation or a Chrome extension. Is it possible to determine whether content is up or down based on nothing but a web page? So can we crowdsource censorship data? If we wish to do this, that means we want to maximize the data we get per user. We want to minimize the installation load per user. Okay, what can we do? Well, the browser's same origin policy usually prevents web pages from doing much to read one another. You wouldn't want Yahoo able to read your Gmail just because like the two windows were open at the same time. But there is one exception to this same origin policy. And the exception is images. Domains are allowed to load images from one another. Beyond that, they're allowed to know that a load was successful. Not merely that there was a particular file that a location, but that it wasn't image, the image rendered and here's its dimensions. You actually learn quite a bit. Now, here's the thing. If a website is being censored, images from it are not going to load. Now, what one image is on like every domain on the internet? Faveicon.ico. This is this little icon in the upper left-hand corner of your tab. It used to be for bookmarks. That's why it was called Faveicon. Pretty much everyone's got one of these. So what we do is we try to load the thing and if it's there, great. And if it's not, we say the site is blocked. So we got this site. It's actually up today. I wrote it with a Joseph Van Gef and Michael Tiffany with a Wall Street journal, Data Transparency Hackathon. And it is themed like Minesweeper. If you hit play, if the image loads, it will drop the Faveicon. If not, it will drop a bomb. Works kind nicely. So what's going on behind the scenes? You create an image object in JavaScript. Image equals new image. You create an on-load handler. You create an on-error handler. You do this a bunch in parallel, reading from a list of sites that have been validated to have Faveicon.ico files. Six failures are required before a bomb is dropped on the map. Is this enough? No. No, it's actually not. Web browsers provide really crappy feedback into what's actually going on on their network stack. It's like flow control really doesn't exist. They just assume they have infinite bandwidth. My friends told me this was true when they were doing stuff still on modems. I didn't believe them. They were right. For actual reliability to the point where we could have genuine worldwide monitoring and trust the data we're getting back, we basically need to shut down all other probes while doing final validation and then do a probe just for something that is known to be up and might be down. And even then we're going to have a few tenths of a percent that are going to be wrong anyway. That being said, sensor sweeper does work pretty well. Can we do better? All right. Well, let's talk about Flash and its sockets. Once upon a time, 2007, you could go ahead and you could turn any web browser you wanted basically into an open proxy to the internet. You could just basically say, hey, web browser, go to this site, get me this content, speak this protocol. It would do it for you. It was a bug. We got it fixed. However, networks are more complicated than they may appear at first. We fixed the browsers, we fixed the plugins so they can only speak back to our own IP. However, there are boxes in between the web browser and our own IP. They're called transparent proxies. Coincidentally, censoring systems are often implemented with transparent proxies. So what this means is that we actually have a thing that is nicely placed for us to start probing. Because what we can do is we can use Flash or Haxi, which is the better language to code Flash in. We created a connection back to ourselves on port 80. And then we tell ourselves, hey, I'm looking for Facebook. I'm looking for YouTube. I'm looking for BitTorrents. I'm looking for all this stuff. And it's coming back to our IP address. But it's intercepted by the transparent proxy. The transparent proxy decides to host whatever the heck it wants. It sends it back to our Flash application. And then we can see the content and go from there. Cool. So going beyond that. And I have 13 minutes. I'm watching this clock. You're not taking it from me. So just as HB traffic on port 80 is hijacked, so may HB traffic on 443. So it turns out not only can you get web content, you can go ahead and see what certificates are being changed. Turns out when you make an SSL request, you can actually name the site that you would like. The IP at your side can actually decide based on what you said it can either pretend to have that certificate or can even proxy to the real Facebook, the real Yahoo or whatever. The transparent box in the middle that may or may not be replacing certificates, it doesn't know that it's not actually talking to the browser. So it goes and provides the certificate. And that goes to the Flash applet. And the Flash applet can go ahead and bust you. So it's kind of cool. Normally the browser DOM, when you ask the browser, it can't tell you what certificate was at a given site. So this is actually the first mechanism that I'll tell you for this site, what cert is there? There's a limitation. If the hijacking is at the DNS layer and not at the IP layer, then you won't go to the correct IP address to get the interspersed certificate. So I am not going to be running the big certificate databases, but unnamed parties who are will be getting this code and I'm sure Lowell's will be had. Full proxy is also possible. I'll kind of skip past this. But you can basically route every single protocol, have like a full HP proxy, bouncing from a server through a client back to a server. Why would you do this? Because while I could try to emulate all the weirdness to play a YouTube video stream over a Flash application, it's a lot easier to just have a real browser really play through the Flash side. So that's kind of neat. So the last thing I want to talk to you guys about is scanning networks quickly. There's a thing that everyone should know, it's my definition of actionable intelligence. What can an attacker do today that he couldn't do yesterday for what class attacker to what class victim? If you write an advisory that does not answer this question, it's a bad advisory. What changed? And a big part of determined how big deal a bug is is, well, how many systems actually have this bug in the first place? I've run two major scans for two major press and ounce bugs this year. One was the Telnet encryption bug. One was the RDP attack. And the question was were either of them widespread? Telnet? No where. There's like a tiny, tiny number of hosts that are running it. RDP? Yeah, like five million hosts were running that. That was actually a big deal. So this is important. If we're going to actually rationally respond to bugs, we should know does anyone actually have them? So how do you actually do these scans? Once upon a time I wrote a network scanner called ScanRand. It was fast because at the time I didn't actually know how to slow down. Just kept running in a loop and it worked. So I send a budget connection request called TCP SINs. This is now not enough. There are a whole bunch of things that will respond to the SINs with the proper SINAC but nothing is there. They answer the door but nobody's home. So you basically have to split your process. You get a bunch of candidates through the old method where you see if the three-way stuff actually works. And then, only then, you go ahead and you do real connections and connect to the IP. So a little bit more detail. You send the SINs, you increment the first byte first, the reason you do this, so 1.1.1, 2.1.1.1 and so on. This means the larger the network you're scanning, or the larger the internet you're scanning, the slower you're scanning each individual network because you're spending your time somewhere else. So it increases your success rate. Then a separate window. I don't even use ScanRand to go ahead and capture the data. I actually have TCP Dump capturing SINACs. Neat little command that will do that. And then once I have my candidates in the Telnet encryption case, the Nmap guys were so kind they went ahead and whipped up a quick check for me. I just fed my IP list to it and very few were found. When I was getting RDP, there's an excellent framework called Black Mamba. It's at root food.org. You basically just write really simple straightforward code and every time you're going to stop in the middle, you yield. So you say yield connect. So come back after you get a connection. Then come back after you're able to write these bytes. Come back after there's a response. And Black Mamba makes it work very, very nice. This gives you about 3,000 IPs a second. I want more. So I've always wanted to write a user space network stack. This is a program that directly talks to the network and retrieves content. HD Moore kind of kicked me into high gear on it. He's got some mysterious new scanning project called Critical I.O. I am not at all beyond supporting any crazy project by HD Moore. Heck yeah. So ScanRand 3 will be out shortly. ScanRand 3 is a new scanner. It doesn't just flood sends. It actually connects to nodes and extracts data. The original plan was to have there be a database inside of ScanRand. I was going to use SQL light. I thought it would be funny. Just humorous to have a network stack where I'm like select star from sockets where data sent not equal data acknowledged and data sent time minus now is greater than 3. To be like a database look up to figure out where you needed to send a retransmit. That would be cool. This was useful for SQL light because that's really, really fast. If there's no index. That speed disappears if you add indexes. So if you actually want to be able to query the database SQL light is no longer fast. So let's just not do that. Can't someone else do it. So check this out. I didn't think this was possible but it actually works. ScanRand did not get its speed by keeping track of all these machines it was speaking to. So why should ScanRand 3? We're going to build a stateless TCP stack. It just sends data. It doesn't remember to who and the other guy remembers that he's talking to me. The way you do this you send a send and you set the maximum segment size and window size to 1460. What this means is at any given point there's only going to be one packet outstanding at a time. So now whenever you receive a SINAC okay you got to talk, you send your payload, you send the acknowledgement. Every time you get another acknowledgement with a payload you acknowledge that. You ignore if there's no payload and when you get a SINAC you send a reset. You say I'm done with you. Don't remember me anymore. In every situation except for if the original SIN is dropped every time the other side sends a packet if there's a drop well whatever he'll clean it up for you. I can take 3.25 million IP addresses. I can sweep them for my little $200 opt-on box on a really nice pipe. I can get 800 megs of HTTP data. This can't take 75 seconds. And of course this will be, this is not even optimized code right? The goal this will get out to about 200,000 IPs a second. You end up doing stream reconstruction with a SQL query which is kind of fun. Security, right now I've implemented none of it just like your average developer. For security you basically put cookies into the sequence number and source port and maybe the TCP time stamp. So the idea is you can put little secret data so you know when someone's responding to you they're actually responding to you and they're not faking responses you know saying that Google's got a bunch of spicy stuff. Some notes this may not be necessary. Kernels have gotten kind of fast. Non-blocking connect plus e-poll should probably be able to do at least tens of thousands of sockets per second and it'll certainly be easier to code for. But my approach eventually becomes the fastest because I'm just really lazy. I'm not doing anything. The other guys are. The biggest performance advantage that I'm going to get is when I get what's called a write V for write vector where I can give the kernel a whole bunch of packets to send instead of just one. A couple more notes. I can try more efficient stores in SQLite. I can do a giant allocation of RAM and just have fixed offsets per IP. There's a cool project called MemSQL from these old Facebook guys and they're basically converting SQL to C++. So that might actually work. We'll find out. And then our merge approach is to where I only keep state if I get a certain distance into a session. Now there are a few servers on the internet that do not actually have a state either. They'll say well the client will keep track and I'm almost done. These servers tend to be run by Google. And yeah if a client thinks the other guy's going to do it and the server thinks the other guy's going to do it, it's a bad day. The most important feature to write that isn't written yet, I need to implement black list support. Most networks don't complain when you scan them. Those that do, you've got to honor the request and it actually does require a fairly efficient data structure not to go ahead and have that take all your speed. Oh, don't mess with the firewall. Just have your code get its own IP. Your little program arbs for itself. That's lots of stuff. I hope you like it. I'm out.