 how are we feeling today? Good. Well thank you all for coming. These guys have a real unfair advantage as far as promoting their talk goes because everybody has come in, you know, since yesterday morning and said, what in the world is that thing? So it's good that it looks like the promotion worked. Congratulations. Kind of an expensive billboard, I guess. So let's learn about how machines are going to replace us all, right? Let's give these guys a big hand. Hello Defcon. So thank you for the intro. Yes, we did put a high performance computer in the hotel this year. We decided that if we were going to bring it back next year, that it was good to test. So my name is Mike Walker. I'm a program manager for the Defense Advanced Research Projects Agency, DARPA. I'm joined today on stage by Jordan Wines from Vector 35, which is one of the companies working to build visualization for Capture the Flag and CyberGun Challenge. Hi everybody. I want to point out the URL that's on this front left screen if you're looking at the stage. That URL, we had, if we're going to talk about Capture the Flag, we should let you guys play one. So if you go to that URL, you might want to laptop it will help. There's some source code. There's a binary and there's a connection information. Have at it. Hopefully some of you have fun. If not, we may have to give you some hints later on, but you guys shouldn't need that. And one little bit of advice, if you connect to it, try not to leave a long running session because we're going to visualize what's happening. You'll see why later. So we're running a live Capture the Flag during the talk. Hopefully some of you can hack it, Capture the Flag. We have a little 3D printed logo for the first to do it. In the meantime, I want to talk to you a little bit about why we're here. Now, when I walk around DEF CON and I say I'm from DARPA, first people say we still like ARPANET, TCP NIP are holding up well. And we also hear about Cyber Fast Track, Mudge's program from a few years ago. But we are also known for our history of challenges. Starting in 2004, DARPA started holding open grand challenges. Global competitions, first in self-driving cars. We gave a million dollar prize to the first team that could drive an autonomous car across a desert course and later moved to the 2007 urban challenge to try and navigate city traffic. More recently, we did disaster rescue robotics. Again, global open competition to develop technology that doesn't really exist yet. So we're here today to talk about bringing autonomy to the sport of hackers to capture the flag. And this talk in a nutshell is that we're going to take this room, knock down those two air walls next year, make it three times as big, install seats and have a free live event where machines play capture the flag against each other in real time with sports casting, visualization, imagine a gigantic e-sports event where all the contestants are machines. So we're here to talk to you a little bit about how that's going to work, how enormously hard this game is that they're going to try and play, and why it's so important. So if we're going to talk about computer security and autonomy, we're going to recognize the fact that computer security is an adversarial contest of the mind. Bruce Schneier, Dan Geer, talk about a field that is defined by an intelligent opponent. And computers have been playing in adversarial contests of the mind for a while. So we can start with checkers. This is a simple one because today, checkers is a solved game. What that means is we're actually able to write down every single position in a database, all 10 to the 20th positions, and solve for what perfect play looks like. So it turns out that once checker was solved, the conclusion was that with perfect play, the best result was a draw. The only winning move is not to play. But the big game is chess. Chess was proposed as a grand challenge for machines in 1950 by Claude Shannon, father of information theory. And the idea of a computer that could beat the very best people at chess took 47 years. It wasn't until though, it wasn't until years later when Deep Blue II became the world chess champion. But there were milestones along the way. Before computers could beat people at chess, they had to play themselves. And in 1970, the ACM created an all computer chess league just so computers could play each other, get a little bit better, learn the foundational lessons, a prototype competition. And seven years later, in 1977, one of those competition computers beat a grand master for the first time. It was foreshadowing things to come. So that's chess. Let's talk about some harder games. This is Go. Now, recently, within the last five or six years, Go computers have started to be competitive with the very best Ninth Dan masters at Go. On this board, which is a beginner board, it's nine by nine. This is not the board that masters play on. They play on this, 19 by 19. And while we're just throwing around numbers like 10 to the 761st power positions, 10 to the 84th power is a pretty good estimate for the number of atoms in the known universe. So that's not infinite, but it is not possible to reason about. And when it comes to 19 by 19 boards, computers don't have a chance against the best Go players. So as we start talking about search spaces that are effectively infinite, that are bigger than any computer we can build, machines start to break down when they play against us. And this might surprise you, but every game I've talked about so far is actually a very simple game. We're in Las Vegas, so let's talk about a real game. Well, we appear to have locked up, so I'm going to invite Jordan over to deal with the gods of the live demo. And I'm just going to keep talking. So let's talk about poker. Poker is a very difficult game for machines to play well. And it's difficult for a variety of reasons. The first is it is multi-opponent. You don't have a single opponent in poker. If I am at a table and I'm playing poker and I am a little fish and at the table is another little fish and a big fish, it behooves me to get the big fish to eat the other guy, which means that I need to be able to model player versus player interactions that are not my own. Thank you. Additionally, keep the mic. All right, I'll keep the mic. Oh yeah, we can switch that back to the slides. We'll do it. This game is non-zero sum. So what does non-zero sum mean? Non-zero sum very simply is if I play 10 games of chess against an opponent and I win every single one and I lose the 11th game, I'm winning. And you can actually get an education on this if you want. If you win 10 games of poker in a row for $1 and you lose the 11th game for $1 million, you're not winning. So it's actually very difficult for machines to reason about margin of victory and margin of defeat. And additionally, poker is a game of incomplete information. There are a lot of things I can do in a game of chess, but I can't bluff because you can see all my pieces. I can see all of yours. But in poker, every player has secrets, which means that as a player, you have to keep a model, a statistical, a probability model about what your opponent has and the moves it can make throughout the game. So how are computers doing against humans at poker given these challenges? Well, this year, four top poker players faced off against a poker playing machine. Three of them beat the machine and the margin of victory on the human side was 700,000 chips. Do not reproduce. Do not send your robot to Vegas with your money. So since we're not just in Las Vegas, since we're at DEF CON, let's talk about a really hard game, Capture the Flag. So Capture the Flag is being played right now in the Ballet's Conference Center with about 15 teams. You can tell from the get-go that this is multi-opponent. It is a live network exercise. It is a big team sport. So let's talk about what those teams are doing. So imagine that you have a friend who is not the worst C-Coder, you know, but he writes lots of bugs and he's written a whole bunch of new servers, new services, and he's given you a server that's running all of them and he said, hey, I want you to plug this server into a network filled with the best hackers in the world. That's Capture the Flag in a nutshell. There are three basic tasks once you get that server filled with vulnerable software. First, you have to defend information. So it's Capture the Flag, but the flags are made out of data. The field is made out of new code that nobody's ever seen. You want to keep flags. That means you want to protect data. So you're doing live patching, live intrusion detection signatures all as fast as you can. Second, you want to take your opponent's flags. So as you're examining the software and you're finding new vulnerabilities, you want to feel a patch, but you also want to take as many flags from everybody who's un-patched as you can in a short amount of time as possible. So if somebody hands you that server and says, hey, this is filled with vulnerable software, please plug it into this network. The clever amongst you might say, I know how to win this game. I'm going to turn everything off. You can't because there is a referee. So this is a networked exercise. The contest organizers are the network in the middle, and I'm going to talk a little bit about what that referee is doing. So the referee is basically a gigantic anonymous remailer that is stripping the sender address off of all the data in the game. So traditionally, for a long time this was a source randomizing NAT. The game changes over time, but the upshot of this is every single connection, you don't know who the sender is. And to top it all off, the referee, the game organizers, are talking to every single piece of software run by every team on the entire network, and they're making sure that it still works. They're connecting to it, so if you have an e-mail server, they're sending e-mails and make sure that the e-mails work correctly. If there's a web server, they're going to traverse the web server and make sure that all the content is still there and it's up and ready. So if you slow the software down, if you damage it, you lose points. If you turn it off, you lose all your points. So keep your data, take the data of others, do it as fast as you can. Don't break any of the software that you're trying to defend. Simple. This is obviously a game of incomplete information. You don't know what flaws your opponents have. You are able to keep your secrets about what you know. It is multi-opponent and because it is a scored game, it is non-zero sum. So sounds hard. Let's continue. How is it that teams play this exercise? Well, if you've got to play a live network defense contest, it's pretty simple. I'm going to sit down with Wireshark and Nessus and Snort IDS maybe and get to work. And at the get-go, none of those tools on a CTF network will do anything. Wireshark won't be able to decode a single packet of data. They'll just say data, data, data. The detectors work at all. And the reason for that is it's running all new protocols. It's all new software, all new protocols. Nessus won't have a single vulnerability signature that works. Snort IDS won't have a single signature that works. You have to do binary reverse engineering the entire time. You have to, you're given just binary code, no source code, no documentation. And the only way to figure out how the software works is to reverse it, to do VR, to program your own IDS, to write your own patches, to write your own vulnerability scanner and do it as fast as you can while your opponents are trying to do the same thing to you. So let's talk about search space. We talked about little numbers like the number of atoms in the universe and the size of a go board. Let's talk about big numbers. It turns out when you want to reason about the number of inputs into arbitrary, unexamined software, we have a pretty good proof that we don't know anything about software in the general case. We don't even know when it's going to halt, let alone how many inputs it has. So the search space of any piece of software is effectively infinite and it gets harder from there because if you want to explore state space of software and it's non-trivial software, you have to learn how to have a conversation with it. So if I am an email client and I'm calling up a brand new email server, I might say hello your sequence number is 50 and I don't know what to do next unless I have reversed out the sequence number mechanism, maybe I just need to add one and say that my sequence number is 51 and it will keep talking, but maybe I need to hash it, maybe I need to add a magic, I don't know unless I've done the protocol research. So to even explore the state space to even know how many positions there are, I have to be able to synthesize logic to be able to talk to the software. So there we are at the end of this thing, how hard is it, the search space is effectively infinite, you have multiple opponents, you're playing a non-zero sum game of incomplete information and you might think about this and say, well if machines can't win it go and they can't win it poker, do machines have a chance of doing this at all? And that's exactly what we're talking about doing, taking the teams away from the CTF table and rolling up a machine. Not a machine, but this one over here which I think we're going to fire up for you guys. There we go. So this year we brought one of these, next year we're bringing 15 racks, but that rack over there is 1300 Xeon cores, 16 terabytes of RAM, the whole computer outlet is going to pull about half a megawatt and we're going to run it as fast as we can get all of that heat out into the Las Vegas summer. But machines, as you can tell, are not enough. This contest is about automation and software, about the systems that are going to try and solve this challenge. So let's talk a little bit about why we believe this is feasible. For the last year we've been running a qualifying round. The results of that qualification are free and openly available and you can download them at repo.civerganchallenge.com. That's everything the machines did in our qualifiers. Every binary they patched themselves. Every proof of vulnerability they built. So, how big of a capture of the flag did we let machines play? Well, for scale, DEF CON capture the flag, which is easily the toughest contest of its kind, teams of up to 80 people have to solve 10 challenges in 48 hours. 10 difficult binary reverse engineering challenges. In our capture of the flag, we had machines do 131 in 24 hours. So it was a machine scale capture of the flag, how did they do? They were able to synthesize a proof of vulnerability in 73% of the software that we released. So, when I say they proved a vulnerability, I don't mean that they scrolled through the binary code and they spit out a line of assembly and said we think there is an integer overflow here. We're not talking about false positives. What they were able to do was they were able to create the input and the logic required to reproducibly crash binary software they had never seen. 73% of it. That means the conversation logic and the input that creates the SEG fault. We also asked them to patch software. Now, obviously we had to put some preconditions on this because it's very easy to patch software start exit. So, what we asked them to do was they had to preserve some original functionality. We had a whole bank of unit tests and we had to make sure that the software was reasonably undamaged for them to get points. They also had to not slow the software down too much. It had to be performant within acceptable limits. So, given those two preconditions, we know today about 590 bugs in that corpus. Maybe you'll show us some more. But we know about 590 bugs and of those bugs we can test for as a field, the machines patched 100%. So, we think we have believable autonomy in this space. We have seven finalists that we've qualified who built these systems. We're going to introduce to you later. But think about the scale of that capture the flag that we're going to try and bring on stage next year. 131 binaries. To run this in a day, again, live, networked, head to head, that is a whole ton of data for us to sportscast to you in a live event. So, we've been working at how we could go a little bit deeper into the game from a sports castings perspective. Now, it would be easy for us to have a race of the bar charts. Team 1 has 500 points and team 2 has 501. But we wanted to actually be able to see in real time structurally what a great patch looked like. What armor that was interfering with the software looked like. What a great crashing input. What a great flag capture looked like. And to do that we had to build some visualization software that could analyze inputs in real time. And we did not decide to bring you screenshots today because this is DEF CON. Instead we brought a live internet connected demo and we've asked you guys to hack our software in real time. So hopefully somebody has. With that, Jordan, you're on. Thankfully I'm shorter, so hopefully this mic works. Alright, so let's go ahead and pull up what we're talking about here first. Like mentioned, we've got some software that needs to be able to look inside other bits of software. I first want to show you the old way of doing it. If you've done reverse engineering before, binary reverse engineering, this will be pretty familiar with you. I'm looking at an S-trace here, a sample syscall log of a program running. And this is actually a fairly high level summary of information. So I'm able to see from here what kernel system calls were made, what arguments were passed in, but I don't see the logic. I don't see why it did it. I don't see comparisons. I don't see why it did certain things. So to do that, I'm going to have to pull up in a debugger. I'm going to have to step through it in GDB and actually look at what it does. So CTF is a command line game and understanding it is command line exercise. This is GDB single step. The state of the art, of course, is IDA Pro is one of the most popular reverse engineering tools. So that gets you a little bit of a graphical interface, but fundamentally, you're still looking at disassembly. You're looking at x86. I don't bother trying to read that. That's just hello world. It's not actually interesting. This one, though, is the challenge that you're running right now. So if you went to that URL that I posted and you've seen the source, hopefully you're working hard at it. I would love to see somebody get a crash on it. We've got some structures up top. It's 300 lines of code. There's clapping for something they saw there. Oh, that's good. So you all know we have a little tradition here at DEF CON to welcome new speakers and some traditions are different than others. But are these guys doing a good job? So we have a couple of patches here for these guys that we want to give to them. And we have one for the competitor over there. Thank you guys. Congratulations. Give another round of applause. Nice. That makes more sense than source code. I was wondering if my source code would be getting applause. So this isn't really all that useful. I'm going to scroll through that the whole time. Let's actually look at what the program does. So I'm going to go ahead and connect to that sample program that's running. This is just a classic sample application much like you'd see in a CTF. It's got two vulnerabilities that we know about. Possibly more. That's the beauty of C. I'm going to go ahead and put my name in here. Let's view some threads. Welcome to our message board. Just a text-based menu response. Oh, let's reply to this one. Your software broke. I might spell it wrong. I'm like most customer support requests. All right. So now let's check out our thread. Oh, good. It's got my message right. So we can add new messages. We can see messages that are on there. Very simple service. Let's go ahead and exit back out now. Now we get to the fun spot. Instead of looking at that in GDB, that actual execution that I ran right there is loaded up live in here. And maybe it's 37. I'm going to look at Rusty. Let's see if this one's mine. So do you want to give us technically what this is, Jordan? So this is pretty looking. But what's it mean, right? First I want to double check. This is me. I'm looking at somebody else's. That's good. This particular view. This particular view we call the tracker. It's a memory trace viewer. If you've used PIN tool, you've used other memory execution tracing programs. That's what we're looking at. So this is software running over time. And time is of course on the X, because that's how it's done. So if we start here at the beginning, we can see the disassembly down here at the bottom left. So the program ran in the dynamic analysis sandbox. The events were recorded. And what you're seeing left to right is execution over time and structure that was created by that data being executed by that software. So the fancy explanation is we've got a relative address instruction trace being mapped into a Hilbert curve to maintain locality. What it means is it's a picture that shows you what the program did. So it's not a static analysis. It's not just showing a program sitting there. We took the program and we feed it input. In this case, the input was what I just did. So when I typed on that sample application, that's what we're looking at right now. When these other traces show up here, these other tracks, these are what you guys in the audience are doing as you interact with the server, which incidentally lets me know no one's managed to crash it yet, because I'll see a little red one. I've got some samples out here. You can see the red. So if somebody manages to figure that out, we'll see it there. We see a lot of other things about the application here, though. So we can step through this track. We can let it run. We can change some layout in my use because I found my name in there, right? So that's actually because that's the receives this call. And I've got transmit. So I can see output from the program. I can see when it allocates memory. Here at the beginning, there's an allocation. I can see when it frees memory. So all these decree system calls, which is the operating system being used in CyberGun Challenge, a thin layer on top of Linux. All of these are being shown here in the GUI. We've actually had a little bit of annotation because we cheated. We had source code in this case. We had source code to just detract the system calls. So I can see this is view a thread. This is a word wrap function. And looking at this, you can actually see structure. Right? You can tell that the piece that outputs data is over here, inputs coming over here in these system calls. This is the region of memory, the piece of code that's actually doing word wrapping. So it looks like it's kind of iterating over. It's reading over and bouncing, doing some comparisons, bouncing them back. And when it gets to the end of it, it comes back out. We've got a little bit of logic. Right? You can make comparisons without even having that. So when Jordan says that locality is preserved, what that means is code that's close together is grouped together in the structure. So a far jump is a far jump. And likewise, a tight loop down here or something that looks really thin means this is a bit of code that was close together in the original instruction, in the original disassembly of the program. So execution over time is cool, but this tool becomes amazing through its ability to compare traces. So we're hoping still that somebody's going to capture our flag and give us a crashing trace. But in the meantime, we can cook up a comparison and show you guys what it looks like when you compare two traces together. Let's go ahead and do another one. This time we're going to send Mike's name, so it's a shorter name. Should we drop a hint? Yeah, we probably should drop a hint, too. Let's go back to that source. I was scrolling through that and maybe C Auditing 101, structure lengths are super interesting to me. There might be four in a subject length of 30, so I might want to try edge conditions around that. Minus one on it, plus one. There might be interesting things that would happen. All right, anyway. So I'm going to go ahead and generate a different use of the program right now. So this time I'm going to post a new thread. I didn't do that before, so it should be different code. My subjects, software is better and my spelling is two. All right, let's go back and look. Oh, that one's not mine. I don't think that's mine. Somebody else looks like they... Let's go take a look at that. We got a name in there? Yeah, so here's mine. It's not interesting, right? That's the different one. That's far less. Let's go look at here. Pull them both up. Color's changed. So now... So we can see right away that the green trace has the red X of doom. Doom. Which means a security harmful crash, right? So same software, different inputs. Red X is a security harmful crash. SIG fault signal 11, generally bad. Well done. So someone was able to crash the software. With a memory protection violation. Anyone like to own up to it? We have your name. I'm not going to highlight with the name that you put in. If you can tell us your name to validate that you are the person. Then this is yours. You will be the first to solve our caption flag. It was you. What was your name, sir? Yeah. Funny thing that. Trust but verify. Sorry. So if you can come up and tell us to the table quietly what you signed in as. You don't have to. If you can also tell us there's a little bit of flag. There's a root password that leaks back too. It does leak a root password. Which we'll probably try not to reveal. So do you want to scroll through what happened in the error case here? We need to go quick. Let's actually look at this. So we've got, in this case, whoever was able to crash it, you can see outputs. We already know that this little bit here was output. We see transmits. We see data. Somebody viewed a message board thread. But in this case, there's a lot of extra data. There's a ton more. In fact, what it actually did was leak out extra data. If your subject line is exactly 30 characters long as an off by one, your null byte overwrites a length. When it goes to read out the data, it sends way too much data. We start reading other hidden messages. One of which includes a root password until it goes off the memory and causes a segmentation fault. So it's a memory overread. It spits it back out just like Heartbleed and it crashes with a memory safety error. Very similar to a crashing Heartbleed. So let's actually look at a patch to that. We actually have Rusty, we want to see what it looks like when it's fixed. So I'm going to go ahead and close these down. So before we were using the same software to compare different inputs, we're going to put it in a different mode now. We're going to use the same input into different software. So the crash generated by the crowd going into patched and unpatched. So you can tell immediately which one is patched. It has the red X of doom. It's got a huge memory leak that scrolls off towards the horizon there. The patched one ends normally with an error code and if we overlay those we should see is the moment that they diverge. This was a little bit hard to see but we can see right in here right after the subject line is right in there's the exit separate code is called out. Let's look at a different one. There's another patch for a different program so not the same program now. Notice though that that was a very tight patch. Whoever was writing that patch knew exactly where the bug was that they were trying to test for. They didn't change the program's major structure. There wasn't a whole bunch of testing. This is software that is not connected to the Internet. This is a completely different program. You can tell that right away. If you just look at the shapes of these clearly it's doing different stuff. This one has a much longer initialization. We have these flat initialization here really small loop and then these broader peaks and valleys but they look similar. It was able to trigger a crash and one that was patched against it. Blue has the red X of doom it's unpatched and green is patched. Let's see what we can learn about this patch. Anybody see a patch? Remember locality is preserved. A far jump is a far jump. The patched version in the very beginning is jumping as far away as it can and then it's calling allocate and it's jumping back. That to me looks like a classic jump hook patch. I've inserted a jump I've jumped far away to new code inserted by the patch author. I'm doing something around to allocate to try and protect the program and I'm jumping back. You would see something like that in a CTF. Never this fast. It would take you quite a while with Ida Pro to determine that was happening. Then here at the end. Nice and easy. Something happened at the beginning. A nice patch to instrument the allocation. Maybe it's tracking where the memory is being legitimately allocated. It has this nice little jump out and it exits cleanly. Although with an error code it indicates something. Is that a stack cookie detection? It looks like a detection a jump away to error handling code in a clean exit. We only have one more surprise for you about this patch and how it was made. This wasn't written by Rusty. This one was written by a computer. This was a binary from the qualification round that we recently finished and this is software patching software with no human intervention. Completely autonomous patch generation. So this software was unknown to the system that generated this patch. It was given no access to source code or documentation. It did this all automatically. It decided to submit this to us as its best case approach and we found that it actually healed the vulnerability and done it without damage and done it without performance slowdown and that's because it looks like exactly what it is an incredibly tight patch around a flaw that it probably knew about. But that's not always what happened. So we have trace number 3. In that case it was a tight patch exactly at the moment of the flaw. So it was able to detect that exactly there instead of crashing it went ahead and cleanly exited. Just from the instruction count of the fonts readable there those are about 200,000 total instructions this third one is about 900,000 total instructions. So right off the bat we know this is a much longer execution. But it's still the same program and that's what I like about the sort of visual approach even though if you were to pull this up in IDA it would look very different. If you were to pull it up and step through it in GDB it would have a lot of different functionality when you look at it here peaks and valleys the same exact kind of peaks the system calls no other interaction until the very end we've got some data being transmitted and received much like the others so it's the same shape it's the same program and in this case it's actually the same program excuse me it's a different program that's doing the same thing it's got the same input and so you get the overall shape that looks the same but it's got a lot more to it and we can see what it's doing kind of hide underneath it like the railroad tracks on this one because again where something is in this sort of XY plane if I click along any of these spots that little blue plane you see there where something is on that plane is where it was in the binary the exact address doesn't matter the point is if it's in the same spot it's in the same spot so these bottom rails this line right here is called constantly throughout the entire execution of this program it was doing the same little check and we can actually hone in if we wanted to look at the disassembly but we don't even really need to because we know that this program was able to successfully defend this particular flaw but instead of that tight little patch exactly at that flaw it's checking everywhere and we know of course it was a longer run it had to have some performance impact but it was able to do that not knowing exactly where the flaw was I like looking at the traces we kind of have been digging through that out by looking at the structures and they almost have like a personality because you see certain approaches that different systems will take and it's like the matrix you know you're looking at the code and you just but here anybody can see it you can look at these and you can tell right away where the changes are and what's happening so in that third trace what you can see is a machine grappling with uncertainty and I find it amazing that Jordan and his team when they look through this thing can actually pull out differences between approaches to say I know which system built that so which system did build this and more importantly who built these systems because we started this whole thing with about a hundred teams registered from around the world and we qualified the top seven scoring teams and they are mostly in the audience with us today so I want to take a moment to appreciate that we said could machines do this stuff in an adversarial format that's never been done before and we have researchers who spend a year of their life saying maybe it's possible and maybe we can do it so I want to talk about the teams you can see the submissions of their systems again as open source you can run it on or capture the flag operating system available at cybergrinchallenge.com and our github but I want to invite those teams to come up right now and join us in front of the stage so if you are a finalist in cybergrinchallenge and you're here today come on up guys and I'm going to try and so when I call you out please step forward and let everybody give you a round of applause our seven finalists in no particular order are joining us from Charlottesville, Virginia Team TechEx a partnership between Gramitech and the University of Virginia led by Dr. David Melsky Team TechEx step forward David Melsky everybody all right from the home of Carnegie Mellon University Pittsburgh, Pennsylvania led by Dr. David Brumley this is a team with deep CTF roots deep program analysis roots for all secure Team Kojitsu from Berkeley, California now this team picture is a little bit deceiving because this is a much bigger team than was able to make it to our site visit they're a very international team a whole bunch of folks calling in from Skype led by Dr. Don Song authors of the Bitblaze Framework amongst other things Team Kojitsu everybody Team Dissect if you play on the CTF circuit you've probably heard these guys name they hail from Athens, Georgia led by Michael Contreras Team Dissect from the University of Idaho at Moscow this is the Center for Secure and Dependable Systems the CSDS team led by Dr. Jim Alves Foss who's with us today say hi Jim, wave to everybody Team Shellfish if you play Capture the Flag you know Shellfish working out of Dr. Giovanni Wigner's lab at the University of California Santa Barbara led by Yann Shoshesh nice I'm sorry Yann, social vice severely alright alright step forward Yann and Team Deep Red led by Dr. Tim Bryant I think we're joined today by Brian Knudsen, his deputy say hi everybody these are our seven finalists and I want to close with a few parting thoughts first why are we doing this why are we trying to teach computers to play this game because anybody who plays Capture the Flag will tell you it's not just a game it is one of the toughest applied reverse engineering competitions on earth and these are the hard skills of computer security the hard skills we're right now machines have no chance so I'm from DARPA our mission is to create and prevent technological surprise that means we try to invent the technologies of the future and it's really easy to be at this conference and understand why we need something that can react to a new threat to a new attack in real time there was time for allow and deny maybe before but we've gotten to cars now so we're trying to build future autonomy and these are the pioneers who signed up to do it I also want to send out a big thank you to the DEF CON conference who worked so hard to be gracious hosts DT, Shirelle, everybody on the team who've let us put this big logistics in position into their move into a new hotel thank you guys very much we're looking forward to working with you next year I want to close with the most important word I've said all day it's not cyber and it's not DARPA it's Thursday this talk is Thursday August 4th, 2016 this contest, this live event machines play capture the flag before DEF CON and that means if you want to come back to this room when we've tripled its size, when we put sportscasters in when we've put in 15 of these racks and when they're going to play capture the flag together you need to show up one day early to make it, it's going to be a free event I hope to see you there and I would like to close with a round of applause for these teams who are going to play this game on that machine with this visualization live in this room next year machines play capture the flag thank you very much and I think we've actually closed this up with time for questions any questions if you have questions come down here I have the mic who has the questions mic okay, that guy this guy, that guy there's one important question looks like this is interesting so what do you guys think about inviting the winning team from this event to play against humans in the DEF CON capture the flag next year I'm going to repeat the question so I believe we have the captain of the highly legitimate business syndicate that runs DEF CON capture the flag with us and he's asked if machines will play humans next year well so I actually can't ask these teams to do that but well, you can so show of hands would you guys do that if they set it up so if you guys will play keep the computers on if you will basically tell everybody that it is a basically a fair and open contest for machines and people good? okay, we've got a deal I think it's probably and for the record, I think it's probably early but if you guys will make the game open to the machine winner entering and our winning team wants to play we'll keep the machines on absolutely awesome more questions? great presentation so just a couple of observations and then two questions so I wanted to start off with it seems like you guys framed a lot of the discussion here in a very domain specific way you talked about checkers, chess, go, poker and then CTF and it also seems like you're throwing a lot of power at this it's safe to say that this is mainly like a brute-forceable type of problem space or is it more kind of like like Demis Hassibis over at DeepMind saying like no, you can create game-playing AIs in a much more naturalistic sort of way that uses heuristics from neuroscience and that type of stuff or is that just science fiction? sorry, a lot of rants there so multifaceted question and I will admit I was having trouble hearing you but I think that the question is basically like how feasible is this and what category of problem does it fit into so I'll give you the best answer I have which is that the world's top program analysis minds are literally standing right in front of me like you should totally ask them we don't know what type of problem space it fits into, we're hoping to find out was that it? that was a cool visualization by the way visualization of the tracing I was just wondering what did you use to trace every instruction and create that did you use an emulator, did you just trace that or how the traces were generated in this case we've got a patch QEMU that we're generating the traces with right now but you could substitute that with anything you could use PIN, you could use any other system that can do execution tracing going to make your visualization software available after the conference after the contest so, so far everything that's been part of the infrastructure of cyber grand challenge from the operating system we built for CTF up to all the challenges are going to be released to the public we're also releasing everything that happens during the final event so you can reproduce the event we have an open source track for almost everything we're building but that software is Jordan's we'll let him answer that question that's actually a joint work by Void Alpha, a gaming company in vector 35 that generates that bits of it we're still working on that but mainly because it's still a very early prototype right, so we have a long way to go before we get to next year but we wouldn't rule it out, that's for sure thanks thank you guys let's give them one more big round of applause thank you