 Okay, hi everybody. I'm Roger Dingeldine, the project leader for the Tor project, and I'm going to tell you a little bit about attacks and vulnerabilities on Tor, past, present, and future. There are sort of three categories to what I'm going to talk to you about today. One of them are solved and solvable problems. A lot of them are code problems or implementation problems. The sort of thing where we look at that and say, hey, wait a minute, that line right there means that Tor doesn't provide any security or anonymity or privacy at all. Wait a minute. So we've got some samples of those to give you some idea of how we've screwed up in the past. We also have some tough ongoing practical issues, things like application level vulnerabilities and all sorts of ways that using real live applications can screw you. And then we've got some research things in terms of the academic community coming up with ways to attack Tor and some pretty esoteric but some pretty effective ways. And then a few thoughts on where we're going to have problems down the road. So how many people here already know how Tor works? Perfect. Okay. So I'm going to give you a very brief crash course on how it works and then we can go into some more details. So it's a free software program. Tor is an anonymity system. It's a program that you install on your computer and it takes care of providing anonymity or privacy to you. So somebody watching you locally can't figure out where you're going and somebody watching the website or IM system you're connecting to can't figure out where you're coming from. So it's open source, no patents. It comes with a full specification and documentation which means that other groups have built their own compatible Tor clients out there and we'll see in a few slides that that's really coming handy. We've got about 1,500 relays all around the world relaying traffic for some number of users. Probably somewhere between 100,000 and 500,000 Tor clients are running right now. As of the end of 2006 we're an official 501C3 U.S. non-profit which means that we've grown a little bit. We've got some funders from a variety of places. There are seven people being paid to work on Tor right now which is pretty cool because in April we had three people. So we'll see how we keep growing at that point. So we also, in addition to those seven people, have maybe 50 or 100 people who spend more than an hour a day just helping out with the community or running relays or answering questions, that sort of thing. We started out funded from the Naval Research Lab and then we were funded by the Electronic Frontier Foundation for a year. So our claim to fame is that we're the only project funded by both the DOD and the EFF for the same free software privacy system. Since then we've been funded by another U.S. government group called the International Broadcasting Bureau. They're the folks who run Voice of America and Radio Free Europe and Cuba and so on. They've got some websites that some folks in the world can't reach and they'd like to fix that and we sure wouldn't mind fixing that. So when I talk to my parents I say that I'm working on privacy systems. When I'm talking to Google and Walmart I say that I'm working on business security or communication security systems. When I'm talking to governments they think anonymity is dumb, they think privacy is irrelevant, but they really do need traffic analysis resistant communication networks. So part of the fun here is trying to explain to each of the different groups what Tor is for them and phrasing it in terms of security properties that they're looking for. And we have another category we've added recently which is blocked users, people in China or Iran or behind various firewalls all around. And if they can't reach the Tor network who cares how good the Tor network's anonymity is? And if they can't actually get to CNN or VOA or other sites then they can't learn or post what they'd like to learn or post on the internet. So there are a lot of people who are not using Tor for the anonymity properties but rather for the reachability properties. They want to be able to get to the websites that they were able to get to last time. So how do you build one of these? The easy answer is you put a relay somewhere and everybody relays through it. Alice one, Alice two, Alice three all show up and say give me this website and the relay fetches them and sends it back. And that's great but what if there's a bad guy anywhere in the system? Maybe there's a compromised relay, somebody's wiretapping it, you bribe the CEO, you threaten the CEO, you put a guy named Guido on an airplane, the list of ways to attack a single point goes on and on. So what we'd like to do is have distributed trust. The idea of Tor is that you relay your connection through more than one relay, through more than one hop and that means that no single hop learns both where you are coming from and where you're going. So if the first guy is bad, if R1 is bad, he knows that Alice is using Tor but he doesn't know what Alice is using Tor for. If R3 is bad, he knows that somebody is talking to Bob but he doesn't know who is talking to Bob. And if R1 and R3 are colluding then we're screwed, we'll get into that down the road. So far so good? Okay. And there's crypto, I won't talk about that much. I guess the key point to remember is that we've got perfect forward secrecy which means that if Alice and R1 established this green key and then later on they both throw it away meaning that they're done with the connection and they throw away the key, it doesn't matter if you've logged what's going on back and forth because you can't break into Alice or R1 later and force them to decrypt. Though as we'll see in a few slides, the Debian random number generator flaw pokes a big hole in that. Okay. And we also have a directory system that lets all the users know about all the different servers out there. The way it works basically is each server has its own public private key pair and it builds a little thing called a server descriptor that says this is my address and exit policies and so on. And it signs that and sends it to the directory authorities. Right now there are six of them. And they compute a consensus that says here are all the servers in the network. If you want to use Tor, get this list first and then build your path through them. Okay. So that was Tor in a nutshell. And there are lots of other things we could talk about. Feel free to ask me later on if you've got questions but I'm going to plow forward. So the first fun example, I'm going to start from the distant past and then work my way up to last week or maybe next week depending on which bugs we get to. So once upon a time open SSL did not ship with AES. And even after it added AES, it didn't ship with counter mode AES. So we added it around. We took the stock AES implementation. It didn't have counter mode so we added it. Turns out one ampersand and two ampersands are very different when you're doing bitwise operations. And it was not until we had a second implementation. There were some nice folks writing a Java client at Dresden University and they showed up and said so I've been writing my compatible client and it works great except after the first cell, all the encryption is different. What am I doing wrong? And it turns out they were using AES counter mode correctly. And we were using something a lot closer to 16 bit AES counter mode. And for the crypto people in the audience, 16 bits is really not enough to get the security you're looking for. So a second implementation, you hear from all the IETF people that you have to have two implementations before they'll believe something. Turns out that's a really good idea. Another fun one. I was talking to people about, hey, let's work on fuzzing Tor. Let's try to figure out what sort of inputs can mess you up. And then I was trying to give them an example of a situation that you might want to look at in the code to say, hey, maybe this is something that we could fuzz. And then I looked at it and I thought, you know, once upon a time, Tor's cells were 256 bytes, which means that you could specify the length in one byte. And then we moved it to 512 bytes because we wanted to be more efficient with using the network. And then we needed two bytes to hold that length, but we never checked to see if the specified length of the cell was less than 512 bytes or was 50,000 bytes or you could specify the cell length as much as you'd like. And that's, you know, it's a bug, OK, except for the fact that the exit relays were writing the cells out to the network at whatever length you specify. So the cell shows up and says, hi, I've got a great payload, it's 45,000 bytes. And the exit relay says, OK, I'll write 45,000 bytes of memory starting here onto the network. And who knows what's in that memory, but it got written out. So that was kind of bad. We noticed that one and solved it pretty quietly, I think. So there are sort of two categories of bugs here. One of them is the ones where we look at the code and we say, holy cow, oh my god, we did what? This has been around for how long? OK, let's fix it. And then after we fix it and people have upgraded, then we'll tell people about the problem and make sure that they upgrade. And the other category are the fun ones where some guy shows up on the IRC channel or the mailing list and says, hey, I think I found a problem. And then we have to fix it rather quickly and make sure that we fix it and make sure that people upgrade. So another example, I was at What the Hack back in 2005. And I was telling people, hey, if you find any problems, we'd love to hear from you. I heard from a few who were explaining, hey, my job doesn't let me report vulnerabilities to anybody except for my job. And I responded, dude, I wrote an anonymity system. Maybe you should use it. So the next week, after I came back from What the Hack, some guy showed up on IRC, pasted a GPG encrypted blob and said, get this to Roger and left. And it was a great bug. It turns out that OpenSSL doesn't check the keys as much as we thought it would. So if somebody does a man in the middle attack, for example, they are your first relay, and they accept the connection, and then they pass it on, but they change the keys so that they can read everything going back and forth. And in fact, they don't have to just read everything. They can pretend to be the whole rest of the network. They can imitate anything they want to. You're basically talking in plain text. And we had no idea this bug was around. OpenSSL has fixed it since then, so now when it's doing the Diffie-Hellman operations, it can check. But that was an exciting week where we were trying to figure out exactly what sort of attackers would be able to attack the users. And you'll notice a little trend here. This one was August 8. You'll notice a trend of a lot of these bugs being fixed in the first or second week of August. We'll get back to that one in a little bit. So another fun attack that some researchers found, Paul Syverson and Lasse Viglie, they were working on how do I attack a hidden service? How do I figure out the location of a Tor hidden service? And they figured it out that if you run a couple of servers and they're fast, and you keep making the hidden service make more circuits, make more connections, then eventually it's going to pick your node as the first hop. And then you win, because you look at it and you say, hey, you're not a Tor server. I'm pretty sure you're the hidden service that I have been inducing to make connections. So what this means is if you run two or three fast Tor servers, and you just wait, and you're patient, and every user keeps building a new path through the network, after a while the user is going to pick your node for the first hop and your node for the last hop. And then she loses. We'll get into how the actual traffic correlation stuff works later on. So the defense against this is what we call entry guards. The idea is every client picks three or four entry nodes, three or four Tor servers, as her first hop always. And she sticks with those three or four until they disappear. So the previous situation was, as Alice uses Tor more and more, the probability that eventually she is screwed goes up to one. Whereas once we're using entry guards, she's either screwed at the beginning, meaning she picks a bad entry guard. Or she's never screwed, because she never uses. She never keeps rotating until she gets there. So that was a fun patch. And it was several years ago, but there was a black hat talk at DC detailing how the attack worked and how to do the attack. And it turned out it was pretty efficient. You could locate a hidden service in a few minutes. And there was another group at MIT a few years ago who did the same attack, not just on hidden services, but on actually finding the location of Tor users going to their website. Then another exciting one. You'll notice also it's the next year, August, beginning of August. We noticed that clients in Tor are pretty much the same code base as servers. So it's just a configuration option. Do I want to be a client? Do I want to be a server? So the way that building circuits works is the user shows up to the first hop and says, here's a create cell. I want to do a handshake with you. And then after that, she shows up and says, here's an extend cell. Please make that circuit another hop. And she sends a few more extend cells. And then she has a circuit. Turns out clients never checked if they got sent a create cell or extend cells. So we used to say, you could be a client. You don't have to relay traffic. Or you can click this button, and you'll be a server. And you can relay traffic. But actually, clients would relay traffic too if anybody bothered to ask them to. As far as I know, nobody bothered to build circuits through them. You couldn't make them exit nodes. They would trigger and assert if you tried to exit from them. But you could build a circuit through them. And I don't think anybody actually tried this. We fixed it before that. But it's another example of a bug where if you make really general designs and you have the same software that can be used for client or server or hidden service or directory authority, then you might run into problems. Another fun attack. This was from a group of researchers at Colorado University. So the way clients choose their paths is they look at a couple of different flags that are provided in the directory. We say these servers are stable. You should use them for instant messaging or IRC or something like that. And the way we assign stable flags is we line up all the servers by uptime. And if you've got the median uptime or better, you get the stable flag. Otherwise, you don't. And the reason we did this is so that we could be dynamic. So if all the servers have low uptime, then we can still say, well, these half are better than that half. Or if all the servers have really good uptime, then we can still say, you want to use this half because it's even more stable. The problem with this is what if you get yourself a botnet or something like that, and you sign up 2,000 servers and they all claim to have 10 years of uptime? You've just claimed all of the stable flags. You just bumped the stable flags off of everybody else. OK, so that's stable. Who cares? That's just for IRC. You actually, the same thing, worked for the guard flag. So the authority, the directory authority, specify these are the servers that you should use as a guard. These are the servers that you should use as the first hop in your path. And if the bad guy shows up and says, here are 2,000 servers that are better to use as the first hop of your path, and he bumps out all the real ones, then he is guaranteed to be chosen for the first hop of the path. That turned out to be a bad move. So the solution for that is now we say, if you have a certain amount of uptime and a certain amount of bandwidth, we're going to give you the guard flag anyway. And that means that the attacker can't show up and give you 2,000 servers and just bump everybody. So far so good? And then a fun one that Kyle found last year. Notice also the beginning of August, 2007, was the control port issue. So Tor has a control port. It listens on local hosts so that we can have other applications. Hopefully people here who use Tor have seen Videlia, which is the cute little GUI we've got that lets you configure your Tor and figure out how you're doing and where your circuits are and so on. And Tor is written in C. It's cross-platform. You can put it anywhere. We wanted the controller to be a separate program written in whatever language you want to write it. So we listen on local hosts, and we only bind to local hosts, which means we're safe, right? Nobody should be able to connect to local hosts except for trusted applications. Turns out the users have these things called web browsers. And web browsers can be induced to do all sorts of crazy things. So the easy version of this is you have a Java plug-in, and you hand them the Java applet. And the Java applet just connects to local hosts and talks to whatever it wants to talk. In this case, you can connect to the control port, and you can change the configuration. You can look up who Alice is. You can do all sorts of crazy things. But in fact, you don't need to use a Java plug-in for this. You can just use a form, a simple HTTP form. You give the user a little post thing. And when she clicks post, then it connects not to the website she thinks she's talking to, but to local hosts, port 9051. And then you can have whatever conversation you want with it. You can dump commands there. So it turns out that Firefox has a list of ports that it won't let you connect to locally, like port 25. Because if you're running a mail server and a browser, you don't really want the attacking website to be able to force you to send mail or receive mail. And there are a couple of other ports that it refuses by default. Last I checked, Safari doesn't have any list like this. So if you are running something open on port 25 and you're using Safari, a website that tries to attack you can induce you to send mail. I hopefully Safari will fix that at some point. If you know any Safari people, have them come talk to me. So this is still a little bit tricky, because how do we do the authentication now? We had authentication. We had password authentication, cookie authentication, if file systems can be trusted. But we didn't use it because it was a hassle. And now we've set it up so that we do use it. There's a password that Videlia knows and Tor knows. But if you want to set up your Tor as an NT service, as something that boots before anything else boots, how do you write down your password? Because it turns out there are other fun attacks where you can induce the browser to go look up any file on the local file system. So if we store the file, if we store the password anywhere on the local file system, then a bad guy can trick our browser into going and looking it up, and then we lose. So that's still an ongoing hassle that we don't have a good usable handle for. We'll get to that in a few slides. So another one that Kyle also found, and we have pretty much solved by now. So the default exit policy in Tor. Every node has an exit policy that specifies these are the addresses and ports you're allowed to connect to. The default exit policy says nobody can connect to internal networks. If anybody saw Dan Kaminski's talk last year where he tricked a web browser into crawling the entire corporate intranet, you could do that with the Tor server also. Except if you ask for 192.168 or a local host or 10.1.1.1, the exit relay will say, no, I'm sorry, I don't want to let you connect to that sort of thing. But the problem is you could still connect to the IP address of the relay from the relay. And typically that was some links that had the password default, and that means that you could break into the relay's links as router and then change its DNS and start your phishing attacks and redirect them and do whatever the heck you wanted to do. So the answer is now the default exit policy refuses connections to the local computer, both by its private IP addresses and by its public IP addresses. And that's a hassle because we really liked the fact that you could exit from a relay to itself. You could run, like Indy Media has been running a Tor exit relay that only exits to themselves. And that means you get end-to-end encryption, you get end-to-end authentication. It's all for free, except that contradicts this attack. And we figured that by default we should not screw the dumb users who don't realize that their links has the default password. So trade-offs again. And then the Debian random number generator flaw. So raise your hand if you don't know what this attack is, what this problem is. I see roughly, I see a few hands. OK. So it turns out that some nice guy on the Debian security team was looking at Valgrind output of OpenSSL and there were a few lines that confused him. It was something about this has unpredictable input or something. I don't like unpredictable input. Let's remove those lines from OpenSSL. And it turns out that unpredictable input, if you're generating random numbers, is really helpful. So for years Debian shipped an OpenSSL that had, I don't know, was it 12 bits of entropy or 16 bits? It wasn't anywhere near what we would have wanted. So that means that anybody who generated an RSA key, anybody who logged in with SSH, anybody who generated an SSL certificate, generated one that I can break. I can break it with a napkin at this point. And that means that we had 300 tour servers running Debian or Ubuntu or something based on this OpenSSL library. And that means that suddenly, one day, 300 of the 1500 tour relays were running with weak keys. Now fortunately, we were tipped off to this problem a little bit before the official announcement went out. So we had a release that came out 30 minutes after the official security advisor came out. But it was still a little bit tricky. The reason why it's so bad is because if anybody decided to log the traffic that they saw going through the tour network and a client happened to pick three bad nodes, all three bad nodes for her path, then you can just decrypt it later on. If I've got logs from somebody using Tora from two years ago, I can read them all right now. I can know what Alice did, what she received, what she sent, that sucks. Now you have to have all three bad nodes. If you had one good node, then the crypto stays and you can't decrypt it. So we're doing a lot better than a lot of single hop proxies out there that were just totally broken for years. And the other issue was the new version three directory authorities. We had just generated a fine new set of identity keys, and we were shifting over to the new directory design where we would build consensus directories better. And so the way it worked was each client would pull down the consensus network status and look for signatures on it. And if she saw four out of six signatures, then she'd say, oh good, this one signed by enough servers. I'll trust it. Three of the six keys were generated on Debian, which means that anybody who wanted to could produce signatures for three out of the six directory authorities. Now that's not four, so we're fine, right? That's probably not a good assumption. There are certainly some attacks you can do if you can almost get the majority. So that was kind of bad. It wasn't awful because this was still the development tree, and hopefully if you're using the development tree, then you're expecting things like this, especially based on the last 10 slides I had. So it wasn't as bad as it could be, but it was still pretty bad. OK, another exciting attack that we're still tackling turns out that clients can just keep on extending their circuit as long as they want to. And the reason for this is that servers can't figure out where they are in the past. So they can't say, hey, wait a minute. I'm hop number 20. What the heck are you doing? I'm not going to let you extend more. All they see is somebody connecting and asking to extend. They don't know if they're the first hop, or the 20th hop, or the 50th hop, or what. So first of all, it's a DOS opportunity. I'm one client. I can build a path that has 4,000 or whatever hops if I can actually get one to extend that far and work. And then I send one cell, and it produces a cell on every single relay in the path. So if I'm some dude with 20 kilobytes of bandwidth, I can knock over a Tor server at Harvard, and MIT, and BU, and so on, because I just keep bouncing back and forth them, and they are forced to use up that bandwidth. So it's a DOS multiplier. OK, I'm fine with that. Turns out it's also an anonymity attack. There are a couple of ways of turning it into an anonymity attack. One of them is if you run, say, 5% of the nodes, and then you beat the crap out of the other 95% of the nodes, so they disappear, now you look like the part of the Tor network that's working. So OK, that's one attack. There's an even more subtle attack that you can do with that Christian Grothoff and Nate Evans are going to talk about in this room in about four or five hours. So there's a plug for them. And we're on the way to fixing it. We've made it so that you can't make a path of more than about eight or 10 hops. And the way we do that is a little bit complex. I'll talk about it afterwards. So we've done all of the steps of fixing the bug, except for enabling servers to actually refuse people who are doing the old way. So that means you can still do the attack right now until everybody's upgraded. And once people have upgraded, then we'll flip the switch and we'll start enforcing the path limits. OK, so those were some solved or solvable problems. How about some practical problems? I assume a lot of people here heard the slash dot articles or CNN or Wired or whatever it was from the Swedish guy last year who, if I use the phrase embassy password thing, then I'll see a lot of people nodding. So there was some dude in Sweden. And he ran a tour exit relay. And he watched all the traffic that came out. And then he started saying, hey, wow, I'm seeing some great things. I'm going to go to the press and produce lots of news articles. So it's a little bit tricky here. First of all, he was breaking Swedish law. He was wiretapping. He's since been raided. And I talked to somebody a few days ago who said, oh yeah, nobody's heard from him since he got raided. So if you've heard from him, I'd love to hear from you, because I'm a bit curious what happened to this fellow. So the first thought is, if you're planning on breaking your law, talk to a lawyer first. There are plenty of EFF people who are hanging around here who would be happy to advise you on exactly how vague and obsolete and yet still enforced the wiretap laws are. The other thing to keep in mind, tour hides your location. It provides anonymity. It doesn't magically encrypt all the traffic on the internet. So if you remember that slide, a few slides back where we had the encryption lined up, the last step was in plaintext. So if I'm talking to CNN and I'm just using HTTP, that means that whatever comes out of the exit relay, going to CNN, anybody in between there gets to read it or change it. It's the same sort of thing you've got if you walk into Starbucks and you don't use SSL. So if you do use SSL, then it works fine over tour. So I guess the key point is if you want end-to-end encryption or end-to-end authentication, get it. Tour does not provide it by default. And it does protect against a local attacker. If there's somebody watching your local network, tour will use encryption inside the circuit. It's only once it pops out of the exit relay that it is however you sent it. So it's a little bit confusing. What are some things we can do here? One of the tricky things is a lot of people view HTTPS as a premium feature. So I was reading a Wired article last year where they were slamming the tour users for saying, the tour users are so stupid. They don't even use SSL. They don't even click the button to encrypt their traffic. So I went to the Wired.com site. And they've got a little login thing. So I click it. I turn it into SSL. They don't use SSL. They don't even support SSL. So what's going on here? There are a lot of groups out there who say encryption should be used on the internet, but nobody actually supports it. So in Wired's case, they use Akamai. And I guess Akamai charges them an extra $100,000 or whatever to provide Akamai as SSL. So they said, oh, heck, that's not worth it. We're not going to do that. There are a couple things we could handle, though. We could make tours say, hey, you're making an outgoing connection to port 109. Probably you're about to send a password and user name for pop. So I'm going to warn you about that, or maybe I'm just going to block the connection entirely. That might work well for pop or IMAP. What do we do for port 80? Firefox has a little pop-up that says, hey, you're sending stuff unencrypted. Somebody might be able to see it. Every one of us has learned to click the stupid button, we know, except the internet is not encrypted. So what are you going to do? Another answer is maybe we should use TorFlow. Mike Perry is somewhere around here, and he's been working on a tool that you can check from an exit relay what it thinks the web page looks like. And if it thinks the web page has a whole bunch of backdoors and JavaScript and other crap in it, then maybe you should be suspicious about what that exit relay is doing. Should we educate users? That trick never works. It even gets more complex than this, because there are, so for example, when you log into Gmail, you use SSL for the authentication step, but it switches you back to plain text after that, which means that anybody who's watching you from Starbucks or Tor or wherever, unless you force Gmail to stay using SSL, they get to read all of your traffic going back and forth, all your mail. They get to see your Gmail cookie. They get to become you. They get to log in. They get to do everything they want to. And it's worse than that, because Gmail's cookies, they could say this cookie is only allowed to be used for SSL connections, but they don't. They let you use it for SSL or non-SSL because they want to be usable. They want to make sure that none of those users out there get confused and can't read their Gmail. So this is certainly something that we need to talk more about. Mike Perry has a talk, I think, tomorrow slamming them and putting out an exploit that automatically attacks all of this. So we'll see how that one goes. And it's a bit tricky. Some research on exit traffic properties is legitimate and useful. Sometimes we really do want to know how many users there are, what sort of protocols they talk. On the other hand, it's illegal in a lot of jurisdictions. How do we get the right balance? So the next issue, who actually runs the relays? How do we know that they're safe? Once upon a time, you had to send me mail saying, hey, Roger, I'm that guy you met at that conference, and you convinced me to set up a tour server. Here it is. That scaled even less well than I thought it would. So what we've done in the meantime is we've set it up. There are two steps to checking. One of them is, is this server actually working? Is it reachable? Does it pass traffic? And the other step is, is this a bad guy? Is this somebody that I shouldn't trust? And we've automated the first step. But the second step is still a big mess. How do you know when some server shows up from Korea, whether it's a nice guy who really wants to help relay traffic, or it's some dude who just read the Swedish embassy thing and wants to be a copycat? So there's still a tension. I mean, if tourist security comes from having a really diverse set of servers all around the world, because then not very many people can attack it, we really do want these people I've never heard of to be running servers. We've got servers in China. We've got servers in Venezuela. We've got servers all over the place. If we only allowed servers that I've heard of, or are places that I know, we'd have a small tour network, and we wouldn't have much security. So it gets more complex than that, even. There are a lot of volunteer Windows exit nodes, and they run the latest and greatest antivirus thing. And that transparently intercepts HTTP going across their network, and it fixes it. It removes JavaScript. If you don't want JavaScript, it replaces websites with, I'm sorry, this might be a phishing site. Don't see it. And that is kind of a hassle for the folks at CMU who are doing phishing research over Tor, but it's kind of good for the users who don't want to get phished. So there's certainly a trade off there. What if your exit relay is in China, and you're trying to read BBC? Doesn't work. So as we get more fast servers in China, it becomes more and more apparent that the internet is not flat. It looks different from different places. You can't even reach certain sites from certain things. Google will give you different answers, depending on whether you're popping out of China or the US or Germany. And they will censor different things in the US, then in Germany, then in France, then in China. And it gets worse than that. We had a Tor server in China a little while ago that was doing SSL man-in-the-middle attacks on everybody popping out of it. If you try to go to PayPal or eBay, then it would scrub off the old certificate, the real one. And it would put in its own self-signed certificate saying, yep, you're really connecting to PayPal. Can I watch all your traffic, please? And some users would notice the little pop-up that Firefox would say. They would say, this is not the real certificate, click here. And so they would click here, and that was that. But a lot of people wouldn't even care about that sort of thing. So we noticed that it wasn't, we originally thought, oh man, there's a guy in China who's attacking Tor. He's trying to do SSL man-in-the-middles. It turns out it was his ISP. The ISP in China was doing SSL man-in-the-middle attacks on all of its customers in that area of China. And the Tor user in Idaho was a collateral damage of this policy that China has for its citizens. So that was something interesting to learn. And then even worse, I've been hearing rumors that, so you know in Firefox, you've got like 200 different certificate authorities that you're willing to trust. And plenty of them went out of business in 2000, but they're still in the list of CAs that you trust. I've been hearing rumors that China is looking to buy one of these keys. Once they do, they're going to be able to do SSL man-in-the- middle attacks, and your Firefox won't pop up anything. It'll say, oh good, it's signed by a real person I trust. That really is PayPal. What happens if some guy who runs an exit note gets a hold of one of these official CAs? Now he can spoof the whole SSL internet? Turns out that SSL might not be as end-to-end secure as we hope. A couple of years ago, there were 10 or 15 tour relays that showed up, all from 149.9.something.something. And they were fast. And they didn't have any contact information. And nobody told me that, hey, I'm the person running these 15 fast servers. So we went to who is, and we looked them up. And they were registered under PSINet near Washington, DC, which some people freaked out about because there are plenty of conspiracy theories to go around. So we contacted PSINet and said, hey, these IP addresses, can you tell us who owns them? And they said, whoa, those are ours? Wow, those aren't in any list that we've got, thanks. So that was kind of weird. And we eventually did track down who's running them. And they're a reasonably well-known security organization. Probably some of them are in the audience right now. And they said, oopsie, I'm really sorry. I probably should have told you, can you not publicly embarrass us? We'll try harder in the future. So these nodes are still running. We know who they are. But what do you do when 15 fast servers show up, and they all want to help out? So the first trick is that we have a new pathbuilding algorithm. If you have a bunch of servers from the same slash 16, you only use one of them in your circuit. And that means that you won't end up saying, I'm going to route through this unregistered block in PSI Net to this unregistered block. And then at my third hop, I'm going to use this unregistered block also. And I'll have great anonymity. That turns out not to be what people should do. And it gets a little bit trickier. The Colorado researchers said, hey, why don't you make it hard to run your 2,000 or 3,000 servers all on the same IP address? And that was a good point. So now you can run only two relays per IP address, which means if you want to sign up 1,500 separate relays, then you need to come up with 750 IP addresses, ideally on different slash 16s if you want to attack us as well as you can. But that's not the whole story. What about separate ASs? I wrote a paper with Nick Feemster a few years ago. And we'll get into that in a few slides. But the basic idea was, generally when we're talking about anonymity systems, we make gestures like this. We say hopping around the internet. But if you look at the actual tier 1 ISPs that people go through, like AT&T and level 3 and stuff, the real gesture should be like this. Hopping around the internet on level 3, back and forth over level 3. And that means that certain attackers who are able to watch large tier 1 ISPs like AT&T, I think we've heard of some three letter agencies who get to snoop AT&T lately, they are in a pretty good position to break tour. What about I-exits? Internet exchanges in Amsterdam and London and so on. These are great centralized places to be able to watch the traffic. So even if there are different IP addresses involved in the circuit, if all the traffic keeps going over the same actual physical pipe, then we're potentially in bad shape. OK. So let's switch from design issues with how to pick your path to application level issues. Let me give you an example. If you are using Tor with Tor button and you go to a website and you're using Tor and you're all safe, they don't know who you are, but you're running JavaScript, then they hand you a little JavaScript timer and it says every 60 seconds, please refresh. So there you are with your browser. Maybe you go get a cup of coffee or whatever. Every 60 seconds, you load the page again. You're still using Tor, so you're fine. And then you say, OK, screw this, this is too slow. You click the button, you disable Tor, now you're not using Tor anymore. 60 seconds later, your browser automatically refreshes the page again. Now you go directly to that website that was attacking you and you say, hey, I'm the guy who's been refreshing every 60 seconds. This is my real IP address. That's bad. And there are a lot of other versions of that attack. There are cookies. There are ways of looking up the history. You can say, well, I don't know who this anonymous person is because they're using Tor, but they visited the following 71 interesting websites over the past day. That's not the sort of thing you want to reveal. Browser window size, it turns out that at least IE and probably Firefox also, I think it's just IE actually, they specify when you connect to a website, these are the exact number of pixels by number of pixels that our browser is. And that's a pretty good tag if you want to recognize the client later. And then user agent, do you have the brand new daily build of Firefox? Does anybody else have that brand new daily build? That's a good way to track you. Are you the guy who says, I really prefer Japanese, but German and Portuguese is OK, too? There aren't all that many Tor users who broadcast that when they make web requests in their HTTP headers. And then HTTP auth information. If I log into a forum and I provide some credentials and now I've got auth in my browser and I turn Tor off and I go there again and knows who I am. So a lot of these problems are application level issues. When you switch from using Tor to not using Tor or back, if you just want to lock down the browser and you're going to say, I'm always going to use Tor, then it gets a lot simpler. But there are a lot of people out there who want to use Tor sometimes and then they want to stop if they want to do other things. So Mike Perry has a new version of Tor button that he put out last week or so. Notice the proximity to the first week in August there again. And it tackles a lot of these problems, but not all of them. So there's certainly a lot more we need to work on there. And there are some Firefox privacy bugs that are still around. One of them is there's no way to configure or spoof time zones. So that means that if you're running in Windows and you say, hey, I'm in Pacific Time and the website you go to says, hey, here's some little JavaScript. Can you tell me your date and you'll tell the date and you'll specify what time zone you're in. And now you've narrowed down where in the world you might be. There are a lot of other tricky things. For example, if you've got client-side SSL certificates, not that anybody does, I think, because nobody uses that feature, then your browser will happily present the SSL certificates to whichever website asks for them. Maybe you have to spoof the domain so you'll be willing to give them, but that's pretty easy. So there are some bugs and I'd be happy to chat about those later on. I imagine Mike Perry will also use the Tor button developer. So there's some more application level woes actually. There are some applications out there that are really bad at obeying their proxy settings. Pretty much all of the plugins out there can be told to ignore those proxy things. So if you set up your Firefox to make sure to proxy through Tor and then you also get to run Flash and Java and so on, then the website gives you a Flash applet and the Flash applet says, screw this proxy stuff. I don't want to do that. I just want to make a direct connection. Then you're in bad shape. And H.D. Moore put out an attack like this a few years ago. And Kyle's had an unending stream of attacks like this. And I was talking to somebody yesterday and they said, well, yeah, but can you just close all your applications and then toggle Tor? Maybe. So one of the challenges here is that Windows and IE are built into each other so much that you can't. Basically, if you want to switch from using Tor to not using Tor, reboot. That's the only real answer. And we'll talk about that in a bit more. So the challenge here is how do we switch from Tor as a SOX proxy? You point your applications to use it into Tor as a transparent proxy. It intercepts all the outgoing connections and it takes care of your privacy for you. So this is easy to do in pretty much every OS except Windows. There are simple system calls. There are simple IP tables or PF rules. You basically say, hey, everything that goes out under this user or not under this user, can you please redirect it over here and I'll do a kernel call to say that connection you just gave me, where did you mean it to go really? So that works except in Windows. And Windows is a big hassle to do a kernel shim or intercept stuff and so on. So we've got two models that we're heading towards right now. One of them is you get a Linux kernel and you run it inside QMU. And you run a Tor client there also. And it does the IP table so it transparently redirects stuff. And you do some sort of interception magic to try to get all the Windows connections to go through it. And at that point, you're great. You've got a Tor client. It is your network from Windows' perspective. And all the stuff that goes out from whatever applications you're running, it just goes into the Tor client and Tor either sends it through Tor or drops it if it can't handle it. So that's one approach. And that's certainly a promising approach. We haven't found any yet that are well documented as well as usable, as well as explored in terms of specification and what the actual security properties are. But I think we're going to get there in hopefully not too long. Another approach is there's a group called incognito. They're working on a live CD. And now they have an option for, I want to boot my live CD inside Windows, inside QMU. So basically, there you are in Windows and you click, click. And now an entire Linux OS pops up with a Tor client set up, with IP table set up correctly, with all the applications you might want configured correctly. And then you just use the Firefox inside your live CD running inside Windows. And then you're all set. You don't have any of those issues with applications being able to get around the proxies. So I'm not sure which of these strategies will be better. Certainly running the entire OS inside Windows is kind of a hassle. Another approach, the ideal approach would be to pop up a bunch of different VMs. So you've got your Windows running in this VM and your Tor running in this VM. And this one takes care of your network, but you can shut this one down and start a new one if you want to. So there are a lot of ways we can play with running a whole lot of different VMs. I worry that if we try to get these working well on the Windows 98 computers in internet cafes and around, it'll be a real challenge. So this is certainly something we need to keep on tackling. Another issue we need to work on. Filtering connections to Tor. I had a talk here at Defcon last year on that. There are a lot of different ways that people can attack connections to the Tor network. For example, I can look at the six directory authorities and block those IP addresses. I can pull down the whole list and block all 1500 IP addresses. I can look at the actual SSL connections and say, hey, you're talking Tor, I'm gonna block that. Or I'm gonna block the website. China, a few weeks ago, finally blocked Torproject.org. So we're trying to tackle that. The approach that we're working on now is bridges and go see last year's talk for that. The basic idea is you have a lot of users who can relay traffic from the blocked user through the Tor client into the Tor network. Okay, so a few last thoughts on research. Traffic confirmation. This is the big attack against Tor. If you can see the connection into Tor and you can see the connection out of Tor, then you win. Simple math means that you look at the patterns and you say, hey, wait a minute, these are the same flows. I'm pretty sure this Alice is talking to this Bob. And there are some research designs out there are defensive dropping, adaptive padding. They kind of sort of work, but not really, because the attacks are producing even better and better versions of the simple math. For example, Steven Murdock had a paper a little while ago where he said, internet exchanges get to see a lot of traffic, but they can't log all of it because they see too much. So they sample. I'm gonna do a traffic confirmation attack when I only get to see one out of every thousand packets. And he still produced a good attack that worked quickly. You didn't have to have a very large flow. So that's really scary. Website fingerprinting is another one. So it's been well known for a while that if you look at an SSL connection, you can guess what's inside based on how long the SSL connection is. You can say, hey, you've just fetched an eight kilobyte thing. I'm pretty sure the eight kilobyte thing comes from this website. So does the attack work on Tor? Can you look at the traffic coming out of a Tor client and say, well, she fetched an eight kilobyte thing. So it wasn't CNN's front page, but it might've been BBC's front page. I don't know, open research question. Some people believe that they've gotten it to work. Some people haven't. I haven't seen anything convincing either way. But it can probably be made to work and that's kind of bad. It can be made to work even faster if you not just look at one website, but you say, Alice pulled down an eight kilobyte thing and then she pulled down a 32 kilobyte thing. How many websites have eight kilobytes on their front page and 32 kilobytes on their top news story? Not so many. So I bet that will be a good attack and we don't have good defenses against it. Clogging and congestion attacks. So once upon a time, Stephen Murdock and George Denisys had an attack in 2005 that said, we're gonna try to identify every relay that Alice is using. So they built a circuit through all the relays in the Tor network and send constant rate traffic through them. And when Alice connected to their web server, they dumped as large a stream as they could on her as large a flow as they could and looked for which three relays had a hiccup. And then they said, aha, these three got affected, got congested, got clogged when Alice started using the network. So these are the three that she's using. But that was okay because we didn't actually identify the location of Alice. We just identified the three servers that she's using. You'll hear more about that attack this afternoon from Christian and Nate. So Nick Hopper actually has a cool attack from last year where he looks at the latency of these things. If you can guess what the three servers are, build your own path through those three servers. You know the latency from your Tor client to the web server. You know the latency from your Tor client to the entry guard. Now you can subtract and now you know the latency from the entry guard to your victim. Now are there latency tables out there? Can I say, given that the latency from the entry node to my victim is 87 milliseconds, I know what IP address she's got, what network she's on. I don't know, maybe, that's kind of worrying. And there's actually a group at Columbia who has a newer version of this where they look at the bandwidth. So they've got a way of remotely measuring bandwidth and they can say, here I'm gonna look for little hiccups. So I'm not just gonna look at the relays for hiccups. I'm gonna actually step back one step at a time on the actual routers and try to trace the flow back to Alice. Does that work? Maybe, I don't know. They've got like six data points. Some of them work. We're working with them to try to get some more data points. Okay, so I'm going to talk about a few issues that we don't have a good handle on yet. Traffic correlation is just gonna get better. If you can see the flows coming into Tor and going out of Tor, there are so many different ways to do the math. And so for example, let's imagine some very innocent ISP operator puts up an MRTG graph of how much bandwidth they've pushed over time. What happens if the entry node puts up an MRTG graph and the exit node puts up a graph? And some smart researcher says, hey, I can do math on this one and that one and identify who's been talking when. That'd be bad. Somebody else set up a smoke ping data. Smoke ping is this thing that pings a whole lot of servers and puts up cool graphs and so on. If you can look at the latency, are you doing a congestion detection attack all the time against all of these servers? I have no idea how well any of this will work, but it scares me. I'd be happy to chat with you about it later on. Countries are gonna keep blocking the Tor network even more and more. There are a lot of press articles out even this week from The Guardian and stuff like that about China and the Olympics and how China keeps saying, well, we're totally open. We've opened up our whole network except for all those bad things that nobody would wanna look at. And there are a lot of journalists who are kind of upset about that. We'll see how that goes. But once they start blocking the Tor network, then we have to deploy our bridges design and that's gonna be an arms race and there have been some bugs that we fixed already in terms of having an attacker easily be able to enumerate all the servers out there and there are doubtless gonna be more bugs like that that we have to fix. And then another issue that we really need to pay a lot of attention to is data retention. In Germany right now and all across the EU, they're working on laws that say you have to write down all the traffic headers you see, we're not gonna tell you what a traffic header is for six months or two years or something. And Germany goes several steps farther. So how many layers of logging are there? Does the Tor server have to log? Does the ISP have to log? Does the tier one ISP have to log? We hear every week about a new company in California that just accidentally lost its whole database about customers. Are these guys gonna be able to keep these huge traffic databases safe? Probably not, especially if they need to provide real time access to law enforcement. Whatever real time means, whatever access means and whatever law enforcement means. So this is gonna be a little bit tricky. So I have this slide reused for my December talk at 24 C3 and the bottom of it says but hey, they're not going to enforce it until 2009 so we've got plenty of time to overthrow the law. We're running out of time to overthrow the law. So we're gonna have to start thinking about designs where we say I'm sorry, no Tor servers in Germany, they're worse than China. Or I'm not sure how we can balance these things and still make use of servers in a country that logs everything. And then a few last lessons learned here. A lot of these are hard research problems and good attacks against all low latency anonymity systems. If you're trying to get your web page from the website through the anonymity system to Tor quickly enough that your user doesn't get pissed off and leave then you're gonna be vulnerable to these end to end correlation attacks. If somebody can see both sides they're gonna be able to win. So it's not just Tor that's vulnerable to these, it's a lot of other systems out there. And this is a little bit tricky in terms of educating the masses. I was talking to a journalist a while ago and she wrote a nice article about Tor and then at the end she said the Tor developers recommend you use something else for now because Tor isn't really perfect. No, we don't recommend you use something else for now. All of these are broken by these various attacks. Tor is the best approach that we know of right now other than if you really need good anonymity consider not using the internet. I'd like to have a better answer than that. There are a lot of people who really need to use the internet so they need to understand the trade-offs. So another lesson here, a lot of these bugs were found and a lot of these attacks were reported because we have all of these design documents and we've written down exactly how you work with Tor and we've made it easy to build controllers and to do research and that means that people use Tor as their test and they come to us and they say hey here's the research I did I think I can break Tor like this. So the lesson is make sure that you provide openness and then people will be able to understand how your system works and they'll be able to provide attacks and hopefully be able to provide defenses. And the other lesson of that is we'd love to hear from you if you have any thoughts on attacks. And then the last lesson as my Goon comes up to pull me off stage is pretty much any Tor bug here turns into an anonymity bug. There have been a lot of simple ones like maybe your certain keys don't get built correctly or maybe you can do an infinite length circuit or maybe you can do some other attack. Each time we come up with a bug that looks like it's just a simple hassle and we should fix it, somebody comes up with an anonymity attack that turns it into a security problem not just an oops I built it wrong. So there are a lot of lessons we can learn about that. Anonymity is tough. Okay so I'm out of time and I'd love to chat with any of you later on. I'm gonna be in room one zero five. Correctly down the hall. Thank you. Thank you.