 Okay. Hi, everybody. I'm Roger Dingeldine, and I'm going to tell you a little bit about stuff we're adding to the Tor network to try to provide some more security features. Unfortunately, I'm tethered to this fixed mic, so rather than pacing around, I guess I'll be really energetic moving from two inches to one side to two inches to the other. So Tor right now is an anonymity system designed to let people browse the web or instant message, stuff like that, without people being able to track them down. There are a lot of people around the world who are using it for all sorts of things. The ones I'm going to talk about today are people in Iran and China and Thailand and maybe the U.S. and other countries that have some censorship issues. There are maybe 30,000 people in China using Tor right now. And they don't mind the anonymity properties, but I imagine a lot of them are using it for the circumvention properties. They want to look at BBC or their web comics or whatever they were able to look at last week before it got filtered by the firewall. Now, China hasn't blocked this yet, but it wouldn't be so hard to block it. So what are we going to do next? That's what this talk is about. And hopefully, there we go. Okay, so there are a lot of things I can talk about today. I'm going to give you a very brief crash course on Tor, and then I'm going to talk to you a little bit about what the goals are for what we're trying to do, what assumptions we've got, what threat model we're looking at, and then give you some background on what the components we've got in Tor are right now and what they can give you and then talk a little bit about current solutions, stuff that's out there that we can crib interesting ideas from or reuse or just straight out use. And then all the components we have to add to Tor, there isn't too much by the time I've gotten to that. And then all the exciting stuff that comes up that makes everything way more complicated than the theory that goes with it. Okay, so the big picture, we are a free software open source anonymity system. We're unencumbered, no patents, you can get it, reuse it, give it to your friends, change it. It comes with a specification and full documentation. We tell you exactly how to build a compatible Tor client, and several groups in Germany have built their own Tor clients. Talk to me later on if you've got questions about the usefulness of a second implementation. If the phrase 16-bit AES means anything to you, it's useful to have other people build Tor clients too, because it turns out they find bugs that you really do want to know about. Another nice feature about this is we're pretty much the de facto research network. If you want to write a research paper on how anonymity works, then we're the ones that are documenting every step of the way, and we've got everything set up so that you can play around with a Tor network and find out if your attack works. Ten years ago there were all these theoretical attacks on anonymity, and nobody had a good idea of which ones worked and which ones didn't work, and now people can actually attack the Tor network and find out, and it turns out that most of them work, which is unfortunate, but also worth knowing. So I guess that's the point behind being able to document everything. We were also picked by the European Union Prime Project as their low-level transport anonymity. That's a 30 million euro privacy project in Europe to try to explain how privacy could work in the world if you set everything up right. And the exciting part about that is that there were some European anonymity systems that they didn't think were as flexible, scalable, well-documented as the one built in America. And we've got some number of users. It's a bit hard to tell for sure, maybe 200,000. How many people here were at NixTalk in the last session, or NixTalk in the session before that? Okay, quite a few hands, but not all of you. Great. And we've got about 1,000 servers around the world on six continents. I was going to say five, but there's one in Morocco today, so six continents. They come up, they go down. And we pushed maybe a gigabit per second of traffic, maybe a bit more than that. So there are a lot of different users doing a lot of different things. And PCWorld magazine picked us as number 40, I think, of their top 100 products of 2005. We beat out Wikipedia, we beat out the iPod. I think this year they're putting us in their top 25 products you've never heard of. So I don't know how the balance there works, but they're still noticing. A lot of different magazines have been doing different things. We were on the front page of the Wall Street Journal a couple of years ago. Okay, so there are a lot of different people who care about this anonymity stuff. When I talk to researchers, I work on anonymity systems. When I talk to my parents and my grandparents, I work on privacy systems, because that anonymity stuff is scary, but privacy is worth having. When I talk to corporations, Google, Walmart, stuff like that, I work on communication security or business security, because they don't want that stupid privacy stuff, and they're scared of that anonymity stuff, but they really do need their security. They want to be able to look at the competitor's websites without having everybody know exactly what they're looking at. They want to be able to go out and buy stuff on the internet without having their ISP keep track of where their suppliers are. And when I talk to governments and military and stuff like that, we work on traffic analysis-resistant communication networks, because they don't care about that anonymity stuff or that privacy stuff, and they've got plenty of security, but they do need to resist traffic analysis. So the fun part of working on all of this is we've got all these different communities that care about the same security properties for quite different reasons, and we need to bring them all into the same network so they can get anonymity with each other, so they can blend with each other. So we've got military, law enforcement, corporations, individuals, activists, human rights, civil rights, all sorts of communities going into the same anonymity system, which is where the anonymity comes from. Okay, so how do you build an anonymity system? The very easy answer is you put a computer somewhere out there, say, in San Diego, and you get all the users to show up, each of them connects to it as a proxy, and they say, hey, I want CNN.com. I want Indy Media. I want Voice of America. And then the proxy goes out and fetches them and gives them back, and the websites don't know which Alice has asked for which page. So this is great. It's really easy to set up. It's pretty much how most of the commercial anonymity systems work, except for that big vulnerability in the middle, the big central point of failure, point of vulnerability. There are so many different ways that a single point can be attacked. Maybe you get a colobox next door and you watch the traffic going in and out of it. Maybe you get a job as the janitor in the colo. Maybe you bribe the janitor. Maybe you call the CEO up and threaten the kneecaps of his daughters. Maybe you send a guy named Guido on an airplane. The list goes on and on of single points of failure, ways that things can go wrong. So what we're trying to do in Tor is we'd like to not have that central point of failure. The basic idea is we distribute the trust. You go over three different hops, and as long as they're not collaborating, then you're in business. So, yes, the first guy might be bad. If the first guy is bad, then he knows that Alice is... I'm cut off over there. Oh, well, that still works. He knows that Alice is using the Tor network, but he doesn't know what Alice is doing. He doesn't know which Bob she's trying to go to. On the other hand, if the last guy is bad, then he knows that somebody is talking to Bob, but he doesn't know who is talking to Bob. And if they're both bad, if R1 and R3 are both collaborating to try to figure out who is talking to who, then we're in bad shape. If you're seeing both sides of the circuit, then you can do traffic correlation attacks to say, hey, wait a minute, the traffic looks exactly the same on both sides. I'm pretty sure that Alice is talking to Bob. So the security that we get from the Tor network is from having a large enough network and a diverse enough network that not very many attackers are going to be in the right places to see. If you happen to be observing the correct places, we assume you've bought a statistician. We assume that you can do the traffic analysis and win, but if you don't happen to be in the right places, it doesn't matter how good you are at math. You don't have the data. There's nothing you can do to learn that it's both Alice and Bob. So far so good? Okay. And there's a lot of crypto. I'm going to gloss over the crypto. Feel free to ask me questions later on. This idea is that you build a tunnel one step at a time and you establish a different encryption key with each person in the tunnel. And that means that by the time you're sending a packet to R3 that says, please connect me to cnn.comport80, the guys in the middle can't read it. They just know where to pass that encrypted packet. Okay. So that was the really quick crash course in Tor. And feel free to jump up at any point and ask questions if something I say doesn't make sense because we've got almost two hours here. So there's plenty of material to cover, but I'm also happy to get sidetracked a little bit here and there if we need to. Okay. So the stuff we're going to be talking about today, people can block connections to the Tor network. It doesn't matter how good the anonymity is for Tor itself if the user can't get there. If a great firewall of whichever country it is prevents its citizens from getting to the Tor network, then they're not going to get whatever anonymity, privacy, security, traffic analysis, resistance properties they're looking for. So there are a lot of different ways we can do this that they can do this. They the attackers. One of them is you block the directory authorities. And I'll tell you a little bit more about what directory authorities are in a bit. And some of you already know from NixTalk. So there are five of these. They're hard-coded, the IP addresses and keys in the Tor code. So you go get those five IP addresses. You put them in your filtering list and you're all done. And if you don't want to do that, then the Tor directory itself is public. There are about a thousand servers out there. The IP addresses are published. They're updated every couple of minutes and that's all there is to it. You go get that list. You put it in your filter and you're done. And if that isn't enough, right now the Tor handshake, the Tor TLS handshake that we do, we actually tried to follow the specs. The X509 stuff says, put your organization name here and put your common name there and describe what that certificate is for. That turns out to be a bad idea because we've got little strings like Tor in our handshake. And all you have to do is look for the string Tor during the TLS handshake and you know what sort of communication it is. Okay, so the goals we're going to talk about today, we need to figure out some way to attract a lot of different relay addresses that aren't the 1,000 of them sitting in the public directory and figure out how to make use of them. And we have to normalize Tor's network fingerprint. We have to make sure that when you're actually using a Tor connection on your network, nobody can look at it and say, hey, wait a minute, I know exactly what protocol that is. We have to solve the discovery problem. Yes, I have a big list of relay addresses I could give you, but how do you learn about them? Do you go to that central point and ask the central guy for a relay address? That's not going to work because that central point is going to get blocked pretty quickly. So we need to somehow do the bootstrapping problem of how do you learn anything to begin with. And the most challenging of these, don't screw up our anonymity properties in the process. This is really tricky because we don't really know how to measure anonymity or what we mean by screwing up or not screwing up. So we need to sort of try to get an intuition about that as we're going. Okay, so there are a lot of assumptions that we might make. One of the challenges here is that every time people work on a circumvention system or something to deal with China or Thailand or Iran, they always come up with a different set of assumptions. A lot of them in the DEF CON community say, okay, imagine the adversary is super powerful. He's got $100 billion dedicated to tracking me down, and he wants to make sure that no citizen will ever do anything bad in any country. And then they try to build a system, and it turns out they can't build a system for that because that's really hard, and then they fail and they disappear. And the history of circumvention systems at DEF CON is filled with stuff like that. So I'm going to try to give a sense of, from talking to people in China and Iran and other places, of what the actual issues are, what the assumptions we can make about our attacker are, and hopefully that'll make things a little bit easier to tackle. Okay, so the first one, like good security researchers, we'd like to build something that can handle stronger adversaries than we predict right now, and that might contradict a little bit with what I talked about before. We're not talking about a super powerful adversary that can do anything, but we also don't want to go up against a weak attacker. For example, right now China doesn't block the Tor network, but that's not the attacker we're looking for. We're not looking at an attacker that doesn't care at all because against that attacker, we're done. We'd like to do something a little bit stronger than that, so that when the attackers were looking at take notice, then we're in better shape. We also have a lot of different users in mind. Maybe there are people in China who need this. Maybe there are people in Thailand who need this. Maybe there are whistleblowers and corporate networks. I've talked to a variety of huge corporations, and they are one by one saying, hey, wait a minute, my employees couldn't use the internet from my network. I don't want them to do that. And so they're slowly locking down every possible proxy, every possible other thing. I suspect that they will be collateral damage if we succeed at making China unable to block this, but we can certainly get into that later on also. And then future oppressive situations. I don't know what's going to go on next. I don't know if the Induce Act is going to get passed in the U.S. and suddenly building these sorts of things will be illegal, or maybe we'll get our own variation of the fine new German law so that nobody knows what it means that says it's illegal to have or look at or be within 100 yards of security software or whatever the law actually turns out to mean. And then another assumption we need to make is attackers are going to be in different stages of the arms race. Maybe some people will have lots of funding and lots of motivation and they'll really figure out how to block a certain set of techniques and maybe some other people aren't going to care as much or have bothered to go to that route yet. Okay. What's the attacker actually trying to do here? There are sort of two big goals that we're looking at. One of them is he's trying to restrict the flow of information. He's trying to restrict embarrassing information, maybe corruption, rights violations, stuff that he really doesn't want the rest of the world to start publishing. Such and such country has a horrible problem with X. They don't really like that sort of thing to get published. And then another one is opposing information. Maybe somebody is organizing a protest. Maybe somebody is publishing, hey, let's all meet at the street corner at midnight and then we'll grab our pitchforks and torches and go find the embassy. So these are sort of two categories of things that they tend to focus on. And of course, they don't actually have to focus on all of these. They just have to give you the impression that they are. They can make people self-censor just by saying, oh yeah, we're watching everything and we make sure to censor most of it. And at that point people are going to say, do I really want to post this? Do I really want to mention that I'm going to go to the street corner? Because I don't know for sure, but I'm pretty sure they're watching me. So one of the key things to keep in mind is the attacker doesn't have to actually do all of these things. He just has to make people either pretty convinced that he's doing all of them or totally unable to predict what he is doing. The laws with respect to what's legal and illegal on the internet change wildly in China, so that makes certain attackers easier. Okay, so there are a lot of other goals we might look at. One of the big ones is complete blocking is not a goal. We don't have to block every possible bad website out there. We just have to block most of them and then the people who want to go to the rest of them are already going to be in a smaller set and just focus in on those with human resources, people on the ground and so on. And similarly, you don't have to block every single circumvention tool. You don't have to say, oh crap, somebody just announced some design at DEF CON and I really have to make sure to block that right now, otherwise one guy in my country might use it. There are only sort of two categories of stuff that they have to actually look at. One of them is if it's really popular and really effective. If it's got lots of users, they have to look at it. And the other one is if it's really high profile, then they have to deal with it anyway. Even if it doesn't work at all, if there have been enough New York Times articles and Wall Street Journal articles saying, such-and-such circumvention project declares war on China, then they have to do something about it because their citizens are saying, hey, you're doing an awful job of filtering our internet because I can even use this thing that I just read about on the newspaper. Or maybe the bosses of the folks doing the censoring say, hey, I just read about this thing and it doesn't look like you've censored it yet. You're not doing a very good job. So those are sort of the two categories of things that they mostly care about. And I'll talk a little bit later about how we're hoping to stay a little bit under the radar with respect to the second one. So if you're a journalist in here writing down every word I say, eager to publish this on the front page of your newspaper, please don't because we will be able to go a lot farther in the arms race if we can actually continue to work on the design and get feedback from the developers and engineers and security people here rather than having a front-line press article that says, tour project declares war on China. So I'll talk a little bit more about that at the end of the talk also. Okay, so another assumption that we have to make and is a good thing to be able to make is if you're just a passive consumer of information, if you're just reading the Internet, nobody's going to bother you much. I talk to a lot of people who say, but all they're going to do is enumerate tour users and kill them all. What happens when you go through the firewall and you read a page that they didn't mean for you to read is they say, oops, and they filter it. That's it. They don't hunt you down. They don't care what you looked at. Now, it's very different when you're publishing things. If you go somewhere and you say, I just saw the following and I want to tell everybody about it, that's a different story. That they really do care about. But if you accidentally read the wrong page, nobody cares. And that makes everything a little bit easier because we don't have an adversary that really is trying to kill every single person who wants to read the webcomics on the Internet. Another thing to keep in mind, sensors and governments have a lot of different incentives. Economic, political, social incentives, not to block the whole Internet. When we started out doing this, we were thinking, if we follow the arms race really far and they block the whole Internet, they just shut it off. Did we succeed or did we fail? If we force a large country in Asia to not use the Internet anymore, are we actually helping their citizens learn more about what's going on and helping them coordinate and learn about democracy and learn about what's going on in their country? Probably not. So fortunately, from what we can tell, many of these countries don't actually want to shut the whole thing off. They've got a lot of financial interest in keeping it up. Now that's not true for all countries. If you talk to Burma, I just talked to the fellow in charge of the Democratic Party of Burma. He can't ever go to Burma. And he was explaining that modems are illegal there. In fact, they just leveled like three blocks of a city hunting for a modem a little while ago. So in that sort of situation, we're in bad shape. You're not going to... You really don't want to be found with a tour client or a computer at all in that area and there's not really much we can do for you. And another point, they don't mind collateral damage. A little while ago, India said there's a blog up there that's going to make our citizens upset because it talks about religion. Please filter it. This is the end of the blog to all the major ISPs in India. And it happened to be an IP address for a little thing called Blogspot. And that turns out that India blocked 60,000 blogs for two or three days. And after a while, a lot of people started yelling at them and they decided to change it so they're now blocking less. But this is India. Everybody thinks of them as a fine democracy. They're close enough to First World. They got a lot of smart people there. So I guess the lesson there is sometimes they don't mind collateral damage if there's something really exciting on the IP address they want to block. Speaking of which, how do they actually do these blocks? There are three sort of high-level categories of how they actually do filtering right now. One of them is they get an IP address or a port that they don't like and they put that on their firewall filter and out it goes. Whenever you talk to a certain IP address, if you ever talk to a certain port, then they just hang up on you. Another one is keyword searching in TCP packets. So if you send a TCP packet and inside the payload is the word follow-on or democracy or something like that, then they notice and they send resets in both directions and as long as you follow the TCP protocol, then you hang up and you can't talk anymore. And then the third one is they intercept DNS requests and they give you other answers. If you search for a certain thing on democracy or a follow-on gong or something like that, then they send you they give you a legitimate DNS answer and they send you to a page saying it seems that you've been looking at this stuff. I think you might be an immoral person and maybe you should reconsider. We are watching you. That helps people have a good sense of how the firewall works and what sorts of things they're concerned about. So those are sort of the three categories of things that people are doing right now in terms of filtering. Another assumption. The network firewall doesn't have an unlimited amount of CPU and attention for every connection that goes through it. It can't look at that and say, hey, wait a minute, you look like an SSL connection but actually your random number generator is weird because you don't have quite enough entropy over in that field. They don't have time to do that for every connection that's going through and that means we don't have to do and they talk to a lot of people working on this sort of design and they say, ah, but I have to have a totally random connection and I can put it in the fourth byte and they'll never look there because we don't want to deal with that sort of stuff because that will never work. That whole field is dead at least with respect to this sort of attacker. So it's a good thing we don't need that. Also, there's a time lag between attackers sharing notes. Maybe a certain large country will find out how to block several of our techniques but they're not going to tell all the other countries immediately. There will be a while before they share or maybe somebody else will, you know, show up and go to a seminar or something. Now the counter example to that is American corporations. Cisco has made a habit of saying, we have built a new way of oppressing your citizens and we'd be happy to sell it to any country who wants it. So that's a way that a single attack gets spread very quickly to a lot of different countries and we can sure talk more about the upsides and downsides of letting American corporations oppress other people's citizens. And another piece of that is the insider threat probably is not a worry initially. If I show up here and I have a little design and I start deploying it a little bit, the attacker is not going to sign up 80% of the first volunteers who show up. He doesn't care that I'm doing this. He has no idea that it's even happening at this point. At least that's the assumption I'm going to make. Hopefully it's true. Okay, another perspective. Censorship is not uniform within a country. There are a lot of different ISPs in China and they all have a lot of different policies. As far as I can tell for China in particular, the directive that they have from their government is the guy at the top says to each different ISP, don't embarrass us. That's it. That's the whole set of instructions. And the poor guy at the ISP says, well, what does that mean? What should I block so I don't embarrass anybody? Well, I guess I'll go block this stock market stuff and I guess I'll go block this porn stuff and I guess I'll go block this freedom stuff and then maybe the next day he'll realize he should block something else or maybe the next day he'll change his mind and unblock the stock market stuff. So there's a lot of unpredictability in terms of what gets blocked where. And that really is because there isn't any set of instructions to block. In India, I think they've sent out, these are the 18 IP addresses that are bad, bad, bad, block these and only these. Whereas in China, everybody sort of makes it up as they go. And then another thing to keep in mind is that the attacker can influence other countries and other companies to help them keep track of users. A year or so ago, China went to Yahoo and said, hey, there's this guy, he's a citizen of China and he's working against China. And Yahoo said, oh yeah, certainly. I'll go check my logs. Yeah, here's the IP address. Here's all the mail he sent. That sucks. But it's how the world works. And Google is also trapped right there where China says we'd be happy to have you in China but we really do want you to filter everything that we want you to filter and we can kick you out whenever we change our mind. So that's also a challenging set of assumptions to work with but it's where we are. Okay, and we also need to assume that the users are not attacked by their hardware and software. If you walk into an internet cafe and there's a video camera behind you and there's a guy looking over your shoulder, this may not be a solution for you. We need to assume that there's no spyware installed, no cameras, stuff like that. I was talking to a group of dissidents from an eastern Asian communist country a couple of months ago and they brought me there and I was about to do a security talk for them and I was all set to say, so here's how crypto works 40-bit bad, 128-bit good, let me teach you about which software is compromised and maybe Windows may not be the best choice for you. And they were getting phone calls saying, so and so just got arrested, what do we do? And so I realized that I wasn't there to talk to them about InfoSec. I was there to talk to them about OpSec and I don't know very much about OpSec situations that InfoSec isn't going to help you very much. They were explaining after a while, yes, when I go to the bathroom people walk into my house and they answer my Skype calls. There's a guy across the street with a parabolic microphone listening to everything I say. They steal my laptop, they install stuff, they put it back. I can't help these people, I'd love to be able to, but one impossible problem at a time. So yes, there are some situations where we work on stuff and there are a lot of situations that are still beyond our reach. And then another assumption we want to make, assume that the user can get an actual copy of Tor. If you happen to be in Thailand and they give you Thailand's new official Tor and you use that one instead then there's nothing I can do, you're using their software, you lose. So maybe we can help this a little bit with PGP signatures. I've got business cards up here that have PGP key that signs the Tor source code. That works great for people who've met me. It works okay for people who've met people who've met me. It doesn't work so well for some guy in India or Thailand who doesn't ever leave his country. So there is a big challenge there in terms of how we get the real software out and how people know which software is the real software. Okay, so far so good. That's a little bit of background on what we should start out assuming and what sort of goals we've got in mind. Okay, so what does Tor offer now? What do we have that we can build off of? The first step is we've got an anonymity system and there are I guess three pieces of the anonymity that we want to pay attention to here. The first one is a local attacker on your network can't watch you and figure out what you're doing and he also can't influence what you're doing. He can't say that page no, oh you want to go to that page, that's okay. So this is clearly useful for circumvention or blocking resistance. It's the property that we want most. But there are a couple of other ones. The other one is no single link, no single relay in the path learns about both you and your destination. And that means that there isn't as much of an incentive for the attacker to show up and sign up a lot of relays. If we were just a one hop proxy we'd expect the attacker to say oh yeah, I signed up 80% of their network, feel free to use the Tor network, it'll be great. I'll be watching everything you do. And in fact they don't even have to do that, they just tell people they did. They just say oh yeah, the Tor network, that's a one hop design, yeah we took care of that. So by having multiple hops we make it harder for people to actually attack the network in that way. And then the third one, the destination you'll be looking at it, can't figure out where you're coming from. This may not quite matter yet but imagine if the destination could figure out where you're coming from. Then imagine that China goes to Google and says so I know that you really want to be liked by China so why don't you go ahead and filter everything from people coming from China whether or not they're using the Tor network. Anybody who comes in from China anywhere they happen to be coming from even if they're actually being relayed through Ohio go ahead and filter them anyway or we'll start hating you. So we really want to make it so that the, I mean this goes into a debate about network neutrality but we really want to make it so that people coming to websites don't reveal where in the world they're coming from. Okay, so there are a lot of other things that we have to build on from what we've got so far. We've got a well analyzed, well understood discovery component of the directory servers that we were talking about before. So they automatically aggregate and test and publish information about all the different servers in the network and then the clients can go and fetch these lists and then they can learn which system, which relays they should use. And it's important that they fetch the same lists so that they're able to use the same network otherwise you've got a lot of anonymity people here and here and here and they don't overlap and that means they can't blend with each other very well. And so another piece of this is that the directory information is cached throughout the network. It's maybe a trust bottleneck, Nick talked about that a little bit more in his previous talks but it's not so much of a bandwidth bottleneck. Okay, so another important feature is yes the five directory authorities are hard coded, you've got the defaults right in the code, if you don't specify any you'll use the default tour network but if you do specify some then you go off and you use some other tour network. So you can say I don't want to use that other one, I want to use these five directory authorities, my own, I want to run a separate tour network and it's pretty easy to set up your own tour network, several people have done it for research purposes somebody started one up inside China for a little while so I mean it's pretty flexible in that respect, you don't have to use our network, we just give you a tool and it can hook up to any network it wants to but you don't want to split the anonymity set too much you don't want to all be running your own networks because then you won't be blending with each other very much. Okay so another feature, tour automatically builds paths if you give it a set of servers, so we're really just a tool, you tell me about some servers and I'll build a path through them and I don't care how I learn about the servers maybe I learn about them through this directory thing, maybe I learn about them through some other mechanism for inputting server information so there's a project at Harvard called Blossom from a grad student who graduated recently and so he doesn't care about the anonymity properties of tour, he cares about the reachability properties of tour, he wants to say I want to see CNN from Iceland, so Blossom will use the tour network to build a path that ends in Iceland and then go to CNN. Hori wants to be able to say give me the resource called the website from the location called Jeff's apartment and it will automatically figure out which tour server is running in the apartment, it will exit from there and ask for the website, so he calls this a perspective access network and it's also useful for people who want to study filtering around the internet because there are maybe 20 or so fast tour exit nodes in China right now and you can exit from them and see what works and what doesn't work, so there are researchers at Harvard and other places who are looking into using this to do automated scanning to figure out what's filtered today and what's filtered tomorrow. Okay so another feature, we separate the role of internal relay from the role of exit relay, another way of phrasing that is tour has these things called exit policies, every tour server can specify which IP addresses and ports it wants to let people connect out to the rest of the internet too. So some people say yeah I'll let anybody browse the web through me and some people say my admin already hates me enough, I don't want to get him harassed anymore I don't want him to harass me, I'm just going to be a middle node, I'm going to be a non-exit node, so there are maybe half the people out there who say sure I'll run the default exit policy and half of them say I don't want to deal with any of that, I'm just going to relay traffic from tour to tour and that means that we've got twice as many users three times as many, I'm sorry servers twice as many servers, three times as many servers as we would have otherwise and remember that the security properties for tour are the larger the network gets the more diverse the network gets the better we are against an adversary who wants to see both sides of some circuit so yes this diversity is what gives us our anonymity and it's helpful to have a lot of different servers out there okay so another feature we've got to work with we exist, we're pretty sustainable we're based on volunteers a lot of anonymity systems in the past have said okay well we'll start with $30 million of VC funding and we'll get a lot of servers out there at ISPs and we'll pay them all thousands of dollars a month and then we'll collect 50 bucks from every user and everybody will get rich those systems are pretty much dead by now so without getting too much into the commercial side of that we think we're a little bit more sustainable in that we rely on volunteers, there are a thousand people all around the world who say anonymity is really important I've got this extra bandwidth, I've got the server here, I'm going to set it up but I'm going to help save the world and hopefully there will be more people next year if I can't keep doing this next year so that sort of community effort means that it's more than just me and Nick working on this it means there are hundreds of developers who show up some of them are anonymous, there are three or four really good developers I still don't know where they are they use Tor to go to the IRC channel they use Tor to communicate their patches I don't know who these people are but they work very carefully at their patches but on the other hand they're really good people and they find bugs so we've got a big community of people who help out in a lot of different ways and a lot of this is because of the open design a lot of this is we tell you exactly how it works and that means that some security researcher who doesn't care about code can show up and look at our design and look at our specifications and say I'm pretty sure you've got a problem here and it also means that some security researcher can show up and look at the code and look at the specs and say I don't know what this is for but it doesn't do what you say it does and I think there's a problem there so hopefully that means that we'll be around for many more years yes question if these developers are anonymous how do you know there are three or four one answer is I don't another answer is they seem to have different PGP keys that doesn't mean that there are you know only three or four but they seem to be different personalities if they really do have enough time to have lots of different personalities and to dupe us I wish they'd spend more time working on tour there are I mean there are some great anonymous contributions Nick made a mention a little while ago about the fellow who pointed out the Diffie Hellman key exchange bug for us this was right after what the hack a few years ago and I talked to an audience of 1500 people and I said hey if you have any if you find any problems in tour please let us know and several people said yeah well I'd like to but my employer doesn't let me do that and I said hey we have an anonymity system and then the next week some guy showed up on IRC using tour pasted a little GPG encrypted blob and said get this to Roger and left and that was the big handshake vulnerability that was presented to a good man in the middle any tour server and whoever that was I'm happy that they told us about that and I imagine we'll have more anonymously contributed bugs now that we've got an anonymity system did that answer the question okay so great that's hopefully something that we can build off of we've got a sustainable community so even if one or two developers disappear tours not going away anytime soon okay other features and users out there now this gives us two things one of them is it gives us more sustainability it means that if people start threatening anonymity legally or socially we're going to have a lot of people standing up saying hey you can't take away my communication security or hey I use that traffic analysis resistant property you can't make tour go away there are navy soldiers in the middle east right now who are using tour and they don't use it for anonymity they don't care about that or privacy or whatever they want to check their mail in Maryland and they want to do it securely they really do want to authenticate but they don't want somebody watching the house there in Iraq to learn what their affiliation is and they don't want somebody watching the mail servers in Maryland to learn what their location is so as long as we've got a large community of military law enforcement corporations individuals activists human rights civil rights the list goes on and on and it's going to be a lot harder for people to say anonymity is only for bad people we gotta outlaw it because there are so many people using it simply for good data hygiene I mean do you really want to be in Amazon's logs I don't know what they're going to do with their logs in two years or whether they're going to leak them like AOL or I can you know list 40 corporations that accidentally lost their whole database but that's also a separate talk so one feature of having all these users is that sustainability another feature is we got a whole heck of a lot of IP addresses that we can work with and maybe we'll be able to make use of those in later parts of this talk okay and a little aside I talked to a lot of people working on circumvention systems and they say anonymity is stupid and hard and I don't want any of that I just want to do circumvention so let me tell you a few scenarios where it might matter a lot of these other systems involve let's imagine Alice is a factory worker in China and she really wants to blog about what's going on in her country so her uncle happens to live in Ohio and she calls up her uncle and says hey can you run a relay for me and he does and she posts her blog through him and then the the big bad adversary says oh dear somebody posted a blog entry about this I really want to know who it is I tracked it down to this guy in Ohio wait a minute he's related to somebody at the factory I know who published that so that's an area where anonymity really matters in addition to the circumvention not only do you want some way of getting the packets out but you also need to make sure that the origin of the packets is not correlated with where they end up and then another side of this is if you don't have any anonymity in your system then the attacker really should sign up as much of the system as they can and then you can turn well this Alice posted about this this Alice posted about this and so on and they can just start building profiles of the users in the network or just spread suspicion that they have they don't actually have to attack things they can just tell you they did and now everybody will say I'm not really sure I should use that tour thing maybe I'll use nothing instead and I'll just try it directly or who knows what the alternatives are I suspect I'll have plenty of time for Q&A if we keep going at this pace so feel free to interrupt me whenever you like and I'm going to talk a little bit about yes, question so the question is can people tell that somebody is coming from tour generally the answer is yes so for example Wikipedia we give them a very simple script to say run this little thing over the directory and compare the IP address you're looking at to the directory and it'll tell you whether that's a tour exit node they're coming from or not and so the reason for that is we don't want to bully every internet service in the world into accepting anonymous users we really want to let every website out there decide whether they can handle anonymous users or not and plenty of them can they have this weird concept called accounts where they let people log in and then they can interact and then you can't log in and they don't care where in the world they're coming from but there are some services out there like Wikipedia that they really don't want to work on the account model and they really do want to let random people from around the world show up and scribble on their web page and we don't want to force them into doing that we can we're trying to work with them to teach them about this weird new modern notion called authentication and once they get it then hopefully they'll allow tour users and until then they don't want to beat them up yes, yep, it's worse than that if they're actually watching you and they're watching the website I don't think it matters that there are other tour users out there in the world I don't think it matters that other people are using tour at the same time if they're watching your traffic and they're watching your destination you're in bad shape and unfortunately this is a property of all low latency anonymity systems if you're asked enough that your users don't hate you and leave then you need to be vulnerable to this end-to-end correlation attack and it's an open research question whether we can address that but I haven't seen any approaches yet that don't involve saying how about you wait three hours and then your web page may or may not arrive and nobody really likes that design so yes, you're right you're vulnerable let me give you another example more extreme let's imagine there's a poor person in the Sudan and they're in some camp because they happen to be the wrong community or gender or race or something like that and they have a blog there's one person in the Sudan who knows how to blog and knows how to use tour they use tour, something shows up on a blog post using tour and then the authorities think gosh, I don't think I want that to have been published who is that person who knows how to use tour again at that point she's in bad shape too so part of that I think can be addressed by the design I'm going to tell you in a bit where it's not nearly as obvious that you're connecting to the tour network and I should also clarify answering the question over there there's a difference between connecting to the tour network and trying to prevent people or allow people from doing that and connecting from the tour network to a website we really want to allow websites to decide whether they allow tour connections but we really want to allow Alice to decide whether she can get to tour so it's a matter of personal allowing people to do things there are many questions, yes great okay so the question was can you talk a little bit about Vidalia which is a tour GUI and how it protects DNS queries which is a tour issue there are many separate things Vidalia is a program that gives you a slick little GUI and you've got a little map of the world it puts little red dots where the tour servers are it draws little little arrows and stuff for which tour servers you're using it also lets you configure things there's a great little window that we put in recently where there's a checkbox that says turn me into a tour relay click turn me into a tour server so that's Vidalia separately from that is the DNS leak issue which has nothing to do with the tour GUI it has to do with the applications that you use I'll cover it very briefly because it isn't quite on topic here but it's worth knowing about so one of the problems with the very early configurations that people had for tour if you for example point your internet explorer directly to tour your internet explorer does the DNS resolve itself and then it goes to the SOX proxy and that means that from the perspective of somebody watching you on the network you shout out, hey guys what's the IP address for google.com and then you anonymously go somewhere using tour this may not be the property you're looking for so if you configure your applications correctly you're in better shape there are also some new designs that involve maybe VMware images running on the same system you can transparently intercept everything you also are able to do that on every OS other than windows just with a few lines of code and then a call to the kernel there's this notion of transparent proxying where the kernel redirects any outgoing TCP connection to your port and then there's a little system call that you say hey that connection you just gave me what was it supposed to go to really and then you pretend that you just intercepted it yourself and you send them to google there is no DNS leak so that sort of addresses that but again that relies on users not shooting themselves in the foot which is rarely a good move yes yes so the question is is fixing the DNS leak problem related to the circumvention problem and the answer is yes because it's yet another way that some poor guy in China can screw himself in particular in China if they do DNS interception then they give you a different IP address and then you anonymously would go to that and you would anonymously go to the page that says we are watching you we really don't want you to learn about that and that's probably also not what you had in mind okay so I'm going to rush forward a little bit and do some more and I'll answer your question maybe at the next section because maybe I do have enough material to fill two or three hours okay so what do we have to work with here what are the various components that we can take advantage of to work on our solution so let's take a step back there are sort of two pieces to each of these circumvention tools there's the relay piece which is how do I build the paths how do I get the crypto right how do I actually how do all these mechanisms work and then there's the discovery component which is how do I learn the first place to connect to how do I bootstrap into the system and we really can separate these two because a lot of the different systems have one relay component and then some other discovery component and we can look at these separately and mix and match okay so one thing to look at centrally controlled shared proxies stuff like anonymizer.com they've got a computer out there or maybe they've got 25 different proxies out there and they control all of them centrally and another feature here is that they aggregate a lot of users into each of those proxies so they've got weak security compared to distributed trust designs compared to something like TOR where you have several hops because every one of those proxies is the point of vulnerability on the other hand they're really easy to deploy because the user is already totally trusting the proxy so you don't even need any client software you just set up the proxy you say that's the address go ahead and tell that guy everything you want and he'll take care of you and users love that because it's easy to do except when you have anonymity problems so this is something we can build with and it's certainly something that's out there right now on the other end of the spectrum are things like circumventor or syphon or CGI proxy these are totally independent personal proxies there the Alice gets her uncle to run a relay type of thing so they've got the same relay strategy the relay is there's a proxy and you use it but the discovery strategy is no longer we tell you the domain name and we hope that you can reach it the new discovery strategy is I'm going to tell you personally about this relay address and then you can use it and please don't tell the world so these are great for blocking resistance nobody's ever going to know that this guy in Ohio is running a relay because only one or two people are using it on the other hand there's a huge scalability question let's imagine you don't know somebody in Ohio and you really know you really need a relay what do you do how do you find somebody to run one of these things for you and on the flip side let's imagine you're in Ohio and you really want to help somebody how do you meet up with somebody who needs a relay so that it's great for blocking resistance but there's a scalability issue and we'll try to tackle that one also and then there are open proxies I don't have to tell the DEF CON crowd what an open proxy is but Google for open proxy list you'll get plenty of them there are Russian corporations that refine the lists they test them all they give you ones that are guaranteed to be up and guaranteed to be at least 50 kilobytes a second and I'm not sure but they probably give you ones that are guaranteed to be inside US military institutions and you know whatever lists they happen to want to give you so they've got a lot of different bandwidth and stability and reliability it's hard to say legally whether you're supposed to be using them whether people meant for you to be using them open proxies really didn't have you in mind and they're not encrypted in a lot of cases which means that keyword filtering is still going to work and then a lot of people say aren't these a little bit too convenient who's running them why are we using them I was actually just talking to an NSA person last week and he was lamenting that a lot of people in his lab were saying oh yeah we got we get great anonymity we found this thing called open proxies and we're using them and and like we're using SSL so we're all secure and he's like you're using SSL to some guy on the internet who said he'd be happy to relay your traffic something's wrong here so there are many lessons to be learned all around but that's another building block and maybe we'll be able to make use of it and then there's an anonymity system in Germany called the Java and non-proxy or YAP and they're quite similar to Tor in a lot of respects they use several hops and things like that we can certainly get into discussing the details of why our design is more scalable and more robust than theirs but they also are working on a blocking resistance scheme they're also working on some sort of how do we let the folks from Iran who want to get to the YAP network actually get there and they basically have a CAPTCHA mechanism where you go to the website and they give you a little picture with some letters that are hard to read and if you type the letters incorrectly then they tell you an IP address that they haven't told very many other people so this is a great mechanism I mean it's not perfect I hear that some large governments can employ people to fill out CAPTCHAs but it's a start and then there's Skype it does port switching and encryption and lots of other stuff so that it's not trivial to filter on the other hand it has a central login server I was talking to a woman from Iran a little while ago who said Skype is really great I can't do any of those things that cost money because I can't get to the place to pay Skype so I can only do the free Skype things and that doesn't actually let me talk to all the people I need to talk to so we can certainly take lessons from it but it also has problems that doesn't solve the whole thing and then there's Tor itself Tor's website is blocked in a lot of places it's blocked in Iran it's blocked by the LA Times which is sort of an adventure because I was doing an interview with a reporter from LA Times a few years ago and he wanted to go look at the website and he couldn't because they were contracting out to one of those filtering companies that doesn't let you look at scary things on the internet and apparently we were a scary thing on the internet so it's blocked in a lot of places but the Tor network itself actually isn't blocked in very many places I think UAE very recently decided that they cared and they're starting to do something about it but China doesn't block it at all so the question is why is that it would seem that if there are 30,000 people using it right now why haven't they done something about it already so the short answer is Gali I don't know I haven't asked them but there are a couple of other guesses we can make one of them is tens of thousands of users who cares we got a billion people here that's nothing nobody's using it another answer is the perception we've been talking to people for a while and a lot of people figure that Tor is for experts there's this weird thing it's called a text file and you sometimes have to edit it if you want your Tor to work right and that means that most users are never going to touch it and as long as the perception is that Tor is for experts then they don't have to worry about blocking it it's not a threat to them and then the corollary to that is we haven't publicly threatened their control as long as I keep talking to people about how Tor is for civil liberties and free countries and it's about not getting an Amazon's database and not being the one that AOL makes fun of when they try to anonymize their database and publish 40 million records or whatever it was they published pardon then as long as we're talking about stuff in the free world then nobody's going to care over there as long as we're not saying this is for human rights, not civil rights then we don't look like the sort of thing that they have to filter we're not threatening them but I guess the key point here is that we should realize we're already in the arms race it's not that we're up against an attacker who is all powerful and all caring we're already in the arms race there are 30,000 people using Tor right now and they haven't taken action so maybe that's something we can take a lesson from also okay so those are the current systems and I'll take another little break because I saw some questions and we'll go from there does Tor have anything to present to prevent DOS attacks and I guess there are two questions in there one of them is can you DOS people through Tor and the answer to that is golly it'll be slow I don't think they'll mind and then the other half of that is Tor only transports correctly formed TCP streams so it's not like you can send a bunch of UDP packets through it you're going to be making a connection and then a little while later you'll be making another if you're the sort of person who likes to DOS people there are a lot of better options out there go build yourself a a six node botnet and you'll be in better shape so that's the first answer and the other answer is does Tor have any protection against people DOSing it and one answer is there are a thousand servers out there and DOSing a thousand servers especially when many of them are run by ISPs is a little bit tricky there are five directory authorities nobody's ever knocked down all of them at once don't take that as a challenge please and on the other hand sometimes Tor exit nodes do get knocked down if a Tor exit node happens to allow connections to IRC and some dude shows up on IRC and starts making fun of people then the community reaction in IRC is I hate you I'm going to make you go away and if the exit node happens to be at a university then nothing happens and the guy doesn't go away and if the exit node happens to be on Comcast then he vanishes and you never hear from him again and it really depends on the exit note what do you mean tell the difference okay so the question has to do with padding the suggestion is why don't we pad the link from Alice to the Tor network all the time and that way the attacker won't be able to do this end to end correlation attack as well so the first answer to that is golly that's expensive if every Alice is padding every link to the Tor server she's going to enter the network at then suddenly the gigabit per second average traffic we push is going to be hundreds of times higher than that and Tor is going to suck even more than it does right now so we're sort of at a tension between trying to provide good anonymity and trying to provide enough performance that the several hundred thousand users we've got are still going to stick around and it's still an open research question how much padding you have to add until you get any resistance to this end to end attack but as far as I can tell it's quite a bit it isn't add ten percent more it's add thousands percent more and that gets really expensive and then the other side of that is even if you do that even if you got full padding the internet's not perfect you're still going to have little dips in the in the link from Alice to the Tor network and so rather than looking for presence of packets and doing the correlation there now it's just look for absence of packets and do the correlation there and the math is the same so you have to be perfect if you want to do padding otherwise you're still vulnerable to these attacks and then the question is okay but how long does it take does it take any less time or more time if you do that and I don't know that's still open research questions but I'm not optimistic other questions or should I move on to the next section oh I see a big hand in the back and you're going to have to shout really loudly what sort of trust is necessary to become an exit node or really any node in the Tor network once upon a time you had to send me mail and say hey Roger I'm that guy you met at the conference who said he was going to set up a Tor server that scaled even less well than I thought it would so that was back in two thousand three or so that we stopped doing that and now you can show up to the Tor network and you can you can run your server and you automatically join and if you're an exit node then you automatically are used as an exit node see Mike Perry's talk and I think five o'clock on how to on what he's working on in terms of scanning the Tor network to try to find evil people who are lying to users but yeah you want to sign up to the Tor network go ahead if we notice you're doing something wrong then we'll either break your knee caps if we know you or will will lock you out of the network by setting flags on the directory authorities but that's certainly a big issue because we're busy developing Tor rather than trying to police the current Tor network and if you've got any ideas on how to how to make that work better please let us know okay I'm gonna go to the next section and then and then go for a few more questions okay so I talked a lot about what we've got to work with and now the big question is what what do we add so that we can actually start start providing some blocking resistance so the first thing is bridge relays we've got a few hundred thousand users out there what would happen if we give them a little button in the GUI in Videlia that says Tor for Freedom or something like that I used to imagine we should call it help China but that sounds too much like Tor project declares war on China so let's imagine there's a button that says help Tor for Freedom and you click the button and you sign yourself up to be a relay you don't sign yourself up in the public directory because that's easy to block they just fetch the list and block it instead you sign yourself up on a different directory authority a bridge directory authority and you don't give that whole list out to people you just give it out in some way that I'm gonna talk about later so these guys they don't have to be exit nodes they can just be relays they just bridge from some person in a blocked area to them to the rest of the network and they can also rate limit to maybe 20 kilobytes a second or something so here's sort of a diagram of how it might work and we're still in the research phases of deciding whether we need R1 there or maybe we can cut that out and just have three hops including Alice for the blocked user I think we can get away with that which will make them a lot happier so that's the basic idea we've got all these users out there we can turn each of them into a little what we call bridge a little Tor server that's not in the main network it's not an exit node they just move a little traffic back and forth and 20 kilobytes a second is cheap for somebody on Comcast and it's a whole lot for a guy in Iran on a modem who's got no other connection otherwise okay so on top of that we also need bridge directory authorities we need some sort of specialized directory authority that behaves almost like the the current ones you want it to collect server descriptors you want it to do tests to see which ones are up you want it to answer questions when when somebody shows up and says I know about a relay here's its identity key tell me the newest descriptor you've got and that has stuff like the exit policy it has the IP address and port it has the keys useful things like that so we wanted to answer questions if you already know the identity key you're asking about but we don't want it to answer questions if you show up and say hey can you tell me every bridge because I'm just curious so that's that's pretty easy to do we've done I a few weeks ago I implemented the bridge stuff and also the bridge directory authority stuff so those are out and you can play with them right now and we want to eventually choose an official bridge authority so that people can actually have a bridge authority hard coded so that when they say I'm a bridge please upload this to the bridge people then it all goes to the same place and that's the same place that people go to when they want to ask questions I thought for a while about running the bridge authority on my computer at MIT but it's already running too with the main director authorities and we shouldn't centralize everything on it so if you know me and you've got a fast stable computer and you want to run a bridge authority let me know afterwards and then bridges should publish using tor so the attacker can't just go monitor the network of the bridge authority and watch as everybody shows up saying I want to be a bridge I want to be a bridge I want to be a bridge because that would be a great way to build a bridge you should filter okay so there are a couple of interesting points here that make the problem a little bit easier one key point is one bridge is enough once you've got one working bridge once you've somehow bootstrapped into knowing about a bridge you can use then you're in business from there you can get to the main tor network and you can get out to any website that the tor can get to from there you can get to the main tor director authorities and you can get to all of this happens in the background once once the the blocked user gets some bridge then she's in business so we've we've sort of switched the challenge it used to be how does the poor user in whatever country this is circumvent her firewall for every single transaction that she does and trust the web pages she gets back and hope that she doesn't get DNS spoofed and all the other problems we've changed that to how do we learn about one working bridge how do we how we bootstrap into the system we want to get around the firewall somehow once but after that it's all automated okay so another issue taking a little side is the tour network fingerprint question and I'm going to skip over this because we don't have three hours we've only got two but so there are there are a couple of issues here one of them is right now in tour we speak two different protocols on the wire we speak HTTP to do directory fetches and we speak HTTPS or SSL or TLS to actually do connections to the tour servers so the first one was a really bad idea in terms of blocking resistance because right now if you're a tour client you're the first thing you're going to do is you're going to start up and you're going to go to some place it might be port 80 it might not be and you're going to say get slash tour slash server slash something or other and I doesn't take very many smart filtering companies to figure out that if they look for that string then they found a tour client and they can just lock that tour client out and they never get any directory answers and that poor guy is never going to boot strap because he'll never actually talk to the directory authorities and and then things will go bad so the first step is we need to change it so that all the directory fetches are tunneled inside the TLS connection so what you do when you want to do a directory connection with somebody is you make a TLS connection you make an ordinary tour connection and then inside of that you send a little packet saying hey I want to do a directory request and this is the string that I'd like to ask so now it's all encrypted and it's all authenticated and it's a little bit harder to filter by a fingerprint okay so that's step one and we've done that though it's not default yet because I'm not sure I want to push the arms race that quick there are a few filter companies here and there who have said to themselves oh good we finally took care of that stupid tour thing I'd like them to keep thinking that for a little while and then the next step maybe we want to pick a good default port maybe port 443 or something like that because right now the default port for tour connections is 9001 which is just you know some port we picked out of nowhere but it doesn't look much like your browsing your bank's website if you connect to port 9001 even if you use SSL and then the next question we really want to make the TLS handshake look a lot more like an ordinary secure web browsing session so the first of all we really want to send you know you show up as a client and you say no I'm just a web browser I don't have any certificates and then the web server says I'm just a web server here's one certificate right now in tours handshake we give two certificates as a client and receive two certificates as a server and that's so we can get our authentication and our perfect forward secrecy and all the great security properties that people browsing the web don't get but there's a trade off there so I think what we want to do is we want to make the handshake look like browsing the web and then inside that we want to exchange a lot of other things to make sure we get our crypto right after we've done the handshake that looks like we're connecting to a bank and that's great except if they really look at the communication and they say you know when people connect to the bank and do a secure transaction usually what they do is they send a little thing and then they get a big thing and then they hang up this guy has sent a big thing and got a big thing and send a big thing and he's been doing that for a while now that's probably not somebody logging into his bank so this is a little arms race in itself and I'm not sure what we'll do about it if anybody gets to that stage it may be that looking like an SSL transaction looking like you're logging into a bank is not the best sort of behavior to blend with maybe we need to take a look at how Skype works because Skype has maybe a different pattern of its timing and packet volume and stuff and maybe it's easier for us to blend with that pattern than it is for blending with the I'm logging into my bank pattern of course we don't want to look like a secure file sharing system because people already think that maybe they're not sure they want a support tour and if we disguise ourselves as that thing that they already hate a whole lot that's probably also not a good move okay so that's a short summary into the arms race for hiding our network signature or fingerprint feel free to ask questions about that one afterwards too okay so how do we discover working bridge relays so I talked a little bit about the relay component before most of what we worked on with tour is the relay component how do we build the paths how do we get the crypto right and then we just tacked on a little discovery component so well how do we learn about the servers let's have directory servers they'll tell you about the servers so tour is really two different pieces it's the relay piece and the discovery piece and we can I just told you about how to extend the relay piece you take what we used in tour before and then you add a little bridge component so there's one more hop or at least a different hop so then how do we do the discovery component how do we fix this whole directory server stuff an easy answer is we can do anything we can do any discovery component we'd like unfortunately it's still going to be an arms race but we have fixed the problem a little bit with this bridge relay thing and the big list of possible bridge relays we changed the arms race from how do I keep a thousand public IP addresses out of the hands of the Chinese government which seems like an impossible problem to I've got 30,000 or 40,000 IP addresses how do I give them out one at a time to the good guys without letting the bad guys learn all of them and that's an arms race that I hope I can handle okay so the bootstrapping step we assume that users already know how to get around their firewall a little bit it's not hard to SSH through certain firewalls you can use Skype, you can use SMS you can use instant messaging there's a whole community of people on World of Warcraft that connect and they don't play the game they don't care how the game works they all get in one place and they tell each other IP addresses and ports that work to get through their firewall and there are whole communities of people who work this way and they've got great ways of getting around their firewall and we're going to take advantage of that I don't know what the best way of getting around the whatever filters there are in Thailand but I hope that some guy in Thailand does know how to do that and our goal is to make it a lot more convenient for him after he's done it the first time he does whatever it is he does normally to get his first bridge address his first bridge relay and once he's done that then Tor takes it from there and it can bootstrap and learn more and so on so we hope that the user knows how to get around his firewall at least once and if he doesn't then hopefully he has a friend who does and if he doesn't know anybody then he should get out more and meet a few more people so what are some other discovery components we could use here one of them is independent bridges with no central discovery the Iproxy and the Syphon model that I was talking about before so some guy in Ohio runs a bridge and the user is bootstrapped by knowing him maybe my uncle is in Ohio and I use his bridge and I call him up on the phone and I learn his IP address and port and I plug that into my software and now I'm good to go I can use his bridge or maybe you learn it from a friend down the street you go over and you say hey Roger I know you've got a bunch of bridges because you've got all these relatives everywhere can I have one can I learn one because I don't know anything can I bootstrap from you and that's great because that means that a social network can help people be able to bootstrap and the incentives are also pretty well lined up because if you're the sort of person who has really bad security and you always screw up and publish everything to the authorities then I'm not going to tell you my bridge because soon after I tell you it's going to stop working you're going to get it filtered I'm only going to tell you if I trust you enough and those incentives are good but there's a downside here which is you're helping the attacker map your social network let's imagine I know a bridge and it's a good bridge and it works great and I tell my 17 closest friends in my dissident community and now we're all using the bridge and it's great and then the attacker learns that somebody's using this bridge and they go watch the bridge and they over time see 18 different people connect to it and the attacker says hey I didn't know those 18 people were friends that's interesting new information so that's a downside of the social network approach for learning about things but this is a building block and it's a start and it's already built at this point so people can use it if they want to because it's really easy to this is a trivial discovery approach the discovery approach is go tell somebody your IP address and port so this is great but there are a couple of problems with it one of the big ones is what happens if your uncle who's on Comcast with a dynamic IP address reboots his router he gets a new IP address he's gone you have no idea where he is now so the answer is you have to bootstrap again you have to call him up on the phone and say hey can you read your IP address to me again or something like that so the next answer is families of bridges the idea is maybe I've got a lot of computers I run 30 of them or maybe the Linux users group in Las Vegas each runs one bridge and they are all in one big family and you tell the user all of these bridge addresses so now he's got 30 bridges to use and the idea is that as long as one of these bridges is still working then you can use that bridge to reboot strap you can use that bridge to connect up and say hey I know there are 30 and like 27 of them have probably changed IP addresses by now can you tell me where the new ones are and remember that this automatically happens in the background so people can look at it and say hey we're running low on bridges it's time to reboot strap it's time to learn new information so this is a fine plan and we're probably going to build this one next because it seems pretty easy to do and at this point we're going to be doing pretty well compared to the currently deployed solutions okay so that's great for the no central discovery approach but there are a lot of people around the world who don't have contacts in free countries who want to set up relays for them and there are a lot of people in free countries who really want to help out but they haven't been to whatever country they want to help out so what about bridges who don't know users what about users who don't know bridges what can we do about them and that's where the arms race gets exciting so the idea here and I'll sort of sketch it out in a nutshell is we take all these volunteers let's imagine we've got 40,000 volunteer IP addresses and we break them up into a bunch of different distribution buckets and the idea is that each bucket has a different strategy a different scarce resource that we're going to try to make the attacker prove that he has so maybe one of them you've got a lot of IP addresses one of them you've got a lot of time on your hands and so on and as long as there are a bunch of different strategies then hopefully the attacker won't be able to break every one of them so I'll give you some intuition about what sort of strategies we might choose the first one is time release bridges so we take our bucket with a thousand bridge addresses in it and we break it up so that between noon and 1pm you only get this subset of the bucket and between 1pm and 2pm you get this other subset and between 2pm and 3pm you get this other subset so this means whenever Alice shows up and says I need a bridge address, she's going to get some address but whenever the adversary shows up and says I need all of them, give me all of them right now he's only going to get that little subset which is to come back next hour to do it again to get some other subset so this seems like a fine start unfortunately I hear that some attackers can hire people to wait an hour and do things again so it's not going to be a perfect solution but at least it's going to help us bootstrap until it gets blocked and it won't be blocked at the same time by every adversary so this is a good start and then the next one is by IP address so you take the pool of bridge relays just like before and depending on where on the internet they're coming from you give them from a certain subset so if you're coming from MIT net then you get this subset if you're coming from Bangladesh then you get this subset and so on and this means that again if Alice wants some bridge address she shows up and she gets one but if the adversary wants all of them then he has to show up from a lot of different places on the internet in order to learn all of them so far so good okay and then what we really don't want to build the first two there's no point in that because we should combine them and now there's whenever you show up from a certain IP address and at a certain time there's a deterministic answer for you you keep asking and you keep getting the same answer and that means again whenever Alice shows up she's going to get an answer but if the attacker wants to learn every possible answer then he has to keep showing up from different places on the internet at a lot of different times of day and night to learn all the possible answers so hopefully this is a way that makes it pretty easy for a lot of users to be able to learn one bridge without making it easy for a large attacker to be able to learn all of them or at least a medium size attacker okay so those are three strategies we might use there's another one there's a system out there called circumventor and they have this new fangled concept called a mailing list and people sign up to the mailing list and every three days or so they block the bridge addresses and mostly they use this against China and right now they found that it takes China about three days to block the bridge addresses they send out so every three days they send out a new list and people on the mailing list use those until they get blocked and then they wait for a new mailing list post so this is fine it might work I have no idea I would speculate that as it becomes more popular more and more firewall people will sign up to the list and they'll try to block them as quickly as they can maybe it'll move from three days to two days and from two days to one day as it gets popular and then it's unclear to me if it's going to move down to it gets filtered within three and a half seconds or if there's always going to be some lower bound of it takes them at least 30 minutes to get around to filling out the form I don't know maybe it'll work maybe it won't work but we should deploy it and find out and then another strategy we might use you send us an email and we send you a bridge address and of course we don't ever want to answer a given email address more than once so we keep track of who we've answered and if you send us a new email address then we answer and of course we don't want to make it trivial for you to make 100 million email addresses so we want you to fill out a cop shop we want you to get a little picture and you type in the letters you see in there and I don't want to build any cop shop stuff but fortunately there are great people at google and yahoo who work hard at making sure that people can't create too many gmail accounts and too many yahoo accounts so I'm just going to leave it to them to rate limit the number of gmail accounts and yahoo accounts that get created and if you send me an email from gmail or yahoo then I'll send you a bridge and good thing there are smart experts at google and yahoo who can help me out here so that's another strategy I don't know if it'll work but I bet it'll work for a little while and then just to give you a sense that we can make it imagine a social network reputation system so the idea is imagine we have 20 people that we really really trust so we give each of them 50 bridge addresses and we give each of them maybe 10 little delegation tokens and each token lets the fellow they give it to connect to the database and build a new account and the idea is accounts in good standing learn about new bridge addresses and if you're not in good standing then you don't learn about new ones and what do we mean by good standing did you get the last ones blocked if we gave you a lot of bridge addresses and they don't work anymore then we blame you and if they still work then we give you more this isn't perfect there are some flaws to it but it's certainly something that's a fun open research question and we could imagine all sorts of pieces of the arms race there maybe people could track reputation between accounts and if somebody gets something blocked then you blame him and the guy who recommended him or maybe that's a bad idea it gets really messy but that's an example of other stuff we could do to try to force the attacker to have scarce resources compared to the user community and then the other strategies we'd like to hold it reserved I don't know what they should be yet but I really don't want the attacker to break strategies 1, 2, 3, 4, 5, 6 put out a press release saying you know that tour thing we took care of it it's all settled nobody can use it anymore because if they do that then we have to rush and pick a new strategy and come up with some scarce resource that we're pretty sure they aren't able to break and even if we do that we've got no bridge addresses to work with because they just learned all of ours so if we keep all of these in reserve then we can come up with new tricks on the day that they break all the old tricks and we'll have a lot of bridge addresses we can use so I'd like to ask all of you to start pondering scarce resources that are things we haven't listed already for example SMS messages Gmail used to use this for rate limiting people creating accounts you have to submit a new phone number that people haven't submitted before and demonstrate possession of it or maybe there was a fellow at Cambridge University who said what about a proof of work scheme I give you a something encrypted with a very small key and you have to break the encryption and then inside is a really address so this means that Alice is willing is willing to download the thing and spend six hours cranking on it and then she learns an address but maybe the adversary doesn't want to buy a few dozen supercomputers to do this or at least it forces him to actually do that it forces him to spend those resources in trying to break that so that's maybe another way that we can leverage the fact that there's one adversary and many many users one of the great things here is that deploying all the solutions at once actually makes everything easier usually I'm on the defense side usually I'm the guy reading the new attack from Cambridge University who says I finally found a problem with their protocol and here it is and I can exploit it usually I'm the guy who has to defend every single possible way now I'm finally on the offense I've got eight different strategies and as long as one of them works that I'm in business and the attacker has to figure out how to allocate his resources between all of these I don't know which ones are going to be hard to defend against or which ones are going to work really well so I presume that he doesn't know that either he's going to have to guess which ones to spend his money on and if he guesses wrong and doesn't spend enough money on strategy three or five or whichever one he misses then one of them works and those users are happy and we can continue to send those bridge addresses out without all of them being learned so by deploying all of them at once and this is sort of an economic side argument all of them become more secure because they distract him from all the other ones so far so good okay so that was what we need to add to Tor I've got a little bit of time left so one or two questions and then I'll hit all the parts that make me wrong about what I've said so far yes yep okay so the question is that's great Roger you've got these fine strategies for giving out really addresses but why don't I just nail you on looking at the fingerprint on the network and then I know you're using Tor and I don't care how you learn the bridge address you're right we need to solve both of them we need to solve all of these at once but if I only solve the network fingerprint and I don't solve the discovery problem then I lose and also if I only solve the discovery problem and I don't solve the network fingerprint problem I also lose so you're right there are several arms races we're doing in parallel at this point okay so that was an easy theory in terms of how things worked but what are the ways in which things are broken one of them is how do we learn if a bridge address has been blocked we can imagine active testing we can imagine there are some users inside China and we build a path through them like Blossom does and then we connect out to the bridge and if it works then it's still working if it doesn't work then I guess it's blocked so that's a bit tricky because how do we choose who to test through a bit tricky because if we have determined dedicated testers then I imagine the adversary will say hey there's this guy who keeps scanning the whole internet and maybe I should watch him and learn who he's scanning and then I'll know the bridges or we could do some passive testing I imagine that the bridges should install GeoIP databases and now they can look at all the different countries they get hits from and then they can say yes today's summary was 73,000 connections from China 200 connections from Iran and 12 connections from Uzbekistan and tomorrow's summary is zero connections from China 2300 connections from Iran and so on and at that point if we see something going along high and there's a sharp dip then we can get suspicious and wonder if nobody in that country is able to get there or maybe that will never work because we will never have numbers that high the numbers will be 22010 and when we get a zero I don't know if that means that the guy who uses Tor in Burma is asleep or he couldn't get there or what I'm not sure how that will work I imagine that we'll probably want some combination of active and passive testing and it gets complex also because it's not just one big country there are a lot of different zones that are blocked in a lot of different ways there are a lot of different places that could be a problem for the bridge another attack I worry about is the attacker should sign up thousands and thousands of great bridges that he's already blocked and now Alice shows up and she gets a bridge address and it doesn't work and she hates me and then she gets another bridge address and it doesn't work and she gives up and she says that Tor thing is stupid those security guys say it works but it clearly doesn't do we pre-test everything how much testing do we do I guess that will depend on how many attacks we see but eventually it will be some combination of active and passive I guess another thing that comes up a lot people talk about how using Tor in oppressed situations means that they're going to get hunted down and killed and the more oppressive the situation gets the fewer and fewer people are going to be willing to use Tor I think it's exactly the opposite I think that as countries get more oppressive there are going to be more important people who I was able to read my webcomics yesterday and now my stupid firewall doesn't let me read my webcomics there's a guy down the street with this little tool I'm going to go use that so it isn't that you're going to create more and more dissidents who are using Tor it's that you're going to create more and more ordinary citizens who happen to be using it to read their websites so the median use is going to become more acceptable as the firewall cracks down that's my theory I guess we'll see how it goes and then another challenge Internet cafes USB packages live CD packages VMware packages how do we make Tor more usable in a situation where the guy has where the guy is using the village's windows 98 computer there are a lot of very challenging security issues in there and so far I've been talking about Alice takes her laptop and and in some countries that works great and in some countries they don't have laptops and they don't have their own hardware and they're they don't have their own Internet connection to their own hardware how many bridges do you need to know about to stay connected let's say you have five bridges and they're on Comcast what's the rate of churn of Comcast IP addresses how quickly do they go away what's the rate of churn of windows users on Comcast IP addresses how quickly do they reboot or disappear how long is a bridge going to be around how many bridges do you need to know about with this natural churn and then there's the question of how quick how often is it that a bridge gets filtered or blocked how do we come up with these numbers so we figure out if five is a small number and we really need to get more or if five is plenty and two is a small number all this quantitative stuff is going to be really hard and then the other side of that is how often do I fetch updates do I wait for a day and then say hey did you move or do I say every six seconds hey did you move yet did you move yet did you move yet I would imagine I'd be more more well connected with the latter but that might not be good cable modems don't usually run big websites if all my bridges end up on Comcast and Comcast or Verizon or whatever wants to do business with China and China says we'd be happy to have you show up here but you got to get rid of those bridges if all of my users are on user style if all the bridges are on user style internet connections then people may have no problem just blocking the whole Comcast range it's not like they run useful things anyway that's a worry to pay attention to and then scanning resistance what if the attacker wants to check whether a given computer is a bridge should I make port 443 say hello you've connected to an unconfigured Apache web server unless you give it the secret handshake how do I give it the secret handshake without looking different from some guy connecting to his bank and then the other question there is what if the attacker wants to search the whole internet for bridges and again he doesn't have to search the whole internet for bridges he just searches Comcast and Verizon and you know the common places where you find bridges and if he finds 95% of those and filters them then this isn't going to work so we need some way of making it so that you can't check or at least can't check easily whether a bridge is something whether a bridge is a bridge maybe another answer is you show up and ask you for authentication there are plenty of people on Comcast who are running secure web servers that display all the photos they've taken in the past six years and they demand that their friends log in and maybe I'll look like one of those and I'll demand that my friends log in and maybe that'll be common enough that or maybe China will just block everybody who demands a password on port 443 Comcast galley I don't know and then the other challenge here of many is publicity attracts attention a lot of circumvention tools launched with a huge media splash and they say we're going to take on all the communist governments in the world and we're going to take them all down with this new funding we got from these VCs and this new tool we got with this undergrad who's writing it and the media loves this they love writing articles about how two guys in a basement are taking on a government unfortunately let's go back to that slide I had towards the beginning of the talk there are only really two categories of tools that really end up getting blocked one of them is if they really work really well and the other one is if they are high profile and we can choose the pace of the arms race we can control whether they consider us high profile or not so if you're an engineer or a security person you're the audience I'm talking to I'd really love to have some feedback on whether you think this would work please run a bridge please help out if you're a journalist I beg of you please don't describe us as taking on the Chinese government or other large government because that will harm the goals of the project I'd be happy to give you other news articles I can tell you all about civil liberties I can tell you all about how dumb AOL was a few months ago there are plenty of other fine stories about privacy around the world and I'd be happy to help you out with those but please don't write about this one okay so there are a lot of different things that I've told you about and we'll have a little bit of time for questions but I should point out as a summary technical solutions are not going to solve the whole problem right now these firewalls are socially very accepted socially very successful it's not like all the people in Thailand or Iran or whatever are saying I hate my government I hate the fact that they filter the internet I really want to be able to read BBC very few of them are saying that most of them are saying that saying I'm so glad my government filters the internet otherwise I'd be seeing all that scary stuff that nobody should be seeing and as long as most of them are saying that then a technical solution is not going to solve the problem now there are many people millions of people in each of these countries that do care a little bit and find being able to read a little bit more about what's really going on in the outside world so that's what we're aiming for now hopefully the technical solutions will be a piece of the bigger puzzle there are plenty of other people working on the social side of things and we've deployed some of it go check out the mailing list post I did a few days ago for details of how to configure yourself as a bridge or details of how to use somebody else's bridge and I'll be building the distribution strategy soon and then you can just click a button that says hey I want to be a bridge and I don't want to deal with meeting anybody or hey I need a bridge and I don't know anybody and we can go from there so another brief discussion and Tor itself needs to survive one of the big questions that's going on around the world these days is privacy or security or anonymity or whatever you want to call it is it worthwhile why don't bad people use anonymity and good people don't need it a while ago we had the same discussion about crypto the whole crypto war discussion in the late 90s where people said only bad people need security but now you wouldn't dream of logging into your bank or your e-trade or whatever without that little button at the bottom of the web browser that says this is an SSL connection so maybe in 5 or 10 years people wouldn't dream of using the internet without that other little button that says this is a secured connection or a private connection or whatever we're a long way from doing that but we need we need to live in a civilization where that's a reasonable thing to want rather than a civilization where they assume that the only people who want actual security on the internet or the criminals and the good guys are happy going in every single database and happy to be advertised to and happy to be compromised all over the place and then data retention it's an issue in Europe right now a lot of people in Europe say well that's not my problem it is your problem for a couple of reasons one of them is Europe's not actually that far away and they're collecting your data too and they're putting it in huge databases and then losing it so if you're worried about your privacy over here you should be worried about your privacy over there the other thing is I read an article that somebody wrote a little while ago saying that the US is looking at forcing search engines traffic and so I read that and thought ok search engines maybe that's reasonable search engines such as Microsoft ok Verizon wait a minute Comcast hey wait a minute so they're phrasing it as search engines but but they're not talking about search engines they're saying wouldn't it be nice if every user oriented ISP around the country wrote down everything that happened because hey that might be handy sometime so that's a fight that we're probably going to have in the next couple of years and if I weren't working on anonymity systems then I'd be working on fighting data retention so hopefully we won't have to have ten talks in a few years at DEF CON on why data retention is so horrible and we need your help if you want to help out with coding or finding bugs or running servers or volunteering documentation funding if you know any large organizations who need their anonymity or their privacy or their security or their traffic analysis resistance let me know and I think we've got some time for questions there's one on the way back you got a shout ok so the question is can I run a tor server on the main public directory as well as a bridge and yes but it wouldn't work very well because that IP address is going to get is going to be in the first list that they filter so it would be extra smart to run your bridge somewhere else but in terms of testing things by all means run them on the same system and if one of them falls apart in an interesting way please let me know yes how are we going to stay low profile and attract a lot of users well we're pretty low profile right now and we've got two hundred thousand people using the tor network I mean the DEF CON audience knows about us but if you go up to some random person on the street then they have no idea what tor is I mean much of what we've been doing lately is I want to finish I want to build an anonymity system that actually works that's usable that's scalable we've only got two hundred thousand users right now not because two hundred thousand people in the world care about anonymity and the rest of them think it's stupid we've only got two hundred thousand users because that's all the network can handle there are millions of people who've tried tor and said this is not for me yet and they're sitting around waiting for us to get it better so I want to make tor good and then we'll have solved the problem and the world will save itself from there and in terms of making the right people know about it social communication is the way it works go tell your friends who need one of these or go tell your friends who are in a position to run a bridge and word can spread through that without having to be on the front page of any newspaper and I imagine the social connections are much more robust at least in America infiltration and observation from various attackers I guess I'll find out I saw another hand around somewhere yes okay so the question is if you're already going to be using these nodes as bridges and they're temporary but they seem pretty useful why don't you just make the tor network bigger why don't you just have a much larger tor network so one answer is sometimes people don't want to be listed in the tor network sometimes people say but I mean that big list and somebody's going to think that I like anonymity and I'm not willing to commit to that so we've got a use for those another answer is bridges can be quite small they can be 10 kilobytes a second 20 kilobytes a second that's not all that useful on the main tor network where we have 50 servers that are 5 megabytes a second but if you're small and you want to be useful and you're not willing to be in the main directory then being a bridge is the perfect thing to do and that brings on another challenge which is what I really want to do is I want to have a button that says help out and it should decide in what way you help out it should check out your bandwidth and check out your stability and if you're really good and really stable then you should jump up and be a real tor server and otherwise you should be a bridge and the only challenge to that is that I really shouldn't have it so everybody joins the main tor network and if they suck then they become a bridge because then we just made a list of all the people who are about to become bridges so it's a little bit tricky but I think I can manage it other questions, comments, thoughts ok so the question is have you thought about making it harder to get the list of main tor servers or have you thought about making it harder to get the whole list we thought about it for a while but I guess there are two questions two issues with that one of them is tor clients really do get a list of the tor servers because otherwise they don't know who to build their circuits through and they really don't want to get a different set for each user if you looked at Nick's talk in the last session if you end up with this Alice knowing this subset of the network and this Alice knowing this subset of the network and they don't overlap completely then you start leaking information about well somebody used that server and only these guys knew about that one and they came from here a very small subset knew about both this one and this one suddenly you can start to narrow down who your users are based on what subset of the directory they know about so we really need to make sure that every user is working on the same set of information and the easy way of doing that is making sure they all know all of it a more exciting way of doing that which we'll be working on in the next few years is maybe we can break the directory up into ten chunks and then Alice flips a coin and if she flips a four then she goes and behaves like all the Alice's that also flipped a four so it isn't that everybody has a random subset it's that we partition the directory and we partition the users and then we can start wondering about how do we blend them together how do we make the people who chose four share the same anonymity set or some anonymity with the people who chose six that's also a tough scalability question there's something about that because right now we've got a thousand servers and every server can connect to every other server and that's a file descriptor per server and the default number of file descriptors you can have open is 1024 on Linux and BSD and so on and it's easy to fix you just go edit a text file but boy do users not like the phrase you just go edit a text file so we'll have to do something about that did that answer the question there were many versions of it and I answered many things try again if it didn't other questions, comments last thoughts I'm going to a breakout room there's a big guy and a red shirt over there who's going to drag me off and if you follow me you'll find out where I end up also I don't know where it is but it'll be breakout room four I guess or QA room four so yeah as another announcement there was a an interesting new attack that we found out just a few days ago on the Tor network and we have patched it with the new stable and the new alpha releases and if you are a Tor user and you also use Vidalia or other things that involve having an open control port and if you don't know what that is you are one of those people upgrade, no really upgrade we'll tell you in a few weeks the details but we're giving users and users a little while to upgrade isn't DEF CON fun okay, last chance thank you, I'll be around for the next