 Okay, so I'm Roger Dingeldine from the TOR project, and I'm going to tell you a little bit today about how TOR works and why the performance problems are what they are. And there are a lot of different things that we could talk about. I'm mostly going to focus on a crash course on TOR just to give you the three-minute overview of what it is, and then I'm going to give you all sorts of problems that we've been trying to figure out. Why is it slow in this way? Why is it slow in that way? How can we improve it? So TOR is free software open source. There are about 1,500 to 2,000 relays all around the world who are pushing traffic for a few hundred thousand users. It's hard to say for sure because it's an anonymity system. We've been working lately on how to actually count the number of users we have in a safe way. I'll talk about that in a little bit more later. One of the neat things about TOR is that it comes with a full specification and documentation RFC style of this is how we built it, and these are the security properties we think we get, and here's why we think we get them. Another interesting point, a few years ago TOR was just a couple of people working on development, and at the end of 2006 we turned into a real live US non-profit 501C3. There are now a little bit more than eight full-time people being funded to work on TOR, so that's pretty cool, and we've got a lot of neat things that we've been working on from that. So we started out funded by the US Department of Defense, the Navy, and from there we were funded by the Electronic Frontier Foundation. So our claim to fame in that respect is that we're the only free software privacy system to be funded both by the DOD and the EFF. And part of the fun of that is going to each of those different groups and trying to explain why they all need privacy, security, and so on. So when I'm talking to researchers, I tell them I'm working on an anonymity system. But when I'm talking to my parents and my grandparents, I tell them I'm working on a privacy system, because anonymity, I'm not sure about that, but privacy, yeah, that's good and wholesome. And when I'm talking to Google and Walmart and so on, we work on communication security or network security, because privacy, that's stupid, that's just for individuals, but security, good point, I do need that. And then when I'm talking to governments and law enforcement people and so on, I'm working on traffic analysis resistant communication networks. Again, it's the same stuff, it's the same security properties and so on, it just depends how you phrase it for them. Because they've already got all this snake oil security that all these corporations around DC have been selling them. So they're all set on security, but traffic analysis resistance, good point. I do need to send my agents to Israel and not have anybody know which countries they're talking to. And then the fourth category that we've learned about recently are blocked users. People in China and Iran and Thailand and the list is growing at this point. People who can't reach all the websites they'd like to reach. And it turns out if you've built an anonymity system, something that prevents a local user from watching the user and figuring out where they're going, then they can also use it as a circumvention system. If you can't tell what website I'm going to, then you can't prevent me from looking at BBC. So how do you build one of these? The easy answer is you put a computer somewhere and everybody relays their traffic through it. And it works fine for a lot of commercial companies. They promise they're not going to look at any of the traffic. Okay, okay, they look at all the traffic. They promise they won't log any of the traffic. Okay, okay, they write it all down. They promise they won't tell anybody what they see. Okay, sometimes they tell people what they see. We want something stronger than that. We want something where the middle relay isn't able to break the whole system and just promises that it won't. So the goal for Tor is distributed trust. The idea is since we're routing the traffic over more than one relay, over more than one hop, then no single relay in the network gets to learn about both Alice and Bob, gets to learn that this Alice is looking at this website. So if R1 is bad, then he knows that Alice is using Tor. He just doesn't know what she's doing. And if R3 is bad, he knows that somebody is talking to Bob, but he doesn't know who. And if they're both bad, then we're screwed. See last year's talk for a while. So far so good? Okay, and there's crypto, but I'm not going to talk about that. So that was your crash course on Tor. Hopefully everybody here knows how Tor works now. So there are a couple of interesting things to point out. How many people here have used Tor lately? Excellent. Okay, so we pulled down a 50 kilobyte file through Tor, 13,000 times over the past two months. And now we've got a histogram of what sort of download times we actually got. So on average, it took about eight seconds to get our internet. Because look at that huge tail down there. It goes down to like 20 or 30 seconds. Quite a few times, if you're trying to load a website, then it's going to take quite a while. And then we've got another. So that previous slide is this little blotch down here. This is the 50 kilobyte file. We've got a box plot where the median is eight seconds or something. And the one standard deep below is like five seconds. And one standard deep above is like 15 seconds. Now let's pull down a one megabyte file every five, 10, 15 minutes or something. How long does that take? And the answer is about 50 seconds on average to pull down your one megabyte file. So these things suck in both directions. First of all, eight seconds is way too slow. And second of all, what, 50 seconds to pull down a big thing? That means there's a lot of capacity. There's a lot of bandwidth, but it's bad for web browsing. So we started out originally developing TOR saying, let's try to provide the most bandwidth we can to the users. Because obviously that will mean that you get good enough latency and you get good enough other network properties. And it turns out that's not true. Providing really good bandwidth to people means that the folks who want lots of bandwidth end up getting a good service out of the network. And the folks who want other properties like latency don't get as good service. So we've got two problems here. First, the people who are web browsing takes like eight seconds, that's no good. Second, the people who are doing bulk transfer over TOR, you can pull down a five megabyte file over TOR and the average is, what, three minutes, four minutes? That's not how it was supposed to be. So there are a lot of different things that we need to do to address this. And I've got a list of six problems that we need to look at, and I'll walk you through a few examples of these six problems over the next hour or so. And I've ordered them by priority. We need to solve the first one before it matters that we've solved the later ones, so, okay, sounds good. So the way that TOR works, you've got the first TOR relay and the second TOR relay. And there are a bunch of users that are trying to send traffic back and forth between them. So the way TOR works is you've got one big TCP session going, that nearing perspective, it's easy to build, easy to analyze, easy to know what's going on. But it's really bad from the flow control and congestion control perspective. Because if you've got 50,000 streams going back and forth, all on this one TCP stream, TCP has only one way of backing off, of telling you, you're talking too fast, you should slow down. And that's, I drop a packet and you're supposed to notice that the packet got dropped and you're supposed to slow down. And that means that that big TCP stream slows down for everybody. So if one guy is talking too fast out of those 50,000 streams, then they all get punished for it. So the solution is kind of complex. We have to switch from TCP pairwise to some sort of end-to-end datagram, UDP type protocol. And there are a lot of pieces to this. One of the big pieces, if we end up doing this, we're going to grab the IP packets at the client end and we're going to encapsulate them somehow and do some crypto. And then get them to the other end of the Tor network, de-capsulate them, and push them onto the network. And you remember all those OS fingerprinting, KSO, and so on tools where you look at an IP packet and you say, that came from Linux 2.4.3 running patch level of this? As soon as we start sending the actual IP packets from here over to there, then people are going to start being able to fingerprint Tor users. They're going to start saying, well, he's anonymous. He's using Tor. But I know that he's using FreeBSD 2.7.0 something and he's the only guy in the network who does that. So what we really need to do, if we're going to solve this, is how do we come up with a secure, maintained, free software, portable, user space, TCP stack? Those are a lot of properties that we need and there isn't one out there. So I'm going to skip over the rest of this high priority item because, God, we don't know how to work on it very well. I talked to a lot of systems people who say, oh, let me tell you exactly how to solve it. You just do this. And then I go to another systems professor and they say, oh, that's stupid. You don't want to do that. You want to do exactly this. It's obvious. And I've talked to like 15 systems professors. And they all have a totally different idea of what should be done and what is totally stupid. So more work remains. Hopefully in a few years, I'll be able to show up here and give you a good idea of how to build a secure datagram approach for TOR. OK, so another problem with this whole problem number one, congestive control, flow control, and stuff. So the way TOR does its flow control, you build your circuit and now you're going to do a stream, say, I want to pull down BBC or something like that. So you make your request to the other end of the TOR circuit and it says bbc.com. And you have what's called a window, a circuit window. TCP has these two. And the goal is it dictates how many cells that side is allowed to send before it needs some sort of acknowledgement that all of those cells have gotten through. So the default circuit window that we picked was 1,000 cells. Should be plenty. It's good performance. Remember, we were focusing on bandwidth. So that means that when I make my request for bbc.com.c.co.uk.cord80, then it can send as many as 1,000 cells before you need an acknowledgement from the client saying, yeah, all of those got through. And the goal here was congestion control. Imagine we didn't have that window. Or imagine that window were 20,000 cells. Then I asked for the 20 megabyte file. And it dumps the whole 20 megabyte file onto the network. And then that big lump of bytes were just way through the network. That's bad, especially if a lot of people do it once. So the goal for this whole circuit window thing was we're going to limit the number of bytes that a given stream or circuit gets to put onto the network at that time. And 1,000 cells is 512 kilobytes. That's a whole lot. So imagine you're a small relay. You do 25 kilobytes a second on average. And you have a whole lot of people who are opening streams. And each of them adds half a megabyte to your buffer. So at this point, the buffers on the exit nodes are really big. And each of those lumps tries to move forward. And basically, there's a whole lot of data queued inside the Tor network right now. So our hope is, if we reduce that 1,000 cells to 100 cells or something, that every time you make a new request through Tor, you're not going to dump a whole lot of data onto the Tor network. You're just going to dump it a little bit. And maybe that means that there will be less congestion, less stuff queued. Hard to say. OK, so that was the first issue. And I mostly skipped over that because there are hard problems that we don't have solved. Let's talk a little bit more about some other problems. The number two problem is, Tor, is that there are a few people doing bulk transfer file sharing, rapid share, stuff like that. Then I did some research with a fellow at Rice University in Danwalec a while ago. And we had a simulation Tor network where we had like 10,000 web browsing users and like 20 BitTorrent users. And the 20 BitTorrent users ruined the whole thing for the rest of them because they just went out and started pulling bytes as quickly as they could. So we're seeing the same thing on the internet right now. We shouldn't be too surprised that a small number of file sharing users could break a whole network of other people trying to grasp the web. So there are some other lessons we should learn here from economics. So if you increase supply network capacity, then users will show up to take it. And that seems pretty reasonable. We used to think that there was going to be an equilibrium, we would say, OK, imagine the Tor network has the following capacity. Once we get slash dotted, then a whole lot more users are going to show up. And performance is going to get worse. But then a lot of those users are going to say, man, this sucks. They're going to leave. And then performance is going to get better. And there should be some equilibrium, some number of seconds latency that users are willing to tolerate. And if there are too many users, then they'll leave and the latency gets better. And if there aren't enough users, then more will show up and the latency will get worse. So there should be some equilibrium. The problem with this is that different user classes have different tolerances for latency. So if you're a web browsing user, you click, and you want your web page to show up. If you're a file sharing user, you click, and you go to sleep. You hope you have your DVD in the morning. And while you are asleep ruining the Tor network, there are a lot of people in Iran trying to click, and they can't get their website. So that's the fundamental problem here. And we need to figure out how to solve that somehow. So the first approach is maybe we need to find the circuits that are pushing a whole lot of cells, and we need to slow them down. We need to say, you're talking too fast. I'm not going to let you talk that fast, because there are a lot of people talking more slowly, and they should get priority. So the way this actually works is a lot of the relays out there are rate limit, and that's a good thing. They should do that. So they only deliver a certain number of cells per second. So the question is, let's imagine I've got 5,000 circuits all waiting, and they really want to send some bytes. Which one do I choose from? Right now the answer is we do a round robin. We say, well, I picked you last time, so I'm going to pick the next one, and the next one, and we just cycle through. And that means that a lot of the circuits that are always active, that always have some cell they want to send, they're going to be pushing out most of the cells onto the network. So if you don't always have something to send, you're going to, the rate limiting, the number of bytes I can write onto the network this second are going to get used up right quick by the fast guys, and the slow guys, the ones who don't have a cell waiting right now, they're going to have to wait for the next second. So we've got some actual graphs that we did running. So we ran these nine tour relays up here, and we tried to figure out how many cells each circuit sends on average. So we ranked the circuits by deciles, by which ones are the most loud, how many cells are sent on each circuit. So over here we have the slowest ones, and the circuits that send the fewest cells only send one cell or two cells or something. And over here we have the fastest 10% of the circuits are sending about 5,000 cells. So that's two and a half megabytes or something like that. So the really fast circuits over tour are on average sending several megabytes. And if we look at it a little bit closer, so this graph is the same ranked number of circuits. And on the y-axis we've got the average number of cells that are queued waiting in the buffers that can't get out because the circuit is just trying to push too much stuff. So over here, the top 10% of the circuits have a whole lot of cells queued. They've got a lot of stuff buffered up. They're always saying, ooh, me, me, me, I want to send the next cell. And all the rest of these guys generally don't have anything waiting. So that means that if we're rate limiting, then the guys that are really loud are going to get their cells out, and they're going to fill it up for the rest of the world. So we need to come up with some algorithm for doing priority better, for load balancing better, for being not quite so fair to the folks who are trying to take up more of the network. What algorithm should we use? How do we know if we've gotten it right? You're going to see this theme come up every so often. We need some sort of tour simulator or really brilliant guy or something to imagine what happens if we use this algorithm. The only option we've got right now is deploy it, make everybody upgrade, see if we did it right. That's not really the right option. Okay, so there are a couple of other options to deal with really fast people. One of them is, maybe we should throttle bandwidth at the client. Maybe your tour client should by default rate limit you to 20 kilobytes a second. And you can burst higher than that, but the average should be enforced at 20 kilobytes a second. On the one hand, that's a horrible idea because clients out there will say, well, I don't want to do that. It's free software, open source. I'm going to fix it, and now I'm going to not be rate limited. Most Windows users don't know how to do that, so it might be easier than we think. I worry that somebody will fork tour and I'm going to give you the fast tour because the slow tour is slow. On the other hand, maybe it's a good idea to rate limit for security. There's an attack that I talked about last year that some folks at Columbia University are working on where the faster you use your tour, the faster they can hunt you down. So if you're pushing 80 kilobytes a second all the time, then you're much more noticeable and they're gonna be able to identify where you are on the network. But if you're always pushing at most 20 kilobytes a second, then you blend in really well and they can't figure out where you are. So that's really, maybe there's a good idea behind the whole throttling thing. Okay, so I'm busting through various examples. Am I going too slow, too fast, just right? Perfect. Okay, so the third issue, not enough capacity, we need more relays. You guys should all run relays. Why am I putting this thing number three? Because we had a, there was one of our funders a couple of months ago said, tour is slow, we're gonna stop paying for development, for code and improvements and so on. We're just gonna run a lot of relays and then tour is gonna be fast, right? So it turns out it doesn't work that way because if this congestion control problem doesn't get solved then we get a lot more capacity but we're still gonna have that one TCP stream that's slowing everybody down when anybody talks fast and if the bulk transfer users aren't somehow made more fair then they're just gonna fill up whatever capacity we add. We could double the capacity of the tour network and some bit torrent dude out there is gonna get really happy and the people in Iran are gonna stay pretty sad. That's the theory at least. On the other hand, so we got to test this a little bit with the Iran stuff lately. So here are the number of running relays look mostly at the black line up here. The number of running relays from mid 2006 to right about now. So over the course of 2008, the number of relays dropped and we'll see in a bit that's because there are a lot of people in Germany saying I'm not sure what this data retention stuff is gonna mean for me but maybe I shouldn't run a relay anymore. And then right here, the number of relays spiked from like 1500 to like 1900 and that was a lot of people saying, people in Iran, activists in Iran really need to be able to use tour because tour is the only safe tool that's working right now. And if I run a relay that I'm gonna make tour faster. So the question is, did they? Because here we moved from like 1500 to 1800 in a few weeks. So we should have some idea of does adding a lot more capacity actually change things? And the answer is yeah, maybe it does. So here's a look at the median 50 kilobyte latency over the course of that time. It starts at nine seconds or something and it slowly works its way down to like seven or eight seconds. So that's not a great improvement but the answer is yes. If we add more capacity then we actually do end up with a faster tour network. And if we look at the one megabyte graph, it's even clearer. It starts at the 72nd mark and it ends up down to the 52nd mark. So that's great. Add more capacity, get more performance but we're getting better performance for the file sharing people, better performance improvements for the file sharing people than we are for the web browsing people. So more capacity is good, we need to do it but we also need to solve the first couple of problems or we're not actually gonna end up making tour better for the folks who want a web browse. So what other approaches can we work on for capacity? Jake and I have been doing talks all around the world trying to teach people about how tour works and maybe they wanna help out and add some more relays. Everywhere Jake goes, he leaves a stream of 30 or 40 people running relays. So that's a good start. We also wanna provide some tools that actually allow people to know how their relays working and get reminded. So maybe we should set up a mailing list just for relay operators. Every so often I get mail from the fellow running the Boston University exit relay saying, I'm getting DDoS'd and it's pretty cool. They're DDoS'ding me for like 150 megabits per second and they've been doing it for 48 hours. Wow, this is neat. But he wants to know if anybody else is getting DDoS'd and my answer is well, I've only heard from you so I guess it's just a, but if we had a mailing list for these people then they could talk to each other and say I'm having problems with this configuration. What do you think about this? Do you have any scripts? And then another one we've been working on is a CGI called Tour Weather which you show up to the website, you say this is the email address that you should mail when this tour relay goes down. And if we had a simple database like that that was checking whether the relays are up and reminding you that it goes down because we hear from a lot of really enthusiastic volunteers who say I really want to run a relay. I set it up, it's great. And then their computer reboots and they didn't set it up to start on boot and they think they're running a relay but they're not. So if we had some way to actually help remind them of these things, maybe we'd be doing better. So here's the graph I talked about earlier. The number of relays over the past couple of years by country. So this sort of orange blotch here that goes up a lot in response to perhaps one of my talks in Germany a few years ago and then goes down in response to maybe the data retention stuff. And then we can see that US was actually the one that spiked from the Iran thing. There weren't a lot of people in Sweden and various other countries who cared particularly about that and said I gotta set one up now. The France spike is coming from their fine new three strikes and you're off the internet for a year law. Chat with me afterwards if you wanna learn more about that. But there are a lot of people in France saying well wait a minute I have to preserve the privacy that I've got. How else can we deal with capacity issues? So another one is some sort of incentive mechanism. Wouldn't it be great if we could say because you're a relay, you get priority, you get faster service, your packets get through faster. And there are a couple of approaches here. One of them is a very crude approach that I've been working on which is basically two divisions. Either you're a good guy, you're a relay lately or not and the good guys get first dibs and then a more complex one would be a micro payments approach where you get to get micro payments in exchange for providing service and then you get to spend them to get better priority for that circuit. So these are all great. The design of them seems to work. We looked at the architecture, the system side and they do in fact provide better service. The problem is an anonymity attack that we don't know the answer for. So imagine you get better service if you're a relay. So here you are wikipedia or something and you want to know who your users are and you see somebody connecting to you really quickly. You say, I don't know who he is, he's anonymous but he's getting good service, so I bet he's a relay. So you go and you look at the list of 5,000 relays or whatever it is at that point and you take a snapshot and you say, I don't know which one he is but he's one of these 5,000. A week later, the same guy logs into the same wikipedia account and he's got fast service and you say, I don't know who he is but he's one of these 6,000 people from this list and you take an intersection of the first list and the second list and now there are only a thousand people who are up at the same time and then he logs in again and again over the next while and pretty soon you've narrowed it down to these six people were the only ones who were relays every time this guy was getting fast service logging in and that sucks and I don't have an answer for it. The intersection attack is a big problem in anonymity systems and has been for years but are there any ways to do incentive mechanisms where you're primarily rewarding people who aren't publicly listed as being rewarded? A fine open question. Eventually we want everybody to be a relay. One problem is the problem I was just talking about. If everybody's a relay then you're in the list of whether you're online right now or not and that means that somebody can take a list of the 300,000 users who were using Tor yesterday and the 350,000 who are using it today and the 250,000 who are using it tomorrow and they can intersect those and try to figure out who's online using Tor every time this anonymous guy logs into this slash dot account. That's bad news. There's another piece of bad news. There's an attack that people have been working on lately which is if you're running a relay then I can send traffic through you and I can slow you down and see if the other side slows down. See my talk from last year for a lot more details on these congestion attacks but until we know how bad that attack is I feel kind of bad saying help out Tor by making yourself more vulnerable. I don't know how bad it is at this point. Right now most of the people who run relays are not running them on their home computer and also using them as a client so I think that that's a perfectly safe approach but we really want to get more Tor users who are browsing the web through Tor to click the button saying I want to become a relay now. How do we figure out how to make that safe? Okay, so problem number four. The load balancing isn't doing so well. How's my timing? Do I have a goon over there that has any idea when I started or ended? No, I'll just keep on going, sounds good. So the fourth problem, the load balancing is not as good as it could be. So the goal for Tor is we've got a bunch of relays. Each relay should say I need to figure out how fast I am so I can advertise the bandwidth that I've got and then clients can weight their choices based on the bandwidth that each relay has. So if I'm a one megabyte per second relay and there's a 100 kilobyte a second relay I should get chosen 10 times more often than he should because that way the clients are gonna put their load correctly onto the network. There are a couple of problems with this. The biggest one is that we're not actually choosing the numbers correctly. So Mike Perry did some actual active tests of the Tor network, I think it was a few months ago and over here on the left hand of the graph we've got the really fast relays. These are the ones that are advertising a whole lot of bandwidth and if you use only those you get like 50 or you get 80 or 90 kilobytes a second on average through Tor. If you're using the rest you get more like 10 kilobytes a second. So we'd rather have something that doesn't look like this but maybe something that looks like this or something a little bit smoother or more fair so the nodes that have a whole lot of bandwidth are getting more use and then the Tor network becomes faster. So this is a problem for a few reasons. Once we were doing the bandwidth self measurement stuff we had all these academics saying woohoo I can attack Tor I can be a 10 megabyte per second relay and I can just lie I don't have to provide the bandwidth I just say that I have it and then I attract all these users and I can start doing my attacks on them. So we had to cap it at 10 megabytes a second because there were people who would have shown up and said woohoo I can be a two petabyte per second relay and now I'll be 99.97% of the network and I'll attract all of the users. So we needed to put some upper bound so they weren't able to attack the whole thing. Now that we're doing the active measurement stuff then we could potentially start using better numbers we could use higher numbers. We've got Tor relays out there that can easily push 50 megabytes per second. Do we let them? There's a trade off here I mean if you're actually pushing 50 megabytes per second and you're attracting that proportion of the users you're 6% of the Tor network. Do we really want some dude to be able to be 6% of the Tor network just by buying 100 megabit per second switch and having good connectivity? On the one hand no because we really want a lot of different servers so that you can get your anonymity. On the other hand yeah if every Tor user could get their web pages quickly that sounds great. So how do we balance anonymity with performance? And a couple of other problems I'll skip over some. What about one hop paths? I still hear from a lot of people who say I don't wanna do that three hop thing that distributed trust stuff I don't really care about security actually I just wanna fetch my BBC videos and they'll only let me do it if I'm coming from England so I really wanna you know I just want one hop I don't care about anonymity and we first for a long time had been saying no you shouldn't do that because our load balancing algorithm expects you to be choosing your paths like this and if everybody's choosing their path like some other thing then the Tor network is gonna suck even more you're gonna ruin it for other people. Now that we've got some sort of active measurement stuff going on so we can actually check who has how much bandwidth and update the bandwidth numbers accurately then that's not so much of a problem if everybody is clobbering just our exit relays then at least we'll notice and we'll crank down the capacity that they advertise but if exits are actually scarce if everybody is doing this one hop thing and they can only use the exit nodes because you can't one hop proxy through somebody who won't let you exit then I mean right now about a third of the network is exit nodes if people start beating on these things even more then Tor is gonna slow down for everybody including the folks trying to use it just as a one hop proxy if the bottleneck is that third hop then don't worry about the first and second hops once we solve this congestion stuff and this priority stuff and so on then the reason why Tor is slow is not gonna be because you're doing multiple hops I mean the speed of light around the world is kinda limiting but we're talking way less than a second we're not talking five seconds so hard to say the real problem here the real reason why we're not happy with people hoping to use Tor relays as one hop proxies right now because of the distributed trust design if you see a connection coming from a Tor exit node you know that that guy has no idea where Alice is if a bunch of our users start using Tor these relays as one hop proxies then you don't I mean maybe maybe he knows who Alice is maybe Alice is only using one hop so I should go bust down that door and collect that computer and find out cause I might get lucky so it's really a binary the exit nodes know none of their users or the exit nodes might know a few of their users and we really need to keep it in that first case or a bunch of the relay operators are gonna have a lot more problems and I've heard from a lot of relay operators who say I'm not comfortable running a relay if people are able to use it as a one hop proxy so you should come up with a way to prevent users from using me in that way sounds like they're having fun out there okay so problem number five it's not just the high latency it's the high variability I talked to a lot of users who say Tor is slow but if I knew it were eight seconds and it were always eight seconds then I would I'd be able to predict what's going on I'd be able to you know I'd anticipate it I could go for a coffee if it's gonna take longer the problem is that it's really variable so an example of that Tor's rate limiting uses a token bucket system and the idea is every second we put in the next seconds worth of bandwidth so if your rate limiting to 30 kilobytes a second then at the beginning of the second you get permission to send 30 kilobytes and if you've got 30 kilobytes queued right then then you send it right then and then you wait for the next second in order to get more permission to send more bytes so at the beginning when we didn't have very many users this was great because not very many of the relays ran out of their token bucket ran out of their permission so it was all smooth and whenever you get a byte then you send it on at this point all of the relays have more than one second worth of stuff queued at them so that means that at the beginning of each second they send out as much as they're able to for that second and then they go silent until the next second and they send out another burst first this is horrible for TCP it doesn't like burst, silent, burst, silent we actually, so Karsten the fellow who's been doing a lot of these measurements did a graph of how long it takes to build a circuit so we've got the red one is how long it takes for the first hop to finish the green one is how long it takes generally for the second hop the blue one is a third hop and he showed up and gave me this graph and he's like there's this weird hump stuff going on at the beginning of every second that's when most of my circuit hops finish what's up with this and the answer is it's that rate limiting thing I was talking about before Tor sends burst of traffic at the beginning of each second which means that if I'm trying to extend my circuit and the response is coming back over the Tor network it's gonna get stuck in some queue and wait to the beginning of a second and then it's gonna get pushed and then it'll get stuck in the next queue and wait to the beginning of the next second so it basically takes one second to get from one relay to the next which means that we're forcing a minimum of three seconds latency just because of the fact that we don't spread out our rate limiting over the course of a second so another thing we need to worry about circuit build timeouts right now I think the timeout is two minutes you have to build your circuit within two minutes or we give up and we're not gonna use it so some circuits finish building really fast one or two seconds is all it takes and you've got your circuit others of them take 20 or 30 seconds to build so the first thing to keep in mind if it takes 30 seconds to build your circuit that circuit's gonna suck it's not gonna give you good performance so we can look at how fast it takes to build the circuit and predict how good that circuit's gonna be so the next step is if it takes too long to build the circuit throw it away don't even use it and it would be great to have a fixed timeout to say if it takes more than 10 seconds to build your circuit then throw it away nobody's gonna use it but if you're some poritory user in Madagascar or something and you've got a satellite uplink and you only get your packets every so often then if we put a fixed upper bound at 10 seconds you're never gonna build any circuits so the answer we've got in mind now is every tour client should watch what it sees in terms of how fast it takes to build circuits for it and build a little database inside to figure out what the media and the standard divs and so on that it's been seeing lately and if it takes me more than a standard div above what I've been doing lately then throw that circuit away and hopefully that will make it more adaptive yet still be able to throw away the circuits that are probably not gonna be good we need to do something similar for stream timeouts so right now the stream timeouts are hard coded if it takes 10 so a stream is a request for a website like BBC or something like that so if I ask for BBC port 80 and it takes more than 10 seconds to get the yes I've got a connection then I say well I guess that circuit sucks I'm gonna throw that away and I'm gonna try a new one and if it takes more than 15 seconds for the next one I'm gonna throw it away and move to a new one so we've got a fixed timeout of 15 seconds if there's some guy in Zimbabwe trying to use a modem or a more important answer these days if there's some guy in Iran who really wants to get his message out but he only has 15 seconds or his tour gives up then he's not gonna be able to connect a lot of the times and he'll start saying well I was planning to be patient I'm willing to spend 30 minutes to get my tweet out but this tour thing keeps giving up way before I would and then the last one I think I'm probably running low on time is my goon back in here do I know how much time I have? Okay, okay sounds great. Okay so the sixth category we need to look at so this whole directory overhead stuff how many people here have heard a previous tour talk or heard me here talk about directory stuff and things like that some hands not quite as many okay so the basic idea for Taurus directory is we wanna somehow give every client a list of all the relays out there their addresses, their keys, exit policies stuff like that and there are a couple of different contradictory goals here one of them is we really want every client to learn about every relay ASAP because if there's a relay on a dynamic IP address and it's gonna disappear in a little while then we really wanna use it right now before it disappears on the other hand we really want every client to have the same view of the network as every other client because if each client has a different set of the relays that it knows about then you can start partitioning the users you can say well I know that Alice didn't know about that relay so the person who just used that relay wasn't Alice and these attacks get a lot more complicated and potentially a lot more deadly and then the third goal is we really wanna scale it so that you can have 20,000 or 30,000 relays so there are a bunch of different approaches we've been using for this over the years at the very beginning when we had 50 relays it was just a big text files big pile of server descriptors and each server descriptor says this is my address and keys and exit policies and so on and the top of that big blob was a little summary of these are the relays that are up these are the relays that are fast, stuff like that so we switched a few years ago to what's called the version three directory design where we have a network status which is that summary it's about 150 kilobytes or something and for each relay it says this is the server descriptor you should get here's the address this is whether it's up here's the capacity stuff like that and then the client fetches that pretty often and then fetches the server descriptors with the actual keys and stuff like that only as they need to so each relay puts out a new server descriptor once a day which means that the clients are only gonna update their information about that relay once a day so this is a lot better but we're still having clients fetch 150 kilobytes every couple of hours and then something like three megabytes daily spread out over the course of the day that works fine for us but what about that poor person and name some other country Onomotem who spends 20%, 30%, 90% of his online of his online time downloading towards directory information that's not what we're meant to do so the next step is we're gonna make it so clients don't need to learn descriptors at all everything that's in a descriptor we can condense it down and either put it in the consensus which means you get it every couple of hours automatically or we can put it in this new thing called a micro descriptor which just has really long term information that doesn't change except maybe once a week or more than that so the clients when they're first bootstrapping they'll have to fetch this but after that they won't have to update it daily so there are a couple of other things we could talk about one of the challenges we've got how do we know whether we should do a given design change I've talked about a whole lot of different ideas we have here there's a problem with congestion we should change to this there's a problem with circuit windows we should change to this which one should we actually do when I'm talking about giving different priorities for different cells based on which circuits have been really loud lately or not so loud how do we actually tune that how do we figure out well if you've been 90% louder than the other one then you go first otherwise and how do we know whether we've gotten them right so it'd be really great to have a tor network simulator that knows enough about what cells are in circuits and so on but doesn't actually model every single TCP packet going back and forth there are a lot of systems people out there who are happy to say yeah I've got this thing but it can only simulate it like a hundred times slower than reality because it has to keep track of all the sequence numbers and all of that so it'd be great if somebody could build a simulator that only paid attention to the right stuff doing measurements is a really good start every week or so Karsten comes up with a new graph saying I have no idea why there's a bump over there what's wrong with our network such that we can fix that and we've got actual data there are a lot of we've got a person working full time at this point trying to figure out how do I measure tor safely how do I figure out how many users there are what countries they're coming from how large their downloads are without learning anything about who they are or where they are or what they're doing and stuff like that so this is a challenging research problem but we're actually starting to put out some graphs and data and stuff like that and then the real challenge what about the anonymity implications of all these things we've got a lot of performance versus security tradeoffs to do how do we choose the right balance on the one hand if tor is not useful except for the guy who's downloading a DVD while he's asleep that sucks on the other hand if tor is pretty secure and if you're dedicated and you're patient then it will work for you and it provides better security than it would otherwise then that's great how do we get the right balance and with that I've got one last slide that we put on our blog a little while ago of the number of users showing up from Iran and from China over the course of June and the Iranian election happened right about here and from there it spiked up to something like 8,000 people a day connecting and on the one hand that's kind of small there are millions of people there on the other hand if each one of these people is trying to get a tweet out or trying to learn where the next event is or something like that then you can coordinate pretty well with a small number of people learning and providing information and at the same time I've got a graph from China here this was the day when China said no Gmail for you we're gonna block Google search we're gonna block Google mail we're gonna block Google calendar other stuff and there were a lot of people in China who ordinarily were saying you know our firewall isn't so bad mostly they keep us safe and suddenly there were a lot of people there saying wait a minute they can do that they did do that I'm not so sure I like being firewalled anymore but that's a separate talk and I have a few minutes for questions I think and then I'll get bundled off somewhere else shout it out real loud have I considered using Planet Lab or Genie for testing tour large scale tour is bigger than Planet Lab so there are a bunch of researchers who are using Planet Lab for their tests part of the challenge with all these immersion behaviors with tour is that tour worked great when we only had 100,000 users and there were a thousand relays everything was fast and smooth and there were no surprises and then when the network gets too overloaded and we end up with several hundred thousand users and I don't even know how many users we have then it all started going bad in funny ways so the real challenge is once you push it past its breaking point how does it break and stuff like Planet Lab should help us with that sort of thing one last question before I'm pulled off the yes so L7 is the DPI thing that says I know what protocol that is right so the question is have you thought about using DPI on tour to censor certain protocols because then the network would get better we thought about it yeah it seems like a bad idea though for a couple of reasons one of them is that we're big fans of anonymity and we don't want to know what you're doing another one is once you start doing the whole I'm looking at your data and deciding whether it can get through a thing then you suddenly switch what sort of liability bucket you're in those are the big ones a third one is there are a bunch of people doing HTTP downloads that are also hurting the network so I don't want to blame just BitTorrent another one is the BitTorrent guys are already engaged in this arms race with telephone companies you've heard of encrypted BitTorrent and doubly encrypted BitTorrent and really now it's totally encrypted BitTorrent I don't want to get into that arms race so we'd like to solve this without carrying what's in the packets pardon me room 104 is where we will all go once we get out through that tiny little door thank you