 Hey everybody, this is your noon talk in track 4, technical changes since the last tour talk. I'm Nick Mathewson. I'm one of the two main tour developers. The other is Roger Dingledine who's sitting there in the fourth row eating some pizza and he'll be talking at two. Two announcements before I start. The one o'clock sessions in track 5 and track which one? And track 3 have been swapped so that Dan Kaminski can have a bigger room. The second one is upgrade to tour 01216. No, we're not kidding. Seriously, we're not kidding. Upgrade. We have reasons. Alright, so let's go back to the future or 2004, which was the last tour talk at Defcon. We had a pretty good tour back then and it was kind of a small network but it was up and coming. A lot of cool hackers were using it. Pretty much only cool hackers and very, very determined people were able to use it because there was no GUI and you had to edit some files and read your logs and hope for the best and configure your applications. But, you know, we managed to get a couple of Defcon talks out of the deal and that was cool stuff. So what have we been up to since then? Mostly we've been hacking on tour all the time. At least that's what I'm going to be talking about today. There's other neat stuff we've been up to but this is the technical changes talk. We've been working on security. It turns out that our software wasn't perfect in 2004. Funny that. We've been working on scalability. It turns out that writing an anonymity network that can support a couple thousand, a couple tens of thousands of users, gives you the sort of software that will pretty much fall over as it gets to hundreds of thousands of users. And so as we've added more and more capacity, more users have shown up. Also we've worked on usability. We're fixing usability bugs, adding GUIs. We've worked on performance too. That's not so much on the slides but I'll talk more about it later. Integration. We've worked on trying to make it so that you can anonymize different applications with tour that you couldn't be for. And lots more. We have a change log. We record stuff pretty religiously about what we've done. Check it out. Now we're up to... Our latest estimate is about 200,000 users although it's kind of hard to tell since they're anonymous. But we've got about a thousand servers these days. So I'm going to give you a brief, fast introduction to tour on the theory that you've probably heard it before. Who here thinks they know how tour works? Who here has used tour at some point ever? Who here is still using tour from time to time? Cool. Right. Once I've gone through tour I'm going to talk about directories and server discovery changes which basically are making things faster and more secure. I'm going to talk about path generation changes which will be more efficient and less filling and I'll explain the filling in this later. I'll talk about ways that we screwed up some of our crypto issues suddenly in the earlier revisions and how we fixed them. And some fun new tools and features that we've done. I'm really wired on caffeine right now so if I speak too quickly someone shouts slow down. So intro to anonymity. Anonymity networks hide users among other users. So the idea is if you're watching the network you ought to be able to tell, alright somebody's talking to Bob1 and Bob2 and Alice1 and Alice2 and Alice3 are using the network. But I don't know whether Alice1 is talking to Bob1 or Bob2 or what. So that's anonymity networks. Here's tour. There are a whole bunch of servers where in this case a whole bunch is represented by 9 and they're all connected via TLS which is also SSL. They're more or less the same thing. And there's a connection between every server and every other server. I just didn't draw it like that here. And these servers can come up and go down. Some are faster than others. Some are slower than others. These are volunteer operated servers running all over the network. You can run one yourself if you like. I'm not sure how well it would work if you tried to run one from your desktop in the audience. But give it a shot. Let me know what happens. So they're all connected via TLS pipes. Over the TLS pipes they relay many circuits for clients. Each TLS pipe can hold many circuits. And when Alice1, who's a client, in this case all of the clients are named Alice because they're anonymous, wants to use the network, she runs some software on her computer that builds her circuit through the network. She extends it piece by piece. First, she connects to the first server. Second, she extends the connection anonymously from the first to the second. Then anonymously from the second to the third. Pretty much all of our hops right now are three hop. I'm not going to explain why, but it's kind of an interesting argument. Right. After Alice has a connection to the last server in her chain, what does she do with it? Well, she probably wants to go somewhere on the net, like she wants to look at some pictures of kittens with captions under them, or whatever else people are using the internet for these days. To do this, she then sends a message that's encrypted with three layers of encryption to the first server, which takes off one layer and sends it to the second server, which takes off another layer and sends it to the third server. Then the third server gets it and says, connects to Bob, where Bob is an IP address and port. The stuff that is actually relayed and supported right now is TCP streams only, doesn't do arbitrary IP. Also, it relays TCP streams and not TCP packets. If you don't know the difference, you don't have to care about the difference. If you do know the difference, what I'm saying is that your favorite TCP fingerprinting attack doesn't work because we generate a separate TCP stream on the far end or exit node, then Alice generates herself talking to the first one. This general design is called Onion Routing. One earlier design that did the same thing was Pipenet. There are other implementations of the same basic design, but this is the one I know best because I work on it. So the general security properties you get are, if the first node or someone watching Alice's connection to the first node is hostile, it can tell that Alice is talking. This isn't steganography. We don't hide the fact that Alice is using the network, but we do hide who Alice is talking to. Now, if the last hop, oh, I've given it away now. Alice 1 is talking to Bob 2. If the last hop is hostile, it can tell someone has connected to Bob but not who. However, two hostile hops can correlate the traffic patterns and tell that Alice is talking to Bob. That is, okay, these connections started at about the same time, they ended at about the same time, they had about the same amount of data, and the data followed the same pattern of bursts and pauses. There aren't any obvious fixes to this in the literature that don't involve slowing your network to a crawl. So when you start building an anonymity network these days, you basically need to decide, are we going to support web browsing and be vulnerable to this kind of attack, or are we going to introduce multi-hour delays between Alice and Bob and resist these attacks but be useless for web browsing. We took the first approach because people seem to like this web thing. So, first I'm going to talk about directories and server discovery. You'll recall before that Alice needed to build a circuit through three servers. Well, how does she find out that those servers are there? If you think about it for a while, you'll realize that every client needs to know pretty much every server in their path because if you just go to one server like in a lot of peer-to-peer networks and ask it for a list of its neighbors, if it's bad, it could lie to you and tell you only other compromised servers. Also, all clients need to know the same servers. I'll talk about that a little more in a moment. Servers shouldn't be able to impersonate each other. If I can pretend to be the entire network, then I'm always the first and the last hop because I'm all of the hops. And the solution there is just what clients find out about servers is a set of self-signed descriptors. The way clients identify servers is by their public key. And assuming that as CryptoBarbie says factoring large numbers is hard, let's go shopping, you can totally rely on this to make sure that servers can't impersonate each other. Also, we don't want to use too much bandwidth to tell clients about servers. So why do clients need to know the same servers? I'll talk about this a little more in my next talk in an hour from now, but the basic reason is that if Alice1 knows some servers and Alice2 knows other servers and I can tell which servers they know, I can tell that some connections like the one to Bob1 could only have come from Alice1 because only Alice1 knows the server that's connecting to Bob1. And there are ways you can make this attack even better, but that's the basic idea. So back in 2004 when we last presented this stuff to you guys, we had a few directory authorities for a few equals three. These were all trusted. Their IP addresses and their public keys were shipped with the source code, although you could override them, but we would strongly suggest that you don't. And each authority would publish a big list of all of these self-signed server descriptors, and this would be signed with the authorities key, and the clients would go to an authority, download it, and go, oh, hey, yeah, that's right, and they'd use all those servers, and they'd do this every few hours or so. This was really slow, and it got even more slow as we added more servers. I think right now, so as soon as we're back from DEF CON, we're going to shut down this whole mechanism. If you're using Tor010, Tor010 will stop working, but that's really old, so no one's using it, we hope. But yeah, this was pretty slow. The files were really big, and right now they're up to over a megabyte in size, and old clients are still fetching them too often. Yeah, this sucked. We added some caches, and this did help some. The clients wouldn't have to hit the authorities all the time, but anyone could run a cache, and because the documents were signed, there wasn't a security flaw there. But the files were still really big, and also if even one authority was compromised, then they would generate presumably a compromised signed list, maybe containing only servers that were bad, and the cache would download it from them, and the client would download it from the cache, and the client would be screwed up. Also, most information in these great big directories was redundant. That is, you go at noon, and you say, what's the directory? And you get a big signed list of 100 servers. You go back an hour or two later, you say, what's the directory? And you get a big signed list of servers of which maybe only one or two would have changed. So around 2005, we split the directories into status, which needed to be generated by the authorities, and individual descriptors. So you'd ask the cache, what do authorities A and B say? And you would get signed things from the authorities that would list the digests of every single router descriptor. Then you would go to the cache again and say, all right, I don't have the descriptor with digests such and such, send it to me. And it sends it to you, and you make sure the signature is right and the digest is right. And you decide what to ask for based not a single authority's opinion, but on the opinion of all of the authorities together. Now, why do we do this by digest of the descriptor and not by the digest of the descriptor's key? This was kind of a subtle point that we almost screwed up until we thought about it. There's a fun attack if you have a bad cache and a bad server working together where I send one descriptor to the authorities with my ID, and then I send a different descriptor to every single client who asks for my descriptor from the caches. So in this case, the authority tells the server, use identity such and such, and the client says, all right, I'd like identity such and such to the cache, and the cache says, all right, here is a special descriptor for server one that only you will ever know about. And later on, when the client uses this special descriptor, which will have other information that only the client knows, the server will be able to tell the client from all other clients. The solution is to make sure that everyone has the same descriptor for every server that they use, and so this attack fails. So the remaining problems with this were that a client still had to go to the caches and ask for separate opinions from all of the authorities, and depending on when they asked, they would get a different set, and if they missed just one, they would compute a different value of what the consensus was for the different authorities. So what we're implementing right now, and I have it mostly done, but we still need to debug and test and actually deploy it, is a voting system where the authorities, now there are five, but I couldn't fit all five on the slide, get together and vote on a multiply sign consensus status document. To do this, they get together every hour or so, and they all send each other's opinions. Once they've got all the ones that they're getting, they figure out, all right, what will the result of the vote be? They sign the result of the vote. It should be the same for all of them. Then, if it isn't then, they use whatever gets signed by the most authorities. Then they distribute signatures to each other, and until they have a single consensus with lots of signatures on it, and clients download these from the caches, check the signatures. If they can get one that's signed by more than half of the authorities, they know, then they use it, and yeah. At some point in here, I guess we collected all the underpants. Also, there's more than just yes, no for each server in these statuses. There's lots of different flags that we set. There's named, which is so, do we guarantee that this is the only server called such and such, or could there be other servers called such and such? Is it running right now? Is it valid? Valid is pretty nebulous. It basically means that we don't know any reason to totally distrust it, but we often don't find out about bad stuff. That's why we try to be resilient against some kinds of failures. Is it fast? Is it stable, or is it likely to go down pretty often? Is it a bad exit? Is it an exit at all? Will it relay traffic? Is it a current authority, and can you use it as a guard? I'll talk more about guards later. One of the neat things about our architecture now is that although actually determining these flags can be hard, the only thing that's really set in stone and sort of hard to change is how clients interpret these flags. So if fast means use this in a circuit that you will need high bandwidth on, then clients will have that behavior, and we can later on get more and more sophisticated at the authorities about detecting which servers are likely to be faster or not, and so clients will get better performance without having to upgrade. Although once again, please do upgrade, really. So other changes that we've made are in path generation. So you'll remember how Alice selected a path through the network herself, because if she asked someone else to do it, they might lead her astray. Well, one problem is in 2004, all of our servers were chosen with roughly equal probability, regardless of their capacity. So even if your bandwidth was X, and someone else's bandwidth was 2X, and someone else's bandwidth was 4X, you'd all get chosen for the same number of circuits. And that meant that big servers had lots of capacity that no one was asking for, and tiny servers were overloaded. I say the asterisk on equal is that we would also select based on your exit policy, that is your statement of, for the last hop, if you want a web page, we need somebody who will allow you to exit to port 80. We wanted to make sure that people could run Tor servers without relaying arbitrary traffic, so we built exit policies so that everyone can select which kind of traffic they'd like to relay. This means, though, that clients need to know these exit policies so they can find one that relays the traffic they want. But anyway, back to bandwidth. So the obvious solution was choose probability proportional to bandwidth, which was a pretty good idea, and, of course, you need to select the believable bandwidth so that if someone says, hey, I can push really, really enormous amounts of traffic, no really, you don't give them 99% of the circuits. This is bad first because of trust bottlenecks, they would get all of the circuits. And second, because of resource bottlenecks, you could deal us the network by sending all of the traffic to a server that couldn't handle it. So another problem is unstable servers. Say you've got some servers that go down every hour or so, and some servers that go down every 10 days or so. Well, a server that goes down every hour or so for a little while is still pretty useful for web connections because most HTTP requests don't take an hour to fulfill. Whereas you don't want to use something like that for your IM or your SSH or any time when you want a really long lived connection. So what we've done also is started tracking uptime, and when you have a server with really big uptime, we label it as stable, and we remember which ports need stable connections that's configurable if you know of one that we don't. So when a client wants to connect to port 22, which is SSH, it will select a path consisting only of stable servers. Whereas when it wants a web page, it won't restrict itself to stable servers. So originally we selected paths at random. This bit is a little tricky, I'll try to go a little slow here. We would select, and the problem with this is profiling. So what you're trying to avoid in a network like this isn't necessarily just some of your connections get broken. Sometimes it's bad if any of your connections get broken. For instance, if you visit, say, Qtoverload.com every morning in order to see pictures of adorable kittens and hamsters, and you don't want anyone to know this because you're trying to, you know, be really tough, you're in the Hell's Angels or something, then what you're trying to avoid is not, you know, sometimes it is not any particular eavesdropping and discovering that you go to Qtoverload. You're trying to avoid anyone ever knowing that you go there. So if even once your stream to there is compromised, you lose. So what this means is that if you keep picking a new first and last node every morning, eventually you will get unlucky, and eventually you'll choose a first node that's trying to get you and a bad node that's trying to get you and everyone down at the biker bar will find out, you know, we'll call you Hamster Guy and that's no good. So the trick here is to choose some fixed first hops. Choose a few that are your favorite. If they are compromised, you lose. However, if they aren't, then you're actually in decent shape now because now you will not eventually choose a bad first hop. This idea isn't original to us. It was originally called helper nodes. We did some user surveys and figured out that nobody could tell what helper meant in this case. So we chose guard instead. And, you know, if your guards are good, then you are fine permanently. However, if your guards are bad, you're still fine, but at least now you've got a chance that your love of Hamsters will be kept secret. So what if the guard nodes go down? I said they were held fixed, but we actually needed a way to handle it if one of them goes down. Well, you got to pick some more. But the trick is that you don't want to eventually go through the whole network. So you've got to make sure that when they come up again, you go back to your original set of guard nodes. So you keep them in order and you use the first three for certain values of three that happen to be up at any given moment. So, you know, you may have a list of 10. If the first one is up and the second is down and the third is down and the fourth is up, use the first and the fourth. And it's even more complicated than this. And Mike Perry might talk about this at his talk, or he might have already enough stuff to talk about. So another problem is that old Tor built circuits on demand. You would say, I want to connect to IM. And then it would go to the first hop and do some public key. Then it would extend to the second hop and do some public key. Then it would extend to the third hop fail because the third hop wasn't there anymore. Go, oh crap, and start all over. And meanwhile, you are waiting for your website to load. So instead, we predict which ports you're going to want and preemptively build circuits there. So if you've been asking for port 80, port 22, and port 8001, we try to make, and you seem to be online recently, we try to make sure that you've got live circuits ready for use to those ports. Also, in case you ask for something that we can't do, we cannibalize an existing circuit and try to extend it only one hop and see if that works. So that way, it's still a little slow, but it's faster than if we had built the circuit from scratch. And how am I doing? Wow, I'm doing really well for a time. Must be the caffeine. All right. So previously, we would extend by IP and port. That is, when you were connected to server one and you wanted to connect to server two or extend to server two, you would say connect to server two at this IP and this port. And once you were there, you would do some handshaking that made pretty sure that you were talking to server two via public key. But on the way there, server one, remember, server one needed to build a TLS connection to server two. So server one wouldn't, if server one didn't know the key for server two, it wouldn't know which certificate to expect from server two when it did the TLS SSL handshake. Originally, we thought that this wouldn't be a problem because all of the servers would download all the directory often and know all the other servers. In practice, this didn't work. One server would show up in the directory and a client would find out about it before all the other servers did, and it was a bad scene. The solution here is to use identity key as well as IP. But you don't want to, so that you say extend to server two with this IP, this port, and this identity key. But you don't want to identify only by identity key because there's a fun man in the middle attack here where a bad Alice tells server one that you go connect to server two at this evil IP and this port. S1 goes there, the evil server relays the connection, and now the evil server, because of TLS, it can't actually read the contents of the traffic between S1 and S2. But now S1 thinks that its TLS connection to S2 travels through the evil server, and so the evil server can look at all of the traffic patterns and all of the timing and all of the volume between S1 and S2 and try to use this for traffic analysis attacks. This was no good, so you need to use the identity key and the IP and the port to do this, and this took a bit of thinking, but it turns out it worked. All right, now we're getting a little crypto heavy. How many people here know what Diffie-Hellman is and how it works? How many people don't know but would like to understand the next couple of slides? Okay, cool. Diffie-Hellman is a way that two parties can come up with, assuming that they know that they're talking to each other, can come up with a secret key that no one else knows that besides them, without ever sending the key over the wire. Basically, we use some generator G, like 2 to the 16th plus 1. We use a big prime. It doesn't matter if everyone knows what the big prime is. I send you G to the power of X. You send me G to the power of Y, both modulo that large prime. We keep X and Y secret, so I now have G to the Y and X. You now have G to the X and Y. Well, we can now both calculate G to the X, Y, but nobody else can based on the information we have exchanged. So that's what all of this G to the X, Y stuff is about, and it'll come up in a minute. But one of the problems with our earlier protocol was it was kind of needlessly slow in that Alice and the first server in her path had already done all of this handshaking in order to establish their first connection via TLS. And then they immediately did exactly the same handshake again to establish confidentiality, which they already had from TLS, to establish integrity, which they already had from TLS, and to make sure that they were really talking to each other, which TLS already guaranteed. So we managed to cut out the initial first hop crypto in half just by making a fast path for creating your first hop inside TLS. And that helped server CPU a little bit. Server CPU is way more important than client CPU because each server has to handle tons and tons of clients. And speaking of cryptography, there was a problem with our crypto. We sort of hoped that either we hadn't thought about this or we thought that OpenSSL would take care of it, but there were some really bad values for X and Y when you do Diffie-Hellman. For instance, if I pick zero, well, anything to the zero is one. So later on when you send me G to the Y and I raise that to the zero, I'll get one. And the problem occurred when Alice was extending through a bad server to a good server. She would send something encrypted with server two's public key, saying G to the X. The bad server knows server two's public key, so the bad server can replace it with G to the zero, encrypted with server two's key. Server two will send back, okay, you picked one. Okay, one to the Y is one. And it will send back G to the Y. And the bad server could then replace G to the Y with G to the zero again. And now the client and the server think they have a secret key that no one else knows and their secret key is one. And while, yes, one is as good a random number as any other number, that's when it's forced, that's bad. So this was pointed out to us by... Hey, Roger, was this one that an anonymous person told us about? Yeah, we got an anonymous tip on this one. We still don't know who found this. If you ever want to fess up, that would be really cool, because we think you're cool. But anyway, once we fix this, Ian Goldberg, who's a prof at University of Waterloo in Canada, managed to prove that this aspect of our protocol at least was secure. Also, we sent in a patch to OpenSSL for this. So if you're using a recent version of OpenSSL to build your crypto protocol, and you just went, oh, crap. Don't worry, OpenSSL will now catch this for you. So, new tools and features. So in the old version of Tor, everyone had to speak socks. Socks was all that we understood. And the way socks works is you want to do an anonymous TCP connection to a certain IP and a certain port. So you connect to your local socks proxy, and you say, connect me to this IP and this port. And it goes off, and it does it. In Tor's case, it builds all of the circuits. In a normal socks case, it just grabs a TCP stream. It just opens a TCP stream to wherever you're going, usually. Well, the problem with this is, in some cases, it's easy. If you have, maybe your browser speaks socks already, maybe your browser goes to a proxy that does some useful stuff for you, like Provoxy, which filters out certain kinds of headers or Polypo, which is pretty fast, and can filter out certain kinds of headers, and that converts it to socks. Maybe you're using an application like Game, which I guess is now called Pigeon or something. It's an IAM client. It's pretty good. It speaks socks already. If you have some monstrous application that does a bad job of socks, what could you do? Well, we had some solutions. They all kind of sucked, but, you know, you can use something like T-socks or D-socks to replace all of the libc calls to connect and so on with calls to socks-ified equivalents. This works on most free unixes. This doesn't work on OSX. If you know how to shim libc calls in OSX, please tell me because we searched pretty hard and didn't find a good way. D-socks can sort of do it, but not too well. Windows was pretty screwed here. You could do a net driver that converted everything to socks, but that kind of sucked. Another solution was... Well, actually, no, that wasn't a good solution. So something that we've added recently that's turned out to be pretty useful, especially for integrators, is IP tables on Linux and PF on BSD and OSX support transparent proxying, which means that you can tell your firewall rules redirect every TCP stream to this local port, which any good firewall can do already. But what they add is the feature where the application can then say, um, hey, kernel, hey, network stack, I just got this connection. Where did the application want it to go? And once you know that, you, as Tor, can now build a connection to the right place, and the application never needed to speak socks in the first place. Another neat solution that's getting popular and is being used by tools like Genesvm is to actually have a VM being your router. You set your router to a local virtual machine which is using tricks like this to convert all of your traffic to Tor. And because this machine is your router, you don't need to worry about applications accidentally sending traffic somewhere else besides your router. So another problem that we had, and have had it for a really long time, is DNS leaks. Say you have an application that is written like most applications. You asked to connect to Naughty.com. I probably should have looked it up first to see what I've just linked to in my slide. Oh, well. So your application will do a DNS lookup and say, where is Naughty.com? Your DNS server will say the address of Naughty.com. One, two, three, four. This is not the actual address of Naughty.com. And then your dumb application will go to Tor and say, over socks and say, get me one, two, three, four. Now you can see the problem here. You just broadcast to your DNS server, hey, I'm looking for Naughty.com. And then you use Tor to go anonymously somewhere. No one can tell. Except your DNS server. And if your DNS server did a recursive lookup, well, yeah. A lot more people now now. So one solution that we had to recommend for a while is there are variants of socks that instead of an IP address, take a hostname. Socks4a always takes a hostname. Socks5 optionally took a hostname. So we had to find applications that either supported socks4a or that claimed to support socks5 and could actually do it right. This is kind of hard to find and hard to explain and hard to convince application writers that they actually want to do it because the whole first lookup then connect path is pretty firmly hard to decode it in a lot of applications. And our user base still isn't quite big enough to like force the IE team to do whatever we say. So new solution is that Tor as of 0202 alpha now can act as a DNS server. So you set Tor as your DNS server either by saying hey Tor listen on port 53 or hey Tor listen on port such and such. Hey firewall forward such. And then you set up Tor local host Tor as your DNS resolver and now when you ask Tor it just does the lookup for you anonymously sends you back the answer and when you connect there you're fine. Another neat feature about this is that so there are some special addresses that Tor knows how to handle like addresses to end with dot onion are for hidden services which I want to talk about here. Now Tor can remember oh hey you asked for um naughty dot com dot such and such dot exit I'm going to send you back an IP like um that isn't the IP of naughty dot com it's an IP that I will remember so that later on I'm going to send you back an IP like um that isn't the IP of naughty dot com it's an IP that I will remember so that later on when you ask Tor for that IP it says okay this means go to naughty dot com using such and such as an exit and we choose IPs for this from 127 dot and then the high and then the high parts of net 127 so that if you screw up and actually try to connect to one of these the connection won't leave local host um another problem we've had is that editing text files alright it's not hard for people in this room most likely but remember that anonymity networks hide users among users so you know if a system that's only usable by the people in this room is not nearly so good as a system that's used by nearly everyone it's because it's more useful to hide among 2200 thousand users than it is to hide among a couple hundred users so what we did is we made an interface that um a so called control interface that listens on a local port and different applications can connect to it and be your GUI and it also lets you out and they can reconfiguration they can help you set up a server they can help you see where all of your paths are going they can put up pretty maps of the world with all of the servers listed on them and you can watch traffic bounce around like in sneakers apparently watching the traffic bounce around like in sneakers was a really big user request um other neat things you can do with these are you can trap different connect different requests to tour like you can enter you can set up something if you think you've got a great new idea for um how to build circuits or a new circuit building algorithm that you want to hack around with and test to see whether it's really faster than the one tours using you can override tours default behavior for all the stuff um yeah and now gosh I went through that really fast um okay now I have time for a whole lot of Q&A um let's see first of just some things you can do later on um tour is at hdpstourproject.org or tour.eff.org um try it out if you want to run a server run a server we have documents and specifications that are pretty darn throw and can tell you any more information that I left out here you can donate to tour we are now a US non-profit charity 501c3 um you can write us off on your taxes you can donate to the EFF as well due to a due to um Cindy from EFF doing me a favor last year um I'm doing the dunk tank for them at 630 if anyone has ever used tour to do anything you didn't like or if tour has ever broken in a way that you didn't like and you're mad at the developers for some reason you can donate one of them in into water for a modest fee um I'm talking at one which is right after this on social engineering attacks against anonymity networks. Roger is talking at two about new anti-censorship features that we're doing and Mike Perry is talking at five is it Mike? Five about securing the network and securing applications and stuff and yeah if you have a question shout it out if you wanted to do the Q&A after the talk I have another talk right after this one so I will be in the Q&A room for this track at three uh excuse me two not at one so who's got a question? yeah ooh between now and the next DEF CON? okay the question is what kind of technical changes do I expect between now and the next DEF CON? that's a pretty good question um let me grab my laptop and look at our to-do list which you can look at too on the web in our SVN repository um we aren't yet we don't yet have enough people to do a really you know polished roadmap like real projects have but we've got some pretty neat stuff lined up um okay so on the to-do list we're going to get directory voting done that's a big thing we're going to get anti-censorship features in we're going to try to be better about letting you decide as a tour server how much bandwidth you want to give to other people and how much bandwidth you want to give for yourself we're hoping to get done features that will create incentives for people to run servers by giving people who run servers um better performance this is going to this has some anonymity issues because obviously you don't want it to be totally obvious that you are running a server because you get a fast connection so that's a little tricky um what else are we up to? um fixing lots of bugs trying to get more secure uh let's see voting is a big one hmm refactoring um for a long time we wanted to switch our transport to UDP instead of TCP I don't think we're going to have time the next year um hey Roger what else? yeah blocking resistance is really the big one Roger is going to be talking about that too um we kind of hope that at this time next year this um tour is censorship resistant by design and not just by accident uh yeah I'll probably think of other stuff too and like shout it out as stuff goes along but yeah you had a question? yes all of our funders um do you want to list the funders or do you have a question about the navy in particular uh we are receiving funding right now from here's a list of everybody we've been funded by various um we've been funded through naval research labs which is a research group from the navy we have been funded by IBB which is part of the state department that cares about anti-censorship um um Roger is shaking his head what did I get wrong they're not part of the state department they're their own group okay um is that is that new or they never were okay um we've also been funded by EFF we're funded by a european NGO that cares a lot about censorship resistance right now and we're also funded by your private donations who else? that's basically everybody um yeah and no one has ever asked us to backdoor it well actually no actually morons on IRC periodically ask us to backdoor it and we tell them that that would be stupid and we kind of want to people to respect us professionally and we kind of don't want to be laughed out of the next DEF CON so no we will never backdoor tour yeah lucky oh good question um so lucky asked about directory voting schemes and why we chose a simple majority rather than some kind of super majority and for security tradeoffs um so the general threshold is you believe so what we do right now is you believe something if more than half of the authorities say it's so now when that's nice and symmetrical like you believe a server is running if more than half say that it's running you believe it's not running if less than half say that it's not running you can set that bar in different places and and when you do that tradeoff what you get is if you set it so that every authority needs to say that it's running for you to believe that it's running then you get a property where any single authority can keep you from using a server whereas if you say and depending on where you set the bar for how many authorities you've got this affects whether it's easier for corrupt servers to for corrupt authorities to add bad servers to the directory or whether it's easier for corrupt authorities to take good servers off the directory and it didn't seem like either one was particularly less awful than the other so we set it at about half although actually if you have a good reason why it should be a super majority in either case then you should use our new proposal process so our previous proposal process our previous approach for modifying our specification was we would edit our specification in subversion then we would edit the code to match the specification also we would argue about the edits that we made and put in little notes inside the specification this created a problem where our specification did not actually specify our software it specified our software and our intentions for our software and our disagreements about our software someone once said this reads like the Talmud some of you get that so instead we have the specification is pretty sacrosanctly what we do we now have a proposal process where you write up a proposal you send it to our dev list and it gets a number and then we argue about it for a while and either you build it and send us a patch and it's got a good chance or you convince us that it's such a good idea that we gotta do it and eventually somebody does it so yeah good question weird tangent oh yes guard connections and finger printing right so two questions first in what way is the guard selection protocol more complicated than I said it's more complicated in a few ways first we don't choose randomly we choose randomly weighted by bandwidth we look for the guard flag set by the authorities to say this will make a good guard we want to choose guards only from fast stable servers because sometimes we want fast connections sometimes we want stable connections and if we always build through our guards then we want our guards to be fast and stable you want to drop guards not just for being down but for being bad guards or for being not listed in the directory or so on now finger printing against guards this is an interesting topic we have a lot of trade-offs here if you want to come argue about it with us on our dev list we'd really appreciate this because it doesn't seem there's immediately obvious answer so it's clear guards need to be there both to deal with the profiling attack I mentioned and to deal with some other attacks like there was an attack on hidden services based on not having guards well however when you do have guards you do have the problem that anybody who's doing an end-to-end connection and third hop can figure out and profiling you they don't figure out who you are but they do eventually get a list of your guards and this means that if you use the same guards over time and eventually someone sees you connecting to those guards they can link it to you this is a problem one solution includes using a different set of guards as you move around but that increases your odds with a compromised guard another solution is to use different guards for every IP you're at another solution is to change your guards over time but if you change them over time then eventually you'll get a compromised one there doesn't seem to be a really trivial answer for this but it seems like we're doing better now than we were before previously to when you had to run two servers now you have to run two servers for a profiling attack and do local client eavesdropping and get lucky and hope that the client stays the same in their behavior over time which again not perfect but we think we're doing about as good as anyone knows how to make a low latency anonymity network do in this area if you have good ideas please get with us we're keen to hear and let's see I got another few minutes any more questions I can read off more of our to-do list let's see I can talk about some pending proposals I can explain stuff in more detail yeah yes good question I am not a lawyer this is not legal advice that said the question was given that you're not a lawyer and you can't give legal advice do people get harassed for putting servers online it really depends on where you are I would not run a server in China if I were you I would in countries where you can be shot for owning a computer I would not run a tour server if I were you because you need a computer to do it in the US typically if people get with you or get with your ISP and say hey you're running a tour server or hey some traffic came out of you that looks like it was done for something bad we want to subpoena all of your records give them all of your records there are no records on your tour server that will lead back to a client we don't log anything interesting however if they decide that they're going to do an enormous sweep and try to look for evidence for of some crime like happened in Germany for a while ago where all of the IPs that were being used to commit certain kinds of crime got hit in one massive sting on one day dumb cops may grab your computer and look for evidence of crimes if you are a criminal then please stop being a criminal if you are a criminal and you are running a tour server on the same computer that you commit your crimes with you are a dumb criminal yeah as for getting other kinds of harassment certain services block connections from tour because they use IPs for authentication you may in order to block tour users they may also block the IP you are running your tour server at if that's also your home computer you will not for example currently be able to edit wikipedia from your home computer you won't be able to log into certain IRC networks and that does discourage some people from any servers but getting a second IP isn't terribly difficult if you have well a job or a hobby or a city or a lab it looks like I am running out of time I hope some of you want to stay for my next talk if not I am going to not go to the QA room right now because I got a next talk but I will be in the QA room for this track after that talk and thank you all very much for coming