 Okay, hello everybody, I'm gonna go ahead and get started here. First off, thanks for coming, I know it's late, but this is the only room you can be in if you wanna hear something, so thank you very much. My name's Nathan Evans, this is detoriorating your anonymity, which is a work myself and my advisor, ChristianGroDOT did. He's not here today, that's okay. And it's some research that we've done that partially de-anonymizes Tor users. And with that said, go ahead and get started. So first thing we need to talk about is what's our motivation for trying to do this? We don't dislike Tor, we really like Tor. So we're trying to promote research and try to make Tor as good as it can be and so that other people can't de-anonymize Tor's user. So also Tor is like the biggest anonymity network out there, so it's a big target for researchers and bad guys alike. And again, any research that shows any improvements or anything like that has a major impact, so for researcher, we like to have a major impact, so that's good. And so all we're here today to say is that we can discover all of the routers that a Tor client chooses to use under certain circumstances and under these circumstances, the user's anonymity is reduced quite a bit. So some general information about Tor, I'm sure most everybody knows this, but it's okay, I'll say it anyways. So Tor stands for The Onion Router. It's a service that attempts to hide users' IP addresses from the destination that they're connecting to, typically. It's a Mix Cascade, which has two parts to it. First part is the Mix, and a Mix server is one that takes in data from all different places and then shoots it out to somewhere else, so hiding from where it came from. And the Cascade part of that means that there are Mix servers used in series to further hide the source from the destination. Also, there's no artificial delays induced in Tor, so it's a low latency network, which helps us do some of our readings, or our measurements, and also all data in the network is sent encrypted between Tor nodes. So the routing in Tor is an important part. Basically, each Tor relay server accepts incoming connections from other Tor servers or clients, and it basically just forwards that information on to the next node in the series or to the final destination if it's the exit router. The client always chooses the Tor nodes to be used, so it's not like the Tor nodes are choosing. And basically, some of Tor's anonymity is achieved from the fact that no adversary can discover all of the routers that a client chooses to use. And if all the nodes are used, that particular circuit becomes about as good as using a one hop proxy or just a single Mix server. So what we're trying to do is discover all of the Tor nodes that a client chooses. So here's a routing example. The PC looking things are Tor servers. The client is a little Firefox icon and the Apache Feather is the web server. And this is just a typical client to server connection. So instead of the client connecting directly to the server, it chooses these three servers from the Tor network. Here it's Tor node one, five, and eight. And you can see that all the data that goes from the client to the server goes through those Tor nodes. And also anything going from the server to the client follows the path in reverse. So that's just briefly how Tor works. So what other research has been done in this vein? The most closely related work was Murdoch and Denitsis. In 2005, if you were at Rogers Talk, you probably heard about that. They achieved the same result we did, which is discovering all the routers that a client chooses to use. Again, they did it in 2005. I think the Tor network had like 50 nodes and they actually only did it on 13 Tor routers. So it's important to check whether that works now today. Unfortunately, their work doesn't work as well today, which is good and bad. Good for Tor people, but bad for me. Some of the reasons that it doesn't work today is because Tor is a bigger, higher capacity network. Their research was based on the fact that they could detect the additional load of a single circuit on a Tor node and it's really hard to detect the effects of just a single added circuit on a Tor server because Tor servers are so busy now. There's clients connecting and disconnecting all the time and there's also a lot of data flowing through the Tor network and that can lead to false positives with their method. We tried their method. We didn't rigorously test it, but we did try it and we found that it didn't work very well for us and certainly not as well as it did for them in 2005. What exactly lets us do our attack? The first thing to understand is that our target user is probably just a Windows user who goes out and downloads the Tor bundle with just the standard install, Provoxy and Vidalia, all that stuff. So Uber geeks probably won't be vulnerable to our attack, but so there's some design issues that enable our attack though. Again, the low latency thing, which allows us to get good measurements. Also the fact that the path length is small, it's set at three, that's hard coded in the client, so no one can change that and knowing that, it allows us to run an exit router and when we're running an exit router, we then know two thirds of the servers in the path because we know ourself and we know the previous node. That makes us only have to detect one extra router in order to discover all three. Again, with Murdoch and Anitzis' work, they had to detect all three routers independently. Um, so, oh, the other thing that enables our attack is that we can construct these long circular paths through the network, which allows us to put load on any particular server that we want. Going back to what Murdoch and Anitzis did, they were trying to detect the load of a single connection. What we can do is put as much load on any server as we want to. So this is just back to the regular path example or we're gonna be showing the circular path example, but this just shows the regular path again without the extraneous Tor servers. So the client and the server are the same and this is just the three Tor nodes that the client chose to use and of course all data going from the client to the server goes through that path. So the first thing that a malicious client does if it wants to build a long circular route is take away the connection to the server from the last hop. So you can see, that's what we did. And then instead of connecting to the server directly, we can just loop back around. So now instead of a path of length two, you have one of length five because we've added the three circular connections there. And since there's no limit on how many times you can do this, well, there wasn't, you can keep doing it. So now we have a path length of eight and you can do it again. Now we have a path length of 11 and just to go crazy, this is a path length of 21. And just to give you a little bit of insight into how exactly our attack works. So building off of Murdoch and Anitzis, they were detecting one line, basically going through a server. What we can do is increase that arbitrarily. Like in this example, we have seven times the load on any given Tor server. And we find that to be much easier to detect. The other important thing about this example that you can see is that the amount of data that goes from the client to the first node and from the server back along the path doesn't increase at all. So it's a very low bandwidth attack for a malicious person. So here's the full attack example. I'm gonna explain our attack and then go into what you're seeing here. So again, our attack extends Murdoch and Anitzis idea that the load from a single circuit can be detected. Again, we do it differently because we use the long circuit of the paths. So the first thing that we do is run exit server. Again, it knows two thirds of the routers already. And what we do with that exit node is we have it inject JavaScript into the HTML that gets sent back to the client. And what that JavaScript does is just sends basically ping messages at regular intervals which we can use to get latency variance measurements. So then what we try to do is induce a load on the first node in the circuit. And if we can do that, we should notice a change in our latency variance measurements. So just looking at this example here, you have the regular client and server. Again, the same pictures as before. The client path is shown by the black directed line which goes through the network just like normal. And then we have these additional things like all the red screen computers are those that are controlled by us. So you can see like I said before, there we are with the exit node which is tor node three in the example. And then the JavaScript that gets injected at the exit node and goes back and runs in the client, the ping path is denoted here by the dotted blue line. And I'm gonna explain it again later but the long circular path that we construct is the double green line. And this is the ideal case where we actually pick the correct first node. There's a lot of trial and error, but in this case, this is the ideal. Okay, so now we have a queuing example of how exactly the tor server uses queues for each circuit. So if a tor server has three circuits running through it, it has a queue for each circuit and these things called cells which is how things are shuttled between servers in the tor network are queued in each one of the queues. So here, the queues are the big black boxes, the cells are little green boxes and the cell that's currently being processed is the highlighted blue box. So it's simple, I think that everything's just processed in a round robin fashion. If the queue is empty, it gets skipped. So hopefully that example is self evident. But what you can see from this is that any circuit that's running through the server gets one third of the available bandwidth and CPU power that's available to the tor server in general. So what we do by constructing these long circular paths is multiply the number of queues or just increase the number of queues drastically so that we can change how much bandwidth and processing time each circuit gets. So in this example, it's the same thing only we have five times as many queues. So it's the same round robin processing but as you can see, hopefully, it takes five times as long to get back around to any single circuit but our legitimate client circuit is the one that we're trying to delay, which is why we do the long circular routing thing. So hopefully this attack example makes some sense now. We have the long circular path and the double green that we're trying to do through the first node again and then we have our malicious client malicious server. Hopefully that's clear. If not, you can ask questions in a minute. So how did we implement the attack? Of course, we have to have a modified exit node which is just a regular tor node but it injects that JavaScript into HTML responses. We also have the modified malicious client node which just creates the long circular routes through the particular routers that we choose and whatever lengths we choose. Then we also have the malicious web server that just shoots data back down that long circular path just as fast as the tor routers can handle it and then we have the instrumentation machine which is typically the exit server that just receives the JavaScript ping messages. So how it really works is we start by recording, so we're running an exit node, we pick a circuit, we record for a certain amount of time, typically 10 minutes for us, 600 seconds, we record when we receive these ping messages and then what we do is choose what we think could be a possible first node in the network and then we create a super long route through that node to hopefully put load on the server and if we are able to put load on the server and it is the correct server, we should notice a change in the latency variance timings that we receive. If it's not the correct server, then we have to pick another one and start all over and do it again until we find the right one. So it's a very one after the other kind of process. So now we have some examples of our data. These aren't the most recent examples but hopefully they get the point across. In these examples there are two things, there's a green line and a red line. The green line is when we're doing the control run which is when we're not trying to put load on any server and the red line is when we are trying to put load on the server and on the x-axis we have from zero to 600 so it's in seconds that we're doing the test and on the y-axis we have the latency variance in milliseconds that we receive. And in this example it's very hard to see the green line because just by chance the Tor node that we were using as the entry node happened to not be doing anything else at the time so it's really close to zero. The latency variance was just pretty much the same all the way across the board but hopefully what you can see here is that when we're performing our attack and stressing out that first node that we see these big jumps in latency variance which is exactly what we wanna see and hopefully it proves to you and me that the attack is working. And we have some more of these examples and of course they get a lot more jittered on the control run as well depending on what's going on in the network, how much load the Tor servers have that we're not putting on it so there's a lot of changing in how the data looks but hopefully what we can all see is that there is a significant difference in the amount of variance that we see when we're stressing out the node than when we are not stressing out the node. So there's those examples. And then we have these other ways to represent the data which is this is a histogram. So now on the x-axis we have a range of measurements that we received. So for instance between zero and 11 seconds in the control run which again is the green bar you can see that we received all 600 of our latency measurements with the latency variance of between zero and 11 seconds it's in seconds here. And then on the y-axis sorry is the number of measurements that we received. So what we're looking for with these graphs is the green control run should be very close to the left and almost vertical because we don't wanna see too much latency variance. Of course that changes depending again on how much load is on the servers and all that stuff. But for the attack run we're hoping that we can make a lot more we get a lot more measures where the latency variance is very high. And so it's more of a horizontal on the x-axis kind of thing. And again as we go through these hopefully you can see that. We are seeing what we think we wanna see. So what exactly do we achieve with our attack? We're not de-anonymizing clients. That's not our attack. But just like Murdoch and Anitsis did back in 2005 we're able to discover all the three tour routers that are used in the circuit which reduces a user's anonymity basically to that of a one hop proxy or if you're doing it on all of the routers then it's a network of one hop proxies. Unlike Murdoch and Anitsis though our attack works on the current large scale tour network. And there's a few reasons why our attack is effective. Again we only need to discover one of the routers because we're running the exit router and we know two are priori. But the most important thing that we're here to say is that this bandwidth multiplying effect of using these long circular routes is just a really bad thing because it not only allows people with high bandwidth to be able to perform this attack but even people with low bandwidth to not only do our attack but also to DOS any tour server at will. So that's bad. There are some fixes. Some of these are making the circuits longer. Of course if you make the circuit longer than three which is the minimum for the anonymity assurances that tour gives then you're degrading performance for users and maybe that's not acceptable. You could also randomize the path lengths say instead of making it always three say between like four and eight or five and 10 or something. That would make more work for the attacker although we think the attack would still be possible even if we had longer path lengths. Also for a client if you're running something over HTTP the way to stop at least our attack is to use HTTPS encrypt everything you can inject JavaScript unless you can break SSL which I can't. I don't know if anybody else can. And then there is the fix that is actually out there now which I think is too detailed for me to get into if you were here for Rogers talk you might have heard it but I can talk about that later. There are some variants of our attack such as instead of using JavaScript using made of refresh tags or using a whole bunch of image source tags we thought of a lot of ways to do it but JavaScript was just the easiest for our goals here. Also you could if we could get better latency measurements of the first node in the path we'd be able to greatly reduce the number of possible routers that we start with. So now we're starting out with all thousand plus or however many routers there are. If we were able to narrow that down that would make our attack a lot more effective because then we don't have to go through all thousand. So that's pretty much it. We have hopefully demonstrated an attack on today's large scale Tor network that reveals all the routers in a circuit that a client uses and we think this decreases the anonymity provided by Tor greatly. Tor uses the shortest possible paths of three maybe they should be longer there's been a lot of debate about it probably won't happen but it could. But the main thing that we're here to say is that and our goal really is to get this fixed was that allowing the arbitrary path links is just a really bad thing because you can DOS the Tor nodes and do our attack as well. And from talking to the Tor guys they have implemented most of the stuff for this and now it's just a matter of upgrading everybody's clients and finishing the final phase to stop this from happening. So with that I will go ahead and open it up for questions. No, all right, go ahead. Yeah, are you just asking about the whole how the whole thing works together? No, no, that's fine. So in this example the regular client path is just the black line. So if you don't look at anything on the right and the bottom that's just the regular Tor circuit. And then again we know the third and second node in the path because we're running the exit router. So all we're really trying to do is stress the first node just to change our measurements enough that we can have a pretty clear idea that that's the actual first node. I don't know if that answers your question any better or not, maybe my presentation just sucked. Does that help at all or no? Okay so yeah, the pings go from the client just to the exit server. It's the dotted blue line there if you can see it. And that's the only way that we have to get somewhat latency measurements. It's really hard to get latency measurements when you're only receiving data from one way. So we're not actually getting latency, we're getting latency variance which is the difference in the times that they're received. But so that's what that's used for just for getting the timing data. We're not using it for anything for the actual attacking. Does that help? Yeah, that's a very good question. Oh repeat the question, good call, speaker etiquette. He asked how long it actually takes to discover any one router out of the thousand routers that are out there. And if there were many more routers, how would that change the attack? And the answer is right now it takes ten minutes to discover and then we also do another ten minutes or longer to verify that we've actually found the correct first node. So it's ten minutes times over a thousand. When we've done the attack, we're really just showing that the attack is possible. We've done the attack when we actually choose the first router, we've done it on many, many, many routers, but we actually choose the first router, we haven't done it on the whole network like that. But so it would take a long time and more routers would make it much longer. Yeah, yeah. So the question was if somebody just uses a new route every say five minutes, that would also, yes, defeat our attack. But we're kind of going with the idea that users are usually just browsing the net and aren't paying attention to picking a new route every five minutes or something like that. But maybe that would be something the torque would do if they thought this was viable. Yeah. When you select a new exit point, does it change your entry point? As far as I know, not necessarily. It doesn't have to, but it could. I think, I don't want to talk too much about how exactly this works because I'm not real clear on it, but I think you have a number of entry guards that you use. So you could conceivably pick the same entry node and maybe not the same exit node, but you could have the same entry node twice. I don't know if that helps. It's a complex process that I can't entirely explain. It has to do, I think, I mean, the Tor guys know a lot better than I do, obviously. It has to do with uptime and bandwidth, I think, and they try to pick really stable, good first nodes. So I know they do a good job. I'm not exactly sure how it works. Does that answer your question? Nobody else? Well, thank you very much.