 Hello everyone. I guess we are about ready to get started. So yeah, so this is a two-hour block but I'll be dividing it up into like all the two intermissions for for Q&A. So be sure to have questions in your mind and ready and don't be afraid to ask a question. If you have a question of something's on your mind it's likely that at least five other people that occurred to them too. So don't be afraid to ask questions. It's very important to make sure that everybody understands everything that's going on and that I haven't overlooked anything or you know I'm not making some crucial mistake because obviously this is in the imperfect tense securing the Tor network. I'm not saying oh I everything is solved forever for sure this is it you know it's ongoing process. Okay so who the hell am I? I'm a volunteer Tor developer. I work on Tor because I think it's important for a few reasons that I'll get into later. Professionally I'm a Ford and reverse engineer. I write C++ code and reverse engineer the Microsoft exchange protocol along with a few others for Riverbed Technology. This is a company shameless plug time just to berate you people about my employer. We make WAN accelerators. Essentially we make WAN traffic fast for companies large companies with lots of branch offices all over the place. Significant accelerations for Windows file sharing that's done across a you know distributed internet. Also significant 5 to 50x enhancements of improvement of performance for Microsoft Exchange. That's the protocol that I work on. We also do protocol independent data reduction so if you send some data over Exchange you the next time you fetch it via the Windows file share it's also cash and accelerated and we are pretty good at it. When we go head to head against any of our competitors 90% win rate we are outselling Cisco 2 to 1 and they're pretty anti competitive about their practices they give significant deals on a lot of their WAN acceleration equipment for people who have Cisco year and we're still significantly outselling them so we're pretty successful. That's the end of that plug. If you're interested in that company and work in reverse engineering, fuzzing network protocols, you know C plus plus development network programming pretty much on Linux of course come talk to me afterwards or whatever and we can we can discuss you know setting up with an interview or whatever. So preaching the choir about why I think TOR is important for normal people there's a lot of talk about y'all it should be it's a censorship resistance tool it's for political dissidents people who are under oppressive firewalls but I think there's a compelling argument for normal people for basic data hygiene to keep their IP address out of essentially marketing databases. We don't really yet understand the consequences of having all of our you know essentially thoughts that we enter into Google whenever we come up with some idea archived and tied to our IP address which is essentially our identity and then maybe not in the case of Google Google has you know they do do significant work to try and make sure that they protect the data that they do massively acquire on all of us but in other in other companies bought and sold to whoever. So the the the second point you know Google like I said Google might not be that horribly evil but I read my ISP's privacy policy in fact they tell me that they do gather my data in aggregate for marketing purposes not a they specifically say it's not uniquely attributed to me but they do mention that they they gather gather aggregate data on on the internet usage for marketing and other performance purposes. So you know this sort of infrastructure is already people are already thinking about this ways to use our data to either sell us products or see what sort of things we're interested in and some things we might want to be able to opt out on and if all of this is tied to our IP address we don't really have a choice without tour and this sort of information can come back to get us in unexpected ways if you're involved in a lawsuit discovery can cause this this information to turn up if it's a peanut and a judge agrees that it's relevant that the other party that may have that data is to my knowledge of the law is obliged to comply this can also happen in bitter divorce cases and you know marketing spam can just randomly show up at your door or if you have product shipped to your work embarrassing catalogs can start showing up at your work which is actually happened to me based on stuff that you have purchased and have had shipped to your work because it's convenient to have that southward you know lockpick sets sent to your work so you just pick it up or whatever rather than have it risk it being stolen at your apartment or whatever so all sorts of all sorts of sticky issues of privacy that basic that normal people need to be concerned about okay so what is tour quick overview for those you might not be familiar there have been five you know four hours of tour talk so far but just a quick recap you run a client the tour client that acts as a socks proxy that you connect your applications to when your applications connect to this socks proxy they issue they're translated into what tour call streams these streams are then multiplex on top of encrypted paths that are the tour call circuits and then these circuits are in turn multiplexed over TLS SSL connections between the individual relay nodes that are run by volunteers all over the world these circuits these paths that your client chooses consists of three nodes what what's called the guard node the relay or middle node and then the exit these nodes are chosen probabilistically by your client proportional to their bandwidth so here's a here's a diagram that illustrates some properties some important the important multiplexing properties that I talked about before you have two clients essentially choosing two different circuits or paths to the network going through this first hop called the guard node the reason for that that term I'll get into later these two clients are multiplexed over the same TLS connection in this middle hop between the guard and the relay and then they diverge off at the relay on two different exits now the second client has that T the the the TCP socks over being multiplexed over over this circuit being routed through the second path and it goes through the single circuit who the individual hops of which are unable to see that there are multiple streams on the circuit other than timing information until it is finally split up by the exit node to connect to whatever maybe a couple of fetches from the same server or a couple of different servers the below diagram it illustrates essentially the crypto behind this and why it is that the guard node doesn't know who your exit node is essentially each node publishes a series of of public keys in the tour directory and then your client uses these public keys to negotiate a layered encrypted channel to each successive hop so first you negotiate a secret key first you make the TLS connection to node X then you negotiate a secret key with node X through that channel you tell node X please help me negotiate a secret key with and make a connection to node Y and then through the encrypting with that secret key you then instruct node Y to extend to node Z and negotiate another secret key in this process of you know of of adding additional hops to your path into our parlance is known as circuit extending and you can iteratively extend your circuit effectively as long as you want and I believe there was there was one attack to try and essentially DOS the network by making infant paths there may I think there are some protections against that but I'm not entirely familiar with with what those are Roger can probably offer that up during Q&A or whatever passive attacks so for as far as how to how to secure this network so in order to secure this the tour network we really have to look at it from you know an attacker's point of view we have to understand the types of attacks the motivation of the attacker what the you know what the attacker is going to do and how he's going to do it so first we should classify these attacks as opposed to what what sort of actions the attacker is going to do are they going to just observe traffic or are they going to actively modify traffic so under under just observing traffic we have things like packet and timing coral connection timing correlation looking at things like when does the when does the connection start how many packets are involved the pattern of those packets and when does it finish we also have things like fingerprinting noting the the what a what a what a particular website fetch looks like through tour fetch of Wikipedia a particular Wikipedia page may exhibit a particular fingerprinting pattern of particular pattern of cells of encrypted packets that a guard node or an ISP is able to just recognize intersection attacks are more subtle and more broad basically this is something that you can do passively to catalog a bunch of different attributes of of users and add all those attributes up active attacks include things like lying about your bandwidth to get more traffic as I said your path through tour is you choose your your nodes probabilistically according to the the bad advertised bandwidth that a node says it can carry if so if the node says it can carry more traffic then you have a higher probability of choosing them and that that node should get more clients connecting to it failing circuits to bias node selection that's another one you can try and modify how the how the path is chosen by a client by failing circuits arbitrarily and you can modify application layer traffic at the exit to do things like insert plugins and we'll be getting into more of these specific examples in more detail later is basically just to understand this classification idea so the position of the attack is is another variable another dimension on the on the classification of these attacks you have an internal adversary that is a node operator that is able to see inside and say okay well they can see past that tls multiplexing and see the individual circuits that are flowing through their node they're also at the exit node they're also able to see past the the stream on top of circuit multiplexing and and say we'll be able to determine which streams are are are associated to a particular circuit this ability gives them has a certain attacks that sort of fall out of that which we'll get to later as well the external adversary as ISP or echelon style adversary the they are able to absorb observe large portions of the tour network or you know multiple nodes but they're assumed to be unable to see inside the node-to-node tls streams and it has yet is yet to be proven but it is likely that due to a couple of factors in my opinion that these sorts of adversaries should be frustrated by users who run tour nodes and as both are as both a node and for their client usage the reason for this is essentially the three main characteristics for a successful timing attack have been demonstrated to be the start of the string the time of the start of the stream essentially the number of packets and the end of the stream and if you have if you are running a node you are going to be multiplexing other people's traffic through through that node an external adversary is only going to be able to see these tls connections queuing delays are going to come into effect and cause issues with the adversary being able to tell the exact start of when when your connection started they're going to make it difficult to determine the number of packets in that connection even if they're trying to calculate the differences between packets entering and leaving if your traffic is throttled at a constant rate this is going to frustrate them to some degree to some unknown degree maybe a possibly marginal but I think it it should be there should there's some some hope there so where can tour where does this look like if we look at that that same path diagram that I displayed earlier obviously there's a lot of places that tour can be attacked in fact you can be attacked pretty much everywhere possibly even there should be some arrows for these tls links they are only one or two four bit rsa keys and some nodes have been around for a real long time five five ish years the age of the network is I believe five six years so we're starting to get close to what my opinion would be a comfortable lifetime for one or two four bit rsa key I believe there are plans in success to allow versioning of the of the protocol and possibly allow multi-sized and alternate keys this was at least discussed on IRC at one point so to break this down in a bit more detail obviously a lot of things going down on here we're gonna look at individual cases of of relevant attacks so here's some passive attacks from that basically extracted from that previous diagram so at the top we have intersection attacks and these are essentially performed independent of the tour network maybe with knowledge just the knowledge of the fact that a user uses tour and then on the other side there is a connection made from a tour IP to a to a server and the user may do things like reveal their time zone information or maybe say that they work at a particular company or otherwise reveal personal information that some will some level of vague personal information that over time can add up to eventually identify them at the bottom we have both internal and external timing correlation attacks now these can be performed by as I said an external adversary that is either an ISP or some something observing the internet exchange or backbone routers or it can also be performed by the nodes themselves who are able to see the map try and match up individual circuits active attacks now these are the some of the more deadly attacks a couple of academic papers revolver on this one by preset breeze Nikita Boris off and George Dan Zeiss to basically argued that reliability is essentially the same as security when you're talking about mix networks an adversary who is able to fail large amount of circuits can bias your begin to bias your past selection and and and in this case you can see that if they own the guard node they can fail circuits at the guard node if it doesn't extend to a relay node that they that they control and then at the relay node they can fail those circuits if it doesn't extend to an exit node that that they control they can also attempt to find a side channel in the in the tour protocol one potential side channel is to X or a piece of the the protocol stream at the guard node and as this circuit flow as this packet flow flows through the network the when it reaches the a an exit node is digest is is verified and if the digest by verification fails that circuit fails so malicious exit node then who is colluding with that guard node can then potentially on X or that same bite in the stream and then those two nodes will can guarantee that they're only going to carry traffic that they really want to own they can also combine this as Damon McCoy demonstrated and a few other researchers from University of Colorado demonstrated in his work on their work that bandwidth line can come into play here and they they basically demonstrate the very with very low very few resources you can sort of combine these two attacks and cause people to try and use you and continually fail your circuit their circuits until until they they do choose an exit node that that you that they control now the tour I should clarify that this that whoops that the tour exit node a tour code does print out I believe a notice if if if the digest fails so this is not something that just can happen all over the place without us knowing that even at this point this is something that can be detected but there's possible that there are other side channels involving timing information that are more difficult to detect guard node bias is another layer that can be added to this an adversary at an organization or a government can try and either block tour entirely or more sinister the more sinister approach maybe to just allow access to the guard nodes that they control that fail circuits that don't that that don't extend to the nodes that they're colluding with application layer attacks surveillance and confiscation is one if your browser is if you live in a country that may have somehow that may have outlawed tour Germany has some nebulous laws that were recently passed that may or may not fall in this category some other countries as well actively blocked or it's it's conceivable that others may try and harass users that that try to use tour in these in these situations a confiscation of those tour users computers may prove profitable to try and reveal browser history information cash information what have you client misconfiguration is sort of a passive attack if a user uses a sock the socks proxy improperly uses the socks for version of of of the protocol instead of four or five DNS requests can happen locally over the network and then that the resolved IP is then fed to tour in this case you've you've basically leaked the host names of the of the sites that you're visiting for the web for the web browser scenario that the tour extension does may verify that things are configured properly but for other applications the you know it's up to the user to essentially ensure that they're doing this right tour does provide print out a notice however again in that case if it's being consistently fed on the IP addresses so if you're watching your vedalia log window or whatever you can see that this is the case how I've missed I've misconfigured things application layer attacks again involve things like in as I said injecting plugins plugins are horrible about bypassing obeying your proxy settings which I'll discuss in more detail later the individual breakdown of how exactly how horrible these plugins are also things like having javascript that waits for events for for when you disable tour or just busy waits until you disable tour things like that okay so that's basically a run down of all the attacks that that this presentation is going to discuss are there any any any questions on thoughts on tax any anything that pops in your mind about those yeah I'm unclear on the quick question you asking what sort of information tour may be logging at the nodes or no no there's a there is actually a false rumor going on to that effect there is no the there's no back doors and tour to satisfy that there and there has been no requirements on tour node operators to actually log any sort of traffic there have been a a couple of waves of seat node seizures of nodes in the tour network to my knowledge all of those machines were returned within a day or two and most of the I believe most of those nodes are again operational of course after rotating there you know refreshing their node keys and what not yeah okay the there's an excellent point then thank you the point the point was I guess for I don't know if there are people who are would be viewing the podcast or whatever this talk that that essentially all of Europe has has these sorts of has this sort of love legislation for I guess the EU has handed it down and several countries have also implemented similar similar sorts of legislation so the European Council oh okay oh the so the no the client is something that you run locally so we'll be using your IP address to connect to the guard node so the guard node will see your IP address so yeah so that that then when the guard connects to the relay it will be using a you know different IP address and additionally the TCP streams are also essentially reused so the TCP fingerprinting attacks don't apply the end-end to your TCP fingerprinting attacks also don't apply okay I think we'll move on to approaches how do we how do we how do we address some of these problems or you know how we address these problems well the first approach while the network was smaller was to try and verify node operators that that you know either legitimate you know we the network operator the network maintainers would know who they are and if there was some trouble with their node they could be easily contacted this is no longer the case essentially you can just go to Vidalha and say yes I want to run a tour node and then it sets up your configuration appropriately for the network has essentially just grown too large for for verification to be applicable at all and hopefully we can continue to grow at larger pass election hacks there's a couple of different ways that your path can be is is is chosen in a specific manner by the client and certain restrictions are placed on a place on your path for security a tour up from the floor up this is essentially the slogan of anonymous this the idea is that if something can fit through a tour you know shove it through tour and if it doesn't you know drop it on the floor so any UDP traffic anything that might be bypassing plug-in settings or what have you is dropped and there are a few different products of projects that do this improving speed network speed and usability is an important component of anonymity to have large amounts of users using the network and scanning nodes for modification make sure they're not modifying traffic at the exit node and make sure that they are reliable they're not overloaded and such things like that and then finally securing the applications trying to make sure that the web browser is not going to magically bypass proxy settings the JavaScript is not going to do mysterious things unfortunately this requires a different threat model that applications don't often consider and we'll get into all these one by one in a bit more detail so starting with past election hacks there's a couple of hacks that that that tour does the slash 16 hack when you're choosing a path no two nodes in that path can be from the same slash 16 net mask so the idea is if someone is doing trying to do what's called a Sibyl attack which is running multiple nodes from you know the a colo provider or even on their same on the on a on their cable modem this prevents them from easily being able to have a large number of nodes on the network that are going to be able to actively compromise users in this way it also prevents ISP ISPs from being able to easily surveil traffic now the caveat is obviously that some ISPs will have disjoint especially the larger ISPs have disjoint IP ranges and they're not always in the slain slash esteem but it does you know that's not to say that this is completely ineffective it is it is some measure of protection against some somebody mounting a trivial Sibyl attack with multiple multiple nodes all from the same you know rack or whatever guard nodes the first as you notice the first node in that path is called a guard these are chosen from the top 50% uptime and top 50% bandwidth of the network they the idea is that well initially they were I believe they were designed they were called helper nodes and the idea was to mitigate profiling attacks so essentially as Nick Matheson mentioned in his talk essentially you don't want if you never want somebody to know that if you're you know some Harley rider it was his example and you like to visit cute overload and you can continually visit if you continually visit the site every morning you never want anybody to be able to determine this if you are continually choosing new entry and exit points all the time eventually with some probability someone will notice that you're visiting cute overload and will you know be able to tell all your biker buddies and embarrass you or what what have you however you want to generalize that example so having a fixed set of guard nodes between two and three is the goal can can make sure that you're not going to be exposed to the whole network in this way and you can you can send us in essence build up sort of trust in your guards and if there is an intimidation attack where some guards try and capture and harass a bunch of users if you're not harassing this way your rational response is oh well I trust my guards even more than I can trust tour even more than assuming that it was a you know end-to-end attack the problem is that that nodes can go up and these guard no nodes can go up and down in general so this can be difficult to do right you can have basically a time trade off of risk if you if you were still rotating through the network because these things are going up and down you can still be exposed to the a large amount of the network for a short period of time and there are a couple of hacks and how tour remembers guard nodes to try and prevent this now so this is because this is less of an issue than it was about a month or so ago tour routers and live CDs there are a few of these one is being presented immediately following this talk Janice VM is an anonymous and and the zero bank virtual machine or I think the major examples of these guys again drop basically drop everything that doesn't go through tour the major issue with these is is it circuit reuse can be a very problematic situation if you are using antivirus software that updates with a unique identifier if you have other ID based software updates or you use aim or you SSH into a shell that has your name in the domain name or you're you connect you an email account that's that you don't want link with your anonymous traffic an exit node can figure this out and begin to associate them now this there is a new name functionality in the Dalia that you can click on say okay I want a new a new identity now but you know that sort of up to the user to be able to differentiate and remember to click on that button and it can be it can be problematic especially if you are routing all of your applications to do like a tour router for example now speed and usability is as I touched on is a key component to tour security security you want as large user races as possible in fact there have been cases in the past where users have been harassed because they they in fact there is an example of a professor who was harassed because there were only two tour users from his university and it was suspected that one of them was engaging in an online scam and so the campus police and possibly some state or federal police contacted him and you know asked him if he was a professor who taught about censorship and censorship resistance and that asked him what sort of what what if any students he knew of were still actively using tour and and basically interrogated him so there's this risk of harassment and there's also the risk of say a blogger who's blog wants to blog about work but then brings their laptop to work and then the only tour user from their at their workplace there it's very easy for the system administrator to see okay this is the only internal IP address that we have connecting to the tour network you know we have a pretty good idea of who this anonymous blogger who may or may not be causing us problems is and who their identity what their identity is so how do you improve this basically users want speed the network to be fast and they want it to be easy so a lot it's my argument that a lot of them you know don't need such high-grade anonymity I suspect that a lot of them are you know concerned about this the data hygiene issue like they want to just opt out of a few Google queries or they want to opt out of you know ordering something won't don't want marketers to have their IP address so they don't want IMDB or whatever to have what movies they like or these sorts of things so for this reason I proposed a two hop-ass proposal the most deadly attack from a from a from a high level point of view for theoretical point of view is the timing correlation attack now this this is effective between 95 and 99.9% in in simulation you can get that much certainty to correlate streams that are entering and exiting that the the the tour network so from a theoretical point of view there's little reason why there should be three hops as opposed to two but there are a lot of implementation details they can sort of that can complicate these things on the conference CD there I do have included this proposal if you want to look through that I have a reasoning of of what what some of these major issues are and what can be done about them to to try and prove things from an implementation perspective these include things like you know the active circuit failure issue they include things like a user using an exit with a specific exit policy that exits to a particular IP address and and basically the you know those sorts of those sorts of issues intelligent pass selection is something that's being worked on by one of our Google summer of code students this is basically that you can try to try and have the client intelligently determine what the latency is between different tour nodes and try and build higher latency pass or lower latency pass and higher bay and bandwidth pass for users that say well we don't need as much anonymity we just want to be able to fetch web websites fast and we just want IP address of use cases essentially last point to improving speed and usability which we're going to go into a bit more detail is ensuring the network is is evenly balanced and reliable it turns out there's a lot of balancing issues with tour which is why it's slow for a large amount a certain percentage of the user's experience rough tour performance so how do we determine how can we try and detect some of these mischievous nodes and what have we found so far so centralized network scanning is essentially using what's called the tour control port to determine the tour control port is a port that the tour exports allows you to build your own circuits through tour and get events on circuit failure and get events on and attach streams to circuits that you build and get events on bandwidth usage and so on so snakes on a tour was a node scan or is a node scanner that I've built and tour flow is a is a python library that interacts with the tour control port and allows you to selectively build pass snakes on a tour essentially the goal of snake that that that scanner is to get the motherfucking snakes off the motherfucking tour to to verify md5 sums of URLs that you fetch make sure that nobody is inserting malicious plugins or JavaScript or modifying inserting exploits into documents or Firefox extensions or what have you this all that we are also at the same time will verify is node reliability and bandwidth to try and detect those circuit failure attacks and it has found some things but for the most part it works against adversaries that are attacking you know pretty much everybody but there are there but it is vulnerable to detection and there are some issues with adversaries that are only tar there are issues with adversaries that are only targeting a select portion of the user base like users who speak a particular language that the scanner does not speak like Chinese so what does this look like from you know from a network diagram point of view again what is the scanner doing what's it able to detect is outlayed on the top you can detect node bias and connectivity issues circuit failure and bandwidth as I said and content modification but at the bottom we have the arms race essentially the adversary is able to tell that you're not making circuits the same way as they as you as a normal tor user because you're not using guard nodes anymore you're connecting to every tor know that you can to try and verify its reliability so the tour the guard nodes are going to be able to set to tell oh well this was a short live connection from this IP and it continually connects to me for only short period of time and then moves on it can then note this IP and provide its special selective service higher bandwidth not failed circuits or what have you similarly the exit node can detect behavioral issues in the URLs that you fetch maybe you don't fetch the images for a page maybe you don't interpret JavaScript or you're missing HD some HTTP headers so and it can also try to vade your your scanning again use either using this detection or old targeting only specific users likewise dynamic and localized content is a problem for scanning if you're just verifying MD5's and the content of a page is continually changing like Google news or localized certain localized websites for in particular languages these become both false positive either false positives for the scanner and or opportune targets for malicious nodes to just sort of attack essentially with impunity but stuff we found anyway that some of this this is pretty interesting we found a a Chinese ISP man in the middling SSL so it was a particular tour node that exhibited self-signed SSL certificates to every SSL site that it visited and tour tour that the snakes on a tour does in fact verify these SSL certificates and stores a bunch of them for a handful of sites and verifies it the same we detected this this ISP doing it wasn't the node it turned out that this ISP for some reason was just trying to own all of its clients pop-up blocking some of the some nodes run a antivirus software so far as it's been exhibited in Windows users only so it must be some sort of antivirus software inserts JavaScript into pages that are fetched to hook the window dot open call to block some some forms of JavaScript pop-ups so that's sort of helpful but then it also may makes things problematic do we whitelist these nodes and then maybe from scanning but then maybe they have the opportunity to do other malicious activity or malicious nodes can look like this behavior so that complicates things a bit so one node in particular one node was also blocking Google Analytics JavaScript which I thought was amusing so that do the Google Analytics JavaScript wasn't able to tell or you add their tour that that IP to whatever statistics that website owner would want to get about you know their users DNS proofing has been detected SSH and SSL man in the middle there was a few tour nodes that the could they seem to come in pairs that were man in the middle in SSH and SSL these were detected and and listed as bad exits and overloaded nodes have also been detected and some balancing issues as well were detected which we'll get into in a bit turns out that there's quite a should be quite a bit more capacity on the network than a lot of users are currently experiencing incidentally when we do find these nodes we are able to label them as malicious and prevent clients from actually choosing them in the future so we have two different tags that that can do that the bad exit tag and the the removing of the valid tag which the bad exit prevents them from being used as a as a as an exit and removing the valid tag prevents them from being used as either a guard or an exit so how do we how do we address some of the the issues of trying to of this arms race of adversaries that are able to detect scanning and provide selective service so from there there are a couple of different perspectives that we can approach this from a client-based decentralized scanning and node-based decentralized scanning client-based decentralized scanning we can use the reliability averages from tour flow and then alert the user if a guard if their guard is failing more than that percentage of circuits or that percentage of circuits times you know the you know two times this the standard deviation or whatever also we can alert them if they can only connect to two guards for example or two guards that fit with their whatever firewall restrictions that they may have we can also potentially observe the bandwidth and latency of their connection but that gets a little nebulous for different types of users node-based scanning can do things like gather statistics from on average capacity and queue lengths and compare that to the node rankings and make sure that nodes match up to you know what what their expected capacity should be and they don't have large disparity between their their how much they dequeue and and how much is being in queue to them and then the this sort of node-based information can be used as a feedback loop to improve balancing as well so passive client node node based scanning can looks essentially like this you have three different clients here in this example that are able to detect a few different things about their network in this top case you have this client who's able to detect that all he's got a high rate of failure to his this first hop that's maliciously failing circuits the second node is able to say well I can only connect to a few no guard nodes and all the others connections seem to fail no other explanation like restrictive firewalls for port for port-based firewalls or whatever what or anything else seems to explain this print out a warning alert the user and in the bottom is the more nebulous case doubtful how to what degree that this this this may be successful trying to detect a lying node that is lying about its bandwidth that may exhibit lower stream rates or maybe higher amounts of of timeouts than normal on this sort and in the center you have a node scanner that is again listening to those events on their control port potential events that we can add are the RQ rate of Q size rate of it increase and rate of drain to to detect overloaded nodes and lying nodes will probably will end up appearing pretty much the same we can also detect have got a statistics on what is the rate of failure through two circuits that are created through me and and repeat report failing circuit of nodes of fail circuit that way as well so balancing issues as I as I mentioned through scanning we're able to to determine the the tour network is unbalanced there is in fact a guard a bug in versions less than 1 to 0 1 2 15 bug number 440 on the the the tour project bug of fly spray bugs ill equivalent the essentially the issue is that nodes were at the guard nodes were accidentally being chosen uniformly across the guard node space rather than weighted by the bandwidth and this has several issues as far as reliability and user experience also there's a bandwidth clipping issue the limit there's a limit on how high where we are going to believe what the bandwidth of a given node is in order you know so that nodes can't show up and say oh I have a terabyte of bandwidth send me all your tour traffic it's currently clipped at 1.5 megabytes a second there are about 32 nodes that have capacities in an excessive 1.5 megabits megabytes a second so that's essentially wasted capacity so one of the results from from scans it turns out that the top five percent of these nodes have essentially have room for about seven times more capacity and the next 10% have room for about three times more capacity we're going to grab the breakdowns of that by percentile and how that was done in these successive slides there are also high circuit failure rates that drop-off that stop at about the point where you nodes are no longer considered for guard status and there are also a high extent higher than normal extend times beyond this 50% mark as well so how's the scanning done essentially we divide the tour network into five percentile ranges about 80 nodes per the per range and build on the case of circuit scanning and scanning for circuit failures build about 503 hot pass for for each of these 80 nodes and fetch a small file through each of through each path we count the number of failures and track the extend times to this bandwidth scanning is similar fetch a 512k file 200 times over two to outpass and average you know the the bandwidth that we observed through them so what does this look like what is this misbalancing look like and what are some of the effects for users this is the breakdown of node bandwidth by these five percentile ranges you can see it follows the typical power law statistics exponential drop-off of of of the amount of bandwidth that nodes are able to provide turns out the first five percent of the nodes provide about 45% of the bandwidth the next 10% after that provide about 30% of the bandwidth and then these rest of these guys provide the remainder 25% now the average stream rate a stream bandwidth also seems to follow this this statistic even though in a balanced network every stream should every node should be receiving at the traffic proportional to its its capacity so you have the the the again the seven the seven X more capacity the average stream capacity for the rest of the network past you know this this range is about 10 kilobytes a second as you can see this from this diagram and the first 5% the average capacity is 75 kilobytes a second for some reason and then right in the next than the next 10% after that has about 30 kilobytes a second so what does this look like in terms of circuit failure again you can see steadily rising circuit failure till about the 50th percentile of of the network where nodes stop being considered for guard status and then it drops off there's a mysterious blip on the radar here about 25% I'm still I looked at a number of factors they'll show in successive slides still I'm still not sure exactly what what the cause of this is it it it remains a mystery it it's not time of day or uptime or anything like this it could be that this just happens to be a sweet spot for for a certain class of users for rate limiting their torn node and for some reason these users don't exhibit that circuit failure you also see that extend times are reasonable again in this top 15% of the network but then they start to spike and continue to rise up until about the the point where they're got nodes are no longer considered for guard status and then they drop shot sharply off after the guard status flag is no longer possible so what does this mean from a usability standpoint so I actually did did know a couple of calculations and turns out that about there's about a said there's there's a 70% chance of choosing a pretty badly unbalanced guard and where that number comes from as you see here from the where the extend times really begin to pick up is around this 15 and then all the way up to the 50th percentile and then every user is going to choose a guard from the 0 to 50th percentile so 15 to 50 is 35 of 35% double that is is 70 the tour the goal of tours to maintain three guards so for existing clients that have already chosen their guards they have a 34% chance of choosing three unbalanced guards so what this means is for 34% of users tours likely unbearable and anecdotally this is the case as well a friend of mine actually convinced her to install tour she comes back to me about a month later and she says well you know what I made my internet connection was I thought it was broken for a week and it turns out I just left I just had left or on so for for 34% of the users there they are likely experiencing some pretty extreme pain until users begin to update and and choose guards based on based on this bandwidth waiting then further statistics three choose two times the probability of choosing one not so highly loaded guard you have a three times point seven times point seven times point three or forty four percent chance of choosing two out of three bad guards so for 44% of users tour is going to be going to build circuits 66% of the circuits are going to be slow 33% are going to be reasonable and then 19% are going to have one bad guard and only 3% of the users are probably going to have a reasonable guard so other factors of load balancing insane exit policies allowing bit torrent peer-to-peer traffic and SMTP are big factors and nodes that are failing a lot of circuits as well I was able to determine that unfortunately some of these are also hard to see beyond the that initial noise of of the circuit failure due to the this balancing issue high uptime versus low uptime this is probably a factor in that if you're a guard node and you're running for a while users are going to choose you and hold on to you for as long as they can after a while if you run for longer than everybody else in that work you're gonna attract a disproportionate number of users they're going to use you so far this doesn't seem to be a definitive statistics probably because guard nodes are are do go up and down but I haven't looked at this in a lot of detail but again because it's hard to see through the noise scarce guard bandwidth their guard guard node bandwidth turns out makes out about makes up about 40% of the network bandwidth would be nice to avoid these things for the relay choice that's just a possibility just so that they're not being used when in positions where or or like lower their probability of being chosen for for non-guard positions because they don't they they are somewhat scarce resource they're greater than a third of the bandwidth but the that remainder should probably be weighted accordingly to to that seven percent that's left over directory versus no traffic nodes that are directory mirrors the balancing between those two may or may not be an issue and time of day and location or potential possibilities that could explain some of these blips like in here it's I did run these scans over a number of days as possible this was daytime nighttime daytime nighttime daytime nighttime so that's a possibility I did try rerunning this one at a 12-hour offset from when it first show up and this was not that was not an issue on this blip so so questions about the balancing issues usability issues of of guards and that are not running network scanning questions oh yeah so if you were if you want to try and not if you are used to her and you think that yeah a lot of time it is slow and it's variations on slowness what you can do is try and find this this this state file and just just remove that tour will choose new guards proportion that's probably the best thing that for you and then guard tour the newer versions of tour will choose guards but proportionally bandwidth and you have a higher probability of choosing those those Leslie weighted guards at some point we I believe we will expire the guards that you know of users for that it chose them prior to that that oh two one five fifteen version so at that that point the network should just what am I as far as I can tell should magically rebalance hopefully everything will be a lot better and there should be four of the other math the other piece of math is that there should be about four times more capacity either for more users or more bandwidth per user so a current estimate is 200,000 users should be room for eight maybe 800,000 but there's a lot of factors in there that you can't really say that for sure there's an economic factor of use some users will put up with a certain amount of speed and then you have that sort of supply versus demand type curve Steven Murdoch posted some nice graphs on that sort of behavior to the to the OR talk list yeah so it is it is done probabilistically so you know if you waited by the bandwidth unfortunately although only metric we have right now is what nodes say that they are observing that they are are carrying but we are able to detect liars as I said by the fact that that they exhibit the lower stream bandwidth now there's potential for a sort of a feedback loop there so you can say oh well these guys are exhibiting a higher amount of circuit failure via those decentralized mechanism scanning avoid them these these have more capacity and sort of dynamically rebalance to try and deal with the nuances of of what might not be accounted for by that just basic probabilistic waiting but that's something that's further down the road after we fix you know these these basic issues scan it I believe if you search the OR talk archives its name was one so the subject was oh my god I found one or something like this all right after I had announced the the snakes on the tour so I found I found a snake and I was all excited so is that post where you can laugh at me or whatever okay the SSA I should I should clarify though the SSH man in the middle was actually observed initially independent of the scanner somebody I want have a day job not able to consistent consistently watch this stuff turns out somebody found the SSH man in the middle we're able to detect it reappearing at other dates under different names so they gave up shortly after the the person found them in the mailing list and then they tried again a bit later and we found them then so okay so the application layer this is I've done some work on tour button to try and secure the web browser so what what sort of things go into securing it the application layer of tour tour has a basically a a superset of the threat model that that most applications were written for no UDP unique identifiers are bad have to obey the proxy settings location information is it shouldn't be translated shouldn't leak your time zone updates are dangerous and you know it it's there is a hostile network essentially this last one is a couple of these do apply to other scenarios as well proxy settings for web are kind of important I think in so from JavaScript malware these days we have a corporate internet that you would like to protect from from JavaScript that is able to scan your the host on your network proxy settings and a browser that obeys them it would be a nice thing to have a Firefox does have this property past versions of IE have questionable behavior in this regard where we haven't yet reexamined the current behavior of it but updates again some Firefox extensions aren't sent over SSL that was in the press recently so what sort of things are are are dangerous from from the the from a web based attacker bypassing proxy settings is the main one this can be done with Java essentially JavaScript events and plugins plugins are horrible at obeying their even their own proxy settings JavaScript meta refresh is another one waiting for tour to be disabled that's adding a really long meta refresh timer or a continual like refresh loop a correlation of Torah versus non Torah this can be done some surprising ways cash is one you can embed a unique identifier in a in a in the document you serve through a through a torn node and then that you need you can inspect that the DOM of pages later to see if if that particular content element has that unique identifier that you've even embedded into it cookies are another big one the cookies of several websites do not tie themselves to SSL so if you visit gmail.google.com is one of them if you visit the HTTPS version of Google and you log in through HTTPS and then you continue to use it through HTTPS you you will have this have a number of cookies one of them is this GX cookie this cookie is can be fetched out you can use that to access mail.google.com outside of SSL now this also means that you can do you can insert a content element into see a when a user on the local network or through the tour network visits like CNN.com and you can cause the browser to transmit that that that cookie grab it and then fetch their inbox behind their back or send mails them or whatever and there's a lot of several websites have this disorder property I was just well I just ordered some stuff a bit ago and the the shopping cart had the same problem where I was through SSL but if you looked at if you you know look used the Firefox extension cookie color whatever to inspect the cookies the properties of the cookies their cookies were any type of session you know they they could be sent outside of an SSL connection so you know if somebody is possible that in those cases somebody can go back and review my order and you know try and grab my credit card information or whatever so those sorts of things are are a danger both for Torah and and non-tor usage history disclosures another big one JavaScript can in fact can examine the DOM attributes of links and determine if they if they have the visited attribute or not it can do this with a very high rate of speed 10,000 queries a second a second see if you've been to issued particular Google queries see if you have visited certain websites the great firewall the great firewall can do this against our users to see if they've googled for for censored Google queries you can even do this without JavaScript you can do it with CSS and set a the attribute the style for visited to fetch an image and and without any JavaScript at all examine people's history and see if they query it for certain URLs and then general anonymity set reduction essentially comes about because of user agent location locale information and proud of other other you know information like if you're the only ice weasel Debbie and user there's probably well there's probably not very many of them that definitely cuts down your your anonymity set conservatively and history records for that search and seizure case that I discussed before so plug and wall of shame all these plug-ins can bypass proxy settings in one way or another flash is pretty great because you look at it you will look at a wire shark you'll say hello hey it did fetch this thing through through my proxy that's great and you'll use it for a while and then all of a sudden it'll start making some connections outside your proxy and you know then it just disobeys them for some reason so I'm not sure I don't know the details of action script that maybe you can just say whether you want to listen to the proxy settings from the browser or not or maybe it has to do with how the how the embedded object is referenced in the page or if the that that flash thing is first the movie that then fetches other content itself through action script those are possibilities quick time has a has a proxy for real-time streaming protocol is primarily UDP media protocol so that's really irrelevant and that proxy setting doesn't apply web streams and so other settings windows media players than probably the most amusing one it does have proxy settings even has a no bypass option it still ignores it so you can set that you know in that application and it'll still use both proxy and not proxy if you're watching an ethereal adobe acrobat reader will leak DNS and player plug-in obeys proxy settings for API files and stuff that you fetch just via straight HTTP but it does support this real-time streaming protocol which is UDP and some other things that it's unclear what the how those proxy settings really apply so what's the solution this I did as I said I did some work on improving tour button which is Firefox extension for tour so you can toggle your tour usage essentially I did I disable all plugins while tour is enabled I isolated the dynamic content of a page in a couple of number of ways CSS can fetch elements based on your your hover or whatever to do CSS based prop-ups without any JavaScript involved so I wrote an MSI content policy that basically I tag every tab with the tour state of the load for that tab and if the if a successive document fetch tries to happen through you know through that tab and it's the original the load state of that tab is different than the current tour state that fetch is blocked also similarly I disable JavaScript based on the the tour state of the initial load and as soon as you toggle tour I flip a disabled JavaScript and all the tabs with a state different than the current one and re-enable it in all the states all the tabs with the state that is the same as the current one cookie jars your code I have an option the via code contributed by Colin Jackson that saves all your cookies to a cookie jar that can't be accessed through tour your usage and is restored when you turn off tour cash management make sure the cash never writes the disk make sure that it that it's wiped when you toggle tour history management this is essentially the two to a another interface that was implemented by Colin Jackson and one of his extensions to prevent at the at the rendering engine level to not not to inform it whether a link is visited or not so it blocks both the CSS and the JavaScript methods of determining whether you're not or not you have visited site and user agent spoofing during tour is done a little bit better than the other extensions for the for that user agent spoofer is a Firefox extension that that tries to do this but it misses a couple of JavaScript ways of determining the user agent through the navigator object you actually have to act the Firefox itself has user agent settings that it doesn't apply to certain elements of this navigator object that you have to hook the methods of in order to return appropriate versions for for those for the windows methods are and attributes are examined time zone and local spoofing another thing this is this a bit tricky to the date turns out there's no way to specify your your time zone in in in Firefox you actually have to actively hook the date object and make it return UTC times are the the example snippet of code for that is on is on the conference city if your JavaScript guru please have a look at that I did try and bet it pretty much every every way I could think of to try and examine ways to access the original wrapped date object that I wrapped via lexical scoping to try and determine what the original time zone was and various ways of copying objects and so on but it and as far as I can tell the method that I used to wrap that jobs that object according to every round JavaScript reference I read is a valid method of obfuscating private member variables from or it protecting private member variables from external access but it's possible it may it you know I may have missed something so please yeah please please do have a look at that the ideal solution is for Firefox to provide a time zone setting that that and you know we can guarantee applies so a demo of these sorts of things with tour button is probably in order so I have a few websites loaded over here with within basic information on this particular page has user agent information I don't know if you can see this font I can make that a little bit bigger after a while there there's there CSS starts to fail but here you can see various various use properties of that never a navigator object are queried there then they report that I'm using Linux a particular build of Firefox and operating system in CPU so if I then turn on tour and fetch this via a you know tour fetch you can see when as soon as that loads we'll see that it has been reset some of the other example I have is a the CSS exploit able to verify that I've been to Google and slash dot time zone information here's the the passive effect I'm on Pacific Time I'll plug in information a list of my plugins here's a CSS only job history disclosure no JavaScript involved they're able to tell that I've visited Google so I probably should have loaded all these so I run I run through all these so here you see that the user agent has been reset to windows all these properties are win 32 so the user agent is pretty much taken care of all the CSS exploit stuff fails or the JavaScript method of detecting history fails to tell that I've been to Google or slash dot my history has not been cleared it again it's through the rendering engine time and date information this one plug in information no plugins have been loaded CSS history haven't isn't able to tell that I have been through Google CSS pop-ups this this I should redo this one oh slowly slowly slowly and then there the time zones are always set to a GMT on this one so the the CSS pop-up behave a little bit differently I have to reload this here you can see that this thing fetches images for all whenever I mounts over these these elements based based on that a hover CSS attribute if I reload this again and don't hover over over those and then turn on tour and then hover over them they don't fetch so via the content policy interesting technical details is the job I think the probably the most interesting one of the one of these is the JavaScript hooking if you are a JavaScript guru again have a look at that the hand the blackhead handout on the conference cd and have a look at that JavaScript to my knowledge it is solid but it the I tried about a dozen ways of attacking at accessing the scope variable trying to access the JavaScript source code via to modifying the two string and such things and copying various state objects and all those seem to seem to fail so all those different attacks so final thoughts I guess oh we're wrapping up well quite a bit early final thoughts tour security is not equal to internet security it's a it's a super set adversary has different goals a lot of apps don't consider privacy vulnerabilities the same as as regular vulnerabilities they are you know it's again it's a different sort of threat model involved so credits and contributions the following people have contributed we got Scott Squires the original tour botan author Colin Jackson did the history hooking and cookie jars borrowed his code Johannes Renner is the Google summer of code student working on tour flow and path intelligent path selection Nick and Roger for advice and tour in general and then shout out to my co-workers I guess for well at least one of them signs the expense reports and the other two are are pretty good guys in general so what can you if you're interested in helping tour on the conference cd is a Linux script for doing load balancing to prioritize tour traffic below all your other node traffic it I use this on my own tour node it had there doesn't I noticed no impact from the tour traffic over my SSH traffic or other web traffic you basically it's used the Linux QoS support so it's pretty handy a nice way to run a tour node without really interfering with your other activity on your shell server or your whatever hosting you might be using and then again please try and raise awareness post some you know plugins or patches to your to your apps to protect against information disclosure you know working work to raise awareness about privacy issues consider it part it should be considered part of security measures I think there's a lot of arguments for some of these issues with proxy settings that can be made independent of tour similar to those things that's probably the best way to convince somebody that that a an issue with a tour is a good is is a good thing to fix is to find the example that outside of tour where it's still it's still a problem so that pretty much that pretty much wraps it up questions comments on those sorts of those protection mechanisms that is still it's pre-proposal form so there's a proposal system and tour where people write up proposals to essentially work on you know the proposed features there's a lot of subtleties there and an anonymity issues with providing that there's also issues where you know dissidents in China that are trying to bypass their their firewall aren't able to you know do this sort of thing so that I mean do they get no service or are they going to be you know is they're gonna be horribly abysmal service other general questions yeah I guess that I guess we were wrapped up quite a bit early