 project to tell us everything we want and need to know about this work. Thanks. Hey, I'm Isis. I'm from The Tour Project. As a disclaimer at first, I'm not actually a computer scientist nor am I a cryptographer. I studied physics, specializing in HEP theory and also English literature with a specialization in feminist critical theory. I've worked for tour now for six years as a core developer. Some of my previous projects include the open observatory of network interference, which is a global framework and a set of test suites for detecting network anomalies. Mostly I was interested in using it for detecting online censorship, but it does do other things. My current work mostly involves working on tours circuit-level code and path algorithms for choosing which relays you use through network and doing work on tour bridge distribution and various censorship circumvention things, plugable transports. If you want more information, here's my homepage. There's my little ponies if you enable JavaScript, which you shouldn't. Why should you care about privacy? Glenn Greenwald in a TED talk a few years ago said, there's an entire genre of YouTube videos devoted to an experience which I'm certain that everyone in this room has had. It entails an individual who, thinking they're alone, engages in some expressive behavior, wild singing, gyrating, dancing, some mild sexual activity, only to discover that in fact they're not alone, that there's a person watching and lurking, the discovery of which causes them to immediately cease what they're doing in horror. The sense of shame and humiliation in their face is palpable. It's the sense of, this is something I'm willing to do only if no one else is watching. This is the crux of the work on which I've been singularly focused for the past 16 months. The question of why privacy matters. Some people will respond to this and be like, well, that's all fine and good, but there's nothing that, I'm not ashamed of dancing, I'm not ashamed of whatever. There's nothing I have to hide. The best response I've seen to this is Ed Snowden's response in his Reddit AMA saying, arguing that you don't care about the right to privacy because you have nothing to hide is no different than saying you don't care about free speech because you have nothing to say. Further, even if you don't care about your own right to privacy, it's pretty antisocial to say that I have nothing to hide, therefore no one else should have anything to hide. This excludes a whole bunch, the work of a whole bunch of other people that are different to you and basically says that everyone should just be this conglomerate majority where we're all average and normal. It's saying I don't care about your right to privacy, I don't care about your rights to freedom of speech and expression. There was a quote from last week at Logan Symposium put on by the Senator for Investigative Journalism from Ed Snowden's talk and he said, privacy is the right from which all others are derived. Privacy is the fountainhead of individuality. Without privacy there is only the collective, there's only society, there is only influence from groups, from large powers that shape every person to bring them into that fold and to make them all alike. So if we assume you have nothing to hide, you've never done anything wrong ever in your life. You've never done anything illegal, not even jaywalking. You don't care about anyone else's rights. You should still care about your own privacy because laws and societal norms can change or maybe interpreted differently over time and if a recording of everything you do is happening at all moments in time, someone when the laws or norms change in the future, someone can look back and say, you know, well you did this wrong then so we're going to punish you now because now the law says that we can put you in prison for jaywalking 20 years ago. In the United States, I know this doesn't apply as much here, but in the United States it's actually currently unknowable not only which laws apply but even the number of laws people have tried to count them, see in a second, which is a little bit scary that we don't actually even know the number that applies. This is a quote from a Regent Law professor in a really good lecture that you guys might want to watch. It's on YouTube. It's very entertaining. He's basically explaining why you should never ever talk to the police. It will only hurt you. And in this he does mention that the estimates on laws in the United States code, there's roughly 27,000 pages of law. Worse yet they refer to external documents that are hard to track down. There may be about 10,000 more of these but no one's really sure. A few people did a few different groups including the American Bar Association tried to count at some point. No one was able to come up with a good number. Further I would also argue that privacy is essential for continued open progression of science. Another really good video if you guys are not familiar with Juice Rap News, if I have time at the end I'll show it. They are a satirical news broadcasting agency that makes rap out of like current events and so from their video Big Brother is watching you. They have a really great quote which explains all the different types of people who at the time would have been considered quite strange or like their thinking was not acceptable who later became luminaries. We're told that we need safety which is precious yes but can a society that can enforce all its laws ever progress? Hindsight shows that many figures guilty of thought crime turned out to be luminaries and heroes before their time. They show some images of people like Martin Luther King, Galileo, Huey Pete Newton. But if the surveillance state had brained then in this form and design just think of all the progress we may have been denied. Could lobbies for women's or gay rights have appeared and thrived? Would revolutionary ideals have materialized? Would science have pioneered or even survived if every word had been monitored by thought police and spies? This is all fine right? We have privacy right? We have TLS most of our traffic is encrypted these days. It's good. Everything's private. And some things are getting better. You can see that encrypted web traffic has more than doubled since the Snowden revelations in June 2013. Depends what you mean by doubled. If you read the article it actually says it's doubling from 2.29% and it's only doubling in North America really. In other places it's going up to much higher but still as you can see here like 10% is really not that high. That was 2014 here. 2015. You can see that 20% of web traffic is encrypted. This year it's predicted to go up quite a bit as Netflix switches to using TLS. These graphs are actually not that good. I should warn that they're by total amount of data sent and not the number of connections. So it's a little bit skewed because when downloading videos can skew the graph that much. Maybe not the best. So if we could imagine a world in which everything was encrypted and authenticated, all emails are signed in encrypted end to end. Everybody's using Cypher suites that offer high security. Crypto implementations are correct, secure, side channel resistant, formally verified and exposed, hopefully impossible to abuse APIs. And applied cryptographers have triple finding jobs. They still get the metadata. What is this metadata stuff? From the EU's data detention directive, which was actually annulled a few years ago as being against the charter of fundamental rights, thankfully. There are, from what I understand, some measures to reintroduce it. But I only recently living in the EU, so not too familiar. It says that members states shall ensure that the categories of data specified in article five are retained for periods of not less than six months and not more than two years from the date of the communication. All right. So what types of metadata would the government be interested in and why? They want data necessary to trace and identify both the source and the destination of a communication, to assume that's probably the IP address. They want to identify the date, time and duration of a communication. Assume this would be used for telling what you're accessing or when you're accessing things. Data necessary to identify the type of communication. So whether it's a phone call or email or SMS. And data necessary to identify user's communication equipment or what purports to be their equipment. This is getting a little bit scarier. Obviously they want to know which browser you're using or if you're on a phone or a laptop. And data necessary to identify the location of mobile communication equipment. Encrypting and authenticating content doesn't prevent any of this. If you're using TLS on your phone, going to Facebook or Twitter or whatever, they still get all of this. But it's just metadata. There's a few good quotes about what metadata can be used for and how it's actually more useful to governments wanting to track the behavior of citizens. Metadata absolutely tells you everything about somebody's life. If you have enough metadata, you don't really need content. It's sort of embarrassing how predictable we are as human beings, says Stuart Baker, formal general counsel of the NSA. Metadata is our context. And that can reveal far more about us than the words we speak. Context yields insights into who we are in the implicit hidden relationships between us. There's also been for a long time, knowledge that various agencies all over the world are overrun with having too much data. In particular, a very good example is recordings of voice calls. They're always backlogged on trying to process these things trying to make transcripts of them. And with just the metadata, it's much easier to automatically process this. It's machine readable. You can generate graphs of who knows who and who's talking to who much faster than you can learn information by trying to analyze the content of the communications. And also, the United States kills people based on metadata. Is metadata all an attacker gets? A common assumption is that metadata only includes who's talking to who, or which website is being requested. But we assume that metadata doesn't actually include any information about the content. This is not exactly true because there still is at a protocol layer implement extra information about the content. For example, an interview with Jimmy Wales of Wikipedia, he said, he was asked, you've said that you're going to start encrypting communications on Wikipedia as a result. And he said, you know, we've done this. Hopefully, the only thing that GCHQ can see is that you're looking at Wikipedia, but they won't be able to tell what you're reading. Not exactly true, even with TLS, even if everything's encrypted. As a small experiment, take 10 we get PGA pages and load one at random through over TLS. The attacker sniffs the network and tries to figure out which one. There's various information there that could be used to determine which page is being looked at. For example, the difference between a page with no images and a page which has several images. The size of the article, you're able to get the number of bytes going back and forth in each direction. You might be able to determine, you know, also with a limited subset of different pictures of different lengths of URLs to pictures. They can count bytes. You can do the same thing for responses going in the other direction. You also get timings of things. So for example, if Wikipedia were malicious, it could give you the response back chunked up into several sections and give it like in very short bursts in order to signal to the other side that you're requesting a certain page or something like that. So what can we do actually to protect our privacy and our communications? Recently, a special report to the UN said in his report, states should promote strong encryption and anonymity. National laws should recognize that individuals are free to protect the privacy of their digital communications by using encryption technology and tools that allow anonymity online. Legislation and regulations protecting human rights defenders and journalists should also include provisions enabling access and providing support to use the technologies to secure the communications. All right. So there's this other thing called anonymity. And it should be viewed as, well, I guess, first in terminology. The first thing that we talk about when you talk about anonymity is the anonymity set. And it's the set of all possible users who, given one transaction, given one user sending a message, the anonymity set would be all the possible users who could have possibly sent that message. For unlinkability, there's two types. There's absolute unlinkability, which is absolute unlinkability ensures that a user may make multiple uses of resources or services without others being able to link these uses together. Unlinkability requires that users or subjects are unable to determine whether the same user caused certain specific operations in the system. So for example, a user sending a message should be unlinked from that same user sending a different message, even if it's to the same person. There's also relative unlinkability, which is basically the same thing, but that given some attackers previous knowledge about the state of the system and about what the user's doing, they shouldn't be able to link it to something else. So even if they de-anonymize you one time, even if they know like you sent this message, they shouldn't be able to just automatically take that and say, oh, and all these other messages, you sent them as well. There are two types of generally two types of anonymity networks. There are those which have trusted relays and those which only have semi-trusted relays. Trusted relays are ones in which the privacy and anonymity guarantees of the network rely on some centralized node. So for example, if just one node is compromised in this network, anonymity guarantees fail completely. These were mostly some of the first designs for anonymity networks. Sometimes there still are them, but I don't understand why, frankly. And then there are those that use semi-trusted relays that is compromise of one relay shouldn't result in total de-anonymization of anyone using that relay. These networks also make it safer for operators of the nodes in the network because it's much harder to coerce. It doesn't do you as much good if you coerce one node, if you send them a warrant or something, you're not going to be able to de-anonymize anyone. So it protects the operators as well. An example of a trusted relay system from quite a long time ago was one of the first email mixers. Johann Helsinges started running a trusted mail relay called a non-penet fee, providing anonymous and pseudonymous email accounts in 1993. The technical principle behind the service was a table of correspondences between real email addresses and pseudonymous email addresses kept by the server. Email to a pseudonym would be forwarded to the real user. Email from a pseudonym was stripped of all identifying information and forwarded to the recipient. While users receiving or sending email to a pseudonym would not be able to find out the real email address of the anonymous correspondent, it would be trivial for a local pacific attacker or the service itself to uncover the correspondence by divulging the timing or by correlating the timing of incoming and outgoing email traffic. Another system that's not used very much is IPsec tunneling, where you have two internet gateways that are running the IPsec tunnel, and then you have two networks behind them, and in between from the internet, anything going between these two gateways, the nodes behind it are indistinguishable. This isn't used very much, and also you can just send a warrant to one of the gateways to get information about the users behind it. Also, potentially a small NNMD set. Another type of a trusted relay system is called an anonymous proxy. They're usually application specific, so a proxy like an HTTP proxy where you send it a request and then it just takes the HTTP request and on the other side it does the request, it gets the response back and it shoves the response back to you. There are more generalized ones, for example, SOC servers, which can handle other protocols on top of them. Request the website coming from the proxy. All users behind the proxy are indistinguishable, and there's various problems. They're again a single point of failure. It's easy to correlate incoming and outgoing traffic because you see a request of one side going on one side and you see the same thing go out on the other side. There's no crypto protecting the connection from the user to the proxy. You could add crypto to it. For example, this is what OpenVPN does. It doesn't solve problems one and two. It does make your connection more private, so someone like on your local network can't see what you're doing, but it's not going to give you any real anonymity. Again, anyone just sending a warrant to the VPN provider is able to remove all privacy and anonymity that you might have had. Another problem with VPNs is that most VPN client software fails open in that if you continue sending requests to the VPN and your VPN tunnel has actually broken down, you're going to start sending requests out into clear, which kind of anonymity fail pretty hardcore. You can see from the previous examples, obviously, yeah, of course. The VPN provider says, playing the boss to one of your clients, what is your take on that? That's sound practice. What happens if the government comes to them with an NDA and says, here's this little black box, we would like you to install it and also you can't tell anyone. There are technical solutions to having single points of failure, I guess, and we can use them these days and we should. And we should, I mean, it's great to use a VPN for privacy reasons, but if they're advertising anonymity, this is a lie. They can't guarantee your anonymity. Does that answer? Okay. So single points of failure can be exploited legally or otherwise, you could break into the machine and start logging and the operator thinks they're not logging, but someone else is there watching. Are single points of failure? The other design for anonymity networks is to use semi-trusted relay systems. The first designs for which are called mixed networks, they're routing protocols that create, that enable user communications by using a chain of nodes known as mixes which receive messages from multiple senders, shuffle them up in some manner and send them back out to the next destination where the next destination could be another node in the mix. Chao originally came up with this idea in 1981. The idea is that for all nodes in the mix, there should exist some well-known public key. Messages are split up into blocks and then encrypted to the mixes public key. Each mix only knows the nodes immediately before it and after it in the chain, making the network more resilient to sending a warrant to one specific. They're supposed to achieve bitwise unlinkability between the source of the message and the destination. Bitwise, unlinkability is this idea that you can look at the bits that are going in and you can look at the bits going out and they should be indistinguishable. Sorry, unlinkable. And then the message shuffling also achieves unlinkability between user sending this message and user sending a different message. This original design didn't actually work out so well for several reasons. It didn't achieve the desired unlinkability property mostly because tagging attacks are still possible. It was using, as you'll see in a second, there's a diagram for a second slide. It's using RSA in a way that we now know to be entirely unsafe. It was just applying the modular explanation directly to the message. And so in that way you could, for example, trick one of the mixes into signing something when it was doing the decryption operation. And you could also just repeat messages or switch blocks around within messages in order to do tagging attacks. This is kind of excusable. Most of Chom's work was done in the late 1970s and we didn't know very much about RSA at the time. The attack where you trick the mix into signing something when it's doing the decryption operation could additionally be hidden from the mix that you are attempting to pull off this attack by doing some sort of blind signing scheme and blinding the message before sending it for decryption and then it gets decrypted and they've actually blind signed it. And now you can strip the blinding factor from the signed message and retrieve a signature on the original message without the mix ever knowing that this happened. Also, this is a bit excusable because two years later Chom invented blind signing. So at the time that he came up with this first idea, we didn't know about the second idea. And then later people pointed out, hey, your second idea is breaking the first one. So the way that type two anonymous remailers work, which anonymous remailers are a type of mix network applied just to sending one message at a time, usually they're used for email, assume that Alice wants to anonymously send message M to Bob. It uses an intermediate communer called a mix in public keys KB of Bob and KM of the mix. Sends the mix encryption of R1, which is just a random nonce, and KB encrypted R0 and M to the mix. Mix collects a bunch of these emails and this usually takes quite some time. You're waiting for, usually there's just some configuration parameter on the software and you wait for, say, like 10 or 15 emails to come in before you send anything out. So if someone sends an email and no one else is using the system, you're going to have to wait a really long time for your emails to actually go out. Remember using these systems back in the day, it would take a week or so for your mail to go somewhere. So the mix collects a bunch of mails, decrypts to Bob's key and an encryption with Bob's key of the nonce and the message, and then it sends out the mails in lexicographic order to receivers. The receiver Bob decrypts and obtains M. This achieves anonymity if encrypted messages are indistinguishable. Again, you shouldn't repeat the input and output or the blocks. It has no protection against tagging attacks and replay attacks. It also has quite high latency. The mix needs to wait long enough for enough messages to come in. For the return direction, we need a way for Bob to reply without revealing Alice's address or identity. The way that this is done is Alice first includes a return address in her message encrypted to Bob along with symmetric key pairs R1 and KX. Bob can send a response M as an encryption to M with nonce R1 and AX and then an encryption to KX with the other nonce and the message. The mix receives and recovers R1 and AX and then is able to know that it needs to send it to Alice and then it takes the second half of Bob's message and encrypts that to Alice and sends it back. Only Alice can decrypt because she's the only one who knows both KX and R1. Chaum noted originally in his paper that using only one node in a mix would cause problems because, again, any exploitation of that node will compromise the anonymity of users. So several proposals for different forms of mixing were put forward. The first one was cascading mixing, where you use all of the nodes in the network in a specific order and always the same order. And this shouldn't be secure if everyone is malicious, except for one person is honest, one node is honest. Obviously, this has a bunch of problems. Larger the network is, the longer it's going to take and everybody needs to agree on a specified order. And so the second way was to allow is just what we generally refer to as a mixed network now. And it's where a user can arbitrarily pick its path through the network, including the length sometimes. Mixed networks generally have some problems. They are known to not offer the same properties that cascading networks offer. In a paper by Bertold Fitzman and Stenton, they argue that mixed networks are not, the users don't remain anonymous if only one node is honest in the same way that a cascade network. This is due to relays usually being able to determine their position in the chain. So they are able to determine, I'm right next to the user sending the message. I'm the one finally outgoing where this email is going out again. And they're able to use this information in various ways to de-anonymize users. Another way that relays or nodes in this network are able to gain extra information about users in order to de-anonymize them is if a user uses the same path for the network twice. So an example of a type two anonymous remailer was Mixmaster, which was in 1985. And it has some differences to the previous general idea of type two remailers. Mixmaster supports only forward path anonymity, not reverse path, and you'll see why in a second. Messages aren't using just RSA, but they're using RSA and triple days. And messages can be divided into smaller chunks and sent through independent paths in the network. If several chunks from the same message arrive at the same Mixmaster node, they're transparently recombined. Version two, the integrity of the RSA encrypted header is protected by a hash, which makes tagging attacks on the header impossible. Later, this was extended in version three. And the appended noise to the message is generated using a secret shared between the mix in the center of the message, and then this is included in the header. This allowed a hash of the entire message to also be put in an additional header, which allows an integrity of the entire message, not just the header, preventing tagging attacks. This also makes replies impossible to construct. The body of the response from Bob in this case wouldn't be known, obviously, to the creator of the path, and so it isn't possible to construct the hash on the return path. So between the designs for anonymizing proxies and mixnets, there are some pros and cons. Mixnets offer no single point of failure, which is a pretty important one. Inbound and outbound traffic analysis doesn't de-anonymize because of the bitwise indistinguishability. And they offer generally good anonymity guarantees. However, they're really slow. They use slow RSA, public K, crypto for everything, and they're high latency. They're not really meant for web traffic. They're not meant for tweaking or SSH or whatever. They're just meant for sending emails. So anonymous proxies are low latency. There's no overhead from public key cryptography, but they're a single point of failure, and they don't really offer any strong anonymity guarantees. This is where the idea of onion routing came from. So the idea is to use a cascade of proxies called Tor relays or Tor nodes and use asymmetric cryptography to set up an authenticated and encrypted channel. And then in that channel, use faster symmetric crypto in Tor's case, we use AAS in counter mode. For the asymmetric crypto, we currently use RSA, 1024 in some places. Some places we're already using 248. One single node, I think, is using 4,096 bit RSA. In one of the last releases of Tor, we switched to using curve 255.19. So the way that onion routing works is that you assume user has public keys for all relays in the network, which I'll explain in a second how you get all these public keys. And then as I mentioned, you use the public key to set up an authenticated encrypted channel, which is used to establish symmetric key pairs to each of the relays in your path. You do this for both forward and reverse paths, so you actually have two symmetric keys per relay in your path. Tor generally uses three hops in a path. So for the entry relay, you have the backwards key R1 and the forwards key and middle relay, same for each. If you want to anonymously send a request to the intercept.com, you prepare the packet as follows, write the destination intercept.com and you encrypt with the forward key to the exit relay. And then outside this, you encrypt it again with the forward key to the second relay, the middle node. And then you encrypt this again to the entry relay with its key. And then you send this packet to the first relay. This is also why it's called onion routing, because it has layers like an onion. R1 receives the packet and removes encryption with the forward key. It sees the destination to the next relay. So the entry can see the middle relay, and it can see where the user's coming from, but it doesn't know where the exit is, and it doesn't know anything about the intercept.com yet. And then the middle relay receives this packet and removes a layer of encryption with the forward symmetric key that it has with the user. And it sees that it's going to the exit relay, so the middle relay can now see which entrance relay there was and which exit relay there was, but it doesn't know where the user is and it doesn't know anything about this request. It sends it on, and then the exit relay receives the packet and removes another layer of encryption with its forward symmetric key to the user. And it finally sees the destination, and so it makes a connection to the intercept.com for the user. And the response, the same thing happens. The intercept comes back, and it says, here's the reply, and it's talking to the exit node. The exit node takes this, and it encrypts with the reverse key to the middle relay. The middle relay takes this and encrypts it with the reverse key to the end. Well, the exit relay continues to encrypt to the middle and then in reverse direction. So encrypts it to the, I'm not used to this slide, I'm sorry. Encrypts it to, also I think my notation is wrong here. Encrypts it to the entrance relay and then to the middle relay and then to the exit relay. And then at each layer, again, as it goes backwards down the path, each one removes it in turn. So, and yes, my notation is wrong here. These numbers should be in the reverse direction because the cell's going back in the other way. And first the middle is going to remove one, and then the entry is going to remove one. And then the client is going to remove the last layer. Sorry, those numbers should be in reverse. So the way that Alice finds out about all the nodes in the network is when first connecting, you connect to the directory authorities. Should I explain this right now? We're also supposed to take a break at some point. Do you guys want a break now? Should I go on explaining? OK. All right. So the way that you get the full list of tour nodes from the directory authorities is you basically just connect. You get this giant list back. It's called the consensus. And it just includes very brief descriptors for each of the nodes in the network. They include the fingerprints of their public keys, their IP addresses and ports. And that's about it. And so once you have these, you have a full view of the network and you can contact the directory mirrors, which are the directory servers. So you pick a path through the network and you ask these mirrors for some. Sometimes you ask them for full descriptors of the relays, particularly if you want to use a very intro sub protocol of tour, which is a little bit too much detail maybe. So you exchange a symmetric key with the entry node. And then tunneled through that node, you exchange another key with the middle node. The user exchanges keys with the middle node. And then tunneled through both of these, so it sort of looks like a telescope. You tunnel through the entry node and then through the middle node to get to the exit node and you exchange a symmetric key with that node. And then finally, as before, the last node communicates actually with the outside internet. We should take a break now, I guess, and then I will continue in, how long is the break usually? 15? OK. And then so I'll continue in 15. OK. So it's time to restart, right? I think it's been 15 minutes. So there are some problems still with tour. Tour offers anonymity only up to the transport layer. So anything you do with your applications that you're running through tour, we can't automatically anonymize that. So for example, if I use tour and I send a tweet as myself and I say, hey, I'm Isis. But my real name's Erda Pfeiffer and I'm 25. And here's my IP address. Tour is not going to be able to do anything to protect me against being an idiot and posting something like this on Twitter. Like, clearly very not anonymous if I post all my real information and docs myself. Various BitTorrent clients do exactly the same thing as part of the application data. Because it's meant to be peer to peer, they're sending out your IP address so that the other BitTorrent clients can directly contact you and ask you for pieces of the file that you're downloading or hosting. For this reason, using BitTorrent with tour is generally not a good idea if you care about your anonymity. It's also not a good idea for other reasons. The primary example is that the algorithm that Bram wrote for deciding how much bandwidth to try to use is really not friendly towards the way the tour works. So basically, it just tries to, like, say, use as much bandwidth as possible. But then there's a bunch of lag because it's going over tour and then it freaks out and it thinks that there's no network and it tries to use, like, no bandwidth and then it just ends up going back and forth between various states of, like, edge cases in Bram's algorithm and freaking out and either using up way too much bandwidth in the tour network or using way too little in your tour and it's going to go, like, really impossibly slow. So probably shouldn't use BitTorrent over tour. That would be a really good use case for a VPN, actually. I mean, not to anonymously use BitTorrent, but if, for example, like, torrenting this thing is blocked in your country. Another problem is that various applications, especially browsers, are really trivially identifiable. So using, for example, a normal Firefox with tour doesn't mean that you're using Tor browser. For Tor browser, we actually have a whole set, like, a very extensive set of patches. One to really lock down and make sure that you're absolutely sending everything through the configured proxy, which in this case is Tor. We have other patches to decrease the amount of user fingerprinting that can be done. So for example, as you're typing, if you're typing into some form on the internet that has JavaScript enabled, they're going to be able to get pretty accurate timing on your keystrokes. And it turns out that the cadence with which you type is really identifiable. So the way that we fixed this was to make a patch such that there's just a buffer and it's storing all your keystrokes for certain amount of time. I think it's 100 milliseconds. Right now it should be unperceptible to a human, but it's just batching them up and then sending them out so that you're not getting the cadence and rhythm of someone's typing. You'll still get the difference between someone who's typing super fast and someone who's just poking at the keys with two fingers, but you won't be able to tell, like, for example, the way that someone is typing in their password or something like that if they have a specific rhythm that they use. This, in particular, turned out to be like a weird patch to apply because then we realized that some banks are already using this, this typing fingerprinting, when you do type in your banking password. And so we just basically turned down the latency on this, when this buffer is sent out, to like, we basically fine tuned it to where that stuff was still kind of working, but didn't appear to be working to the extent that you could pick out single users. It would just look like you're using Tor browser and then going to your bank, and your bank just thinks you're using a weird browser, but wouldn't be able to tell the difference between all the different users. Other patches that Tor browser does is there's a way that you can tell Tor to use different circuits for each thing. So we have domain level isolation. So when you go to the URL bar in Tor browser and you go to say like facebook.com, it's going to make a new circuit and it basically makes a mapping between facebook.com and just a random socks username and random socks password. And it sends us a Tor and this tells Tor basically to create a new circuit, specifically for this one socks username and password. And then the browser remembers that all of the resources that are attached to this request for facebook.com, like it's probably using some CDN, some image server, it might have some like outside scripts or something. All of those are also put over the same circuit. If you open a new tab and you also go to Facebook again, it'll still use that same circuit. But then if you open another one and you go to google.com, it'll use an entirely different circuit also for all the attached resources. So Tor browser is doing some kind of fancy tricks to help you not shoot yourself in the foot. If you're running other applications at the same time, as Tor, it's harder for ST, well one we can't guarantee the anonymity of this other application. So let's say if you have Tor browser running and you have like it's running Tor and there's another Firefox running and you've also set it to use this Tor as its proxy, we really can't do much to, I mean your Tor browser session should still be safe and it should still be unlinkable from the Firefox stuff. But at that point we can't make any super strong guarantees because you're doing something really weird. So in general, just transproxying things over Tor, whereby transproxy, I mean just like you set up Tor somewhere and then you just shove all your traffic through it, is in general a bad idea because you don't know actually what those applications are sending, like what data about you and other things. So for attacks on the Tor network, most attacks assume that or attempt to control both the guard and the exit. These are called traffic confirmation attacks. So it's, since the exit can see where you're going to and the entrance can see who you are, if you can control both of these, most attacks try to find some way to do some sort of signaling between the two to show like, hey I'm also the same bad guy running this node, okay you know the user, can you tell me the user and I know what they're doing, here I'll tell you what they're doing. Either by like correlating that information later or by doing some signaling through Tor in some way to show that they know who the other one is between the middle node. One of the, it's sometimes a problem in Tor and it's sometimes a feature. We actually have leaky pipe topology which means that it's not always as they explain it that the user is like doing this telescoping thing of setting up these channels between these other channels and only talking to the relay at the end and only talking to the internet. The leaky pipe actually allows sometimes various things in the circuit to talk to various other things in the circuit. So for example, this is how the recent CMU attack worked we had a bug. Well, there were two specific types of commands. One is relay which means just like it's a command that you can send and Tor and it happens at the circuit level. So you basically, it's something saying to someone else please pass this along. There's another type of command called relay early which relay early was introduced a long time ago because there was an attack where you could use these relay cells to do extend cells which extends are how you get this symmetric keys setup. You could use them to set up an actual infinite length circuit so you could build almost like a cascade network like you could build a circuit through everything in the network and obviously this would allow you you do this a few times and this allows you to pretty trivially DOS the network and use up all the resources on the relays. So in order to fix this we introduced relay early cells and relay early cells are exactly the same thing for the most sake as relay cells they just say pass this along but they just have a different identifier to them and so you count the number of them that you see going through. So they work exactly the same way but you say like we only allow eight on this one circuit to pass through here. But the problem was that when we fixed this infinite circuit length attack we only made it so that the relay early cells could go were limited in their number in the outgoing direction so from the client towards the exit node which is a bug because it allows infinite number of relay early cells to be sent in the opposite direction. This allowed the CMU researchers to hope, I haven't, I'm sorry I have two computers in front of me and I forgot which one is which because they're both black think pads. So I have been hitting the key on the wrong one. Here's all the slides that I was talking about while I was stupidly pushing the button on the wrong laptop. All right, so we fixed that bug. We didn't limit the number of relay early cells in the reverse direction, this allowed the CMU researchers to encode tags by alternating relay and relay early cells going in the other direction. They also did at the same time a simple attack on the network by running a whole bunch of high bandwidth nodes from their university and not, we have a way in Tor called, there's a setting where you can say the family of your node. So you can say, I'm running all of these nodes, they're all mine and then Tor clients don't use two nodes from the same family within the same circuit. Obviously that requires the operators to be honest that they're actually running all the same nodes. If we catch you running a bunch of nodes, which there's various tests we have set up to tell, if we catch you running nodes that are all probably the same party running them, like for example, Tor relays report their uptime. So if you're running a whole bunch of nodes on a network and they all have exactly to the second the same uptime and they're all like on the same subnet, you're probably like all being run by the same person. So there's various ways that we can tell and they're running a civil attack in order to try to get users to end up using one of their relays as both the entrance and another one of the relays as the exit at the same time. So that they could do this tagging trick of alternating these types of cells and encoding messages back and forth between the exit and their guard, which they also controlled. Other problems with Tor are correlation attacks. So Tor aims to be low latency, meaning that you should be able to use it for like SSH web traffic, chat, et cetera. But we don't do any sort of like timing delays at each node. There's certain timing that happens inherently. Relays when they are sending out packets, they basically just have a ring that they go around in and they have connections coming off of this ring and they're just taking one packet at a time and going around in circles. So the more things that are coming in from the other connections, the more delay there might be in getting something queued, but effectively there's no real randomization or reordering of the packet or delay on the packets. So this means that timing and correlation attacks are still theoretically possible. We've never seen any particularly accurate ones as I said, it's still difficult. And then also the distances between the relays and the fact that the clients choosing them means that you might be traveling from Russia to the US to Chile and then like going to a website in Iceland or something like that. And so because of all the timing delays that can inherently happen as they're going through all these crazy paths all over the world, it becomes a little bit difficult to actually pull off timing correlation attacks. In general, the way that it would work is if you think of Tor on a whole as just like one giant proxy, you would basically look at the timing of requests going in and out. So for example, one way that it would work is if I used my Tor browser and I went to some malicious website that's going to do something weird in the timing of its responses or in the timing of packets that it's sending back to me that would possibly survive going through the Tor network if it did a weird enough pattern, recognizable enough pattern. Tor also currently doesn't really do any padding. We have some patches coming up. So NetFlow records are taken in usually at large routers like at ISP level and in the backbone. And so NetFlow is this program which is basically logging what is connecting to what and what's like on a very like high level, like just this IP to that IP and here's a timestamp. NetFlow in its default configuration, we figured out has some time out. It's I think two minutes or something like that. But since Tor is only changing circuits roughly every 10 minutes, this would mean that you would get usually each person's circuit several times and you would get perhaps information about the amount of time that they were using that circuit because it changes roughly every 10 minutes but if you continue using the circuit you'll keep using the same one. So because these records were happening every two minutes you would get some extra information about oh they actually requested one site, they used the circuit and then they stopped using it. They didn't do anything else. Or they made the circuit, they started doing something with it and then they continued doing it. They kept their SSH connection open for an hour or something. And so because of this, we decided to start padding at the TLS connection layer not at the circuit layer. So also there's TLS connections outside of each of these telescope things. And it's just open SSL like pretty, somewhat standard TLS connections. So we started doing padding at the TLS layer in order to decrease the resolution of the timing correlation attacks in net flow records. These patches are going into the next version. So 028, which we should have just released. There are plans to implement adaptive padding techniques. There's a very amazingly titled paper called WTF pad, which stands for website traffic fingerprinting. They kept wanting to change it to something more academic sounding. But since there are some tore people on the paper we kept changing it back because this title is amusing. And so this is like Claudia Diaz and one of her students have devised a method for using what's called adaptive padding techniques in order to pad towards circuit level traffic. This would remove the ability, for example, the previous example of using Wikipedia and being able to tell like how many images are being requested. This would completely remove anyone's ability to do that because there would be extra padding on the connection. They also came up with a pretty clever method for reducing the amount of overhead that the padding requires by basically sending two histograms between the client and the relay that it's going to pad to. One histogram is for delay and one is for size of padding or size of padding and or whether or not the padding occurs. So that paper recently came out. I think it's still in submission to somewhere but we've started working on patches for that and that should also be included quite soon. So as far as whether or not tore is broken, there was a slide that was leaked but Snowden that was called Tor Stinks which is pretty amazing and we all had like a really great laugh about it. It's not so scary. We will never be able to de-anonymize all Tor users all the time. Like, okay, how about some of them? And it does say we can de-anonymize a very small fraction of Tor users. However, no success de-anonymizing a user in response to a TOPI request on demand. The way that we read this is that the NSA is probably running some relays and they're probably doing the same civil attack that I mentioned that CMU was doing and basically just waiting and hoping that eventually some user is using both an entry node run by the NSA and an exit node which is quite unfortunate for those users. There's not much but we don't know of any ways so far to defend against this. However, it is good that they can't just randomly select people they don't like and try to de-anonymize them. I mean, the client is selecting its path through the network so if you're unlucky enough to select these two NSA nodes, wherever they are, that sucks. We don't know what to do about it. Correlation attacks, if the time span becomes bigger, it helps the attacker, right? It literally, and for all these correlation attacks, it looks at the entry point and exit point. Usually, yes. Right now, your entry node, which we also sometimes call a guard node, you usually choose just one and you stick with it for a very long time. The idea there, which is not my idea, is that rather than choosing a random one each time, which I also don't think you should do, but rather than choosing a random one each time, or let's say, if you choose a random one each time, some percentage of your time, for sure, you will accidentally choose the NSA one. And so the idea is you better just roll the dice once and hope that you roll the dice correctly that one time because you're gonna stick with that dice roll for, I think the current default is six months or something. As I said, I don't exactly agree with the reasoning there. I'm changing the secret of exit nodes so that probably won't be anything. Right, and that's the idea there, that it should be quite hard for them to pull off this attack because, one, it requires you rolling the dice wrong at first, or when you first connect to network, you're rolling the dice wrong, and then also with some probably tiny probability, you're accidentally occasionally choosing an exit node that's bad, or that is in collusion with your guard node. As I said, our assumption is that they meant that they're able to do the same thing as CMU did, running a Sibyl and controlling both. Not so secretly. And they can take the slide and do whatever they want with it. We also think the NSA stinks. So Tor is also used for censorship circumvention, which is part of what I work on. This is sort of an accident. Obviously, you can't see what website a user is requesting. So this enables users to anonymously request things and this enables you to get past firewalls. For example, the Great Firewall of China, which might be looking for, or is looking for connections to certain sites and IPs that it has deemed inappropriate. This has been used pretty famously in China, Iran, Kazakhstan, Ethiopia, although Ethiopia for a while blocked all TLS connections entirely, so there's also blocked Tor. We're not sure if they blocked Tor on purpose or if they just hate TLS, which I mean, it's fair to hate TLS. It's not very nice protocols, I think. But hating encryption altogether is probably bad. Also, Turkey has been doing it mostly during protests, just kind of a smarter thing to do from a political standpoint, because if you're doing it during a protest, your citizens are probably already pissed off at you anyway and that's the time when you want to enact a policy where they're not able to communicate with each other in order to coalesce their anger and actually do something about it. And then in other times, you leave the network untouched so that they're more inclined to be less angry at you. Tor can also be used to circumvent country filters. For example, I live in Germany and if you don't use Tor, it's frequently the case that you're trying to go to some YouTube video that your friend's telling you about, you're sitting in a cafe and you're like, oh yeah, I'll totally watch this and then it's just this big, really ugly thing that's like GEMA and some stuff in German. And you're like, no, I wanted my video. But obviously if you use Tor, there's a lot of exit nos in Germany, so you might still be unfortunate and get this ugly GEMA thing on YouTube, but you can pretty trivially get around this YouTube GUIP-based blocking. Censorship of Tor turns out to also be quite easy. Basically, since any client can attain a list of Tor nodes from the directory authorities, you just take that list, pull all the IP addresses out of them, block all of them. Other ways that it's been blocked is just to simply block the directory authorities so that the other clients can't get the list of relays in the network. One of, the solution that we originally came up with was to use Tor bridges, which are just hidden entrances to the network. These turn out to still be quite blockable. For example, the first way in which they were blocked was by the Cypher Suite list that we are sending, which was a rather strange one that it turned out almost nothing else on the internet was sending. We've since changed that, but even at the time there was various tricks you could do to get around this. For example, TLS includes a manner in which the Cypher Suite can actually be split up into two separate packets. And if you did this, it wouldn't catch, like the thing wasn't very stateful, so it wouldn't be able to reassemble these packets and determine yes, this is actually like the really unique Tor Cypher Suite list. So there was like a tool called Bridge Guard for a while that did this. But then of course we changed the Cypher Suite and then they changed their manner of detection. Bridge IP addresses need to be distributed out of band. Sometimes also other connection information like the fingerprint and if there are any additional keys. This is upping the game at this point to where the adversary needs to be doing some sort of stateful DPI or like active probing of bridges. For active probing, for example, you could basically just run a Tor client, which is what the Great Firewall does. They're obviously not running our Tor client. They're running their own implementation, which does some curious things and speaks Tor a little bit differently. And it's kind of fun to mess around with it sometimes. And it also, for example, will start doing a Tor handshake up into a certain step and then just stop is the biggest difference. That's probably to save resources on their firewall, I would assume. Bridges are distributed via a very terrible centralized system called BridgeDB, which is one of the things I work on. You can get bridges by visiting currently a website or an email address, it's not my design. I'm currently doing things to redesign this and make it better and make it more censorship-resistance, because obviously in this case, you could just block Tor's website and okay, none of this works anymore, so there are better ways to do this. At some point, all the bridges were getting blocked and we're changing the Cypher Suite list and we're trying to make Tor more indistinguishable. We started to have a discussion about how we make Tor look different. One of the first ideas was, we'll just try to look like Apache too. We'll try to make our TLS connections look exactly like a normal Apache server on the internet. This is totally not possible. There's no way that a normal Tor connection can ever be made to look like a user interacting with a real Apache server, even if we just change the encryption to exactly match Apache's usage of TLS that's still gonna be distinguishers. So that was argued about for quite a long time about a year and then we had a student who came along, my colleague George. He was, he did this, there's a Google Summer of Code program which we sometimes enroll in and went to enroll in again if any of you guys have projects that you would like to work on for or with Tor. So he was one of our GSOC students and he came up with this idea for creating these pluggable obfuscating proxies which could go in between other protocols and obfuscate the distinguishers of those protocols. The very first, the very first one was called obfusc proxy two. I don't know why we started with the number two, but we did. And this one basically just made a new encryption key and then re-encrypted all the traffic coming through it. So before the tour traffic would go out, it would go through this additional proxy that was connected with SOX. Everything would get re-encrypted and then go out over the network. And then at the other end on the bridge, there's again this this obfusc two proxy listening and it would decrypt everything and then pass along the underlying tour traffic to Tor. George's original idea was that we should try to make transports which could make protocols look like other protocols. So for example, making tour traffic look like Skype or making it look like HTTP or making it look like SSH traffic or whatever. That also turns out to be rather hard to do. A protocol mimicking is quite difficult. And usually there will still be distinguishers. Obfusc two also kind of did a silly thing since it was a proof of concept. It just sent the original key that the traffic was gonna be encrypted with in the clear across to the bridge. It was just like, yolo, here's a key. Have fun decrypting on the other side. So the purpose wasn't actually to add any extra security through the encryption. It was just to do this bitwise unlinkability that was mentioned with Chomps original mixed network which actually makes plugable transports pretty fun to work on because you're not actually using encryption for security. So you can play with all kinds of weird crypto and you can just kind of pile things together and as long as it's authenticated and as long as it's encrypted and also bridge DB has to do all the hard work of making sure that these things that are needed for authentication actually get to the clients. You can basically do whatever you want. So it would also be a project if for students, if any of you want to enroll in GSOC or want to start helping out with Tor. I think that's all. If any of you have any questions, I could answer them. I can also attempt to get my new cubes laptop to play this juice wrap news video. Which is probably more entertaining than questions. I will attempt to do that. I'm pretty sure it's not going to work because I forgot to transfer the file to that computer. Sorry. You guys should probably all go watch that video. It's really good. They also had a live performance last week. It was great. That's all. Questions? Solve it? Or how do we resolve it? The way that DNS requests work is that you construct a circuit and then you basically ask your exit node to resolve the DNS for you. And it sends the response back and then you're still going to use the same circuit for sending any requests to that resolved IP address. However we also, optionally depending on how configured, it can cache DNS. So if you are able to hit TORS DNS cache and you already have the IP, it'll just use a brand new circuit that hasn't been used for any other DNS results. We currently only support some types of records which is a little bit annoying. Like A, quadruple A, MX, no we don't support MX I think. Serve records. I think that's it. We would like to add more types. Just for resolves or for the cache? No, I had an overture.