 Okay, so I'm from to Darmstadt and this is joint work with Martin Stan from the University of Bristol. This talk relates to recent work that we did where we put forward the new security notion for onion encryption and we used it to analyze recent proposal for a new onion encryption scheme to be used in Tor. So I'll start this talk with an overview of how Tor works. Then I'll describe a class of attacks called tagging attacks. Then proposal to this one, which is a proposal to try to prevent these tagging attacks. And then I'll give a very high level view of our contribution, which is the security definitions that we propose and some results from analysis. Okay, so as most of you know, Tor is a tool for maintaining anonymity online. And the way it works is that there is a Tor network composed of several nodes called onion routers and the user who wants to use Tor will learn a piece of software called an onion proxy and let's say this user wants to connect to XYZ.com. Then the following sequence of events will occur, which would invoke four different sub-protocols which are the main four components of Tor. The first one is the link protocol, which is essentially TLS, and this is used to secure the communication, the point-to-point communication between the several onion routers, and all communications occur over TLS. Then the first thing the onion proxy will do is to pick a subset of these nodes. The default number is three, and here they're marked in blue. And through these nodes it will establish a circuit. So the first thing it does towards the end, it makes a connection, a TLS connection to one of these nodes and then invokes the circuit extent protocol, which will run a series of key exchanges with each node in the circuit. And now that the onion proxy shares a symmetric key with each of the other nodes in the circuit, onion encryption can be used to protect all the traffic flowing through the circuit, and this part is handled by the relay protocol. Then on top of the relay protocol is the stream protocol, which can be used, for example, by the onion proxy to instruct the last node in the circuit to connect to XYZ.com, so for example, an instruction to connect, to establish the HTTPS connection to XYZ.com. And the stream protocol also serves to multiple streams over the circuit. Our focus, however, will be on onion encryption, and hands will focus only on the relay protocol. So here's how the process is data. All data has to be encapsulated into cells of a fixed size, which are 514 bytes, and the cell is composed of a cell header and a cell payload. The cell header includes a circuit identifier, which indicates to which circuit the cell is associated to, and the circuit identifier is different on every edge of the circuit. Then there's a single byte command field, which indicates the type of payload, or the type of data being carried in the payload. And for the purposes of the relay protocol, this will always be fixed to either of site relay or relay early. Then the data in the payload, or you can see the colors here, but the data in the payload is also prepended with a number of fields. The most relevant in our case are the recognized field, which is two bytes long and is set to all zeros. And the digest field, which is essentially a seeded running hash over the whole data, computed using SHA-1 and then truncated to four bytes. Then once the encoding is done, multiple layers of encryption are applied to the payload using AS in counter mode. One layer for each node in the circuit. And once the encryption is done, now the cell is ready to be transmitted over to the circuit, where every node in the circuit will strip one layer of encryption, replace the circuit identifier, and forward it to the next node in the circuit. Also note here that only the last node in the circuit can verify the digest. And as such, the integrity is only provided as end-to-end integrity. Okay, so that's all about Tor. Let me now describe an attack that can be used to break anonymity in Tor. So yeah, so we saw that our user now established a circuit, but let's assume now that we have an adversary who controls a subset of the onion routers in the network, marked in red here. And in this case, the adversary happens to control the first and last onion router in the circuit, over one and over three. And this allows him to mount the following attack. Essentially, in over one, it flips a bit in the cell and forward it over, and then in over three, it will flip back that bit and check whether the encryption works. Now because of the well-known reliability of counter mode encryption, these two bit flips will cancel out of each other. And the plain text and the payload and the cell payload will be restored to its original value, integrity will go through, and encryption will succeed. And at that point, the adversary has confirmed that the two edges incident at over one and over three belong to the same circuit. And because over one can see who the user is, and over three can see what the destination is, the user has been completely anonymized. One thing to note here is the similarity that this attack shares with another type of attack called a traffic correlation attack. And where essentially the same effect is achieved by matching traffic patterns in a passive way, as opposed to this technique which involves flipping bits in the cell manipulating cells. Now what's interesting about tagging attacks is how our view of these attacks changed over the years. Tagging attacks were known at least since 2004. And in particular, they were known to the Tor designers, as you can see from this quote, from the original Tor design paper. Now Tor was designed to be a low latency system, which also means that it will be inevitably vulnerable to traffic correlation attacks. This was a compromise that the Tor designers made, trading security for performance, what it was a conscious decision. Now because of the similarity between tagging attacks and traffic correlation attacks, the designers also deemed it pointless to use heavyweight cryptography in order to protect against tagging attacks when the same effect could be achieved through traffic correlation attacks. So they took also the conscious decision not to protect against tagging attacks. Then in 2009, tagging attacks were discovered by Foo and Ling in a paper that was presented at Black Hat. But this did not have much influence on the Tor project, since they knew about this already. Interestingly enough, what changed the mind of the Tor project was an anonymous post by someone calling themselves the 23rd Raccoon on a mailing list, on the Tor developer mailing list. And this post was a probabilistic analysis arguing that tagging attacks can actually be more dangerous than traffic correlation attacks. The same person had posted a nice post on the mailing list back in 2008, which made a very nice point about the base rate fallacy and the context of traffic correlation attacks. And although this did not make an mention of tagging attacks, it implicitly makes an argument for the higher severity of tagging attacks when compared to traffic correlation attacks. So then in 2012, the Tor project made a U-turn. They decided to revise the relay protocol and aim to protect against tagging attacks. Now let me give you some idea of the arguments being made by the 23rd Raccoon. Consider the following example. Consider a network with 10,000 concurrent circuits and a traffic correlation adversary who controls 30% of the nodes in the network. Now due to noise, a correlation detector will inevitably exhibit some false positives, meaning that two edges that belong to different circuits will be declared to belong to the same circuit, when in fact they don't. But this will only happen in a small probability. So let's assume this happens with 0.5% probability. Now the question I want to ask you is if the detector detects a match between two edges, what is the probability that these two edges truly belong to the same circuit? And the answer turns out to be only 2%. This may seem counterintuitive, and this is because of a well-known effect called the base rate fallacy in statistics, but it's a very simple effect. Although there is a small chance of error, which is only 0.5%, this test gets done multiple times, a time for every pair of possible edges. So these little errors accumulate overall and give up a big chance of error overall. So here in this case, one out of every 50 matches is actually incorrect. I'll say only one of every 50 is correct. So for every edge, there's 50 possible candidates. For every destination, there's 50 possible users. So you can see the traffic correlations are not as bad as we thought originally for Tor, for a low latency system like Tor. What's interesting now is that this effect becomes more pronounced as the network size increases. And on the other hand, tagging attacks, because they have a zero chance of false positives, they are immune to this effect and hence scale much better from the adversary's perspective. Now this is the main argument made in the 2008 post. The 2012 post builds on these ideas and also argues for a notification effect, which makes tagging attacks consume less resources from the point of view of the adversary. But you can check the post for more details from that. Okay, so let's now look at ways in which we can protect against tagging attacks. And the most concrete proposal that has been made by the Tor project is Proposal 261, which was drafted by Nick Matheson. And here's the main ideas. So as I said already, what enables tagging attacks is this malleability of counter mode. So the obvious solution would be to try to stop this malleability. And the first thing that comes to mind in order to do this is to apply a MAC tag at every layer of encryption. And this may help to prevent one problem, but it opens another one because it leaks some unwanted information. So essentially the number of MAC tags present in the cell indicate the relative position of the end in router with respect to the circuit. And they also leak the overall size of the circuit. So this is, we would like to avoid this ideally. This could be avoided by appropriate padding and ensuring that the cell size is constant throughout the lifetime of the cell. But this has a significant impact on the bandwidth efficiency of the overall system. More bandwidth conservative approach is to use instead a tweakable wide block cipher. What this means is that now the whole encryption will behave like random permutation or rather a family of random permutations. And there are a number of possible instantiations that are being considered, as far as I know. One of them is AEZ by Huang, Crovettes and Rogovain. Another one is still in progress, I think, and it's called HHFHFH or sometimes abbreviated to Hufflepuff. This is a design by them, Bernstein. And the other one is Farfalle, which is by the Katchak team, and it's based on sponges. So I think all of these are still in play for being decided for this proposal. So what changes with proposal 261? In terms of encoding the data, there's not much change. The only difference is that now the digest will be set to all zeros instead. Then encryption is replaced with the tweakable wide block cipher, but now the tweakable wide block cipher takes an additional input, which is the tweak. Now, there's a lot going on on how these tweaks are computed, but for the purpose of this talk, it suffices to say that each layer uses a different tweak, and the tweaks are updated to every cell. For us, it will be relevant that the command field and the cell header will be included in this tweak. And we'll see later why. Now, an integrity end-to-end integrity is achieved via an encoded and cipher approach, which means that upon decryption, the last node in the circuit will then go to verify the recognized field, the digest field, and the seven most significant bits of the length field, and check that these are all set to zero. And this is the way, this works because of this tweakable wide block cipher, which behaves like a random permutation. Okay, so that's the main idea and the proposal. So now let's try to be a bit more rigorous and see whether this really works and whether we can prove something about the security of the scheme. So there have been prior works which looked at onion encryption and that formalizing security for onion encryption. Most, the first one was the work of combination design skia from 2005. However, this focused mostly on the setting of mixnets. And these are quite different from the way Tor works. Essentially, cells in mixnets are routed individually and not according to predefined circuits. Onion routers are stateless and the onion encryption is publicly ear than the symmetric key. Roughly speaking, Tor is to mixnets what a secure channel is to publicly encryption. And we know that the latter two are quite different. The other notable work is the work of Bakkes et al from 2012. And this one did focus on the case of Tor. And they even covered not just the onion encryption, but also the certificate extent protocol. But this work had a number of shortcomings. And the most prominent shortcoming of these is that it does not protect against tagging attacks. So the security notion does not guarantee security against tagging attacks. Yeah, on the contrary, and this work, like this vulnerability was turned into a feature called predictable maliability, which actually guarantees that a scheme will be vulnerable to this attack. So as you can see, there is no adequate security model for onion encryption as used in Tor. And this is what we try to fix. So let's step back for a minute and ask ourselves, like, what should we expect? What kind of security should we expect from an onion encryption scheme? It is natural to expect the usual security offered by a secure channel, essentially confidentiality, integrity, and protection against CPE and reordering. But the main goal of Tor is anonymity. However, anonymity in Tor is achieved through a combination of factors, some cryptographic and some not. So the cryptographic component is essentially an encryption, but anonymity also relies on the size of the network and the traffic load in the network. And as such, anonymity is not the right notion for onion encryption. So our goal here was also to identify what security can the cryptographic component that is onion encryption contribute towards anonymity and assuming all other factors to be ideal. And we contend that the answer to this question is circuit hiding, which is the notion that we put forward. At a very intuitive level, circuit hiding says the following. It says that an adversary should not be able to learn any new information about the circuits topology in the network beyond what it inevitably leaks or beyond what is inevitably leaked through node corruptions. And this should hold even when the adversary can choose the messages that get encrypted and is able to reorder, inject, and manipulate cells on the network. And note how tagging attacks are a special case of this broader class of attacks. An attacking attack an adversary is able to confirm a circuit and thereby learn topological information about the circuits. And he does so manipulating cells in the network. Okay. So here's how we define circuit hiding more formally. So we defined as a game played by an adversary. The adversary starts by specifying a set of nodes and it then indicates which of these nodes are corrupted and which ones are controlled by him. Then for this set of nodes, it will specify two possible networks consisting of a set of circuits. Yeah. Okay. You can see the colors. So in the two networks, the circuits can vary in different ways, both in the structure, how they connect and their length, for example, you can see there's a lot of variation here, but they must satisfy the following requirement. The interface between the circuits, so the interface to the corrupted nodes, that is the edges entering the corrupted nodes, the edges going out of the corrupted nodes. And the way these edges connect to each other should be identical in both worlds. And this is because this is the, what we called before the inevitable leakage, right? This is the information that is inevitably leaked to the adversary, simply because he controls that nodes. So because it's inevitable, we require to be the same in both networks. And then if the two networks satisfy this condition, a network is chosen at random and the adversary gets to interact with it via the corrupted nodes and try to determine with which network it is interacting. And then a scheme is deemed to be secure, if no adversary can distinguish between the two. This is a very high level. There's much more going on in circuit hiding and the actual definition, but the main idea is this. Okay, so we use this to try to analyze Proposal 261 and unfortunately we found out that Proposal 261 is not circuit hiding. The reason for this is that there's a simple attack that can be mounted on Proposal 261, which relates to the command field in the header and the cell header. Remember that the cell header can take two values, either relay or relay early, and this can be used to attack the cell, not in the ciphertext, but in the header, right? And it's a real concern because a similar vulnerability was, no, sorry, a similar vulnerability was exploited in 2014, in the 2014 incident involving Carnegie Mellon, where there was an attack against Torz onion services, which essentially resulted in the takedown of the website Silk Road. And recall that the CMD field was included in the tweak that was input to the white box cipher and while this helps, this seemed to be an attempt to try to prevent this and while this helps, it does not prevent the attack. The good news, however, is that there's a number of factors or practical factors that limit the exploitability of this attack. This relay early cell type is needed because it enables another mechanism in Tor which is used to limit the maximum size of a circuit. And as such, there is no easy way to fix this without changing this mechanism. As such, it may make sense in practice to accept this issue and rely on the other mitigating factors rather than try to eliminate it completely, as long as we know that this is the only issue in Tor 261. And the good news is that this is also the case. So we looked at a variant of proposal 261 where we fixed the command field, fixed to relay to a single value and then we were able to show that the circuit hiding. So this serves to show that the overall design in 261 is sound and is able to protect against tagging attacks. And for the simple attack that can be used on the header, we can rely on these other mitigating factors. Okay, so some concluding remarks. If you want to know more about this, look out on E-Print for our paper which should be coming up soon. This is the title to look for. I think there's plenty of more work to be done on form analysis of Tor. Unfortunately, TLS takes all the attention, but Tor on the other hand is like the main tool for anonymity. One, the first obvious thing to do would be also to look at the circuit extent protocol which we did not look in our work. And I think there's also some interesting possibility of trying to understand better the observations made by Eventador-Dracun and trying to validate these more empirically. That's it and thank you very much. Please go ahead. The original slide, the original tagging attack. So the input node flips a bit, but then if the output node is not compromised, it doesn't flip the bit, the Mac doesn't check, so it knows that somebody's attacking the network. So how come this doesn't eventually lead to the compromised networks being exposed? No, so the attack from the perspective of the adversary only works if it controls the exit node and depends on the probability. So I think your argument is that then the attack will be detected by some other node and so what's the issue? And this is a very important point because it's often made to say that tagging attacks are weaker than traffic correlation attacks, but this is not really clear. Detection does not mean that you can identify who is running the attack and you would not be able to stop and I think in order to do that, you are likely to de-anonymize some users because you will get some onion routers to talk to each other. So it really is not a viable solution I think and we confirm this a bit by talking to some people from the Torr project and they don't seem to have a solution based on detection. Yeah, you're right, I hadn't thought about it. Actually compromising that, but thank you. And also like in that 2014 case, that attack went unnoticed for months. All right, we have time for another question. Anybody up there? No, then I guess I'll ask a question. First I'll make a comment. So actually Nick Matheson gave a talk at RBC like several years ago and asked explicitly for more analysis. So it's great to see this followup work. It still seems like you can still get one bit of information by corrupting the ciphertext right to the first node and then you would detect that it failed, you just wouldn't be able to keep the circuit going, right? So does your model handle that? You're saying that there's always an inevitable attack that you can always corrupt the cell and it will take down the... Yeah, that's something I think that can be protected against. Like every scheme will be vulnerable to that attack as long as there's end-to-end integrity. So our security notion filters that out and anything beyond that would be considered a tag-and-attack. Fantastic. Okay, let's thank John Paul again. Thank you. Thank you.