 Hello and welcome to the sixth lecture. By now you've seen a lot of the basics of Bitcoin, how the system works, how mining works, and how to use Bitcoin as a currency. Now let's get to what has been one of the most controversial aspects of Bitcoin, which is the anonymity properties of Bitcoin. And in fact, there's a lot about Bitcoin and anonymity that you'll hear different opinions on. Is Bitcoin anonymous, first of all? Are anonymous cryptocurrencies even a good thing? Is it good for people who have a stake in Bitcoin? Is it good for society? And what are the various, various proposals that have been made to improve Bitcoin's anonymity? How well do those work? Which of those should we adopt? And so on. So in this lecture, what we're going to do is help cut through all of that confusion. And we're going to discuss where things are, what are the options, and where things seem to be going. So let's start like this. Let's start with a basic understanding of what we even mean when we say anonymity in Bitcoin and some of the overall concepts like how does anonymity tie into privacy? Is that a good thing or a bad thing? Can we only have the good aspects of anonymity without the bad? A variety of questions like that. And then we'll see a variety of proposals, some already existing and some that may be implemented someday, for improving Bitcoin's anonymity or creating different anonymous cryptocurrencies altogether. And what's interesting about them is that they offer a variety of increasing levels of cryptographic sophistication as we go down this list and we'll learn to see what the trade-offs are and analyze the anonymity properties, how deployable these are, and so on. All right, let's get started. If you look online, you'll see there are a number of people in groups saying that Bitcoin is anonymous. There's no shortage of opinions on this. Let me just pull out one quote in particular. This is the WikiLeaks donation page. It says, in plain and simple terms, Bitcoin is a secure and anonymous currency. Is that actually true? Well, you'll also find a variety of opinions to the contrary. Again, I'm just pulling out one example. This is the wired UK saying Bitcoin won't hide you from the NSA's prying eyes. So how can we resolve this confusion? Let's look at what the word anonymous means. At quite a literal level, anonymous means without a name. And so what does that mean exactly? Well, there's two ways to interpret it. We know that in Bitcoin, addresses are public keys. You don't need to put in your real name in order to interact with the system or public key hashes instead of real identities. But we can interpret this property of being without a name in two different ways. We can interpret it as interacting without your real name, or we can interpret it as interacting without any name at all. Now, if you interpret it as interacting without your real name, then certainly Bitcoin is anonymous in that sense. But we do have these public key hashes that act as some sort of pseudo identities. And so when computer scientists look at the situation, they don't use the term anonymous to describe this. They call this pseudonymity. And there's a very clear difference between the two. And it's an important one, and we'll see why in a second. You might wonder, yeah, even though you're using a pseudonym, which is your public key hash, you can create any number of them. You can have as many pseudonyms as you want. Does that make it anonymous? Well, the answer is not quite, and we'll get into that as well. Okay, so if computer scientists call the pseudonymity, what is anonymity then? Is there a clear definition of what it would take for something to be called anonymous? At a conceptual level, the answer is very simple. Anonymity in computer science is just pseudonymity together with unlinkability. So what is this property called unlinkability? At an intuitive level, we'll get into better definitions in a little bit. But at an intuitive level, what unlinkability means is that as a user interacts with the system repeatedly, these different interactions should not be able to be tied to each other from the point of view of some adversary. So you have to be talking about a specific adversary for this to even make sense. Now, this distinction here between full anonymity and mere pseudonymity is something that you might be familiar with from a variety of other contexts. And one good way that I like to explain this is to look at online forums. And again here, the distinction between a mere pseudonymous interaction and anonymous interaction comes up in different forums. And Reddit is a good example of a forum where you pick a long-term pseudonym and interact over a period of time with that pseudonym. You could create different pseudonyms, but it's going to be practically infeasible to create a new pseudonym every single time you want to post a comment, and it's not even very meaningful. So Reddit offers pseudonymous interaction. The opposite of that, fully anonymous interaction where you can make posts with no attribution at all is the model that you typically have in 4chan. And there's a similar difference in Bitcoin as well, and Bitcoin is in the pseudonymous model more than the anonymous model. Okay, but let's talk about why this difference is important in Bitcoin. Why is mere pseudonymity not sufficient if you want privacy? After all, if you have pseudonymity, it seems like even if somebody can create a pseudonymous profile of all of your interactions on the system, they can't tie it back to your real identity. Well, here's the answer to that. It turns out that if you have this pseudonymous profile, it's pretty fragile. It's very easy for it to get linked back to your real identity at some point. And if that happens at any point, then of course, all of your transactions past, present, and future have been linked to your identity. So here are a couple of different ways in which that can happen. One is that a variety of Bitcoin businesses, online wallet services, exchanges and others, even vendors in a lot of cases, are going to want to your real life identity in order to let you transact with them. Consider this analogy. You go to a coffee shop, you pay for your coffee with Bitcoins. And of course, if you're there in the store, then the person who's giving you your coffee sort of knows who you are, even if they don't actually ask for your real name. And so your physical identity does get tied to one of your Bitcoin transactions. And if that Bitcoin transaction then gets tied to all of your Bitcoin transactions, then that is a complete violation of anonymity. So this notion of a pseudonymous profile is very fragile. It could easily get compromised in a variety of ways. And also, even if such a direct linkage doesn't happen, these linked profiles can be de-anonymized due to side channels. What do I mean by side channels? Well, here's something that I find intriguing that might seem like a tall claim, but in fact, such things have been known to happen. Maybe somebody looks at a profile of your pseudonymous Bitcoin transactions and finds that you interact at certain times of day and they're able to correlate the times of day when you're active online with the times of day when your Twitter account is posting tweets and so they're able to find a connection between your Twitter identity and your transactions on Bitcoin. Similar attacks have been known to happen. So this is why this notion of a pseudonymous profile is considered quite fragile and for real anonymity, we want the stronger notion of unlinkability. So let's try to define it in a little bit more concrete sense, what unlinkability means in the context of Bitcoin. And we can do that in a variety of different ways. One is that it should be hard to link together different addresses of the same user. Another is that it should be hard to link together different transactions made by the same user. Both of these seem intuitive. Look at this one though. It should be hard to link the sender of a payment to its recipient. This one might sound a little confusing at first because if you interpret a payment as a Bitcoin transaction, then of course that transaction has inputs and outputs. And these inputs and outputs are inevitably going to be in the blockchain publicly and linked together and so you might think that this is impossible to achieve. But if we interpret this notion of payment in a different way, not as a single direct Bitcoin transaction, but perhaps an indirect sort of payment that goes through a circuitous route of transactions, then one might imagine that the ultimate sender and the ultimate recipient of that payment might not immediately be linkable looking at the Bitcoin blockchain. So these are all somewhat more concrete but still at an intuitive level varieties of unlinkability that one might want to shoot for. But if you look at this last definition, it might still be not entirely convincing. Let's say that you pay for a particular product and it costs a certain amount of Bitcoin and then maybe you send that payment through a circuitous route of transactions, but still you might think somebody looking at the blockchain must be able to infer something specifically that Bitcoins left some address a certain number of Bitcoins and Bitcoins showed up at some other address. And these two might be slightly different because of transaction fees and so on, but roughly equal and also roughly at the same in the same time period because there can't be too much of a lag between the sending and the receiving of a payment. And so clearly even if we try to achieve this kind of unlinkability it can be unlinkability between all possible transactions but some smaller subset of transactions that look like each other. So let's make this a little bit more concrete now and this is how we quantify anonymity. We usually don't try to achieve complete unlinkability which is unlinkability among all possible transactions or addresses in the system. But instead we go for something more measured. We try to maximize the size of our anonymity set. The anonymity set is the size of the crowd of other addresses or transactions that we're trying to hide in. So if I can be reasonably sure that with respect to some adversary there are these thousand other transactions that look just like mine and the adversary can't tell which one was mine and then that we might consider to be a pretty good level of anonymity. And to calculate this anonymity set it's not trivial at all. It takes a few steps. You have to first define concretely what your adversary model is and you have to reason carefully about what that adversary knows, what they don't know, and what they cannot know. And there's no general formula for doing this. It requires carefully analyzing each protocol and system and doing it on a case-by-case basis. I want to point out that in the Bitcoin community often people carry out intuitive analyses of anonymity services for example mixing services that we're going to see later in this lecture and often they come up with ways like taint analysis. This is an intuitive way that tracks the flow between a particular sending address and a particular receiving address. And intuitively it might make a lot of sense but if we consider it from the point of view of how we actually should calculate anonymity taint analysis is not a very good measure of how much anonymity you get from a system. And the reason for that is that it assumes a particular type of attack the adversary might might carry out a rather naive attack looking directly for quantities of flow between a sending and a receiving address. And if your adversary were a little bit cleverer than that then you might carry out taint analysis and think that you have a lot of anonymity in a certain situation but in fact you might not. So the bottom line from this slide is that quantifying anonymity must be done in terms of the anonymity set and in some cases probability distributions on top of that anonymity set and it requires a careful analysis of the protocol in the system. You can't apply a simple formula. Okay let's switch gears a little bit and talk about the ethics of anonymity. Why do people want anonymity? We've already seen a little bit the connection between anonymity and privacy but let's make that very concrete. Now in blockchain based currencies because all transactions are recorded on the ledger they're totally and publicly and permanently traceable. And so if your identity ever gets linked to these transactions you're in a situation where your privacy level is much worse than you get with traditional banking. Why? Because anybody might be able to carry out this type of denonimization attack not specifically a company or a government that you might be worried about. Any member of the public and your transactions since they're permanent your loss of anonymity years down the line could affect all your transactions today and vice versa. So we really want anonymity to even get the privacy level of cryptocurrencies to the level that we enjoy with the traditional system but also people hope that it can give us a new level of privacy. Of course we have to acknowledge the concerns as well and one of the major concerns is money laundering and all of the bad things that that can enable. So let's talk about that. This is definitely a legitimate worry. I wouldn't be in favor of studying anonymity and cryptocurrencies and ignoring the ethical aspects and saying oh that's not something I'm going to worry about. I'm only interested in the technology. I think it's important to consider the ethical aspects. There's one item of comfort that I will offer though. If you look at how things stand currently in bitcoin the difficulty of things like money laundering is not necessarily because the blockchain is not so anonymous and so it's easy to trace flows but instead the difficulty stems much more from the fact that moving large flows into and out of the currency rather than within bitcoin is what is really hard. In other words cashing out is hard and so anti-money laundering efforts have great promise if they're focused in this part of the system. And the good news is that all of these attempts to improve anonymity in bitcoin don't affect this part of the equation in any way and so I would recommend that bitcoin researchers and developers coordinate efforts with anti-money laundering efforts by law enforcement and others so that the technical aspect of bitcoin anonymity can be relatively separate from law enforcement and legal aspects and so on. Nevertheless one could try to ask can't we design the technology in such a way that only the good uses of bitcoin anonymity are allowed and the bad uses are somehow permitted? Well this turns out to be a quite common conundrum in computer security and privacy in a lot of scenarios we want something like this but it never turns out to be possible why because these different uses that we're talking about that we perceive as being very different morally are going to be almost identical technologically. And if we want to encode some sort of moral rules into the technical rules of the system that are going to be automatically enforced by miners it's not even clear how to do that. And so hence my recommendation of separating out the technical anonymity properties of the system with the legal principles that we put on top of it in terms of how people use that currency. It's not a completely satisfactory solution but it's perhaps the best way we have of trading off the good with the bad. I do want to point out that this is far from the first time that we're considering this dilemma. It's come up in the context of Tor, an anonymous communication network and anonymous communication enables bad actions at least as much as anonymous moving of funds does and so Tor has really had to grapple with this problem. In a very simple and single picture Tor is a communication network that routes messages between a sender and a receiver through a network of nodes but further through some clever encryption ensures that as long as at least some of the nodes in that network are honest then the adversary is not going to be able to link the sender to the receiver. So that's what Tor does and you can see how it can enable a lot of bad activities. Let's look at some activities good and bad that do happen on the Tor network it's used first of all by normal people who want to protect themselves from being tracked online by marketers or various other privacy properties online when they're browsing websites. It's used by journalists and activists and dissidents and so on and so that's clearly an important use case. It's also used by law enforcement because if they wanted to do an electronic sting operation then you want to be able to visit websites without revealing that your IP address is coming from a law enforcement block. So clearly a lot of activities that we might approve of but it's also used by botnets for example for spreading malware between nodes in the network and unfortunately there is also child pornography in the network. So distinguishing between these uses at a technical level is essentially impossible and so Tor has grappled with this issue and as a society we have grappled with it and by and large we've concluded that it's better for the world that the technology exists than it doesn't and in fact one of the main funders of Tor is the U.S. State Department. They're interested in it because Tor helps dissidents in other countries who might be fighting oppressive governments and so on and in fact recently there was a news story about the FBI having a successful string of sting operations against people using Tor for child pornography and so of course we have to remember there is a level above the technology that law enforcement can exploit a variety of ways to get to people who are using these systems for bad purposes and so it preserves a sense of balance. So let's switch gears a little bit once more. Let's look at the history of anonymous eCache. Even though with Bitcoin these questions are quite controversial and there are debates about how anonymous exactly Bitcoin is and what are the options and so on this is not the first time that we have thought about anonymous cryptocurrencies at a technical level. These efforts have quite a long history in fact all the way back in 1982 more than two decades ago cryptographer David Chom proposed something called blind signatures that helped him develop anonymous electronic cache. So what are blind signatures? Blind signatures are a two-party protocol. Two parties communicate with each other and at the end of that one party has produced a digital signature of some input without actually knowing what that input is. I know it sounds a little bit like magic but I encourage you to look it up. It's not that sophisticated at a technical level. It's quite simple to understand if you work through the details but since I'm not actually going to go into the details now let's for the moment assume that this works by magic. So assuming that we have blind signatures how can that help us achieve an electronic cache protocol? That's what David Chom did and as we go through this protocol try to see if you can spot any other flaws with it other than the anonymity properties are lack thereof. It's quite a simple protocol I'm going to show it to you in just one slide. Now imagine that there is a bank and this is a protocol for anonymous eCache through blind signatures. Imagine that there is a bank and the bank stores various things in its database in particular it stores these two tables. The first table has a mapping of users with the balance that they have in their bank account. These balances don't refer to any sort of cryptographic currency it's just a plain old number sitting in a database just like your actual bank account or PayPal or something like that. In addition it has another table called spent coins and you'll see in a moment what this means. Let's say that a user now wants to withdraw an anonymous coin from the system and now this is where the crypto magic is going to come in. So the user wishes to withdraw an anonymous coin of a standard denomination let's say that that's $1 denomination and all of these values refer to dollars so the first thing that the bank is going to do on receiving this request is deduct this user's balance that's gone down from 10 to 9 in this example. The next thing the user and the bank are going to do together is execute a two-party protocol a blind signature protocol at the end of which the user having picked a random serial number of a coin that's what's being depicted here. This is a serial number for an anonymous coin and the user was completely at liberty to pick that number she did and then they executed a protocol at the end of which the user has received a signature of this serial number but in such a way that the bank did not in fact learn the serial number the bank had no idea what number it was signing it just knew that it was some number that it signed and now this signed number represents an anonymous token this is a token that the user can pass around to another user so let's say that she wants to make a payment to another user what she'll do is send to that user not only the signed token but also the plain text value of the token of the serial number and what the receiving user will do immediately is the following she will immediately contact the bank and try to deposit this anonymous coin because without actually trying to deposit it this red user here cannot be sure that the blue user is not trying to double spend the blue user could be sending that same anonymous coin to a hundred different users how can they know that they're not being tricked into accepting a double spent coin the way they're sure is when the red user receives the coin they have to immediately contact the bank to verify if it's valid or not and only if the coin turns out to be valid will the red user proceed to complete the rest of whatever transaction she was having with the blue user so the bank now receives the message to deposit the coin and note that it now gets finally the plain text serial number as well as its own signature the bank looks at the signature verifies that it's a valid signature and here's the key thing it also verifies that the serial number that it received is not on the list of spent coins that's how it knows that this is not a double spent attempt this is the legitimate first spend of a coin that the bank signed before so it's a legitimate anonymous token and since the bank didn't see the serial number the first time around the bank does not know which user initially withdrew this anonymous coin and that's the key anonymity property in the period of time between the blue user withdrawing this coin and then perhaps much later sending it to the red user who immediately deposits the coin many other pairs of users might have deposited and withdrawn coins and the bank has no way to tell them apart so coming back to this part of the protocol the bank verifies that this is a new serial number that it's seeing for the first time it puts that serial number into its list of spent coins so that it cannot be spent anymore and adds one dollar whatever the denomination is to red's account and then sends back a message saying this is okay and now the red user has verified that they received a legitimate anonymous coin from the blue user and can now proceed to complete the transaction right so this is the entirety of a very simple anonymous electronic cash scheme and the key property here is that the bank cannot link the two users so I asked you to think about whether this has any drawbacks other than anonymity and of course the glaring thing that you probably noticed is that all of this depends upon trusting this bank I mean look at this part of the system this is simply the bank keeping numbers in its database of who owns how much money right so this is this seems to be a trust model that's very very different from the model that bitcoin operates under so a lot of the traditional cryptography research on anonymous e-cash was in this model where you were willing to trust a bank for many things including keeping your money but you were not willing to trust a bank with with anonymity you wanted to be sure that the bank didn't know who was interacting with whom okay it's a it's an interesting model it's a valid model and many such schemes were developed under this model but in retrospect it seems to have been that the decentralization problem was a much more important one to solve than the anonymity problem in order for anonymous electronic cash to become successful people were willing to accept a decentralized e-cash system with only sort of pseudonymity properties and not real anonymity and then get to work on maybe improving the anonymity instead of starting from a fully provably anonymous electronic cash system that relied on a single central authority but more generally anonymization and decentralization as we'll see repeatedly in this lecture are in conflict with each other there are at least a couple of reasons for this one is that as we saw in the last slide often for anonymity you might want to rely on certain interactive protocols with a bank in order to do some blinding which we saw in blind signatures that's where you get anonymity from so but how are you going to do that without a central bank to carry out that protocol with it's not clear but even if you got rid of this blinding and we're willing to accept just pseudonymity instead of true anonymity you still have the problem that in order to decentralize and still get security properties like resistance to double spending often the way to go is to record and trace everything in a public ledger as bitcoin does and so you might even further compromise your anonymity and privacy properties so these are two big challenges to overcome and as we'll see much later in this lecture a zero coin and zero cash are cryptographic anonymous decentralized electronic cash schemes that have some similarities to the blind signature based protocol that I showed you earlier but some of the giant challenges that they have to tackle involve these two limitations all right I said several times earlier that bitcoin is only pseudonymous and so all of your transactions or addresses could get linked together let's now go in and see how that might actually happen let's in fact start from WikiLeaks again I showed you a quote from them saying bitcoin is a secure and anonymous digital currency and this is actually the page that that was taken from this is their donations page and here you'll see that in addition to this blurb about bitcoin being secure and anonymous they have a donation address over here this is of course the hash of a public key you've seen things like this in previous lectures but they also have this interesting refresh button right next to that what do you imagine this refresh button might do well as you might expect if you click on that refresh button it'll give you an entirely new donation address let's go in and take a look at that so a totally new address popped up on the page so what is going on here what WikiLeaks is doing is it's making sure that each time a person visits the page each time a person wants to visit the page and make a donation they send that donation to a totally new public key that WikiLeaks creates just for that purpose so here WikiLeaks is taking advantage of the ability to create new pseudonyms new public keys to the maximum every single transaction that they receive they want to receive it at a new address and in fact this is the bitcoin best practice for anonymity to always receive new transactions at a fresh address so you might look at this and think surely then these different addresses must be unlinkable you receive a transaction over here and then much later you spend it by sending it to somebody else you receive another transaction at this address and then you send it to someone else over there so how might somebody link well here's the key let's let's imagine the scenario Alice a customer goes to a big box store and wants to buy a teapot so in the scenario Alice has a few bitcoins lying around with these different denominations and the store lists the teapot for a price of eight bitcoins that's a pretty expensive teapot at today's exchange rate so imagine that's a centi bitcoins or something if you like at any rate Alice has these different addresses and wants to pay for the teapot how is she going to accomplish this she doesn't actually have an address with eight bitcoins sitting in there and so what she's going to do is she's going to combine several different input transactions into a single transaction in order to pay eight bitcoins to the store so this reveals something for somebody who's looking at this transaction that gets recorded permanently in the blockchain they're going to think aha two different inputs to this transaction that could only happen because both of these input addresses are under the control of the same user they were able to use their wallet software to create a transaction that combined both of them into one so in other words shared spending as evidence of joint control of two different addresses and it doesn't stop there this is not just about linking two different addresses that are inputs to a transaction you can do that transitively and every time Alice has a whole cluster of addresses that have been linked and then she creates a new transaction that combines one of those addresses with a new address you can add this new address to the cluster so this is the first insight behind being able to link transactions together and we'll see later on that an anonymity technique called coin join works by violating exactly this assumption but if you assume that people are just using regular bitcoin wallet software not doing anything special on top of it then this technique tends to be pretty robust and this has been explored in a variety of research papers and as just a note about this lecture a lot of what we're going to be discussing today gets into the frontiers of where the research knowledge are so a lot of this the state of the art may have advanced in a few months or a few years so every time I talk about a technique that we know from a particular research paper I'll give you a reference to that paper so then you can look it up you can look up papers that cite it and you can build up that knowledge on your own now in particular one of the papers that used this technique used it for a particular purpose there was a well publicized bitcoin theft a few years ago and what they wanted to do the authors of this paper decided to see how this thief has been moving bitcoins around between multiple addresses of his own and so this is that paper in question it's called an analysis of anonymity in the bitcoin system so this is one of the first major research efforts that did what we call transaction graph analysis so you can use the techniques that I showed you in previous slides and you can draw a lot of these pretty graphs and deduce that this represents the thief moving money around between his own different addresses this is the thief sending money to someone else and various things like that I haven't yet shown you anything that allows you to link any of these clusters to real-world identity but let's defer that question for a bit let's defer that question and go back to the scenario of Alice in the teapot so let's look at it again maybe the teapot has gone up in price to 8.5 centi bitcoins so what is Alice going to do now she can't combine any subset of her transactions or her addresses to produce the exact amount of change necessary for purchasing this teapot so instead what she's going to do is exploit the fact that transactions can have any number of inputs and outputs and create a single transaction that looks like this it combines these two inputs to produce this output that goes over here and another output that goes to an address that she herself owns and this is called a change address which you saw in a previous lecture this presents a conundrum for an adversary who's looking at this the adversary might be able to deduce that these two addresses belong to the same user he might suspect that one of these addresses also belongs to that same user but has no way of knowing which one that is in this particular example the change address is a small amount but it doesn't have to be that way at all Alice might own an address that has 10 000 bitcoins and might spend a little bit on the teapot and might send most of the rest of it back to her and her own change address and these transaction outputs don't have any particular ordering in the back chain the in the blockchain that order is not meaningful at all so it's not clear what the adversary might do it's not clear how the adversary might determine which address this change in a multi output transaction so what is the adversary to do there's another pretty cool technique for this again from a research paper which i'll tell you about but the technique is this the authors call this idioms of use and they exploit idiosyncratic features of different wallet software for example one thing they found is that most wallet software use an address as a change address only once that means that this in fact seems to sort of follow bitcoin best practice for anonymity in a sense if you have a new transaction where you need to create a new change address don't use an address that you've already used before as a change address create a new address and use it for this purpose great now not all addresses that are outputs of transactions might have this property going back to the example of the big box store the store might advertise a long-term address at which it wants to receive bitcoins instead of receiving bitcoins at a different address every time so not every non-change address has this property that it's used only once as an as a change address but every change address does have that property so they use this and they found that it works pretty well on the other hand this has some limitations it's it just happens to be a feature of wallet software and so there are a lot of false positives that might creep in to these clustering techniques if you use techniques like this so it required a lot of manual intervention nevertheless they were able to use the technique that I showed you before which is clustering shared inputs together as well as a few heuristics for a change address detection and then what they were able to do is they were able to look at the entire bitcoin transaction graph and create some giant clusters that they hypothesized belongs to various major service providers and here's what that graph looks like after applying these two heuristics and this is the paper in question this is by sarah michael john and others as a whole bunch of authors of this paper now this graph looks very interesting here the sizes of these circles represent the amount of money flowing into those clusters and the number of edges going out of a cluster represent the number of transactions let's try to just stare at this for a second and see if we can guess what some of these major service providers and other clusters of nodes might be this huge one here that dominates in transaction volume compared to any other cluster given that this paper was written in 2013 we might guess that it's mount gox which was a very prominent exchange of the time at the time that later went under we might also guess that this little one here that only has a little bit of transaction volume in spite of having a very large number of transactions sort of corresponds to the profile of the gambling service satoshi dice because the way that it works is you send a tiny amount of bitcoins and you either win that bet or you lose that bet and so you might get double the bitcoins or none of the bitcoins so that's the gambling service satoshi dice we might guess that it's this one here we might guess that it's mount gox and so on but this kind of guessing is suboptimal the authors wanted some sort of reliable way of identifying what are the service providers corresponding to each of these clusters how did they do that well one idea you might have is you might think oh why not just go to the mount gox website and see what address they advertise for receiving bitcoins well that doesn't quite work because they're going to advertise a new address for every single transaction and if you just go to the website look at the address and actually don't complete that transaction you don't send bitcoins there then they're simply going to discard that address they're not going to reuse that address for another customer in other words that address will never get used you simply won't find it in the blockchain so what's the way around this well the only way to reliably infer addresses that are associated with a service provider is to actually transact with that service provider which is exactly what the authors did they they went ahead and bought a variety of things and interacted in a variety of other ways with a bunch of service providers comprising 344 transactions in all mining pools, wallet services exchanges, various merchants even gambling sites and so on and they got a bunch of cool things to show for their efforts and Mikkeljohn informs me that in fact the cupcakes were really good at any rate the authors used this very clever technique to go ahead and label the major clusters in the graph that I showed you on the previous slide and so this is what the labeled graph looks like in fact this was mount gox as we might have guessed this was satoshi dice but a lot of the others would have been very difficult to guess and by actually transacting with these services they were able to identify most of these service providers so already now we've seen something pretty interesting beyond just clustering and being able to put labels on the clusters so the next question is sure you can do these labels for these major service providers can you put labels for individuals in other words connect little clusters corresponding to individuals to their real life identities well there's at least a couple of different ways in which that can happen one is intuitively what I told you right at the beginning you could simply interact at a coffee shop or with some other merchant so they learn some transaction or some address that corresponds to you and they might use that to tag your cluster there are at least a couple of other ways in which this might happen and one is that there's a high centralization in these service providers so the intuition here is that most users in the course of normal usage of bitcoin over a period of months or years are going to interact with at least one of those major service providers that were labeled in the previous graph so if somebody wants to identify a cluster corresponding to a particular user there's a very high chance that they're going to be able to identify a transaction that ties that cluster with a known labeled cluster and then they can go to that service provider and if they have the appropriate authority subpoena that service provider or if they're a hacker try to hack into that service provider and so on and so this is one major avenue in which regular users can get de-anonymized because they eventually inevitably interact with one of these major easily identified service providers another one is simply carelessness a lot of users end up posting address information in forums they might post one of the bitcoin addresses that they own for example to receive donations when they're posting comments on forums now that might be because these users are not worried about getting de-anonymized it could also be because they don't realize that posting one of their addresses is almost going to inevitably allow somebody to connect all of their different addresses together okay so hopefully I've convinced you that there are clever ways that an attacker might utilize in order to not only link different addresses or transactions belonging to a user but go from there to real-world identity and our experience our history of these de-anonymization algorithms shows that they only get more powerful with time and more auxiliary information as we call it for attackers to utilize in order to link together to to get to users identities so this is something to worry about if you care about privacy before we look at how to make things better for anonymity let's look at a completely different way in which users can get de-anonymized so far what we've looked at is all based on what is available to the attacker in the blockchain right the part that is permanently and publicly recorded but recall that that's not the only part of bitcoin there's also a peer-to-peer network in which a lot of messages are sent around that don't necessarily get permanently recorded in the blockchain so the blockchain in networking terminology is called the application layer and the peer-to-peer network is of course the networking layer and so de-anonymization can happen at this totally different layer at the networking layer well how could that happen here's an example this was first pointed out by dan Kaminsky a few years ago in a talk at black hat here's the peer-to-peer network what he noticed is that when a node creates a transaction and wants to broadcast it it's going to connect to a lot of nodes at once and broadcast that transaction and so if a few nodes on the network put their heads together they can figure out that hey this new transaction this is the first we heard of it and all of us first heard of it from this particular node so this must be the node this must be the IP address corresponding to the user who created this transaction so here you have a linkage not between a transaction or a cluster and a real-world identity instead you have a linkage between a transaction and IP address and of course IP address is something that's very close to a real-world identity there are a lot of ways to go from there to the next level of finding identity so this is already a serious problem luckily though this is not a very hard problem to solve why because this is now a problem of communications anonymity and communicating anonymously is a problem that has received a lot of attention from the research community and as we already saw in the introduction there is a good system called Tor that you can use for communicating anonymously now there's one little caveat Tor is intended for what is called low latency activities such as web browsing where there is a large volume of flow and you don't want to sit around waiting for too long and you get the response immediately so it makes some compromises in anonymity in order to achieve low latency bitcoin is inherently a high latency system right because it takes a while for transactions to propagate through the network and especially to get confirmed in the blockchain so we don't have this low latency constraint so it's possible that we could come up with a more specific fine-tuned sort of anonymity network for this particular purpose and there are such things things called mix nets the only problem is that Tor is a system that's most widely deployed and analyzed and robust and functional today but it's possible that somebody might develop a mix net solution for anonymizing your bitcoin communications and if that happens that would be something to switch to so let's summarize what we've learned so far we've seen that based on the information in the blockchain different addresses could get linked together could also get linked to identity we've also seen that based on the information at the network layer a transaction or address could get linked to your ip address luckily this latter problem is simple to solve if you care about your anonymity and privacy when using bitcoin it's a good idea to do it through Tor but the former problem is much trickier and that's what we're going to spend the rest of this lecture talking about so there are a variety of solutions to inhibit what we've been calling transaction graph anonymization or a transaction graph analysis pardon me and the first of them is called mixing so what is mixing well the intuition behind this is very very simple it's the same intuition that comes up in a lot of context which is that if you want anonymity use an intermediary to route your communications or your funds or whatnot so let's look at what that might look like visually here's an intermediary and in a second we'll get to who these intermediaries might be but assume that there is some intermediary some service that allows users to put in bitcoins but the key property that it gives you is that after these bitcoins have been put in it forgets who put them in and treats its entire store of bitcoins as indistinguishable from each other and in fact it might further combine them all into one giant transaction or it might further mix them or split them and merge them in different ways whatever but the key property is that when users later come in to withdraw their bitcoins it's not tied to the coin that they put in they're going to get some other say randomly picked deposit that the that the intermediary received so when these three users come back they're going to withdraw these coins in a random order and so somebody looking at this in the blockchain who doesn't have the records that the intermediary might or might not store just from the publicly available information in the blockchain is not going to be able to link the ultimate input addresses to the ultimate output addresses corresponding to the same user so that's the intuition behind intermediaries now looking at this does that strike a chord have we seen in previous lectures something that offers services that are similar to this that allows you to deposit bitcoins and then withdraw them later at a later time you might recall that this is exactly what online wallets do there are services where you can just store your bitcoins online until you need them and so you might wonder well is that the solution to our problems do online wallets provide anonymity let's think about that the answer to this is not obvious but I will start by mentioning that it's taken well-known researchers by surprise here was here was a post on the New York Times bits blog reporting on a preprint of a paper released by two Israeli researchers saying that there was a link between dread pirate Roberts the pseudonymous creator of silk road which we're going to see more about and Satoshi Nakamoto this was of course very surprising but as it turned out all that had happened was that they had mistook this link that went through an intermediary and that intermediary just turned out to be mount gox which you can think of sort of as an online wallet service and so a few days later this other post was published at the same venue see if you can spot the difference they had to retract their study and I think they had made a very simple mistake of not accounting for the presence of this intermediary so it's clear that at least in some sense online wallets provide some sort of anonymity because at least somebody tried to make a connection between an input and an output address and completely failed at that so let's try to understand exactly the sense in which online wallets provide anonymity and I think a good way to do that would be to in fact contrast online wallets with the online services that exist specifically for the purpose of acting as these intermediaries for anonymity and those are going to be dedicated mixing services we'll talk about mixing services in much more detail but very briefly the two things that they promise that you won't get simply by putting your bitcoins into an online wallet and retrieving that again is that they promise not to keep records it's not just that as a side effect they sort of randomly give you bitcoins that came from some other address but they specifically say that they won't keep records and so even if they tried to they wouldn't know which bitcoins were the ones you put in and so with the high probability you're going to get some other bitcoins back and furthermore even if someone came knocking for their records or if they got hacked and so on there would be nothing to find there would be no records so that's something that a mixing service promises and the other thing is that you don't need your real life identity in order to interact with these services and this is in contrast to most of these online wallets why because online wallets are typically reputable and in fact often regulated businesses and this fact has two consequences one is that they'll typically require your identity in banking there is the know your customer principle which essentially at a technical level translates to learn the customer's identity and store those records and in fact they will keep records if they receive a deposit they will keep the link between the identity and the bitcoin address if they move money around internally they will probably keep records of all of that and just because when you withdraw your bitcoins they come from a different address does not mean that the online wallet does not know the link that link probably does exist in their records and will exist for all eternity even if they don't explicitly ask for your identity think about this to even interact with an online wallet you do need a persistent long-term identity you can't possibly use a different pseudonym every time because if you did they'd have no way of associating an account with you of knowing how many bitcoins they owed you right so because of that even if they didn't ask for your identity at the very least the online wallet knows the address of every single deposit that you made of the bitcoins that you put into the system and more importantly every single withdrawal that you made and so when you make a series of withdrawals from an online wallet and proceed to spend those bitcoins the wallet service can now connect all of those together in a profile and of course it's not just the wallet service people who care about anonymity are also worried about those records getting hacked inside or attacks somebody who has a subpoena for getting those records and so on and so forth so with respect to the wallet service itself and whoever they might be cooperating with you have no anonymity in this context on the other hand there is something cool about this if you are willing to trust them with your bitcoins then what's going to happen is you're going to keep them in the wallet service for much longer than you typically would with a mix service why because you don't trust a mix service as much you want to put in your bitcoins and you want to receive it back immediately from some other address at an address of your choosing right so unlike that for an online wallet service you're going to have a bigger anonymity set why because your anonymity set from the point of view of someone with no privileged information from the point of view of someone who's merely looking at the blockchain your withdrawal could look indistinguishable from every single withdrawal ever made from that service provider so with respect to the wallet service you have no anonymity with respect to everybody else you have a bigger anonymity set than you possibly would with using a mixing service or at least with using a single mixing service so if we look at this this looks suspiciously similar to the kind of privacy properties that you have with the traditional banking system there are these centralized intermediaries that know a lot about our transactions but from the point of view of a stranger with no privileged information we have a pretty good amount of privacy so even if this gives you some sort of anonymity it's almost at best what do you get with the traditional system and so those are not the kind of people who are typically looking for anonymity in bitcoin anyway if they were happy with the anonymity properties of the traditional system they would have probably stayed with that system and so generally people who are looking for anonymity properties in bitcoin simply do not want to accept the trust requirements that these online services online wallet services require and they don't want the sort of anonymity properties that it gives you they don't want to have to trust that service with with their anonymity and in fact we've seen that there have been a lot of closures of these exchanges and services and so there's good reason for believing that if you put all your trust in an online service you might simply lose your money okay so having rejected online wallets as an anonymity solution let's turn to these dedicated mixing services that I told you a little bit about before looking at their details let's talk about the terminology a little bit I like to call it a mix some people call it a mixer these are really the same thing some people also call them laundries I don't like this term at all and the reason for this is that it needlessly attaches moral meaning to something that's a purely technical term as we've seen earlier there are very good reasons why you might want to protect your privacy in bitcoin and use mixes for entirely good reasons for everyday privacy of course we must also acknowledge the bad uses but it seems a little bit weird to me to use the term laundry that implies that your coins are dirty and you need to clean them and attaching a negative moral value to the whole thing which and for that reason I'm not going to use that term in this lecture we'll go with the technically neutral term which is mixing so in talking about mixing there are several of us about six of us who got together researchers at Princeton Concordia and Maryland including all four of us who are doing this online lecture series and analyze the existing mix ecosystem and propose a series of changes for improving the way that mixes operate both in terms of anonymity and the trustworthiness of mixes so let's look at those principles before I show you those principles as a quick reminder at a very fundamental level how does a mix operate it asks for an address at which you want to receive bitcoins and it gives you an address to send bitcoins to the mix and then you both execute that transaction it's a swap basically in a second I'll show you what that looks like visually but what were our principles for running these mixes properly well the very first one is that you might want to use a series of mixes instead of just a single mix and this is a very well known principle using a series of routers is the same principle in the anonymous communication system tour and it's a good idea because it allows you to not have to trust a single mix but instead be sure that as long as any one of these mixes is promising to delete its records then you have a good guarantee of anonymity and in particular mixes should implement a standard API so that this can be very easy for clients to accomplish and right now this is not quite the case and this this is our paper for your reference so now let's go in and look at what a series of mixes would look like visually so here it is here is a user who starts with a coin or an input address that we assume that the adversary has managed to link to this particular user they're going to send it to the mix at this address and get back a bitcoin at this other output address that they provide they freshly generate this output address and provide that address to the mix the mix will hopefully return the same amount of bitcoins at this output address there's no way for the user to force the mix to do that the user has to trust the mix and this is as we'll see a recurring problem with the whole notion of mixes and either immediately or after a time gap it doesn't matter the user will take the bitcoin or bitcoins of whatever value they've received at this address and send it to a different mix which is hopefully not cooperating with the first mix and repeat this process over and over again so from an adversary's point of view looking at the public blockchain they're merely going to see along with all of these transactions a variety of other mixed transactions that other users are executing and well hopefully the adversary will have no way to tell apart which of those transactions correspond to this particular user and which one corresponds to some other users so that's the first principle and the second one if you think about what I've just said in order to make that possible you want to make these transactions as uniform as possible so that this linkability is minimized and what does it mean to make these transactions as uniform as possible one important consequence is that all of these mixed transactions not only from a particular mix but all of the mixes in this mix ecosystem should have the same value so we think that all mixes out there providing service should agree upon a chunk size standard chunk size and of course there can be multiple denominations but there can't be too many and you can't simply allow the users to put in whatever amount of bitcoins they wish to that wouldn't work so you need this kind of standardization in addition to this we found that there are a variety of possible attacks in which a clever adversary might infer various things not just the amount even if you remove the amount some other properties including timing for example in order to try to link users input addresses and output addresses together this type of linking can be avoided but human users if they interact with the mix are not going to be able to take into account all of those possible linking attacks so instead what needs to be done is this client side software must be automated and built into desktop wallet software so that this desktop wallet software automatically knows how to interact with these mixes in order to preserve the user's anonymity so that was our third principle our fourth principle is a subtle one now these mixes why do they provide these the service typically it's because they're a business and if they're a business they want to be paid how are they going to get paid well it turns out that pretty much the only way for these mixes to get paid is to take a cut of the transaction that the user is sending to the mix that seems a bit weird because if a mix takes a standard percentage then an adversary might be able to use that to link the input transaction and the output transaction so some current mixes try to randomize the transaction fee they might say we take a random cut between one percent and three percent we found that this is not a good idea either because if you put that through a chain of mixes then the amount of the value in the chunk is going to dwindle in a predictable way and this is an important side channel for the adversary so what is a way to avoid this we proposed that these mixed fees should be all or nothing in other words the mix should either swallow the whole chunk with a small probability or should return the whole chunk so if the mix wants to charge a point one percent mixing fee this is by the way very different from the transaction fee that mining nodes charge this is a mixing fee on top of that so if the mix wants to charge a point one percent mixing fee then one out of a thousand times the mix should swallow the entire chunk and 999 times out of a thousand the mix should return the entire chunk without taking any mixing fee this is a tricky property to accomplish which means that the mix should generate a random number in a way that can convince the user that the mix is not cheated in generating this random number and has genuinely flipped a coin which has you know a 99.9 percent chance of coming up one way versus the other but we do show how to how to do this using cryptography in a way that both parties can be satisfied has worked correctly we think that really all four of these principles are necessary to have anything approaching mathematical confidence in having a large anonymity set and in our ability to resist clever inferential attacks by an adversary that looks at the blockchain to try to link input to output the sad news is that virtually none of the current mixes follow these principles they're in a very different model where each mix operates completely independently and they have a web interface and the user interacts with them totally manually instead of automatically through their wallet software and will manually put in the amount instead of a standard chunk size it's whatever amount the user chooses typically and the mix will take some cut of that as a mixing fee and send the rest to the user so this is we don't think this is a situation that gives mix users a lot of anonymity but we think that by moving to a slightly different model based on these four principles the anonymity properties of the mix ecosystem can be dramatically improved all right so through these four principles we've seen how the anonymity properties of mixing can be improved but there is still one major problem which is that users still have to trust these mixes so again we had a few ways that we talked about in our paper for what to do about this mixes can do several things to improve their trustworthiness one is that simply by staying in business for a long time and not stealing users money they can build up a reputation you might wonder does this reputation count for anything because it's simply a matter of he said she said in fact a mix operator can claim that a competing mix operator stole all their money even if that did not in fact happen well generally reputation systems in the real world manage to operate even though there can be conflicting claims that are made in this context for example users might learn to only trust the word of prominent members of the bitcoin community who they think have the best interests of the ecosystem at heart another way is that in the system that we proposed the chunk sizes are going to be so small that in the regular course of mixing users are going to mix a pretty huge number of chunks or at least the system can be configured in that way so that the chunk sizes are relatively small so in that context if a mix has even a one percent probability of stealing a user's chunk then after a hundred or so interactions with small chunk sizes with a particular mix the user is going to know the user is going to detect the theft and so the user will learn to never use this mix again and so the system might sort of correct itself by users testing mixes for themselves for trustworthiness an important thing to keep in mind here is that the chunks that users are sending to mixes have typically already been through other mixes so the mix itself can't know which user the chunk is coming from and so the only thing the mix can do is to essentially steal randomly from users the mix can't steal from a particular user so from the user point of view on average they won't suffer losses that are more than the average rate at which the mix steals so they don't have to worry that a mix might particularly have it in it for that particular user and steal all of their money there's no way that that can happen so that's what I mean when I say users can test this for themselves and finally we propose the cryptographic mechanism where the mix can issue sort of a promissory statements to the user that once it receives a chunk at a particular address it will send a chunk back at some other address that the user provides and so if the mix fails to keep this promise our idea is that the user can publicize this warranty and everybody will know that a particular mix is cheated and so everybody will stop using this mix and the mix will lose business and in combination all of these three mechanisms provide incentives for mixes to act honestly so these were our calculations anyway and our proposal we have improved that this will work in practice that remains to be seen all right let's uh on that note let's quickly look at how things are in practice right now it doesn't seem that there are any reputable services providing dedicated mixing that users have learned to trust or at least enough to use on a regular basis in fact this is from the bitcoin wiki where the original is also highlighted in red so I took the liberty of doing that myself mixing services made themselves be operating with anonymity and so if your funds are not delivered you have no recourse use at your own discretion so we're proposing moving to a different model where mixes stay in business become reputable entities and so on that hasn't quite happened yet and note that there are sort of a bootstrapping problem here if mixes were reputable entities they would have a big volume of transactions and so by interacting with them you'd get a pretty good anonymity set and so users would be more confident in interacting with them and mixes would realize that they're making more money by staying in business and taking a small cut than by trying to steal the small amount of money that they're controlling at any given time and so mixes would be further incentivized to stay in business so you can imagine that once a mix ecosystem gets going it will be self-sustaining but whether or not that can eventually happen we can say for sure that it hasn't quite happened yet so the fact that this mix ecosystem currently doesn't exist is a big part of the reason why many people have proposed decentralized mixing and there are a variety of reasons for decentralized mixing some of which we've talked about in that there is no bootstrapping problem so the reason there's no bootstrapping problem is that in decentralized mixing you don't go through a particular dedicated mix service instead you find a community of peers who all want to do mixing and somehow without any central coordination or at least a central service that collects your funds you manage to mix with each other so that avoids the bootstrapping problem because as long as there is enough interest from bitcoin users they can meet with each other and start mixing how to do that we'll see in a second also theft is impossible and this is enforced through technical means because nobody is explicitly sending bitcoins to another user again we'll see how this is accomplished it could possibly provide better anonymity and we'll look into more details on that as well and finally i just want to point out that this is just more philosophically aligned with bitcoin if you can get rid of having to have a centralized service for some purpose then there are a lot of users who are bitcoin users who find that appealing so how might this work the main proposal for a decentralized mixing is called a coin join and this is something that was proposed by greg maxwell as a core bitcoin developer who will meet again in the next lecture actually so what he proposed is different users coming together to create a single bitcoin transaction and what are the outputs of this transaction we'll see in a second but somehow create a single bitcoin transaction that combines all of their inputs presumably of equal value now let's think about this for a second what is necessary in order for these three users to create a single transaction well one way of thinking about it we might imagine that in order to produce a signature somebody has to collect all three private keys that's not actually how it works though and bitcoin all the signatures corresponding to the different inputs are totally separate so each input signature is entirely separate so what it allows the users to easily do is create different inputs that correspond to different users and also different output addresses that correspond to different users and randomize the order between them so in this situation maybe the users participating in the protocol might necessarily have to know which input address corresponds to which output address although we'll see in a second if we can avoid that as well but certainly someone looking at the blockchain looking at only this single transaction even if they realize that this is a coin join transaction will not be able to find the mapping between the input and the output it's that simple that's the essence of coin join of course this is just one round of mixing on top of this you have to apply the same principles that we talked about before so the principles that I discussed they're not only for centralized mixes they apply essentially with very few modifications even to the coin join scenario so you want to do a sequence of coin joins you want to make sure that these chunk sizes are standardized so that you don't introduce new side channels etc etc okay but let's look into the single transaction though exactly how would this work there are a lot of details that are still not clear so let's look at this in algorithmic form so if we write it out like this what needs to happen is that a group of peers who all want to mix somehow need to find each other that's the first difficulty and then they have to exchange their input and output addresses with each other and one of these users it doesn't matter who will construct this transaction not yet a signed transaction but just the transaction that corresponds to these different inputs going to these different outputs and then they'll pass it around to collect signatures from each of the peers now if the peer who constructed the transaction were disruptive and for example left out one of the peers outputs then the whole thing will collapse because when that particular peer gets the transaction in order to sign it they will simply refuse to sign and the process will not be able to go forward but if everything is okay everybody acts honestly then the transaction is constructed and now any peer that again doesn't matter who can broadcast the transaction to the network two of them could do it independently it doesn't matter the transaction will of course be counted only once so that's it that's the whole protocol the entire security property comes from each peer checking that their output address is represented and that their output of course receives at least as much value as went in from their input so that seems simple enough but what are the remaining problems here well there were three problems one is how did this group of peers find each other right and the second is that as as I described in the previous slide this protocol involves each of these peers finding out the mapping between inputs and outputs or at least one of those peers so that seems like a problem in fact I want to point out that this is a worse problem for decentralized mixes than for centralized mixes and why is that in the centralized mixing case you could hope that these different mixes are run by entirely different entities who are not colluding with each other and at least in some cases these will be reputable real-life entities who you would imagine have incentives not to collude with each other because they have different goals or for whatever reason again the reasoning is similar to Tor you have a variety of different types of people who are running Tor nodes they don't at all have the same incentives so we imagine that they're not all going to collude with each other and also that they're not all going to get compromised by the same attacker a similar principle holds for decentralized mixes and that only works because you know something about the identities of these mixes so these mixes having known identities and being reputable entities helps anonymity in this case we don't have that luxury with decentralized mixes because we have no idea who any of these peers are right so it could be a single attacker creating lots of civil accounts and accounts in the sense of just creating lots of symbols and trying to get into every single coin joint transaction that's ever carried out in order to learn these input-output mappings and so even if you do a series of coin joints it might be the case that in each of those coin joints at least one of the participants was an attacker or was controlled by the same attacker in which case your entire anonymity is lost so that seems like a problem and a third problem and kind of a tricky one is denial of service what does this mean well it could happen that after providing the input-output pairs one of the nodes disappears and refuses to sign the resulting transaction so the transaction is not able to proceed forward and secondly even after creating the signature before the transaction can get broadcast to the network and confirmed in the blockchain one of the nodes who might be malicious might take this input and spend it in some other transaction that's unrelated to this coin join and so this coin join will look like a double spend attempt and will be rejected by the bitcoin network so that's another way in which you can launch denial of service against coin join so now let's look at what are some possible solutions to each of these three problems well the first one how to find peers is a very simple solution it's not it's not a perfect solution but people consider this to be somewhat okay you simply use an untrusted server it's sort of like a watering hole where different users can connect and find each other but the server is not necessarily involved in any way that the users have to trust in running the protocol right and as we're going to see each of these steps for solving these problems introduces a little bit of engineering complexity so this already requires a whole peer-to-peer protocol for finding these coin join peers on top of the bitcoin protocol and we're going to see similar factors that introduce engineering complexity for solving each of the other problems so the next one how do we solve the anonymity problem well there is a there is a simple strawman solution you can frame the anonymity problem in this way you need to communicate the set of inputs to all the peers and also you need to communicate the set of outputs but break the linkage between the input and the output now this becomes a communications anonymity problem instead of a bitcoin anonymity problem right because it's simply the matter of communicating these output addresses that needs to be unlinked from communication of the input addresses so a strawman solution to that since we already have seen Tor a little bit is simply this these peers come together they exchange input addresses and they disconnect and then reconnect over Tor after after a while and then exchange the output addresses so this is pretty simple but it may not be very robust in practice a better solution might be to build a special purpose anonymous routing mechanism for these participants to utilize just for this protocol and there are things called decryption mix nets that allow you to do exactly that and such solutions have been proposed so let's move to the third problem which is a denial of service attack let's think about it this way what's a traditional solution to a denial of service attack well one possible solution to a denial of service is to make it a little bit expensive for the client to connect to the server and to to receive service well this is not a client server model it's a peer-to-peer model but we can still try to adapt the same principles and that's the principle behind the first two of the proposed solutions for denial of service either a proof of work or a proof of burn so what do I mean by this proof of work is simply repurposing the algorithm behind bitcoin's proof of work to require each of these peer nodes to do a little bit of computational work before they can join a coin join protocol and the rationale is that if the adversary is going to disrupt every coin join that exists out there they're going to be burning a lot of computing power which will make it very expensive for them proof of burn is a similar concept it's a it's also called that fidelity bonds in bitcoin it allows you to irreversibly destroy some bitcoins that you own by sending it to an unspendable address thereby proving that you've made a little bit of an expensive signal in order to get into the system so that's the rationale between the first two solutions the second two solutions next to the third and fourth also have a similar rationale which is to identify the malicious participant one or more malicious participants who launch the denial of service to kick them out and to run the coin join with the remaining participants and that could be done if you trust the server a little bit to carry it out it could also be done in a purely decentralized manner like this paper called coin shuffle proposed and they came up with a cryptographic blaming protocol for doing this and it involves something called zero knowledge where you learn at least one of the players who must behaved without necessarily learning much more about what happened and then the rest of the peers can then redo the protocol at various points I've talked about side channels so let's look at an example of that and I want to point out that these side channels can be very tricky not all the mixing in the world can save you from what I call high level flows that could be identifying and here's a neat example of this let's say user Alice receives a very specific amount of bitcoins let's say on a weekly basis as income and has the habit of always automatically and immediately transferring let's say five percent of that to her retirement amount right so think about the patterns that will be visible on the blockchain here no matter what she does to obscure the link between the addresses at which she receives her income and the address to which she transfers to her retirement account the patterns here are going to be uniquely identifying because this is a very specific value and the five percent of that is also going to be a specific value and there's also a timing pattern every time money appears here every time money goes to this address as well so this is a problem how do we protect ourselves from this well one suggestion that has been proposed is not only in the context of mixing but even in the context of regular bitcoin wallets where users are not even trying to do any mixing is by my kern and he calls this merge avoidance merge avoidance is a very simple idea when users want to do payments the proposal is that instead of creating a giant transaction that combines as many inputs as necessary in order to pay the entire payment to a single address why not have a protocol by which the receiver can provide multiple output addresses as many as necessary and the sender and receiver can agree upon denominations and the sender can avoid combining different inputs and can make a variety of different transactions that send money from different input addresses to different output addresses so this avoids a lot of the problems both of high level flows because even these multiple input and output addresses cannot be linked to each other so an adversary might not even be able to observe the fact that this is a high level flow that this is that's happening but also avoids problems like clustering addresses together because of evidence of shared spending and this is a proposal that one could think about incorporating right now into bitcoin based payment flows in order to improve anonymity for everyone now let's turn to zero coin and zero cash which are a completely different approach to bitcoin anonymity the approach is sort of to bake it in at the protocol level and these are cryptographic heavy weights and so a zero coin was first developed by cryptographers at johns hopkins and later on they started collaborating with other researchers around the world who had been developing a very efficient cryptographic technique that would enable making some of the cryptographic operations and zero coin more efficient and that resulted in zero cash as you'll see these techniques provide a qualitatively different level of anonymity than mixing solutions that sit on top of bitcoin but what's the catch the problem is that this is not quite backward compatible with bitcoin zero coin and zero cash are going to require altcoins technically it's possible that zero coin can be deployed as what is known as a soft work of bitcoin but the practical difficulties are high enough that this is not really considered feasible and in fact the zero coin developers intend to deploy it as an altcoin themselves instead of trying to be compatible with bitcoin directly let's start talking about the details here let's review some of the things that i've just said so zero coin brings protocol level mixing and being baked into the protocol what it gives you is a cryptographic guarantee of mixing what does that mean you don't need to trust a single mix or even a set of mixes or a set of peers or anybody at all to ensure your anonymity you just need to rely on the underlying crypto being solid you don't even need to rely on the miners enforcing this in order to achieve anonymity it's purely a cryptographic guarantee so that's really great that's qualitatively better than what we have so far and of course it's not currently compatible with bitcoin and here's the paper if you want to look it up so how does zero coin work i'm going to introduce a concept called base coin and i'm taking a few liberties with the presentation here in order to simplify and clarify the concepts i'm going to do that by mixing some concepts from zero coin and zero cash but toward the end i'll make very clear what the differences are between the two so like i said zero coin is an altcoin and i'm going to call that altcoin base coin i'm not calling it zero coin because zero coin is something else it's an extension of this base coin it's something that sort of sits on top of this altcoin and the key property that gives you anonymity is that these base coins can be converted into zero coins and back again and when you do that it breaks the link between the original base coin and the new base coin so think of this as a cryptographic mixing system that's provided by the protocol itself so how might this work another way of looking at a zero coin is that it's a cryptographic proof that you owned a base coin not anymore but you owned it and then you made it unspendable a zero coin is something that allows you to assert that to say any miner who might care and miners can verify these proofs and that's what gives you the right to later redeem a new base coin in exchange for the zero coin and the analogy is a little bit like poker chips so how could that work and what properties of these proofs need to have in order to enable this so one challenge is how to construct these proofs and the other trick is how do you make sure that each proof can be spent only once can be used only once to redeem a base coin because if you don't have that property then it's going to lead to double spending so let's see how to do that it crucially involves a concept called zero knowledge proofs what are zero knowledge proofs i'm going to tell you at a little bit of an intuitive level so i'm calling it crypto magic again but what it is is it's a way for somebody to prove a statement without revealing any other information that leads to that statement being true a couple of examples are going to make this really clear you might be able to prove a statement like i know an input that hashes to this particular value and notice that if the input that you had picked were long and random you could if you did a proof in such a way that you don't actually reveal the input it won't necessarily allow somebody else to infer what that input is a more complex version of this is you could say i know an input that hashes to some hash in a following set of several different possible outputs and the zero knowledge proves that zero coin is going to use is something that's very similar to the second category here let's dive in a little bit more so zero coins are minted they come into existence by minting and anybody can do this and zero coins come in standard denominations let's assume for the rest of this that zero coins are with one base coin each you could also imagine multiple denominations co-existing how do you make a zero coin well we're going to see that in the next slide but let me just say for now that minting a zero coin doesn't automatically give it any value you can't get free money it only acquires value once you put it onto the blockchain and so putting it onto the blockchain is going to be about as expensive as the value of that zero coin that you're later going to be able to redeem so you have some sort of a conservation principle here okay so here's how specifically in cryptographic terms we mint a zero coin it's something called a cryptographic commitment what a cryptographic commitment is is intuitively you can think of it as you're taking a serial number a random serial number that you generated and putting it into an envelope so this intuitive notion of putting it into an envelope cryptographically what does that correspond to what it corresponds to is generating another random secret r which you're never going to make public and computing the hash of the coin serial number together with this random secret now this is a little bit of a simplification but it but it really helps you understand the properties of the system so let's go with this description so what just happened here you generated arbitrarily just like you generated bitcoin public keys a serial number for your zero coin and if we're long and random hopefully no one else has ever picked that same serial number before and you also generated this other random number that you're going to keep secret and intuitively generating a commitment to this serial number corresponds to putting it in an envelope and sealing it and mathematically it happens by computing the hash of the serial number together with this random value okay once you've generated this commitment what do you do with that well the next step is to put that commitment onto the blockchain that's when the zero coin sort of becomes real and doing this requires in a sense burning a base coin and making it unspendable so in concrete terms how would that work you've got the blockchain over here and one of those transactions might be a mint transaction and if you zoomed in it would be a transaction that's signed by Alice who created this zero coin who minted the zero coin and what we saw earlier in the structure of transactions is that over here you would have the recipients public key or the recipients address instead of that here you have this cryptographic commitment and just like before just like a transaction having a pointer to a previous transaction the same structure is carried over for zero coin transactions as well so what has happened here we've spent this base coin in order to mint the zero coin and this commitment the sealed envelope that we've put into the zero coin is what is going to allow us to redeem that zero coin later in exchange for a base coin once again so how does that work to spend the zero coin later you will reveal that serial number that you put inside the envelope and what miners will do it's their their job to verify that the serial number has not been spent before that the serial number has not been revealed as the number that was put inside some other envelope that's what prevents double spending in the system next you'll create a zero knowledge proof that we just talked about and specifically the zero knowledge proof will say i know a number r such that the hash of the serial number together with r corresponds to one of the zero coins in the blockchain and we'll make that statement more mathematically precise in a second but think about what this says it doesn't reveal that random number r but somehow you're proving that you are in possession of that number combined with the serial number that you have just made public will result in the zero coin that was once in the past put onto the blockchain right so for somebody looking at this proof this is all they need to know to verify that you earlier spent a base coin in order to get to this point so this now should give you the right to redeem a base coin but which base coin and here's where the anonymity property comes in you can pick an arbitrary zero coin in the blockchain and use that as an input to a new transaction out of which comes a base coin and the miners will allow you to do that so put a zero coin in take a zero coin out but a different zero coin and all that anybody needs to know is that you have the right to do that because you've put in some zero coin in the past it doesn't matter which zero coin and you can't do that twice you can't do uh spend twice corresponding to a single mint because the serial number now will become public and there's only one serial number corresponding to one zero coin and you only know the serial number is corresponding to your zero coins and not anyone else's zero coins great so where does the anonymity property comes from here's the anonymity property since you've kept this random number r secrets and this is what is available on the blockchain there are a number of hashes or commitments corresponding to the different zero coins that have been put on the blockchain even though you've revealed the serial number not knowing this other random input r nobody can try to brute force this and guess which of these zero coins corresponded to your serial number so even after the serial number inside an envelope has been revealed and it's been verified that this serial number was inside one of the envelopes we still don't know which serial number it is so this is the sort of magical property that zero knowledge proves in cryptography give us that uh you wouldn't get in a real world physical world envelope based analogy of this so the next cool thing about this whole construction is the fact that these proofs are efficient and i'm putting efficient in quotes here and the the sense in which they're efficient is that compared to what we know of zero knowledge proofs and have come to expect on them it's quite an achievement that these proofs are as efficient as they are however compared to the efficiency of bitcoin transactions themselves these are in fact quite slow so it occupies a space in between those two so exactly what i mean by efficient the reason it's efficient is that it manages to avoid being linear in the number of zero coins on the chain even though that is what you would expect why is that what you would expect think about the statement that the spender is proving here i know a random number r such that either the hash of the serial number with r corresponds to the first commitment or the first hash or the second commitment or any one of these giant number of commitments that reside on the blockchain right so it's a very long statement that the prover is proving it's a statement whose length is proportional to the number of zero coins on the blockchain and yet the proof is much smaller than that it's not linear it's only logarithmic in this in the uh in the value n here and uh that's part of the magic of zero coin that's what makes it possible to even run the system all right moving on let's talk about zero cash now uh zero cash kind of takes the cryptography uh sort of to the next level it uses a cryptographic tool called snarks which we won't get into at all but the upshot of that the upshot of the use of these more efficient cryptographic constructions for proofs is that the efficiency gets to a point where the authors suggest that you can in fact run the whole system without having any base coin all transactions can be done in this zero knowledge manner you don't need to have separate expensive transactions that are used only for mixing and a set of regular everyday transactions that you use uh when you don't want special anonymity properties that distinction is now gone the claim is that you can run all of these transactions uh sort of inside these envelopes and what i mean by that is the following all transactions are zero coins and so zero cash becomes untraceable in a sense because there is no base coin and the reason for that is that splitting and merging of coins are also transactions that are supported in zero cash itself without going to base coin and in particular the transaction value is the transaction amounts you can put those inside the commitments those won't be visible on the blockchain anymore the only thing that the ledger records publicly is the existence of these transactions uh you know that alice put in some transaction you know much later that bob redeems some transaction who might be the same user might be a different user but the only people who need to know what the amount is are the sender and receiver of any particular transaction the miners don't need to know that if there is a transaction fee then the miners need to know that fee but uh that doesn't really compromise your anonymity property right so the ability to run zero coin in this different configuration where it's not two different coins anymore it's not a base coin with a mixed layer on top but instead an entirely untraceable system of transactions puts zero cash sort of in the next level when it comes to anonymity because a lot of the possible side channel attacks that were true for mixing that were true to a certain extent at least for a zero coin are no longer true for zero cash because the transaction amounts will no longer be visible in the public ledger but that almost sounds too good to be true a completely untraceable electronic cash system it is ledger based but the ledger doesn't record anything that might compromise anonymity or privacy well there is one catch here's the catch in zero cash it requires a certain setup process to even set up the system specifically one needs random and secret inputs in order to generate the public parameters think of those as public keys except that these are giant public keys they're over a gigabyte in size and not only that not only is the size a bit of a problem these secret inputs for the security of the system then have to be securely destroyed so that nobody knows what those secret inputs were that were used in order to generate these public parameters that seems like a bit of a problem and the reason that no one can know them is because if somebody knows them it doesn't mean that they will be able to compromise anonymity but they will be able to create new zero coins for themselves and nobody will be the wiser which is also an equally bad problem for the currency so it's kind of an interesting sociological problem here how could some entity set up the system and then convince everybody that they have securely destroyed the parameters that were of course necessary in order to set up the system so it's not entirely clear how that can be solved there have been various proposals for it but at the moment we don't have a very clear idea of how to go forward on this so what have we seen so far in all of the different efforts to improve anonymity in bitcoin well if we put them on a line as i'll show you in a second we see that there are five clearly different levels of anonymity that that we've seen in different proposed solutions and what are these so let's look at not only the levels of anonymity that these systems provide but also the deployability of these systems let's start with bitcoin which is already here it's only pseudonymous it doesn't even aspire to be really anonymous and we've seen that pretty bad transaction graph analysis is possible i showed you many beautiful graphs with a clustering of different addresses and in many cases how to go from those addresses to identities so not a lot of anonymity provided by bitcoin the next level is simply using a single mix sort of in a manual way in which people are doing right now with some of these dedicated mixed services and that still allows you transaction graph analysis because as as you might remember from the four principles that i gave you if you don't have this automated system that has uniform chunk sizes and so on a lot of transaction graph analysis is still possible in addition you have to worry that this mix might not be trustworthy storing records and might be sharing them with other people and again could get hacked etc the third level that we saw is a chain of mixes and this can be in a centralized model or a decentralized model it doesn't matter both models give you roughly the same level of anonymity but where really the anonymity improvement comes in for this one compared to a single mix is that you have these standardized chunk sizes and you have a series of mixes and you have a variety of other bells and whistles on top of it like automated clients and so on and for this some site channels are still possible not as bad as before transaction graph analysis is no longer that easy and you still have to worry about an adversary who might collude with multiple mixes or in the decentralized model some peers that might be malicious and compromise your anonymity this is of course perfectly backward compatible with bitcoin could be deployed and adopted any day hasn't quite happened yet in a way that we would consider to be truly anonymous and then we saw zero coin which is cryptographic mixing baked into the protocol doesn't depend on anybody promising to destroy their records or anything like that you just need to trust the math so that's a whole different level of anonymity in my opinion it still has some possible side channels but it's not as bad as the other mixing based solutions that we saw where it's not baked into the protocol and zero coin of course as we saw as an altcoin so it's not quite bitcoin compatible in a way that's one might hope and finally zero cash the difference between zero cash and zero coin is not so much at a fundamental mathematical level but because of the fact that you can run zero cash in a configuration where you get rid of the base coin altogether and the efficiency is not is not too bad in that in that configuration and so what that gives you is untraceability which is something on top of unlinkability so that's a new anonymity property and there really aren't any anonymity attacks that I can think of at least but the downside of course is that not only is an altcoin but it also has this very tricky setup process that we don't necessarily know how to make progress on so we've talked a lot about bitcoins anonymity in this lecture but bitcoins anonymity becomes even more powerful when combined with other technologies in particular anonymous communication technologies we've talked about tour a little bit we've alluded to it several times but now let's go into more detail let's first set up the problem of anonymous communication though so this is what the system looks like there are a bunch of senders there are a bunch of recipients and messages are routed from senders through recipients through this network over here and of course there's going to be an attacker this attacker and this is called the thread model the attacker controls several things some of these nodes in red are compromised by the attacker some of these edges some of these links between honest nodes to the network are also controlled by the attacker even if the nodes themselves are not similarly some of the recipient nodes over here and some of these links from the network to the recipient node are also controlled by the attacker and finally some of the internal nodes of the anonymous communication network all under the control of the attacker but crucially not all of the communication network is controlled by the attacker and we want to achieve anonymity in this hostile environment and as before anonymity refers to unlinkability between the sender and the receiver so how does tour accomplish this it's the same old pattern of picking a chain of intermediaries to route your messages through and here it is in a nice visual form and I have to thank the electronic frontier foundation for this slide so what's going on Alice over here wants to talk to Bob over here so she pre-selects a path through this set of routers and that number is fixed in the tour protocol it's always three but conceptually you can imagine that it would be any number you want and the more nodes you route through the more anonymity you get or the harder it is I should say to breach anonymity so these nodes denoted with a plus are all the tour nodes and she picks some subset of three nodes randomly in order to route her message and the security property that we get is that as long as at least one of these three nodes that she picks is not compromised or colluding with the attacker then she is sort of safe here in that Alice cannot be linked to Bob by somebody who's observing some of the nodes in the network I should say that there are many attacks possible on tour one of them for example is called an end-end traffic correlation attack so there are going to be timing patterns in the flow of traffic between Alice and whatever Bob is maybe a website and so if the attacker controls both of these links then just by observing the correlation in those timing patterns he might be able to determine that these two nodes are in communication with each other even if he knows nothing about the route that the message took between them so one key point here is how do you hide routing information what do I mean by that when a message has gone from Alice to the first router it has to have the IP address of Bob's computer somewhere in that message otherwise there is no way that this router can appropriately forward that on to reach the right destination however we don't want this router to actually learn that IP address because if the router does learn that IP address then it knows both Alice's IP because the message came from her and a Bob's IP because that's where the message is eventually going and now this router has the link between the two ends of the communication and this would be a problem if this router were malicious so as you might guess the answer involves encryption and as you can see in this picture these links are in green they're encrypted connections and this one is an unencrypted connection let's look in more detail to see how this encryption works it's a specific way in which encryption is used it's called a layered encryption it resembles an onion so that's why onion routing is a related concept here so what is going on here Alice and router 1 share a symmetric key that's represented in purple Alice and router 2 share this key that's represented in blue and Alice and r3 share the key that's represented in gold now these symmetric keys are not stored long term by any of these nodes they're established as necessary using key exchange the only persistent keys are the long-term public keys of these routers and these routers do in fact have long-lived identities and public keys and so on Alice of course does not need to have any long-term public key when she picks a path of these routers she finds their public keys executes key exchange protocols and obtains these shared symmetric keys and what she's going to do is when she sends the message to r1 it's going to be triply encrypted the outermost layer of encryption is a symmetric encryption between Alice and r1 and so what this allows r1 to do is peel off that layer of encryption like peeling off an onion and when router 1 peels off that layer of encryption inside it's going to find the ip address of router 2 and an encrypted message to send to router 2 and it's going to forward that router 2 peels off a further layer of encryption and then to router 3 further layer of encryption now the message is unencrypted consisting of the plain text message as well as bob's ip address and so router 3 now sends that message in plain text to bob of course what you probably want to do is further layer a protocol like htps or secure web browsing on top of tor so that even this message from router 3 to 2 bob is encrypted but the tor protocol itself doesn't guarantee that has no way of guaranteeing that because bob might be a regular web server that doesn't even speak the tor protocol and so there's no way that tor can be responsible for the encryption between r3 which is called the exit node and the ultimate recipient of the message i'll leave you to think about why this wouldn't quite work if there were only one layer of encryption for example if alice tried to encrypt the message all the way from her to r3 it wouldn't quite work the writing would not quite work out but as it is the very neat property that you have is that r1 only knows alice's ip address and r2's address does not know r3's or bob's address and similarly every node knows only the addresses of the node that was one hop before it and one hop after it and in fact when the message gets to this point the ip address of alice is not even present anymore whether or not an encrypted form so that's really how you get anonymity here if any one of these if r2 for example were compromised then it would learn r1's and r3's ip addresses but not alice's or bob's so that's how tor works and now let's talk about uh silk road and in particular the problem that a site like silk road has to overcome is this silk road is what is known as a hidden service in other words the silk road server wants to hide its address for obvious reasons if you haven't heard about silk road let me just say a sentence about it briefly you're going to see it in more detail in the next lecture silk road was a website that operated for a couple of years it was an anonymous marketplace it sold a variety of goods but the thing that was most known for is selling drugs and because of the pervasive anonymity or at least pseudonymity in the system the idea was that it was very hard for law enforcement to go after and the story of what happened next i will leave to the next lecture but let's look look at the technology that made something like silk road possible and the implications of that so here is a simplified algorithm by which a server can keep its identity hidden and yet provide services through tor what it does is it connects through what is called a rendezvous point which is one of the tor routers through tor and then what it's going to do is it's going to publish the mapping between its name its domain name and the address of the rendezvous point through directory services that the tor system offers and these domain names are not your regular dns domain names that wouldn't work because it's this whole parallel system of routing and so these are called onion addresses and they're going to look like this long string dot onion and notice that it looks a lot like a bitcoin public keys and it's for sort of the same reasons it's because anyone can generate one of these and now the client will have to learn the onion address of the site that it wants to visit if when this when the silk road existed if you wanted to go to silk road you couldn't type in silk road dot com that wouldn't make any sense because silk road is not even available over the regular web instead you would have to through some manner and this was a widely known address you would have to find this is not silk roads address by the way this is the onion address of duck dot go a search engine that offers privacy and anonymity but you would find a similar address that belonged to silk road and put that into your tor enabled browser and that what your client would automatically do is look up the mapping for the address of the rendezvous point connect to that rendezvous point and through that rendezvous point have a anonymous and encrypted connection to the ultimate server without the server having to publish its actual IP address so that covers some of the technology behind silk road in particular anonymous communication and how do you do anonymous payments which is of course with bitcoin but still you need more technology in order to make this whole system work you need security in other words how can you be sure that when you pay someone on silk road they're going to actually sell you the goods silk road had a reputation system for that and how do you do anonymous shipping the side pretty much left this to the participants and advised buyers to provide an anonymous p.o. box for example to ship goods to so let's take a step back we've covered a lot of technology in this lecture hopefully you've understood that bitcoin anonymity is a very powerful thing and it gains in power when combined with other technologies in particular anonymous communication technologies and also anonymity is a deeply morally ambiguous thing there are many moral distinctions that we would like to make that we're not able to adequately express at the technological level and so some of this moral ambiguity appears to be inherent hopefully it's also been clear that anonymity is very fragile one mistake can create a link that you're trying to hide but also anonymity is an important thing to protect it's worthwhile protecting it has a lot of good uses in addition to bad uses so most of the things that we've talked about today are either the forefront of research technologically or there's a topic of serious ethical debates none of this is really settled and so this is an ongoing conversation area of ongoing research we don't know which anonymity system for bitcoin if any is going to become prominent or mainstream and so this is a great opportunity for you either as a developer or in thinking through the ethical implications to get involved in some of these issues and hopefully what you've learned in this lecture has given you the right background for that