 Hello and welcome. My name is Tom Ritter and I work for ISAC Partners. If you don't know who Zax is, you will by the end of the talk. So this talk is about an anonymity network that was started in the fledgling days of the cypherpunk era, the early 1990s. This book, what many of you will probably call the Bible, had not even come out yet. But the first edition had, and while you could export the book itself, the U.S. government had determined you could not export the floppy disk that the code had come on. In fact, the U.S. was actively investigating Phil Zimmerman for violating the Arms Control Export Act for making the first few versions of PGP available. Dan Bernstein and the toddler-aged EFF went on the offensive, taking the U.S. government to court and suing over the export controls on crypto. And another group of people ultimately printed out the source code for PGP, exported the book to Europe, scanned it in an OCRD in 97, releasing a version of PGP that bypassed the export controls. Alt.anonymous.messages was forged in the heyday of the cypherpunks, and really overall has changed very little in the intervening decade since it was last shaped in any major way. But in that decade, what we have seen is a monumental focus of the nation's spy agencies on not what was thought to be the most critical piece of information to encrypt the content itself, but rather on metadata. The people who know don't won't talk, and the people who talk don't know, but leaked court orders require Verizon to turn over call records local and abroad. Now, I'm talking here so I don't know anything and I'm just speculating, but the most straightforward thing to do with this data is to build communication graphs, analyze the metadata, looking for patterns, identifying people of interest and figuring out who they talk to, and the metadata around an encrypted channel tells volumes. So SSL is the most widely used encrypted channel on the internet today, and even ignoring the numerous attacks we've seen in the past few years, and even ignoring how it breaks just about every cryptographic best practice that there is, there's a wealth of information that you can learn from observing an SSL session. There are protocol level leaks itself. It says a lot about the type of client you're using and the version, and it even includes what you think the local time is. So here's hoping your clocks are synced. But from an information theoretic perspective, an adversary can see that you're sending packets and communicating. That seems obvious, you know, of course they know that you're communicating, but it's important to bear in mind for the future. Ideally, the adversary wouldn't even know that you are communicating. Secondly, SSL makes no attempt at hiding who you're talking to, so the fact that you're on Facebook straightforward. And similarly, the adversary knows when you're on Facebook, and when you are sending data, and when you are receiving data, and the resolution on this goes down to the microsecond. So they know exactly when, but they also know exactly how much data you receive. SSL doesn't have any real padding, and I don't know of any website that adds variable length padding to frustrate length analysis. So how many of you stayed through Runo's talk? A few? Thank you. So let's talk about Tor. Tor is an implementation of onion routing where you pass messages along a chain, each node peeling off a layer of encryption until an exit node talks to the intended destination. The destination responds and it's routed back. Onion routing specifically aims to disguise who is talking. An adversary observing you can't see that you're talking to a website or a service, and an adversary observing that website or service can't see who is talking to it. But it doesn't stop an adversary from knowing you're talking to someone, knowing when you're talking and how much you're saying. Tor doesn't really do padding. What little it does is not intended to be a security feature. Tor explicitly leaves out length padding. And, if you stayed through Runo's talk, you know that Tor cannot protect you if an adversary can see the entire path of a circuit. Let's say, hypothetically speaking, that New Zealand, Australia, the US, Canada, and the UK were to say conspire on some sort of spy program. Well, if your circuit went through these countries, Tor can't help you, at least not information theoretically. The adversary can track your traffic and find out who you're talking to. I'm not saying this is actively happening. I'm saying we've proved in papers that it's possible and that it's explicitly outside of Tor's threat model. And a slightly more difficult version of that attack is if the adversary can see you and then see the last leg of your path later on, like say you're in China visiting a Chinese website, well, they can do a similar attack and track you down. It requires a little bit more math, a little bit more correlation, but again we've proved that it's possible and it is again outside of Tor's threat model. And this is particularly concerning seeing as I, like probably most of you, happen to live in the US and so much of what we do happens to be hosted in Amazon EC2 in Virginia. So if either of those two cases apply, we're basically back at SSL because the adversary can tell who you're talking to. And at this point I think it's worthwhile to show a couple of attacks on metadata. So IOActive built a proof of concept traffic analysis tool that looks at your SSL session with Google and figures out what part of Google Maps you're actually looking at, all based off the sizes of the tiles that you're downloading over SSL. And it's worthwhile to note that this is an attack on a client, on someone browsing Google Maps at that moment. Let me show an alternate example. You're sitting on Facebook with Facebook Chat enabled, all over SSL, heck, all over Tor. Well Facebook Chat turns you into a server. You are able to receive messages from people and they will be pushed down to you. The attacker, not you, determines when you will receive a message and that's a pretty powerful capability and it can lead to time-based correlation attacks. An adversary sends you a message and looks at all the people connected to Facebook or Tor and see who receives a message right after that. And even easier because Facebook chats tend to be huge, it can lead to size-based correlation attacks. Not only do I send you a Facebook Chat but I send you a huge Facebook Chat. With only a couple of trials you can be pretty confident that the user whose internet connection you're monitoring is the same anonymous Syrian dissident that you're messaging on Facebook. And it's interesting to note that a very similar attack was used to de-anonymize Jeremy Hammond who is currently awaiting trial for allegedly dumping Stratfor's mail spools. The police staked out his home, watched him enter, saw some Tor traffic and whoop, the username that they thought was him popped onto IRC. Classic traffic confirmation attack. And I've gotten some comments that they also might have cut his internet connection and saw him drop off. I haven't been able to personally confirm that in the police logs. I haven't had time. But if that's true that's another type of traffic confirmation attack that's on a low latency connection. Now the good news is that even if the adversary can see the start and end nodes or even the entire path there is a way to disguise who you're talking to. And that's mixed networks. Mixed networks introduce a delay while they collect messages into a pool and then fire them all out. Collecting messages prevents an adversary who's observing the mix from knowing what message went where. It introduces uncertainty. And I really like mixed networks and I want to encourage their research and adoption so I actually want to take a quick moment to demonstrate it to you live on stage. So right now I'm going to be a Tor node or an onion routing node or a low latency anonymity network and I'm going to receive a packet and then send it right out. Now I'm going to play a mix node or a remailer node and I'm going to collect a packet, stick it in my bag, collect another packet, stick it in my bag and collect another packet and stick it in my bag. I'm going to shuffle these up. I'm going to peel out the outer layer of encryption and now I'm going to send them out all at once. So you, the global passive adversary who can observe my computer and see all the traffic I send and receive, you saw that I received three messages and you saw that I sent out three messages but you don't know which message went where. That's the uncertainty. So mixed networks demonstrate that we've gained back a certain amount of protection against figuring out who was communicating with who. Given enough time or low enough traffic volume, an adversary can perform the same types of attacks I described against Tor correlating messages but it takes a lot more observation. The easiest thing to learn that takes no time or analysis is the fact that I'm communicating, you know, we don't disguise the if, we also don't disguise the when and we also don't disguise how large it is. So enter shared mailboxes and alt.anonymous.messages. That's a bit of a wordful. I'm going to abbreviate alt.anonymous.messages to AAM. So a shared mailbox is what it sounds like. Imagine an e-mail account where everyone in the room has the username and password but it's read only access. You can't delete messages, you can't send them. All of the messages are encrypted so what you do is you download them all as one of the people with access to this inbox and then you try and decrypt each one of them. And the ones that you can decrypt are to you and the ones that you can't decrypt aren't and you don't know who they're to. Well, someone watching this encrypted connection, watching you accessing this mailbox and downloading all the messages, they can see that you're accessing the mailbox. That's certain. And they know that you downloaded all the messages but they don't know if you were able to decrypt any of them. And because of that they don't know when you received a message, who it was from, or how large it was. All they know is that you're checking the mailbox, not that you're actually getting mail. At the cost of a lot of bandwidth receiving messages via a shared mailbox provides an awful lot of security comparatively. Now shared mailboxes are an awesome anonymity tool but the difference between an awesome anonymity tool and an anonymity tool that's actually used is the answer to the question can I interact with the rest of the world? Tor is wildly successful compared to any other anonymity system because you can browse the actual internet with it. It's not a closed system where you only interact with hidden services. So for a shared mailbox to actually be used it needs to interact with normal e-mail and that's where NIMSERVs come in. The simplest NIMSERV and the newest and easiest to use receives a message at a domain name and then just posts it immediately to alt.anonymous.messages. This is a NIMSERV written by Zax and it's on GitHub. The much more complicated Type 1 or GHIO NIMSERVs can forward the mail to another e-mail address or directly to alt.anonymous.messages or they can even route it through a remote network to eventually wind up in one of those two places and I'll talk more about this NIMSERV later on. So if we added NIMSERVs to send mail, shared mailboxes have awesome anonymity for the recipient and when you send the message to a NIMSERV that uses a shared mailbox you're ideally using an onion router or a mixed network although you don't have to and thus you would have those security properties. An adversary can see that you're sending when you send it and how large it is. So now that I've walked through the security properties of the different types of anonymity networks let's actually dive into AAM. It should really have strong security after all it's the most theoretically secure but if you've never looked at it before this is what it looks like at least in Google groups. It's Usenet. How many people are old enough to have used Usenet? All right, good, good. So there's a whole bunch of, this is what it looks like today, a whole bunch of hexadecimal subjects all posted by anonymous or nobody and any individual message usually looks like a PGP message that may or may not have a version string. Today there are about 190 messages posted per day but what's interesting is that while the average has certainly decreased over the last decade it's held somewhat steady in the last five years. So the data set that I worked off of was about 1.1 million messages from the last 10 years. Now we can really see some shortcomings here already. Over half of the messages in my data set go through two people. The network diversity is horrible and if you stay through it as you know that's kind of important. If either one of these folks Zach's or Dism got subpoenaed, shut down, or just retired the whole network would be thrown into disarray. And to the person who asked about directory authorities in Tor, Dism is one of the directory authorities in Tor and he's not affiliated with the Tor project, he's just someone that they trust. Now this looks pretty bad, it's way worse. That 53.5% statistic was over the entire data set. Today Zach's and Dism make up virtually all of the messages posted to AAM. I don't mean that they're sending them all, I mean that they are the exit node for all the messages posted to AAM. And that dip, that weird dip, that was 7800 messages sent through Frell which operates a e-mailer and a news gateway. It was a unique subject, it didn't have any unique headers. I couldn't get a whole lot out of it aside from correlating those 7800 messages uniquely. So with network diversity pretty clearly abolished, let's take a look at the data and see what type of analysis we can actually do. I don't think I can say anything ironic as this quote. Keeping the cyber text around in public sounds like a good, for a shorter time sounds like a good thing anyway. And that's from 1994. So here we are just shy of 20 years later. And the first thing to do is break it up by PGP versus not PGP. And you can see it's overwhelmingly PGP messages, but what are the not PGP messages real quickly? I was coming up to, I was trying to come up with a nice way to say crackpots. I'm not sure if I succeeded, but there are several people who have and continue to post just random rants about, I'm not even really sure, some of them are definitely the lizard people. And there are actually frequently asked questions that are sprung up in response to these guys because people are just getting flat out confused by them. And besides those, there are some other non-PGP messages. I think the most interesting is a set of about 10,000 messages with a subject operation Satanic or Satanic operation. What's interesting about these messages is that they're clearly cipher text, but it's alphabetic. If you look at a single message, you might think that it's like a Caesar cipher or a visionary or some sort of polyalphabetic thing. But if you look at them in whole, you see that it's a perfectly even distribution over a 16-letter alphabet. In other words, I think it's a substitution cipher into hexadecimal and that it's actually cipher text. There are other message clumps that are similar to this. So if you're into this sort of analysis, have at it. And the next thing to look at is what percent of messages were delivered to AAM via a NIMServe or via a Remailer. Now these numbers are going to be a little bit off since some of the PGP or Remail messages are actually to NIMS and some of the PGP messages may be through Remailers I don't know about, but it's something. And we can see that a large portion are messages to NIMS, which will be important when I tell you about how many NIMServes are actually still running. Okay, so those somewhat interesting statistics aside, let's start diving into all of those hundreds of thousands of encrypted messages. So if you didn't know, open PGP consists of packets. And each packet type does something slightly different. There's a packet type for a message encrypted to a public key and a packet type encrypted to a password. So what are these packet types? Well, these graphs show the popularity of each of the different packet types. For example, packet type one followed by packet type nine. And the top five, the ones on the bottom are the ones that you'd expect to see. Packet type one is messages encrypted to a public key. Packet type three is messages encrypted to a passphrase. The actual ciphertext of a message is nine or 18 for old style or new style. And I separated out the messages to a single public key versus messages to multiple public keys. Now there are two that are just kind of weird. These are the packet types you expect to see after you decrypted a message. These are plain text packets. There are actually a small number of messages that look like open PGP data. They've got the whole begin PGP message ticker and their base 64, but they're actually just plain text sitting in plain site. And if we look at packet type eight, this is what we get. It really is just compressed plain text data. Unfortunately, it's also nonsense. I don't know if there's a code there or not. I didn't spend a whole lot of time on it after I looked at it. I ran organizing bizarre sabbatical. It probably came out of some Markov generator somewhere. So I kind of moved on. And what I moved on to was messages that were sent to public keys. Now it's super obvious to do analysis based on the public key that's in the message. I promise you it gets a little bit more complicated later. But let's look at the key IDs. So obviously, they're a pretty powerful segmenting tool. I wanted to illustrate a couple of examples where key IDs can tell us more. There was one key ID, and I've anonymized most of the specific data in this because de-anonymizing people kind of isn't cool. So there was one key ID that messaged very reliably through a NIMServe except for two messages sent through easy news. And if you track down that very unique easy news gateway and the user agent, well, we find out that person also sent messages to another key ID, and we can start making inferences across multiple types of metadata. Now I mentioned that I separated out the messages that were sent to multiple public keys versus the one sent to a single one. If a message was sent to a single key, we don't know too much about it, especially because they usually throw the key ID, so it's just all zeros. But if a message is sent to more than one key, then we can draw communication graphs. Now it's not a strict communication graph in the sense that a message was sent from Alice to Bob. Technically it's that Alice and Bob received the same message. But in most situations, people will encrypt a message to themselves so they can read their own sent mail. So this was about the same... I started drawing these pictures about the same time as the prism scandal started breaking, so I was feeling really uncomfortable that this is probably what the NSA is doing to me and my friends. But nonetheless, quick reference. Green means that a... I was able to get the public key off of a key server. A circle means that a key received messages to it individually as well as to it and multiple other people. And then the size of the circle and the width of the line is how many messages they received. So there's this very nice symmetrical five person graph and we've got these much larger communication networks here and a real big one here. And we've got a couple of interesting graphs with central communication points. You can kind of infer from that what you want. And then we've got a couple of more interesting networks. And I think these are interesting because they imply that not everybody knows everybody else. This graph and the next one may really be a model of actual internet where people will email people in a complex interconnected but not fully connected way. This is a fairly low volume network and this one has quite a few higher volume folks participating. And then there's like the rest, the simple two person communications going on. So I was working on the communication... But let's talk about brute forcing ciphertext. Now, if you'll recall this graph, you saw that packet type nine was by far the most common packet type found. There's over 700,000 of them. Now, this packet type is really interesting. So let's dive in a little bit into the OpenPGP spec. This packet is the actual ciphertext of the message. It is only the encrypted data. It doesn't say what algorithm it is and it doesn't explain how to get the key. So where's the key? The key is in another packet. It's in packet type one for public keys or packet type three for passphrases. But if you recall from that graph, there aren't any packets that precede packet type nine. We've got a disconnect from what the spec says and the data that we actually see until we find this. The idea algorithm is used with the session key calculated as the MD5 hash of the password. Yeah, the MD5 of the password. This is absolutely legacy and we've had better ways of doing this in OpenPGP since the late 90s. So while in the very beginning of AAM this might have been excusable, the fact that my dataset was from 2003 onward makes this a pretty horrible situation. And we know how to do MD5s really, really fast. But that's only half of it. We also have to do an idea decryption. And then we have to detect if what we decrypted was the actual plaintext or just random. And while you could run randomness tests, they're slow and we're brute forcing here, so we want to go as fast as possible. This is all my way of trying to justify that I spent a lot of time writing GPU powered code and running it for months and killing my home desktop. But I did get results out of all this GPU cracking. And in fact, one of the first few dozen of the messages that we got was this one, which did not... which did not make me feel terribly good about myself. But I kept going. And I got some HTML pages. I got some weird SMTP logs. I got a lot of partial e-mailer messages. But overwhelmingly, what I got after I decrypted the message was an encrypted message. Recursively encrypted PGP messages. And in fact, here's a breakdown of how many recursions I hit. I got about 10,000 decryptions into a public key message and another 2200 that went into another password protected message, so I went and cracked those and I got about 49 messages that were two layers deep and then I had cracked some more of those and I went four layers deep. And then there was this one bloody message that was four layers deep that I still couldn't crack. So it's pretty damn recursive. Now, for the number of messages I was trying to brute force, something like 700 or 800,000, the fact that I only got about 10,000 cracked is not really great. Password crackers would consider that an abysmal failure. I'm not the best cracker. I'm sure people can do better, but what I do want to defend myself with is I'm not trying to crack passwords. I'm trying to crack encryption passphrases of the most paranoid people on the Internet. So I think I did decent. Now, I haven't explained why there are so many recursively encrypted messages. Like what the hell? And to explain that, I have to talk about e-mailers. How many people have ever used a e-mailer? All right, so about like two dozen. So the tools that you've probably used Mixmaster and Mixminion are dubbed type 2 and type 3 e-mailers. That means there must be a type 1 e-mailer somewhere, right? Well, they're basically dead, but the protocol itself lives on in Mixmaster. And boy, what a protocol. This is a manual of how to use most, but not even all of the options supported by type 1 e-mailers. Now, some of the directives are on the left. Now, what's the difference between e-mail 2, remix 2, anon 2, and encrypt 2? I sure as heck don't remember it. I had studied this stuff for a while. So to use type 1, you actually have to type all of these out yourself. It's not like a GUI where you just click a checkbox. Now, I had talked in the beginning about type 1 NIMSERVES. Well, type 1 NIMSERVES are the main recipients of these directives. It's doing together a mixed network chain of directives encrypted to different nodes. You'd type that all out yourself, by the way. And that would be your reply block. And when someone e-mails your NIM, the NIMSERVE would basically execute your reply block, sending the message off through each of the steps, ultimately coming out to your real e-mail address or to all the anonymous dot messages. And we're still seeing these messages posted. But there are only two type 1 NIMSERVES operating. One is Zax, of course. The other is Paranoisi. Paranoisi is run by a group of Italian hackers in Milan. They run Ostasisie and Inventati, which you can kind of think of an Italian version of Rise Up, if you've ever heard of Rise Up. So in conclusion, what are those nested PGP messages? They're type 1 NIMSERVER messages, where the key idea is the ultimate NIMS owner. If I don't have a key ID, then there's another layer of symmetric encryption which I haven't cracked yet. And when you download type 1 NIMSERVER messages, you know all of the passwords, you peel them off one by one, and finally you use your private key. And these are all the recipients with more than five messages. It's pretty top-heavy towards just a few NIMS. So communication graphs and brute forcing is really just the first quarter, I would say, of the analysis that I did on AAM. A majority of my time was spent doing correlation. So even if I don't know who a message is to or what it says, it's valuable to know that it's to the same person as another message or that it was sent by the same sender. And why is that valuable? Well, let's go back to this slide. You can't tell if someone has even received a message in a shared mailbox. But if I can correlate one message with another, then I can start determining that some unknown person has received a message. And once I know that two messages are related well, then I can start paying attention to their time stamp and to the length. And this goes even further, because people tend to respond to messages that they receive. And since I know if someone has sent a message, it might just be that they are replying to a message that they just received. So let's talk more about correlation and some more analysis of what's going on in AAM. First off, it's obvious that you can correlate messages that use a single constant subject. But there are a lot of messages like these. Like nearly half of all the messages post to AAM have a constant English subject. They don't use that hexadecimal stuff. They do tend to be the older messages and they've tapered off recently, which makes sense. But you can look at these numbers, 22,000 messages in a cluster, 18,000 messages in a cluster. But let's talk about those random hexadecimal subjects. Now, there are two algorithms to generate these subjects. They're called encrypted subjects or e-subs and hash subjects or h-subs. And the point of these is to quickly identify which messages are for you and which messages you should ignore. For the folks who used that, this is like downloaded, you can just download the headers and not the whole bodies. Now personally, I think we're at the point we could probably cut this step out, but nonetheless it's still there, so let's break it. E-subs have two secrets, a subject and a password. H-subs have a single secret, a password. It's considerably more difficult to brute force the e-subs and I ran out of time, so I just focused on the h-subs. H-subs were created by Zax, and as his services are used more and more, they make up an increasing percentage of the subjects. Now h-subs have a random piece in them that you can kind of think of as an initialization vector as a salt. And while I could try and shoehorn these into the existing SHA-256 crackers, it'd be painful, you have to truncate the output, so I just wrote my own GPU cracker again. And I cracked about 3,500 h-subs. Better than the percentage of messages I brute force, but again, not a great percentage, but again, these are the passwords of the most paranoid people on the internet. And I found an interesting set of messages with the h-subs, Danger Will Robinson, which was used by some, but used by some but not all of the messages that were sent to a couple of particular key IDs. I cracked all the h-subs of another key ID with the passwords of Testicular and Panties. And if you don't know what Schmegma is, don't Urban Dictionary it. So if h-subs and e-subs are used to let a NIM owner identify their own messages, can we do something similar? Let's say we want to target the NIM Bob. Well, what we can do is send a particularly large message to Bob, full of nonsense. And then we wait for a large message to pop out in AAM. Zach's NIM serve is near instantaneous, so this size-based correlation is pretty easy. Type 1 NIM serves are not necessarily instantaneous, a little bit more difficult, but not too difficult. You can just do it a couple of times. And this works, and it works pretty easily and effectively. What we get is a specific message that we know is to a particular NIM. And at that point, we can target that for h-sub cracking. So I'm not done, but unlike everything I presented before, what I'm going to talk about now is probability-based attacks. That is, I come up with a hypothesis that I can correlate messages with a probability better than random if I look at property X, whatever X is. Well, how many of you like the scientific method? I don't really have controls. So what I'm doing is I'm coming up with a hypothesis and running it across the dataset. And then I'm looking at the clusters of messages that pop out, and then I'm going to see if I can figure out something else that correlates them. And if I can see something else that correlates them, I call it a success. That's how I kind of simulate controls. So let's say I think if a message has a header value of X, I think that's a unique sender. One sender is sending that value of X. So I run that analysis, and I get clusters of messages encrypted to a single public key. Well, if there was no correlation at all, I would probably get a distribution that looks more random and be encrypted to random public keys. But with such nicely segmented public key, I kind of think that this worked. That's how I kind of simulate controls and find clusters of data when there is no other dis... And if there is a... Sorry. And even if I could have found that cluster by just looking at the public keys, the data implies that I could use that trick, that is that hypothesis, to find a cluster of data when there is no other distinguishing characteristic. So that's how I try and preserve some semblance of the scientific method. So my first example is message headers. That's a pretty big one. Let's look at these. Now, there are a few headers that are in nearly every message, but a long tail of headers that are in only a few. But these mostly unique message headers are not necessarily the goldmine that you might think they are. And that's because headers can be added at the client, at the exit remailer, at the mail to news gateway, or by the Usenet peer. So what we have to do is to really go after the distinguishing headers is subtract out the headers that were added by all the other parts of the path, which we can do by just clustering by the exit remailer and then seeing which headers are on all of those messages and kind of subtract those out. And here are some great examples of headers that were specified by the client. So user agent, obviously, X post type ID, X no archive, if you've used Usenet, you know that X no archive is a client preference. Now these three particular strange headers all formed a distinct clump of messages with the unique subject of we will save the planet. And that's an easy example of how the idea of unique message headers can kind of correlate messages. Now X no archive, this means don't save it on Usenet. It's a client request that most Usenet servers will obey. It's also not the word that I have on the screen. This is a misspelling of the header. And there is one person, or at least I'm claiming one person, who has messed this up and completely distinguishes their messages from everyone else's. All 17,300 of them. So this is what you want, right? No, capitalization matters. And this is not the correct capitalization. What's interesting about this one is that it shows up on several long running threads on AAM composing nearly 28,000 messages. And initially I thought each of these threads was relatively independent of each other, but after finding this little bit of information I'm starting to seriously doubt that. This one isn't right either. There's 1500 messages posted with this header including some test messages that were posted with someone's real name. This is actually the correct version and there is about 135,000 messages that have it or a little more than 10%, which makes it distinguishing in and of itself. So, just out of curiosity, another hand showing, has anyone ever used a type 1 NIMServe? I don't see any hands. Okay, so Encrypt Subject is a directive for type 1 remailers that should be processed by the remailer. It should never make its way into Usenet. This is a bug. This is a client. This is a user messing up. But I can't really blame them because type 1 is so horribly difficult. There are over 10,000 messages like this. And when you reuse the subject like these, you make messages without the Encrypt Subject stand out. That's the one on the far right. Or even worse, mess it up once and then figure out how to do it but keep using that same subject and password. So, this let me identify 52 E-sub messages that were otherwise secure but they messed up once and sent it through in plain text. And then there's Encrypt Key, another header that should never make it into Usenet but does because type 1 remailers are so hard to use. There are over 10,000 of these messages. And let's look at another header, news groups. Just like nailing lists, you can post a message to more than one news group. But if you do, you're wildly in the minority and that segments you. Like this news group. There are 34 messages posted with this news group and thank you so much to Comcast for making your users extremely distinguishable. And what about this value? AAM with 4 commas at the end. I thought this was a correlation but after tracking it down, it was actually a bug caused by the remailer.org.uk for one week in January of 2006. Just some random trivia I pulled out. How about this one with duplicated news groups? These were sent through a large variety of remailers and have no obvious correlation besides this value and that they have English subjects. So the English subjects was another example of the control that I used to confirm that using a unique use group is a bad idea. And now humans are creatures of habit and as flaky as remailers have been, a lot of people find a configuration that works for them and then they stick with it. Well, if I partition people by the remailer and the news gateway that they use, what the colored squares are, what was previously an anonymous discussion thread suddenly makes it very easy to pick out who is saying what and who is agreeing with themselves. And it's even easier if I add in the header signature on the far right. And then here's a really interesting pattern that I observed. There are a host of messages who have subjects with a one or a two in them, like soggy, soggy two. Well, I looked at these and found they were being posted together really close together. Then I realized one of the options in type one remailers is to duplicate a message for redundancy. Send the message down two different remailer chains just in case one becomes unavailable. And while that gains you some measure of availability and redundancy, it's also quite distinguishing. You could target a NIM like I described earlier with huge messages. And if you see two huge messages appear, well, you know that that NIM's reply block duplicates the messages. Then look for all the possible duplicate messages and you've got a candidate list of messages to that NIM, even if you're unsuccessful doing an H sub or an E sub attack. And a similar pattern I saw was these. Look at each pair of messages that are in the slightly different backgrounds. The second message comes out of Dism about five to six hours later than the one that comes out to Panta Ray. Now, I don't know what this means, but it did stand out as distinguishing. The subject for all these was, again, we will save the planet. Also some messages were from Frell we're mixed in with no obvious correlation to other messages. So there were a number of hypotheses I tried that did not turn up interesting data, but there are more queries that can be run across this data set. But I need to start wrapping up. It all comes down to metadata. What we saw in AAM is the obvious mistakes we'd kind of expect. It also suffers a bit because we haven't taken into account the lessons that we've learned in the 10 to 15 years since it was developed. That's a lifetime in anonymity technology. But I do think there's some traffic analysis lessons that we haven't codified as best practice that we should. So what does the future hold for AAM? Well, the security of a well-posted message is good with a lot of caveats. If you use uncrackable passphrases, only use servers that output key stretch packets, post-through re-mailers with no distinguishing characteristics, and you're willing to be in a very small anonymity set, go for it. I don't know how many people are using AAM today, but I don't think it's a lot. What that means is if the government asked for a list of everyone who uses it, they could probably get a really short list of names to dig fairly deeply into each of their lives. And AAM crucially relies on re-mailers and news gateways. And these services are dying. Remember that two people, Zax and Dism, post more than 98% of the traffic to AAM. And it's also text-based, very limited bandwidth. And the NIMSERVs themselves are pretty crappy, architecturally speaking. We give single-hop proxies like VPNs and ultrasurfs a lot of shit because their architecture is not nearly as strong as TORs. But NIMSERVs are in that same category of trust this guy not to roll over on you. I feel compelled to mention that the alternative is to use TOR, which you do trust, to send email via throwaway accounts on a service you do not trust. And while this is a practice that pretty much everyone in this room has probably used or at least thought of, it's also a really shitty architecture. Now the good news is we have something better. We have a very strongly architected NIMSERV. Pingeon Gate was developed by Len Sassman, Bram Coem and Nick Mathison, and uses private information retrieval instead of a shared mailbox. It exposes less metadata, resists flooding or size-based correlation attacks. However, it's not built. It's been started, but it's got a very long way to go. And it also requires a remeller network to operate. And we don't really have a remeller network. What we've got is Mixmaster and Mixminion. Now Mixminion is a bit better than Mixmaster, which doesn't have any link encryption, it doesn't run attacks, uses old crypto with no chance of upgrading. But both of these services suffer from the fact that we don't have a good solution to remail or spam or abuse. We don't have good documentation about them, and they both have horrible network diversity. Under 25 people running Mixmaster, under five, five people running Mixminion. So if we like Pingeon Gate, the path forward also involves fixing Mixminion. And Mixminion needs love. Mixminion is currently unmaintained, but we have a to-do list that includes the items that I've got here. Some of them are extremely complicated, like moving to a new packet format. Others are relatively straightforward, like improving the TLS settings. Others give you the opportunity to practice writing crypto, designing a distributed trust directory system, or writing a complete standalone pinger in any language or style that you want. So if you're interested, I think it's a cool opportunity here. But what I keep coming back to is the fact that we have no anonymity network that is high bandwidth, high latency. We have no anonymity network that would have let someone securely share the collateral murder video without WikiLeaks being their proxy. You can't take a video of corruption or police brutality and post it anonymously. Now, I hear you arguing with me in your heads. Use Tor and upload it to YouTube. No, YouTube will take it down. Use Tor and upload it to Mega, or some site that will fight fraudulent takedown notices. Okay, but now you're relying on the good graces of a third party. A third party that is known to host the video and can be sued. WikiLeaks was the last organization that was willing to take on that legal fight, and now they are no longer in the business of hosting content for ordinary people. And you can say hidden services, and I'll point to size-based traffic analysis and information attacks that come with a low-latency network. Never mind Ralph Philip Weinman's recent paper that pretty much killed hidden services. We can go on and on like this, but I hope you'll at least concede the point that what you're coming up with are workarounds for a problem that we lack a good solution to. So, if I've been able to entertain you, I'm glad. If I've been able to inspire you to work on anonymity systems, I'm overjoyed.