 Thank you all for coming. This is the traffic analysis panel, and I apologize a bit for not having a lot of details in the program, but thanks for coming anyway. The speakers that we have here, I'm very fortunate to have a really good group. That end we have Raven Alder, who's hacker extraordinaire, capable of doing nearly anything. On the other end I have Ricardo Batati, who is professor at Texas A&M University, and has also done a lot of work on this field, so he has some great slides to show. I'm John Callis. I am CTO at PGP Corporation. I have been doing security and photography for entirely too long. We also have Nick Mathison, who is one of the developers of Tor. So between the four of us, I think that we have a really, really good group covering a broad spectrum of experience and ability to talk about what really is the most difficult problem that we face in security today. So what is traffic analysis? Very simply, traffic analysis is signals intelligence that ignores the content. It is looking at the problem of figuring out what your opponent is doing and saying, screw it, I'm not even bothered to try to read it. I'm just going to look at patterns in various dimensions and then see what goes on. You can consider it to be metadata analysis and metadata only analysis. It looks at all sorts of things like who's talking to whom, how long they're talking, where are they, who do they call after that, so on and so forth. And then you start constructing models of what you know about the opponent based upon the metadata only. It also extends to some very interesting things like analyzing social networks. Now, it's really, really important because it's extraordinarily easy to do. Traffic analysis is looking at things that are hard to hide. All the time there is more data available. The more that we use wired devices, the more we use wireless devices, the more that we are doing signaling, the more that there is for somebody who's doing traffic analysis to get a base of data to start constructing things out of. Also, it's extraordinarily hard to protect against. Any of the defenses that you do are frequently extraordinarily expensive. There are ways to break the defenses themselves or they really aren't very pleasant for you to do. I mean, for example, you could get rid of your cell phone, you could get rid of your wireless card, you could stop using the Internet, but what fun would that be? So, let me give you a scenario. Let's just suppose that you have a worldwide data collection system. I mean, it could happen. I mean, somebody might have one. Let's also suppose that there's ubiquitous crypto and that it's pretty good crypto. So, the major problem that you're going to have is that you're going to have data reduction problems because you have this worldwide network and there's all sorts of things going on and a lot of it's being encrypted. So, you need to, in many cases, throw stuff out and figure out what it is that you need to throw out and that is in fact one of your biggest problems. So, how would you solve the problem with your worldwide collection network of coming up with as much information as you could? And the obvious answer is traffic analysis. So, why are we having this panel? I have been personally concerned about traffic analysis for a long, long time that as PGP architect and developer, I have known that anytime you send an encrypted message using PGP or anything else out on the wire, it basically screams, hello, I'm an encrypted message. So, actually sucking up the encrypted messages is the easiest thing there is. I mean, this is an irony, is that cryptography is not a help here. There is increasing evidence that what they, and by they, I mean intelligence organizations, oh, you know, marketing people, the mob, and so on, they are doing traffic analysis for many of the reasons that I described before. And I think that it is in fact the key threat to privacy. So, I also think that we need to shift our mindsets about the way that we look at this. There's a tendency to look at things like the NSA, wiretapping scandal and say, oh, they're listening to our phone calls, and in fact, they're not. I've been paying close attention to it, and that was my inspiration for this, is that the more I looked at this, the more I said, ooh, they're doing traffic analysis. And I mentioned this to Jeff Moss and said that we really ought to start talking about it because that seems to be the shift of where things are going to, and it may very well be that the answer to why over the last six or seven years has there only been an increase in liberalization of restrictions on cryptography is that the intelligence organizations have decided, screw it, we don't care anymore. We're going to shift our tactics away from the old school and into the new school, and that means that we need to shift the way that we think about this. But also, every technology can be used for good or ill. You can take a hammer and you can drive a nail or you can smash in somebody's head. So the idea that technology itself might somehow be particularly evil might apply to certain things like nuclear weapons and chemical weapons, but when you get to things like fire, we'd have to give that up. So it's best to look at these things. So the way I'm going to structure this is that we want to have a conversation about the traffic analysis. Ricardo has some slides that he's going to show after I'm done here in a moment that describe a little bit of what's going on in the world of traffic analysis, and then we're all going to talk about things. We're welcome to have questions, but we really only have 50 minutes here, and this is a topic that we can talk about for days. So we've been talking over the last week or so. If you've got a question, ask it, but if I cut you off, please don't take it personally. Now lastly, here's a really good source. This is by George Dinesis. He's got a paper that he wrote late last year, an introduction to traffic analysis and also his slides for the talk that he gave. Read them on the way home. If you're on the net now, download them now, but read them later. It is a really good introduction of exactly what's going on, some historical context, and so on and so forth. So that's it for me, and let me get the next set of slides here. Hello, my name is Ricardo Vitati. I'm from Texas A&M University. The background of our group is in the original, a long time ago, and real-time systems. So time for us is an important topic, and so we are looking at traffic analysis from a timing analysis point of view. So we kind of look at the underlying second mechanisms that then support traffic analysis on top of it. So I would like to show maybe a few hopefully reasonably scary scenarios and capabilities that we have today with timing analysis and what can go wrong if you try to design countermeasures. Now this is a well-known result that says if you listen to an SSH conversation, you basically can break it with no sweat. You basically hear the typing, and that's a long story. So that's one well-known timing attack. So if you hear a 250 millisecond inter-packet time on an SSH channel, that gives you roughly two bit of information. That's without any prior knowledge about what is typed. So if you knew that what is typed is English, then you have a lot of a priori information, and you break the conversation with no sweat at all. Here this is the result of a term paper that I had a student write where we basically listen to a machine that is sending data, and we are just doing some, I mean a bunch of possible configurations of machines, different configurations of network interface cards and operating systems, and we run a bunch of classifiers on them, and it's absolutely easy to, this is basically an afternoon worth of work. It's absolutely easy to figure out what is the operating system, hardware configuration of that machine, without looking at the packets that are sent. We just count packets over intervals. Another application that we have kind of looked at recently was Bob's detection, right? Everybody would like to know if the participants on the poker table are real or bots. Here is an application where we look at Honeydee, and with very few packets we can figure out whether it's a Honeypot or not, and this is without looking at the packets, without looking at the protocols. We treat the entire machine as a black box. It's just a timing, and the results are trivial. It's because the underlying timer is really poorly implemented, right? And we see this. So what are countermeasures? Link padding is a classical countermeasure. You try to have an underlying traffic pattern, and what you put is some sort of a cover pattern on top of it, and so this is an impossible implementation there often. It shows that the problem is that is link padding is really, really, really hard. And here, actually it's very difficult to see. Maybe we can basically listen on the radio. The cover channel that goes over a padded link for even a passive observer is a very high capacity channel. Other scenarios, and here we are looking at naive implementation of an anonymous networking scheme. Some of those mix... If we look at the naive implementation of a so-called mix network, the problem is those networks interfere with the delivery of TCP traffic. So if you start mucking around with TCP, TCP becomes very highly visible. And so it's very easy to identify flows and where they go to. Now, again, this is just a naive implementation. If you make larger networks, you're mucked with TCP even more and gets worse. Actually, the results are even worse. There's no need to do end-to-end analysis. What we basically can say is that even if we just measure aggregates in a network, we just have a bunch of measurements that do nothing else than count packets. We don't look at the packets. We don't know where they come from. We don't see headers, nothing. We just count packets. By counting packets, we can identify individual flows. All we have is access to aggregates. We can dissect the aggregates into individual flows without even knowing what those flows could be. This is a signal processing technique called blind-source separation. Think of it as, for people familiar with tomography, this is tomography on steroids or tomography with no safety net. We can, even if we have crowds, we have access to only aggregate data and we have no models of the traffic. All we do is we count packets. It's very easy to get end-to-end connectivity. It gets even worse. When we go in the wireless network, we can, even with simple sensors, that can only count packets. They don't know where the packets come from. They have no MAC addresses. They have no payload information, no nothing. We can separate all the flows into individual flows. We can identify, because of reachability, we can identify where they come from with very high accuracy. That's physical geography. We have sensors to think of it as 8.2.11 receivers that we scattered in the field. We just have a bunch of 8.2.11 senders that are not distinguishable in any way. We can still locate them to few meters. It's even worse. We can reconstruct the path. In this particular case, we know that the sender is in the middle of that white blob. We know the possible locations where that sender is talking to. Because the same signal, although it's again hidden in aggregates, can be separated out and we can guess roughly where the receiver is. Now, the receiver happens to be, I'm not sure how well you can see this, but the receiver happens to be where the other blow star is. We have an estimated region where the receiver is. This is just a little overview on some of the techniques that could be used to provide the underlying signal processing for traffic analysis. Go ahead. Pardon? Again? Go to my website and there is a paper that has extremely good reference. Pardon? We want to know URL. Do a Google search. Google beta, there is only one. Yeah. I'll go back and get that. I'm going to start off with discussion here. We have tour developer here. Nick, what can be done? We painted a very bleak picture here, but what we've also been talking about is worst cases. We really have been talking about what happens when you have a lot of large-scale surveillance. But there are very effective things that you can do in a lot of normal cases. So what is possible and useful? Well, let's start. Actually, I think I'd like to start with an example, and is this mic on? Good. Can people hear me? Okay, good. So I'm going to claim that we should start by resisting comparatively weaker attackers Most of this, a lot of stuff that is going on in the literature right now is ways to try to give in certain amounts of data, do interesting things to larger systems or systems that try to take specific steps. But something that hasn't really reached public understanding and I don't even think in the community here has gotten enough attention is what you can do just by looking because most people are not taking active steps to resist traffic analysis. Running tour helps. But a lot of people aren't. And because of that, you don't need to do any kind of analysis on data streams. You don't need to do any sort of timing analysis. You can simply look at the traffic on the network and yes, it's encrypted, but you know where it's going. The size of data being transmitted is not obscured. You can combine, for stuff that isn't encrypted, you can combine individual non-encrypted pieces of data to come up with an interesting profile of people. For instance, if I know, say, I can tell the USSHing into this machine here, which is also running the website of so-and-so and no one else's website, so probably that's you. So now I know who you are. I see you visiting another website, downloading an SSL file of a certain size. Will that website only, if you download a file via SSL of a certain size, will that website only has a certain number of pages and only a few pages actually follow that particular size distribution, so now I know what you did. That page is a form. Well, all right, there's only a few options. You sent something in that makes me think you sent in a dozen characters or so. That probably means you chose options such and such. And later on you visit some live journal friends page. Maybe that's you. Maybe that's an account you didn't want linked. You visit a bunch of other sites and basically I can get a pretty good profile of you from just observation right now if you aren't taking steps against it. So yes, back to the original question, what can be done? It kind of depends against whom? Against a large intelligence agency that is eavesdropping a whole lot and maybe might know somebody who knows statistics and can read papers. Not a lot right now. Sorry, doing a lot of suspicion, carrying on conversations in person in quiet tones. But against the folks in this room, you can do pretty well, I think. Anyone want to add anything? Do I even answer the question? One of the other things that I've seen recently and something that you guys can do, many sites, particularly social networking sites, it's sort of a peer model. You can link to people, they can link to you and there's differentiation between who does that and that can be a really valuable source of information. For those of you on live journal, where you've seen people stomp off and delete their live journal and have a tantrum, and then suddenly, like two days later, someone pops up with all the same friends and it's not really hard to figure out. There are options on some of these sites for obscuring who is allowed to see who your friends are, et cetera, but a lot of the time there's backend databases that are publicly minable that do this sort of thing as well. We were discussing in the prep dinner last night that we haven't seen a lot of cross-site correlation, but if I'm raving at live journal, I'm not, but maybe I'm also raving at Orchid or raving at LinkedIn or what have you, and you can look at those and see if there are similar data patterns across those as well. One of the things that I found being privacy aware, I've tried for years to make sure that when you Google my name, what pops up is mostly respectable. You're posting to Linux chicks, and you get my own webpage and a couple of articles that were written, and I was very pleased by that. So last year, there was a piece in Slashdot that was written about me, and that caused a million of my well-meaning, not-so-security-aware friends to write posts that were public and their blogs or other forums going, oh, hey, my friend Ravens in Slashdot linked to article that contains my full name, and oh my God, I remember when we went to Lesbian Ninja Pirate Weekend and did blah, blah, blah, I'm like, ugh. This is not what I need future employers to be Googling on. So the lesson there is that even if you are paranoid and careful, you may be linked to people who are not paranoid and careful, and just by looking at those patterns, you're still in trouble. So I went around and whacked all my friends with the security bat for a couple of weeks and that was I didn't want them doing this and had to wait for it to expire from the Google cache, but I'm sure archive.org's got it. So don't be me. So I'll add in a little too, which is that there still are nonetheless tools that you can use, and Tor is a valuable one. You just have to understand what it's good for. I mean, for example, suppose that you don't want anybody to know that you got a copy of PGP. Well, if they're really after you, that might be kind of difficult, particularly if you start sending a lot of emails using it because they will say begin PGP message at the beginning of them all, but downloading the software from Tor doesn't put you in our web logs. Similarly, anything else where you go, I mean, Tor has this ability to remove or blunt some of the signals that people are going to get from log analysis, and particularly because logs are very likely to be retained for God only knows how long. Relatively few people have policies about how quickly they burn them. So if you're going to be there and you don't want anybody to know what IP address that you downloaded something from, it's very simple. I mean, use Tor. Tor will protect you from that level of analysis where there's lots of stuff that's been lying around in tracks that can be done from IP address and geolocation after the fact. It will also be helpful for you guys to just think about the metadata that you're sort of scattering in your wake as you go around living your daily lives. I think probably half of you have cell phones that are probably turned on, and that will give, hey, this phone was registered to the network in roughly this area with respect to a tower at this time, and if you don't want people to know you're hanging out at DEF CON, that might not be the most brilliant thing to do. Every time you use your credit card or something like that, you leave a little bit of data footprinting, and even if you don't look at, okay, how much you spent or where you spent it, you can still see, okay, the credit card was in use at this time, and that provides tracking information about you that you want to be aware of. So just start thinking about these things. I'm sure most of you have already had some thoughts along those lines. So the next thing is I want to get back to you just for a bit, and that's to talk about what we as a group can start doing with this, because traffic analysis properly applies as an extraordinarily powerful tool, and we could start using it for doing things like fighting bot networks, fighting fraud, get rid of phishing sites, do all these sorts of things that are good for the world, and we can start coming up with ways that we would monitor bad guys' traffic just like, you know, there are going to be people who might be bad guys monitoring ours, and we haven't thought about this at all. So I'll toss that open as what can we start doing to use this technique ourselves? Anybody? Yeah? I've seen some work in intrusion detection systems, things that capture all the packets going across your network, trying to do some sort of protocol analysis, not on, like, the headers and the contents, but simply on, okay, who speaks what protocols to whom. So if you see a box suddenly spawning 10,000 new connections and it's never talked to any of these machines before, that might be something that you want to alert on, and there are several IDS systems in the field that do that sort of thing that, you know, okay, maybe this box has been infected by a worm and is now scanning everyone, and it can be helpful to divine particular new network events that are anomalous against the background. There's a number of bad implementations of this sort of thing, but they are improving. They're certainly better than they were three or four years ago, and for a good network administrator, it can be very useful to see, you know, the pattern of connections as your boxes get infected. Also, I'm not entirely certainly qualified as traffic analysis, except I think it does under your definition. Authorship analysis is fun and easy, and everyone should try it. I think that a generalization of it to program authorship is probably doable and would make, it would be interesting for linking authors of malware. Throwing that out. Informally, things like that do happen. You notice that this one dude in Russia always leads one space after a period in his comments or what have you, and you can get an idea of, okay, the style of this is done similarly. And the formal techniques that work are even harder to obscure because you can't, you can try to fool random people by, well, I'll type and I'll always misspell some word or I'll punctuate like this or wrap like this, but the techniques that you want to use for automated based stuff tend to be relative distribution of function words like and if of but for the which that and I know which words I spell right and which words I need to look up, but I have no idea about how my relative frequency of using the and a and for compares to Ravens. Can we take questions? Yeah, I'll take a question there. It's for like tracking botnets and that sort of thing. There's a paper I read recently out of, I think Portland, that doing just that looking at hosts on their residential, the campus residential network and and looking at, all right, you know, how many, how many packets do they send to an IRC channel versus, you know, how many they're getting back and how many different clients are joining, joining channels and basically detecting botnet command and control planes via that. So it's definitely happening. I don't know how much is actually in production, but certainly in academia, they're looking at it. Yeah, that's a very good point. I'll bet that you can also, based upon the other research, you can do things like detect somebody who's running IRC botnet on a non-standard port as well because you could separate the flow out and say this resembles an IRC flow as opposed to something else and I don't care what port it was on. Actually, I was going to amend that. I know, again, this is academic. They have a working version. There's one system, there's two systems that do roughly similar work. One's called Early Bird and it's not traffic analysis per se, but it looks at, it just looks at the simple spread of where things are sent from to and header information and it's used for automatic worm detection of novel worms. So it looks for small kernels of invariant packets and a sudden widespread distribution to try and indicate where, to try to indicate where things are going, so without trying to use signature-based detection. So there's similar systems for that. It may not be true traffic analysis because it's not just raw traffic through a point, but it's the idea of doing basically target address spread analysis in this case. So if suddenly one sender is sending a packet with a small invariant with a small but invariant chunk of information that's not header information that's not your expected invariant to a large number of targets so there is some similar work. I believe there's one called Early Bird, I can't recall the name of the other one, but so there is some interesting work in worm detection as well that way. I think that counts as traffic analysis. One of the very early things, like a lot of the technologies that we use, they come originally out of military uses and in World War II one of the things that they would do is that they could do signal analysis of relatively quick rudimentary form and tell how many planes were flying to a certain place so you could get roughly a signature of this cloud of airplanes based upon the radios that they would use to talk back and forth that even if they were masking it you could frequently tell, aha this is the same radio as that other radio. Yes? So one obstacle to latency based defenses to traffic analysis is that users really hate latency. And I'm particularly concerned about this for voice applications where it almost seems like a lost cause because people are so sensitive to latency in voice applications. But I'm wondering as a general matter what do we know today about the tradeoffs between particular latency padding strategies and the amount of traffic analysis resistance that you get? Is there anything that's clear? People have the sense that there is a tradeoff. So I've looked at this a whole lot. I've worked on two major anonymity systems, Tor and Mixminion and I've tried to stay pretty current with literature. Basically if you have a system so if you hold the volume of traffic constant that as we assume that there is end users sending M bytes total through the system with some relatively constant pattern or consistent pattern between the two cases a system that has higher variability it's possible to achieve stronger resistance to traffic analysis at the expense of higher variance in latency. So a system like Mixminion or Mixmaster or something like that can basically by obscuring the connection between message arrival time and message sending time keep you from doing time in correlation. Whereas a system that tries to be low latency not indefinitely but for a matter of thousands or perhaps hundreds of thousands of rounds at the expense of delaying traffic by the order of 30 minutes to several hours. Whereas if you want to be useful for web browsing say much less voice you need to be fast enough so that most traffic analysis techniques we know will succeed in probably under a minute. Usually probably several seconds of traffic is enough. This of course I'm not saying everyone go out and implement high latency systems though because when I said in the beginning of this holding the amount of traffic constant and the number of users constant well that's a dirty lie. Most of you would rather browse the web and get an answer back now and click on a page and click on a whole bunch of other pages because the first one you clicked on will arrive at a random time between half an hour from now and tomorrow. You may as well just transport the pages by FedEx. This is why right now Tor has ordered 200,000 users and Mixminion has as near as I can tell ordered several hundred users. It's not just a matter of popularity the lack of popularity hurts the anonymity of an anonymity network because if you get a message from Mixminion you can't tell which of several hundred Mixminion users sent it but you need to be very clever indeed and have a whole lot of traffic to tell which of your several hundred suspects it is but if the message is in Chinese probably most of them don't speak Chinese that cuts it down if the message is about a particular city probably don't live in that city that narrows it down further so really the anonymity set which is a technical term for the number of people the set of people who might have sent a particular message or done something gets very small indeed when you start with a small number so basically what we know is that you can resist traffic analysis for a while with lots of latency we don't know a good practical way that you can deploy on the real internet but we have been doing analysis for a long time with little latency and nobody seems to want to use high latency systems in real life although people keep telling me I might be wrong about that so I might be wrong about that order 200,000 although it's kind of hard to tell they're anonymous could I bring a comment on that too the operational definition that we use for group users for low latency anonymity networks as long as the latency that is added to packets does not mock with TCP as long as it doesn't break TCP for us it's a low latency communication and using that operational definition the larger the delay that you add the worse is the anonymity that you get at the end to be fair no currently deployed anonymity system actually transports raw IP frames so nobody is actually doing TCP over Tor or TCP over really any of these I think I couldn't tell you about zero knowledge I suspect that they did something clever to try to work around this but their product of the freedom network is not there with us so it's not necessarily relevant anything else well I mean actually sorry I just got interrupted and I wanted to say that the protocols that are used of course in anonymity networks are not totally naive right you don't want to use a tight loop high timing footprint feedback control mechanism like TCP when all you want to do is hide your footprint the other point that may be related to the question that you're asking and that's a pet peeve that I have in general with this whole area of traffic analysis is that we are really not we really should focus on finding metrics that allow us to think in a structured fashion about how much protection do we get in anonymity systems there are relatively well defined systems right you have an anonymity set you would like to hide the anonymity of users so how much information do you leak and all these kind of things right so there it's but still it is I always compare it to encryption encryption is so pretty the story is so simple you can tell right you can tell a customer you can say there is some complexity analysis that tells me that the effort that an attacker has to make to do a particular type of attack is bounded by some whatever and that does not exist yet in the traffic analysis we don't even know how to think about it right at this point so if the moment we could come up with some metrics that will allow us to tell a simple story then we would already be in a much better shape alright I'll take the next question a minute ago you were talking about RF finger printing and certainly after the Johnny Cash talk RF finger printing seems pretty powerful form of traffic analysis and one that just keeps getting easier and easier I know that several people have been playing with the new radio and trying to look at cell phone RF finger printing based on cheap hardware and we also know that OS stack client finger printing is very easy with passive observation and in theory it should be pretty easy to do geolocation via finger printing just this much delay is there anything public that you know of that is working on actual client IP and TCB based finger printing based on timing and things like that because it seems to me like this is tractable do you know I'm not personally not aware we do it kind of as a hobby do a lot of that but I don't know if any published how plausible do you think that is actually being able to look at the stream and start working out this person is probably in the San Jose area and working down from that this is the same one I saw last week I would be surprised if it didn't work yeah I would expect it to work and I would I would expect that there are some effective countermeasures that would work very well and some that would work well enough but would still let you know some things I mean I'll bet that if say I were using a VPN back to San Jose that you could geolocate the exit IP and conclude that I am not in San Jose and come up with a set of likely locations but that could be it could include both Las Vegas and London you know you know based upon knowing there's a fast path here and a slow path there right now my default assumption about any form of traffic analysis is that it will work right now in in the research field if you can publish a paper that says there's two kinds of interesting paper here's a form of traffic analysis no one thought of before the really interesting kind is here's a countermeasure that works because countermeasures are much harder and mostly they don't work do you want to do the next question yeah I think I'm going to take one more then I'm going to move on a bit okay there was an interesting comment earlier about early bird and I'm sure all of you were analyzing the data differently are there any particular algorithms you're using like entropy algorithms or LCP algorithms natural language processing algorithms that you find are showing a little bit more promise than just pretty much mean mode kind of analysis they all work but I mean I'm sure you're all looking at different algorithms is anybody playing with anything interesting well the general structure that we have is that we use entropy based algorithms because they are so good at filtering out noise and outliers so we have the entropy based algorithms until now we have always been dealing with the underlying system that we are attacking as a black box we have not the slightest idea what those guys do we have not the slightest idea what the countermeasure is we just count packets so the moment when you start having an idea about what the countermeasure could be then you bring in hidden mark of chain models when you don't know too much you just know a little bit then you bring in hidden mark are you using tuples with those are you using like an n-tuple algorithm with your mark-obs or is it pure mark-off the whole variety and with these we have just been in last year we have just been playing with these blind source schemes where we just use minimization of mutual information all kind of pretty standard techniques that are extremely powerful I mean, scary so we have a really interesting problem that actually nobody knows a whole lot about we have an interesting problem where there are some extraordinarily effective things that people can do and this is a group of hackers so one thing that I'd like to suggest is that people who are out there you and people you talk to start thinking about how you might construct systems that use traffic analysis to get some interesting real-world results so that we start learning more about this in a lot of research there are a bunch of difficulties because you would need to have informed consent and so on and so forth and it would be difficult to construct an experiment because you would have to essentially get everybody's permission say on a campus network that you could analyze their traffic and that's not going to work on the other hand there are a lot of places like deathcon where it's well known that all sorts of stuff's going on and you're crazed not to expect it but there are a lot of those of us who go to events like this where there is an implicit informed consent that people are doing weird stuff what could you start doing I'm going to toss that out for some suggestions among us well I'm going to make a hopefully mildly controversial proposal and claim the wall of sheep is totally 1995 anybody who isn't using encryption or is using someone else's user account and password just as a ha ha gotcha to put them up on the board encryption is like pants if you're not wearing them you probably know it this is 2006 on the other hand I bet very few people are taking active steps for what traffic they send that is unencrypted so if you really think about linkability issues and profiling issues in all of their traffic I bet comparatively few of us are thinking about well you know if someone were looking at the timing of the SSL frames arriving to me I bet they could tell pretty well what I was looking at on Amazon, what I was doing on ebay and you know something like that think of how neat it would be display or some sort of readout of what everyone was doing that actually surprised people for once about how well it worked one of the things that I've seen in people who are considering this sort of thing is they consider the human behavior modifications like people might go alright I'm at DEF CON I'm not going to go to my corporate portal website because that'd be bad and they may make those decisions but they aren't considering the underlying automated based things like we've been talking about earlier and you know these things are getting mapped like I'm sure some of you have also had the experience about three or four months ago where you suddenly started getting a whole bunch of spam where the subject lines were from people you know like people you know who have pretty uncommon names I know I got a whole bunch of it and there's no way that it's just you know randomly generated so people are out there mapping these networks and relationships and using it for their financial we need to start looking to another quick thing everyone knows at this point don't go making ciphers until you've spent a lot of time learning crypt analysis you might not want to go out and start designing countermeasures to traffic analysis until you've done some traffic analysis for a while because if you don't it's not all that hard to design something that surpasses your own understanding of what you know how to break I bet if anybody comes up to me with an unbreakable scheme after for traffic analysis resistance in the next month or so it's experience suggests it probably won't work so then again you know maybe we'll solve this problem really soon and I'm wrong about that oh but I have this badass perfect encryption algorithm that I designed myself and I have this bridge in Brooklyn does it use chaos and fractals oh my god okay we've got five more minutes so let's let's let's make good use of it it'll take the next question I have two kinds of pessimism that have just come to mind that I wanted to inject one is the idea of the cost of countermeasures not just in terms of added latency but in terms of if you want to send cover traffic then you need to send more data on the network briefly no one has a affordable cover scheme that the average web user is willing is able to implement that a volunteer network can implement and that actually helps resist any kind of worthwhile traffic analysis you are right there this is an active research topic and so certainly for people who are running tour even if there were a technically feasible way of doing it people are already feeling lucky just to be able to dedicate as much network capacity as they do even without injecting more network capacity requirements exactly there's no as the tour network is currently structured it seems very unlikely that people would be willing or able to actually provide bandwidth for cover traffic in any amount that would actually help the other thing is even if your packets are encrypted the nice graphs that we saw before about flow separation and all those nice results are without looking at unencrypted protocol headers but the unencrypted protocol headers are actually present so people can actually intercept them and actually use that data as another input so even on top of what we've seen a real attacker who wants more data inputs would also be able to use source and destination IP addresses maybe TCP ports things like that there's even more data available when you want to go beyond just counting packets so it's scary to see what you can do without needing that data Ricardo was telling me yesterday too about a system that they had built which was an IPSEC system that did constant load in it and after they built it they started using these timing techniques and could then start extracting information from it and all of that constant traffic running everything full bore was for naught so I'd like to make a suggestion for productive work in the future one of the themes that's gone on here over the last three days has been the threat posed to you and me and everyone else from organized crime identity theft spam fishing and so on across the net Vixie and I don't know if he's in the room here makes a fairly interesting argument that technological counter measures to botnet command and control centers and so on in terms of shutting them down and making them more difficult to deploy will only do what the same sort of things to spammers did which is create better botnets and better fishers and better identity thefts and that potentially the more productive approach and the more productive long term approach is to sit back listen to them find out where they are and send the fuckers to jail if you want to participate in something like that it strikes me that an extraordinarily productive white hat sort of measure in using traffic analysis and signals intelligence is to start listening to these fuckers and find out where they are next hello a couple of avenues that came up in my mind while we were discussing all this today that came out of just some high level things that you touched upon you hit upon briefly baselining and the idea of baselining and it sounds like what we need to do is come up with some new techniques of baselining what's going on I think it is possible to baseline encrypted protocols and then do comparative analysis to figure out more interesting things about what's changing in the baseline from day to day, minute to minute, second to second session to session, what have you we probably need to be doing some kind of paradigm shift for baselining I don't know what that is the second thing too is another avenue that I thought about Raven mentioned how there's a little bit of apathy when it comes to people's ideas about traffic analysis or their lack of awareness of traffic analysis somehow traffic analysis needs to be made sexy damn cool we're trying go do some and frighten people with it and also there's a huge opportunity here because we're still at the stage where we don't know what we don't know that we are reaching into the dark and coming up with continual surprises of oh my god this is incredibly useful more than I ever thought so what we need to do is to start figuring out what's going on and that means that if you want to do something the chance that you'll find something useful is really high public perception point of view, we're in the 1980s and your manager doesn't yet know that you can look at ethernet traffic that isn't your ethernet traffic you can read my email you mean it's not secure because it's on computers it's on your people? something that else was just a crazy idea that bobbed up you were talking about entropy based methods of countering maybe there's something we can do with the apathy and kind of come up with apathy based methods of countering just a just a wacky idea well I think that it's not all apathy I think there's some lack of awareness and some sort of feeling of helplessness and so by advancing research in the field we'll help deal with both of those real quick because I know we don't have much time left yeah this is probably our last one the perception of information that we have for our digital fingerprint like Greg Conti's talk on the information being collected by google yahoo so on and so forth will it help if we reduce our digital fingerprint to make it more difficult to do traffic analysis to associate people's activities down to a single person reduce or lie one of the things that came up over dinner last night was the existence of those frequent shopper cards that you get at grocery stores and one of the best things that I've seen for that I know there are people who swap their cards around and then you just stand up with someone else's fingerprint associated with your name effectively but there's this fantastic guy out in California who has this clone army of people and if you send him an email he will send you a print out of his barcode on a sticker and you just slap it on your card so you know there's ten zillion of him feeding that fingerprint and that's pretty awesome I'm sure that if you really dug into it you could start separating out data flows within the Rob Cocker frequent Safeway shopper but I don't think Safeway is quite that sophisticated yeah rising above the baseline of you're doing nothing helps though keep in mind if anyone really cares simply lying without pattern is not going to help necessarily a lot for relationship stuff if I can link like Raven to some aliases of Ravens because of one thing she did once well it's linked no matter what she does in the future yeah alright thank you very much thank you for coming