 Hey, everybody. Thanks for coming out. Okay. Well, I got a lot of slides, so I'm just trying to just burn through them. We're just going to power through. So try to pay attention to like the first five minutes of slides so that, you know, you'll be there with me when we're hitting through this stuff. Okay. So I'm Brandon Wiley. I've done some stuff. I wrote a thing called FreeNet in like 2000, DEF CON 2000. Oh, thank you. Raise your hand if you have ever run a FreeNet node. Yeah, my people. Thank you. National heroes, every one of you. So yeah, my first talk ever, I was 18 years old. It was at DEF CON in 2000. I presented about FreeNet. The entire description of my talk was, this is about FreeNet. I drew the slides with crayons. And that was it. It was like a packed room of people that came to go see a talk based on that information. And then a black cat, like 2003, I presented Curious Yellow, which was my super warm design that was designed to destroy the Internet. Purely theoretical, as you can tell, because the Internet is still here. You could read more about that in a, Charles Strauss has a book called Glass House in which Curious Yellow is the thing that destroys humanity. So that was a great moment for me when he put that in there. And then I used to work at BitTorrent. So like I was there when BitTorrent bought UTorrent, so I apologize for that. But yeah, I did a lot of stuff at BitTorrent. And then since then, when I was at BitTorrent, is when I first saw a deep packet inspection being used to block BitTorrent. In fact, when BitTorrent was, when we noticed that Comcast was blocking BitTorrent before any of the press heard about it, I was the guy that they sent to Comcast to try to reason with them. And, well, you know, you know how that worked out. So I started doing, I've been working on, you know, kind of anonymity stuff and mainly kind of in the censorship resistance side of things for a long time. So I know the folks from TOR from back in the day, and I've been helping them out more recently with their new like obfuscated protocols because TOR is being blocked in a lot of places. So they need a new protocol that's not blocked. And then finally I have, I wrote part of a book called Peer to Peer for O'Reilly like a long time ago. So anyway, so those are my credentials, who cares, whatever. I'm just putting this up so that I can establish some credibility with you guys so that when I start showing you pictures of cats, you don't just be like, what is this? I'm out of here. Because there's a lot of pictures of cats in my talk. So, thank you. Cool. Cool. All right. So let's get into it. So my slides are taken from two different sources. One is my children's book on internet freedom called Free as in Kitties. And the other one, the other slides are from my PhD dissertation. So I kind of meshed them together. We'll see how it goes. Right? So we're going to start out with the internet. What is it? Let's define some terms. Hopefully you guys have checked it out. If not, it's pretty cool. Should get on there. There's a lot of stuff on it. A lot of cats and stuff. And then how do we internet with this internet once we know what an internet is? And then we just get straight up into just binary classifiers using Bayesian statistical inference. That's from the children's book. No. And then fooling binary classifiers with polymorphic protocols. And then, you know, dust, which is what it talks about, which is the polymorphic protocol engine. And then I got some infographics. And then if we have time, I forgot to start my timer. There we go. Then we'll talk a little bit. I want to talk a little about realistic threat models versus the threat models that everybody else uses. So, yeah. So first of all, the internet. The internet, as we all know, is the greatest technological marvel of our time and the pinnacle of civilization. It's an unprecedented way to deliver pictures of cats. So I know what you're thinking. You can't take a real cat and transmit it over the internet. Believe me, I've tried. It doesn't work. That's an analog cat. Okay. So first step is we have to turn it into pixels with what they call pixelation. So we get it pixels and then that's a digital form that we can transmit over the internet. So if we take this exact cat, we make it into pixels. We have this. It's a pixel cat. Fun fact, if you go on Google image search and you're looking, or just on Google, and you're looking for things like 8-bit cat, pixel cat, low-res cat, you'll find a lot of okay, cupid profiles of girls that live in Oakland. It's a true story. Okay, great. So we got this cat. Now we need to turn it into numbers because as we know, like computers that use numbers and stuff, so that's pretty easy. We have all these various color spaces and things. So we get like a number mapping for each color and then we run it through there and then we get, you know, a map of numbers. Okay. So now we're good. Now we have something computers can understand and we can transmit it. So first we got to do is in the internet, for some reason when they designed the internet, they didn't think that it would be handling like, you know, big chunks of data like cat pictures. So it can only handle very tiny chunks of data. So we split all of the data into all these kind of just kind of randomly sized different things that we call them packets. And then we transmit them over an unreliable, a possibly unreliable medium. Great. And then they all arrive maybe, maybe they arrive, maybe they don't arrive at some point. And then we try to kind of copy, like cut and paste, stitch them back together to get the packet. And then on the other end of the pipe, after all of this magic has happened, we get a pixel perfect exact replica sent through the internet of the cat that we started with. There we go. Yay internet. Internet's great. Okay. So what's the problem? I mean, the internet's great. We can look at cat pictures. It brings us all a lot of love and joy. Like, who would ever want to try to stop this? Well, robots. Since the beginning of time, there's been a war between cats and robots. No one knows why. All we know is that robots have been programmed to hate cats. Okay. So here's how binary classifiers work. Okay. Robot looks at something. It looks at the packets and it says, is that a cat? Yes or no? Those are all the options that we have. That's why it's called a binary classifier. That's the decision it's trying to make. Cat, not a cat. Okay. Now, because they hate cats, if it is a cat, they replace it with a sad panda. Okay. All cats, all cats are replaced by sad pandas. Now, if it's not a cat, don't care. Don't care. Just pass it through just exactly as it was. Bananas, whatever. It doesn't even know what bananas are. They just know about cats and things that aren't cats because they're binary classifiers. So don't care. Pass it on. Okay. So the question is, how do we fool a robot so that we can transmit pictures of cats over the internet without having to replace with sad pandas? That's the question. How do we fool robots? Well, I think if you've been paying attention, remember I said, pay attention to the first five minutes. You already know the answer, right? Right? You got to make cats look like bananas and then robots don't care. All right? So here's the secret code to my talk. Don't take a picture of this slide. This slide is not on the internet version of the talk. This talk is just about cats and bananas. So kittens are free speech. Sad pandas are censorship of free speech. Robots are filtering hardware that's made in America and then sold to companies all over the world to make it so that people can't access the internet and find out things about like news about what's going on in their own country during elections and other critical times like that. Bananas are just messages that filtering hardware doesn't care about. And then banana cats are free speech which is encoded so that it will get past the filtering hardware. Okay? So, yeah. So we're talking about some serious kind of deep stuff here, right? Like this is like really important sort of stuff because the internet needs to be free. But, you know, I just kind of wanted to segue in this. So now I hope that we're all at the same level. Like we all are on the same page and understand the code, right? So now that you know the code, I can tell you about my project. Dust makes cats into bananas in order to fool robots so we don't have any more sad pandas. Okay? All right? So, yeah. So that's the intro. Now let's get into a little kind of some details here. So how do robots see cats? So robots can't see cats the way that you and I see cats where you look. You're like, hey, it's a cat, right? They only see the packets. They see the grid of, excuse me, they see the grid of numbers and then they have to use some kind of like statistical or like rule-based because they're robots, right? They only know logic. So here's one mechanism, right? Which is you just look at the lengths of the packets, right? It's all grouped into these kind of randomly sized packets. You just kind of count like the first one is like 38 numbers in it and you say, you know, if things are in this kind of configuration, then it must be a cat. Now this probably sounds really dumb, you know? You think that's not going to work. That has nothing to do with whether or not it's a cat. So we're going to do a little audience participation test to see if you guys can classify traffic based on packet lengths. Okay? Are you ready? Here we go. This is a graph of HTTP packet lengths. Now that thing on the far right side, that is not the border. That's actually a giant spike in the graph. There's a giant spike over there. If you know about TCP, that's because of the Nagle algorithm which takes little packets and then just helpfully for you, it bundles them into big packets. So since that's not turned off in HTTP, you have kind of this spike in the largest possible size packets. Okay? Now this is HTTPS. HTTPS disables the Nagle algorithm in TCP by setting the no delay option and therefore it doesn't have that kind of, it has this like totally different statistical, like it still has, you know, a lot of like fairly big packets. It doesn't have that spike on the end and it has kind of this other spike kind of around like 400 or so. I don't really know why. I just look at the graphs. Okay, so I have just showed you two different graphs. Now I'm going to ask you, I'm going to show you a chart. I'm going to ask you if you can guess which one it is. Okay. So raise your hand if you think this is a chart of HTTP. Okay. Raise your hand if you think this is a chart of HTTPS. Okay. Congratulations, you are all robots. It was dust, my project pretending to be HTTPS. So, yeah. So it did a pretty good job, right? I kind of tricked you though because I didn't have that option of like, is this something pretending to be HTTPS? You might have picked that because that's kind of obvious choice since that's what we're talking about. So yeah, so packet links work as a way to determine if something is one protocol or another protocol. And the reason that we care about this is because these days the way they block the internet is they don't say, hey, you're looking at this thing that we don't want you to look at. So we're going to block it. They say, hey, you're using BitTorrent, blocked. Hey, you're using Tor, blocked. You're using SSL, blocked. You're using a VPN, blocked. They just block it by the protocol regardless of what you're doing. And that's, that's crazy because you could be doing all kinds of things, but you know, if they can't look at what you're doing to determine whether or not they like it, they're just going to go ahead and block it by default. And so they do it based on protocols. So like they're, for instance, they're situations in which SSL has been totally blocked and you can only use unencrypted HTTP. Well, that's okay if you can make your traffic look like unencrypted HTTP even if it's not, right? So yeah, so dust removes packet length information, but it doesn't just randomize it. It randomizes it according to a target distribution of whatever you want. So you pick a protocol and dust will make your packet links look like that protocol. Any protocol doesn't matter. Just give me some sample traffic. I'll sample it and I'll make a profile and I'll make it look like that. So here's one of the like kind of tools that I've made for looking at deep packet inspection hardware and trying to figure out how it's doing classification so that we can, you know, circumvent that classification. I made this tool called Shaper. You give it a model of a protocol, a statistical model. So for instance, like a model of like what packet links it then does the trick before and makes traffic that looks like that, just infinite traffic that looks like whatever you want it to look like. And then we pass it through and we say, hey, is this such and such or not? And then we get the answers back and then we can tell how well the different hardware is at classifying protocols. And then once we can do that, we can get better at making encodings that hide stuff from the classifiers. And so that's one of my open source tools. You can use it. If you have some hardware you can like throw traffic at it and test it and see how it's doing classification. Okay, so second type is just looks and says hey, there's some statistical properties of this traffic. Like for instance, I see a whole bunch of sixes. I'm going to count the number of sixes. If there's like a bunch of sixes, then that means that it must be, you know, whatever, must be some particular type of traffic. So here's some examples of that. So this is an English dictionary and I looked at the probability of different bytes to occur in that dictionary, right? So the one on the far left is just New Line because it was just a list of words. So don't pay attention to that. That's just, I didn't clean the data because real data is dirty. So I'm showing you the dirty data. And so there's, yeah, so this is the main thing. This is lower case letters of the alphabet, right? So you can see there's definitely a spike. To the left is a little spike. That's uppercase letters. There's a lot of uppercase letters in the dictionary more than you would think, but a lot less than lower case letters. So yeah, so that's clearly there's like statistical sort of stuff. If you're looking like a UK English dictionary, it's a slightly different sort of thing. This is HTTP. Oh my gosh. It's the same spike. Why is that? It's because HTTP traffic actually has a lot of, you know, like ASCII letters in it as well. Like HTML elements are often lower case letters. A little bit of a bigger spike in the uppercase letters. But yeah, so you can see this bleeds through. Like we know that this was English HTTP traffic, or at least like HTML HTTP traffic, right? We know this was not images because we can just look at this distribution, right? So I feel like a lot of people think that, you know, if you kind of wrap your traffic in something, it hides it. But a lot of stuff actually bleeds through. Here's HTTBS. Oh my gosh, it has the same spike. Why does HTTBS, which is encrypted, have the same spike in English letters? It's because SSL is encrypted, but the header is not encrypted, right? And the header has a bunch of information in there that uses normal English letters, like the name of the website and stuff like that, the SSL common name, as they call it. And that's how they get you with the SSL. That's how they get you with the encrypted traffic, is they look at the unencrypted headers, and then it's actually super easy to tell what protocol you're using, even if you're using an encrypted protocol, if there's an unencrypted header. So I think people have this idea, let's just encrypt everything with SSL. Well that doesn't work because you can tell it's SSL and people don't know it's SSL. So yeah, so Dust fixes that too, right? Dust removes the statistical content information. I use this thing called reverse Huffman encoding, where I encrypt everything to make it random, and then I reverse Huffman encode it to make it not random, to make it just whatever. Like if you say the only characters, the only bytes you can use are f and a, I will give you a stream of just f's and a's that encodes your traffic. Whatever you want, whatever probably distribution you want, I'll make it look like that. And then final, and this is, I know you guys are going to be like, that's stupid. I can't, no one does that. But yeah, this is the most popular way of classifying traffic. You look for a sequence of bytes at a particular offset in the file and then that's it. You see this like, for instance, HTTP traffic, you know it starts with HTTP get, HTTP post. They just look at the first byte. If it's HTTP, they classify it as HTTP traffic. That's it. And that is like 90% of all DPI classification that's like actually deployed and used for censorship is just doing that. So yeah, so we remove that, right? Because, you know, that's not going to work. So along those lines, I have this other tool that I made that's part of the kind of suite of tools, which is to figure out what these byte sequences are because these signatures, they call them signatures, are not public. Like they don't want to tell you what bytes they're looking for because it would make it easy to obficate your traffic, right? So if you have some DPI hardware, I have this tool that will take some sample traffic and then replay it with all these different variations where it blanks out certain bytes and then you can look at the results and you can find the exact string that you're looking at. Okay, so to break it down for you, what DUS does is if you define a set of properties that Deep Packet Inspection hardware is looking at to filter and you define, you know, like which things go in which category based on those rules, then for whatever property that is, DUS will randomize that property to remove all information and it randomizes it according to a probability distribution to force it into the category. So you tell me what categories your hardware has and I can make arbitrary traffic get put in any of those categories. The reason you want to do this is because you want to get into the category that's not being blocked, whatever that is, right? Like there was a recent instance of an adversary was blocking everything except for HTTP and HTTP connections could only be 60 seconds long and then they were automatically closed and HTTP connections, let's do it and then encodes all the traffic that you have over, you know, that protocol. So yeah, so basically if you let any messages through then you have to let all messages through because we'll just encode into the set of messages that are allowed. And then the ultimate point of all of this is I have this message server that you give it arbitrary messages, it encodes them to look like bananas, they're passed through and then people are reunited with the cats that they love and that's really what it's all about is just letting people get to the content they want to get to, post what they want to post, read what they want to read and just have free speech on the internet. Cool. So that's the end of my linear part of my talk and now I have several bonus slides depending on how much time we have and I think I've learned through those pretty quick so let's go through them and then when we do Q&A maybe some of the questions will also be related to these slides. So sometimes people ask me about various other projects and how dust is different from these other projects and I don't really think of them as competitors like we people are going to choose one kind of encoding or another for their traffic to get it past this filtering hardware but just use whatever works I mean all you want to do is get past the filtering hardware so if something works do it and if it stops working then switch to something else. So I worked with Tor on obfs proxy which is their obfuscating protocol and so that's an example of protocol where it just obfuscates like it just makes everything look totally random and that's good that's pretty good that will give you a lot of things but some of the hardware stuff is random looking at which point you can make a custom rule that says hey if it's random looking block it if you can't classify it that's okay just block everything that has like high entropy like you guys have heard about like the entropy attacks those are really awesome attacks that work really well they're not really widely deployed but you can custom configure them in some of the hardware so that's the issue with just obfuscating stuff you need the second layer where you can get it listed a lot of people are doing a lot of research on mimicking specific protocols especially HTTP people are just trying to make stuff that hides like Stegon graphically hides information in HTTP so the problem with that approach is that people always choose the most common protocols the ones that they think like no one will ever block this protocol because it's too important people usually say they're about SSL and now it's so people are really focusing on HTTP the problem with that is that the DPI hardware has the most visibility into HTTP of any protocol there are actually whole boxes that just do HTTP interception and do like semantic parsing of all of the headers and all of that kind of stuff so you have to do a lot of work to look like HTTP in fact there was this paper recently called the parrot is dead in which they talk about that they're pretty sure that given any kind of traffic that mimics some other kind of traffic they can make a test exist where they can differentiate the two because there's going to be difference between like your HTTP implementation and like a real HTTP implementation so people are trying to do this crazy stuff they're like trying to get like an actual browser like they're trying to get Firefox and try to make Firefox like load pages and they encode like information in the way like which pages you choose that's fine it's just like a very slow protocol and you don't need to do any of that because like I said before the DPI hardware is just most of the time saying are the first four bytes HTTP and then that's all you need to do a lot of the hardware only looks at the first packet because they're trying to scale and so they're basically they're cheating in their design right like instead of like looking at all the packets because they want to be able to push more throughput and be able to tell the people they're buying it like oh yeah we can handle your whole country's traffic and you know you don't need that many boxes will be fine they just look at the first packet and they classify it and they just like forget it it's been classified so they just stick with that classification I was talking to a DPI vendor said that they look for some protocols they have to look at like 20 packets oh no 20 packets before they can classify it so it's just it's a lot easier than trying to actually like be exactly like this protocol and then there's a really cool project called format transforming encryption that you give it a grammar for a protocol like for instance you say like HTTP or like FTP or like SMTP and then it will generate random messages that conform to that grammar that's a pretty cool project so I check that one out so the difference is in what I'm doing is that I'm not writing a protocol like obfs3 is like the torus current protocol for obfuscation you look at FTE that's kind of a protocol engine but most people are just they're thinking let's make one protocol that can never be blocked and I gotta tell you that doesn't exist there is no one protocol that cannot ever be blocked by anybody it just depends on your settings like your attacker, your adversary is gonna have some configuration on their hardware for block this, don't block this and it's gonna be different for everybody there is no one protocol so instead I wrote a protocol engine where you just instead of updating it with each revision when it gets blocked you can use the settings like you say okay before we were making traffic look like HTTP now let's make it look like let's do some UDP based thing let's just get crazy let's use UDP, let's make it look like Skype whatever and then if they block that then again just switch it up switch it up every day in fact don't even just mimic protocols I have this thing that I can't really convince anyone I don't know like SMTP and like NTP and then you just kinda like smush them together and you get this protocol that people are like I don't know what that is and just keep them busy they got guys they gotta configure this hardware they first have to notice your anomalous traffic then they have to figure out what you're doing then they have to make a configuration and then they have to make sure that it evenly splits out your traffic so you know just keep it rolling in fact you could even just use a probability distribution you could make up just random distributions you know you could be like in this protocol everything's always gonna be 5 bytes long or you know like 1400 bytes long I don't think there's any protocols like that you know another thing is my thing is purely statistical because that's how they actually look per packet all the classifiers work so my stuff is per packet the parrot is dead paper they actually reference my work and they say I think we've determined in this paper that packet based stuff like dust is just never gonna work and it's like right it's not gonna work against a bunch of CS professors and all of their grad students in a lab looking at like two different like pcap files sure but against the actual deployed hardware it works awesome I know because I have hardware and I pass it through there and it works awesome so I think that's kind of one of the differences there and oh thank you thank you right and so another difference is like with FTE format transforming encryption it's a great project you need a protocol specification so that you can follow that grammar with dust you just give me some sample traffic that in fact the best thing is you give me some sample traffic of traffic that was blocked and some sample traffic of traffic that wasn't blocked and I can from that make you a protocol that will be guaranteed to not be blocked well not guaranteed but it won't be blocked without having to even know a protocol I don't even need to know what protocol it is I just need you to give me the pcap files and I just process them and then we're done another thing is so a lot of people that are doing these specific protocols they model the protocol and they say what does the protocol look like let's look exactly like this what I do is I model the filtering hardware and I say what does the filtering hardware think that HTTP looks like let's look like that and then not do any more work than necessary so we get maximum efficiency while still definitely getting past that hardware you give me some different hardware I might come up with a different protocol and I think this all comes down to I'm aiming for a realistic I want to base my threat model on what's deployed and what's being used to sensor countries and then one more thing I just added right before the talk is that there's no shared secrets like everything's totally public like the source code is out there you can get it and you know even the protocol doesn't have any kind of shared secrets anything so you can know that people are running dust it doesn't help you figure out who's running dust because the traffic by definition looks like the traffic that you don't care about right so even if you downloaded you run your own experiments unless you know what settings people are using it won't help but even if you know what settings the battle is you have to make a better rule for your filter that can tell between the mimic traffic and the real traffic so it's no longer like a war of technology it's like a war of who has the better information like who has the better models so talking about threat models so in the academic world the threat model hierarchy of threats is if someone just published a paper and won best paper award that's the that's the adversary that you need to attack with the adversary in that paper right and then otherwise like if there's like a recently published attack you should defend against that otherwise it was if there was an attack published before 2003 no one cares no one is working on that in the academic search at all so that's kind of my issue with academic stuff they're really good at classifying traffic in the lab but I mean who cares because until it makes it to hardware until it's deployed until it's being used for censorship it doesn't it doesn't really matter I have a slide about open source threat and I just want to say if anybody this is my experience working on free now working on open source project is that the biggest threats one threat is whatever you come up with that you can think of that's like that's what I defend against because like I thought of it and so it's like probably pretty serious attack and then secondly is like if someone on the mailing list comes up with it then you know it's pretty bad or if it's on Reddit like if somebody attacks your system on Reddit like in a Reddit thread and they're like your system sucks it's totally broken I know because I broke it because I made this attack then that's what people defend against and then finally everybody always adds plausible deniability as a you know so it's like I've been there everybody just always thinks you got to add plausible deniability and I think that this is a bad road to go down as well so my threat model is based on is this is this attack actually being done in the wild to censor traffic a lot and so that would be an example of like the like the static packet of the static by sequence matching that's like number one thing so like if you don't defend against that then we should we don't even need to talk about it and there's actually still obfuscating protocols that begin with a magic number in the handshake and so if you just put that magic number to the filter then that protocol is gone and then you know if you see it occasionally that's you know that's good too we'll do that and then finally if it's if you if the capability is in hardware but just hasn't been used then that's like lowest priority but I'll still do that and there's some like really awesome hardware that sounded like totally sweet no one's using it but if anybody ever buys it so one of the things about DPI hardware it's like old it's really old no one ever upgrades so a lot of these countries that are filtering they're using like 10 year old hardware so that's the first thing is like the 10 year old hardware is the first thing we need to prevent against and you would be surprised the protocols that are coming out that fall instantly when thrown against 10 year old hardware papers are there going on the mailing list rather than looking at the actual hardware let me flip through see if I have some more slides here let's see yeah okay that's a good question so yeah so you have to have a client and you have to have a server and they both need to be speaking the protocol you need the public key of the server you need that because I need to have to be able to do a handshake where we don't have to communicate anything that's not purely random bytes so let me go I have let's see I won't really get into the key exchange I don't have a lot of time but the key exchange and everything is all purely random so you need to have the public key ahead of time so when you find out the address of the server you need to find out it's IP it's port it's public key and then also the configuration for what specific protocol you're going to be speaking so that all needs to be out of band in the invitation and so I know that's not that's kind of not the way that people usually do it people like to do these like you connect and then you just handshake everything like right there that's kind of like a more popular way to do it and I just feel like that way doesn't work you need to have a little bit of information transmitted out of band beforehand in order to have all the properties that we want to have let's see let's see yeah okay let's just do questions we have slides that are referenced by questions that's fine anybody got any questions oh we got a mic that's good because it's a big room come Jimmy I don't have a long cord and shockingly no wireless here so how do we how do we run a dust server to help out is there a community setup or such or EC2 instances or anything like that how can we make those endpoints that people can connect to right so that's a good point so dust right now is not actually a service it's a it's a protocol and it's like an implementation of that protocol which is designed for other people to use so like for instance with Tor I worked with them on Aavevist proxy which is part of their polygable transport system where you can basically make anybody can make a new transport for Tor and so that's kind of one of the targets is like a Tor wrapper that uses this and then and then also I'm trying to make it into like a library where you can use it like just in your own kind of protocol there's no currently like system for just doing like open proxies that are based on dust I think that's not really the model that I want to go with just because I know from knowing the tour guys from way back when like how much work it is to run a community of volunteer nodes well and well free net we had that issue as well was actually pretty low maintenance people just run it there wasn't a lot of coordination but so right now this is let me go to the slide on whether or not you should put real traffic on it which is no don't put real traffic on it because this is a purely purely experimental sort of thing yeah so yeah there's no I don't have a good answer for that yet but that's a good question I'm gonna work on that I guess this is more of a general question for all obfuscating protocols but couldn't the attacker just notice that you're only communicating with one machine all the time and it's always HTTP and you never get anything blocked and then just block all access that way right I see what you're saying so you're talking about like your connection patterns being anomalous right like you're you're making long lived connections to a single machine so that's one of the things I'm going in the next version that I'm working on is to put real traffic over multiple connections to multiple machines one conversation I've already got it where like some protocols actually use multiple different ports like if you look at open VPN it uses 443 and like 1194 I already have that as part of the statistical model where you can say yeah use like 80% on 443 and 20% use like 1194 right so you can take that to host too you can be like split your traffic among this set of hosts with this use these ports with this probably the distribution so yeah I'm totally I'm totally working on that also I'm working on a thing where you can split your traffic over simultaneous TCP and UDP conversations using different profiles different protocols with different hosts and it all just gets kind of funneled back together into one stream on the other end that's a lot of work though so it hasn't it hasn't come together yet it's just a lot of bookkeeping and stuff yeah that's the next step the obvious escalation for the hardware manufacturers is to just move up the chain and start classifying distributions of biograms, trigrams like hashes of tokens and HTTP have you seen any evidence that they're moving that way or are you sort of banking on the fact that that's like a lab CS world theoretical attack and not likely to be deployed in practice well so to come back to the basic principle of dust if you define a property that has no connections I will randomize over that property so if you move from a first order probability model for content where you're just looking at individual bytes to looking at biograms or trigrams and that's deployed and I see that I will simply randomize on the biogram and trigram level and I can do that a lot faster than the hardware people that need to do all that stuff test all the stuff and then get people to buy it and then get people to roll it out I could do that today but it's not deployed and also like today specifically I'm really busy doing some of the DEF CON contests so I you know yeah we're not done yet it's not clapping so how do you specify what's allowed through do you have the client email out of band some pcap data for things that they were able to do and what they weren't able to do how does what's the actual details of how that gets specified so there's kind of two parts there there's how do I make a model of a protocol and then how do we communicate that model to the client so they can connect to the server so in terms of modeling the protocol I have some tools that take pcap files and then actually boil them down into like a statistical like it takes out all of that individual package and just gives you the statistical model and it makes that into like a tiny little file that you can email to somebody and you bundle that up into what I call packet which has the IP and the port and the protocol configuration information all in one thing so all you need to do is tell does here's my invitation and then it will connect to the server and do everything right and so in terms of how you make those what I do is I have deep packet inspection hardware and I look at what gets through and what doesn't get through now obviously it depends on how you configure it like what kind of traffic you like are against so what I do is I look at real world instances of filtering I find out what they're using I get that hardware I configure it to like reproduce the reported behavior and then that's how I try to make a realistic model which brings me to something I want to say about contribution here's a bunch of ways you can contribute everything's written in Haskell and my Haskell to C is really weak so if anybody knows Haskell to C could really use some help making my Haskell C bindings not suck and then also if anybody has any DPI hardware that would be cool because I have some but I don't have it all in particular I need some Huawei so if anybody has got any Huawei gear that they want to let me like send some packets through you can help save the internet from being censored so you know it's like on the DL you're saying Huawei's like maybe security problem there was that you're saying Huawei's a potential security risk for my project no in general in general I wouldn't say in general I mean they they have good stuff they have they're really good at filtering stuff so I don't know if my stuff works against Huawei or not because I don't have a Huawei box yeah anyway more questions do you think if it's possible to put a deal of deal of fast category client in the filter so the message can be decrypted I mean automatically yeah we can use a key I mean exchange there but the protocol I mean it's relatively more constant than that so if we just reverse engineer the protocol I mean reverse engineer my protocol oh you don't need to reverse engineer you can just download the source code so it's like it's right there you know yeah I was thinking I mean just to put a just trying to put a defense mechanism in the in the in the filter like so just things can be automatically decrypted to yeah just like put a client in the filter so you can put a client I mean put a client in the filter so you can understand the meaning of what has been passed through I don't totally understand your question so let's talk after and then and then I'll get it you mentioned some academic work which sort of question whether in the long long run your protocol can fundamentally work because eventually they can adapt to your protocol can you please give more details about it uh yeah so so that was the parrot is dead paper in which they say that packet based protocols packet based approaches to obfuscation won't work because they've already got some stuff that they have done where they look at like the whole connection and then they're able to classify stuff a lot better which makes sense right like if you're now looking at one packet if you're looking at all of the packets information that you can use to classify um so yeah sure that's true here's the thing though um if you are looking at the whole sequence of all of the packets um unless you delayed well not even then that means you passed them that means you passed the packets onto the server and then you got responses and you recorded the whole conversation and then you classified it I won in that case right the message got through now maybe you had to burn that IP maybe that IP is blocked now I had to go to a new IP because they said oh you're doing you're doing crazy stuff so we're gonna block it um that's already a problem right that's already a problem that Tor deals with all the time which is you got to churn through new IPs all the time so I consider victory to be any time that I get the message through I don't care about anything else I don't care about people reading the messages I don't care about them decrypting the messages if it's afterwards and they couldn't use that information to block the packets the academic people are like can we classify traffic yes or no and my question is can they block the traffic which they do through classification this will be the last question if anyone else wants to talk to our man here we're gonna take him over to the chill out cafe so one more okay only one more so I'll make it count can you multiplex traffic across multiple protocols and multiple endpoints is the first part and the second part is are you IPv6 ready so good questions the first part that is in the next version I'm working on is multiplexing over multiple protocols multiple IPs multiple ports and also between TCP and UDP which nobody's doing so I think that's I think that's cool most people just don't like UDP I don't know why it's rad and IPv6 ready it's funny you say that I actually the first version of dust was IPv6 only and people had to talk me down from that they had to be like look you guys look you guys look Brandon like people don't have IPv6 I'm like well they better get it so the new version thank you yes IPv6 is cool so the new version I actually have just done IPv4 but I'm gonna add IPv6 obviously because actually one of the best way to avoid deep packet inspection is use IPv6 because they haven't gotten around to implementing most of the stuff for IPv6 yeah another great thing you can do is there's a thing called Tirito which is IPv6 over IPv4 UDP with like built-in hole punching and stuff and it's like really sweet it's actually built into Windows 7 so if you're Windows 7 you already have it you can just go to IPv6 addresses that's another thing where they just don't know what that traffic is so you just use that and then everything's fine there's a lot of like cool little shortcuts to getting your traffic using weird like use a weird protocol you know stuff like that alright thank you so yeah I'd be happy to talk to everybody see you guys at the Q&A room or if you just see me around you know let's hang out let's get a beer invite me to some parties cool thank you