 Hi everyone, thanks for coming. My name is Robert Deaton and I'm going to talk to you about a piece of software I've been working on for the past few months called DivaShark. But first I want to warn you about the perils of trusting your friends to help you if you were to ask them to say, help proofread your abstract before you send it in. And this is the abstract that I believe is printed in the book and the two highlighted lines were added by a friend of mine at the very last minute when I asked him to proofread it and makes me look much more cocky than I actually am on this. So make sure you proofread after you have someone proofread for you or things like this can happen. So the inspiration for this project really came out of playing capture the packet last year and some other CTFs over the past few years. And so if you're not familiar, capture the packet is a competition that's run at DEF CON every year or I guess it ran last year and is this year again too. And its focus is on live network traffic analysis. They give Jeopardy style questions about 10 or 15 questions and you plug into an Ethernet cable and you watch the traffic and you have to try to find the various things they're looking for on the network. But to make it difficult, they put lots and lots of noise on the wire and you have an hour to answer all these questions. So some of the example questions they have is a user at this IP address logged in to Reddit. What was his username and password because he wasn't using HTTPS? Or which email service did a user with this IP address visit? Or two users had a Google voice conversation and if you can find it, you can actually listen in on the conversation after the fact and find out what they were talking about. So kind of the standard tool that everyone uses when playing capture the packet is Wireshark and it's kind of the industry standard in doing network analysis. I think literally every team at capture the packet last year was seen using Wireshark. It was just kind of the thing that everyone used. But I argue that it's really kind of the wrong tool for the job. And I'm going to show you why. So this is a screenshot of Wireshark with some of the capture the packet traffic open last year. Thank you. As you can see, there's a lot of information. This is one view of the 146,000 packets and I think this is about a quarter of the traffic dump for the entire hour long. So there's tons and tons and tons of noise. And if you were to open up a view on any individual packet or look at the little pain that's usually down there that I have hidden, you get all kinds of information about every individual packet. Everything from the TCP sequence numbers, frame numbers on the ethernet line, some analysis of like HTTP headers if it's that, but every little detail about every kind of layer of the information. And when you're looking for something high level like a user logged into a website and it wasn't secure and I wanted to use a password, you don't really care about the TCP sequence numbers and whether the packets came in order or out of order or whether things had to be retransmitted, doesn't really matter at all. So I'm going to use the example of finding the Reddit login to kind of what we ended up having to do last year to capture the packet. So one idea is first we need to figure out which IP address is our communicating here. You know the users because it's given to you but you don't know Reddit's off hand. You can't really look it up on your own because DNS round robin or whatever could point you in the wrong direction. So decide we'll try to see if the DNS look up happened in the network traffic that we captured. And so we can do a filter to try to get just DNS traffic and we still end up with a few hundred requests for the 15 minutes of traffic. And you can do like a search on the string Reddit to maybe help you narrow it down and find it that way. And of course this time we couldn't be so lucky it must have been cached. So instead we'll do a full text search for the string Reddit on all the traffic which you can imagine takes a minute or two when you're looking through 146,000 packets. And you come up with this, it kind of just brings you to the point in where it finds Reddit for the first time. And the request that it found is a request for an advertisement that happened to be on Reddit's sidebar. So you can follow that TCP stream and it gives you a little bit more and actually shows that, yeah, that was pulling an ad and it's G zipping code and all that stuff. Don't really care. So you can keep searching again for Reddit and you'll get another packet that had the string Reddit. You can follow that TCP stream and you can keep doing that. And in this case, if you do it about 40 times, you'll finally find the one place where the user tried to log in. Or you can try filtering by the IP address that they pulled the ads from and hope that that's the same one that the login server is on. Or you can just kind of keep on trying to filter out by various things and searching different strings, you know, filter by searching for the string Reddit and then filter again searching for the string login. And you can just keep on doing this and eventually you'll find it. But it took a whole lot of effort. So to make it better, do we really care about packet level details? Do we really need to see all that stuff? And for something where we're looking at high level stuff and looking through application protocols, it's really not important. We're not actually writing these protocols. We're not looking for bugs in the protocols. We really just care about the high level stuff that was sent at the application layer. So instead, we really, we don't need to see every packet. We don't need to look at those individually. And even in some cases, we don't even need to look at one TCP connection individually, you know. There's a lot of information that can be gathered that you can coalesce from a few different requests to the server. So this is where Divasart comes in. And it's a tool to kind of try to make this live network forensics easy. And in addition, it's also a framework so that we can build kind of application level protocol to sectors and classifiers similar to Wireshark but that don't care about the low level details. So the idea is really to get you the information that you want as quickly as possible without getting bogged down in small details. So this is just kind of the analog to Wireshark's home view. We show, you know, source IPs and destination IPs, packet counts, and protocols which the program detects for you, source ports and destination ports. So kind of the big picture is that Wireshark or Divasart will do the same sort of traffic analysis or capture that Wireshark does. You can load in from a PCAP. You can do live capture from a card if it's in promiscuous mode. And it sort of automatically tries to follow TCP streams and UDP streams even as they come in and kind of group them together into individual streams. Afterwards, it runs the streams through a kind of port independent protocol classifier because you never know if someone's hiding things by using different ports. It prioritizes common ports for common protocols. So it will look for HTTP running on port 80 or SH running on port 22 first. But if sort of those classifiers don't match, then it will start running through the other classifiers. And once it's classified, there are protocol dissectors which can go along with the classifiers that are kind of designed to get you the information that you care about. And so the example is the HTTP dissector will show you information such as the user agent that was requested, the page that was requested, kind of give you a chart with all the get and post parameters and give you the option to just download the file that was transferred or the page that was transferred and save that out right off the bat. If you've ever tried to do this in Wireshark, you can follow the TCP stream in Wireshark and then you'll get the conversation going back and forth. And then you can save that whole conversation and kind of carve the file to get rid of the headers that were sent in the request that you don't care about. And then you have the file you're looking for. It's sort of that bit of tediousness that we want to get rid of. We really want to be able to just say, hey, this user downloaded a file. Here it is. Don't worry about anything like that. And probably one of the most important part is sort of powerful abilities to filter traffic. So we don't want to be looking through every little stream trying to find the one that matches us. We want to be able to classify or we want people to filter by the protocol that you're running on. But even better than that, the protocol classifiers have the ability to give the program information that they can filter on at an application level. So one example is the HTTP classifier and dissector allow you to filter your HTTP traffic by user agent or by whatever URL they're requesting, by whether or not it had GZIP encoding, anything like that. And it's sort of that ability to really narrow down on the type of traffic that you want that I think is kind of the most important and the big picture for the software. And unfortunately and embarrassingly for me, much of what I've told you is finished. The part that's not finished and was not ready to present to you today was the user interface, which is kind of the most important part, of course. But the sort of the sectors and the analysis and all that stuff, it works, it's done, it's all written in Python, it's still incredibly fast. I loaded up the screenshot from back here is actually the same PCAP from Capture the Packet last year that Wireshark did, and it was able to load it and classify based on, I have about six protocol classifiers in, and it loaded the whole file and classified it before I could time it. And I think it's kind of a great proof of concept that we don't need to worry about writing these in C or in low level languages. Python has shown that it's fast enough and it gives the ability to very quickly and rapidly prototype and write new protocol classifiers and disectors and get them out so that for competitions like Capture the Packet we have any kind of protocol classification dissection that we need. And so the last bit of info that I have for you guys is that the slides will be up soon. Releases imminent probably within the next few days I'm going to actually put up the first versions even if they don't have the user interface. And the project is going to be open source and I'd really love everyone's help. I kind of spent a long time trying to think out and put together a great Python API so that the classifiers and disectors can be written easily and without caring about the low level details and I would love if we could get as many people as possible involved and get really cool disectors and make this a cool project that makes a competition like Capture the Packet look really easy and really simple and no longer be this kind of tedious hunt to see who can filter through Wireshark the fastest. We can do better. I think a tool like this has the ability to make Capture the Packet as a competition almost not a competition anymore because the playing field should have everyone be able to find every answer immediately. So that's what I have for you guys. I guess I have time for questions if people have questions. Why is it called Fever Shark? So if you're familiar with this, well, the Shark part comes from Wireshark because it seems like a similar product and so I wanted to keep that around and as I was trying to think of names for this, I thought of the ability to monitor network flow and DivaCup came to mind and so I combined the two of those into Diva Shark and that gentleman is also the gentleman who decided to insert things into my abstract when proofreaded. Yeah, right now I have I think six protocol classifiers just for HTTP, SSH, Samba and a couple others, but there's also sort of a naive port-based classifier that it will fall back on. So I have a list of common, the most common services that are run on different ports and if it fails to classify, it kind of defaults on showing you that information at the very least for the time being. There's a Python API that's available. You basically implement three or four functions that are going to take in different parts of the network traffic. You specify a few options. For instance, the classifiers say I need n bytes before I can confidently say that this is the right kind of, or before I can confidently classify it, but you'll get that sort of information back and the classifiers just return whether or not they match. Yeah, sorry, what's that? Okay, so you asked how it's different from flow analysis and flow tools and so there are tools like TCP flow which kind of are supposed to take in network traffic and they just spit back out the individual flows as PCAPs or whatever and really the difference here is they bring together that ability to do flow analysis but also to do application level analysis on the flows. It's, the best part is not that it is able to kind of put together the TCP and the UDP sessions, it's that we can have that information put together and pass it on to more powerful tools that are able to look at the application level and no longer care that it came from a TCP stream or a UDP stream, we care what's happening at the application level. Yeah, let's say, I'm not familiar with that but it does remind me of one other thing that I wanted to mention and forgot to and that's one other thing that we have the capability to do with the API is to kind of correlate data between more than one TCP session so if I understood your question correctly there's a piece of software which actually lets you bring in more than one network stream and correlate data between those. I haven't thought about the capability of doing that but what we can do is, for instance, in the main user interface, we don't often care about the IP address, we care what domain it's looking at and we actually have the ability to monitor all the DNS queries and make a DNS database and show instead the host that that IP address belongs to in the network traffic even if that request was made at a different time and it could also get that information from an HTTP request because they have the host field so there's a bunch of different ways to kind of take a bit of information that we probably don't care about which is the IP address and show you the information that we do care about like the domain name that was being connected to. Sorry, can you repeat the last part of that? Yes. So I think your question is, in Wireshark there's a feature where you can actually kind of view graphs of the traffic as it comes through and that gives you sort of information on which protocols are more popular at the time and you're asking if that information is used at all to help. I don't use anything from Wireshark at all but what is done is sort of protocols in the options you can actually specify which protocol analyzers you want to load so by default it loads all of them but if you say don't care about HTTP ever you can go ahead and unload that so that it doesn't run but it also lets you order the protocols in the priority that you want them to be checked so if you care the most that we're looking for HTTP traffic you can put that protocol classifier and deceptor as like the priority ones. Yes. It's going to be GUI only so I feel like for a tool like this the user interface is probably going to be one of the most important parts as part of the reason that it wasn't done is I've tried to spend a lot of time making sure that everything kind of behaves as you would expect and that you can customize and change things and make sure that the information that you care about is what's being shown. Yeah. I haven't decided where it's going to be hosted yet. I can tell you that it will be get based but I'm not sure if I'm going to host it on GitHub or not but at the very least you're free to mirror it on GitHub and send requests through GitHub. I want to say thank you for getting started on this even though it's a bunch of things you're going to do, thank you even more. Thanks. Yeah. Also it really is going to depend on the different protocol classifiers and deceptors because each one is kind of written as a plug-in and they can provide any sort of filters that you want including an interface to like design those filters so for HTTP the filters will support filtering on anything that comes in the HTTP headers or for other protocols it's going to vary based on which protocol is available but any information which a classifier and deceptor deems is important enough to filter on they can provide the interface to filter and so you really get the ability to kind of filter based on if you're looking at the protocol based on what if you're looking at one application you can really drill down and figure out which of those you want but it really depends on which application and how much the classifier is exposed. Also in the example his question was how do you pull out the application level data like the username and password and so the example that I used before about Reddit the idea here is that you'd be able to filter and look for say a post request to reddit.com because that's likely how the login details are going to be submitted and once you found that if you open up the window with all the HTTP information it will show you a table of all the the parameters that were passed via post and so you'll get a table that has username and the username and password and the password but it's not the goal isn't to provide like a way to look at all the traffic and say oh someone had username like logged in with this username and password but if you know what you're looking for you should be able to find it immediately. Okay I don't see anyone else but if you have any other questions feel free to email me or grab me afterwards.