 I'd like to welcome everyone to track 5 at DEF CON. Want to remind everyone in consideration of the other attendees, please turn the ringer on your cell phone off. Thanks. Also, there's been one scheduled change. Track 4, 4 PM, there's a substitution owning the WAN with Xan. Hello. I'm glad so many of you showed up. I'm Rafael Marty. I work for a company called Splunk. A lot of you probably have seen the shirts around. I have a few shirts here. I will hand them out during the workshop. At some point, I will tell you when I give them out. You've probably heard of sekvis.org at this point. I hope if you haven't, go check it out. It's a portal I started around security visualization. And I'm also the author of Afterglow. If you've seen that tool before to generate link graphs, I will show that in a little bit. And it happens also that I'm the author of the book you see here, Applied Security Visualization. It just came out on Tuesday. The bookstore here had it before I have seen a copy, which was pretty exciting. And I'm presenting here with Jan. Yes, I'm Jan Munch. I'm from Switzerland. And I'm senior security analyst with a well-known Swiss pentestine security review company. I'm the father of Davix. I initiated the project and engineered the CD. So the thing what you're going to see during the workshop, that's the stuff I did. All right, to get the workshop going, you need ISO images. So there's a guy down there. He's Martin. He's a colleague at work. He will distribute 30 CDs among the crowd. So please copy those ISO image and the manual, which is on the disk, onto your hard drive and hand on the CD immediately so that everybody gets a chance to get the image. Because maybe there are some guys here. So who did not get the chance to download the image? OK, there's quite a few. All right, so Martin is distributing disks along the aisles. Just pass it on. And to set up. Too much security here. Yeah. All right, the recommended setup is that you use VMware, run it in there. In the manual in chapter 6.1.1 and 6.1.2, you'll find instructions on how to set up a VM if you don't know how to do it. Then we suggest that you activate networking because the manuals of the tools and tutorials they're located in the internet. So there are like links on the CD. We'll discuss this later on. And to get to the content, you need internet access. So suggest that this net your VM to the network or even bridge it if you like. OK, so what we're going to do, it's kind of an experiment for us and I think DEF CON as well. We want to talk a little bit about visualization about the David CD itself. And then we have the hands-on analysis and that's what you need the CDs for. So you can start up the live image and do some analysis yourself. During the time that you guys are playing with the capture file we already gave you on the CD or you download it, we'll roam around a little bit and try to figure out what you guys are up to and if there are any questions or things we can help with. And then we'll probably show some things here. If anyone during the analysis comes up with interesting things they find in the captures or ways to visualize the data, we encourage you that you come up here and then show everybody what you did. So while you're doing the analysis, kind of keep track of what you're doing and you can come up here and show people. For people that are gonna come up with something that we're gonna be impressed with, I have four books here that I'm gonna give away. So I encourage you to come up with something interesting and why does this go blank here? Here we go. All right. So the goal is that you guys get kind of a feeling for how do you visualize security data and how do you operate the David CD itself? Let me ask a few questions. Who has done log analysis before? Here's your hand. Oh wow, awesome. I think I never had so many hands shown when I asked this. Who visualizes their logs when they analyze them? Ah, not that many. Okay, that's good. I hope after this it's gonna be a few more. Has anyone used David's before? Couple. Okay, cool. Have you heard of Sequiz? A few. All right, so everybody that does visualization of log files, I encourage you to go to Sequiz and submit some example layers. It's a portal where there's a whole gallery where people submit graphs and ideas of how to visualize log files. So you might get inspired and if you submit some things, you can inspire other people. All right, so I'm gonna hand over to Jan and he's gonna talk a little bit about the CD and how it's organized. All right, so I give you a short introduction to Davics. So first question is, what is Davics? Davics is a live CD which is built on Slack's six. Maybe some guys of you know, Backtrack is also built on Slack's. So we use more or less the same technology which comes from Slack's six. A good thing about Slack's is it's very modular. You can have software in packages and organize them pretty easily. It's easy to customize the CD. So if you wanna change single files on the CD or in the live system, it can be done really easily. And you cannot just run it from CD and DVD. You can also run it from US Bistic if you like. There are instructions on the CD manual where it tells you how to do that. It's actually really copying files and running a batch file and that's it. And maybe some of you would like to have it really installed on a hard drive. You can do that as well. We have included the Backtrack 3 installer. So you can set up like a VM and run everything like a normal Linux. Okay, the CD contains a collection of tools. These are mostly tools for data processing like classical things like RAP, AVEKA and Grap, Perl, things like this. And the most important part are the visualization tools. So we've included like 25 or 30, I don't remember the exact number of tools on the CD. This is actually based on the research Rafi has done for his book. So the tools are the same ones which you can also find in the book version. All right, so the tools are ready to run so you don't have to compile anything. It's really just started up and start using the tools. Then the CD comes with a documentation as I just mentioned. There's a quick start guide on how to get things running and how to sort of get things visualizing. There are sort of really basic examples on how to do that. All right, next slide. Why did we build Davics? Well, first of all, there is sort of no free out of the box solution which contains all these tools and it's quite cumbersome to get running lots of these tools. There are tools which require certain versions of GCC or libraries which are rarely used. So getting things to run can be really difficult. Then there are different runtime environments. There are tools written in Java, in Ruby, in Perl. So you need a huge environment to get the tools working and we've done all that work for you. So the goals for our project is to have a running environment which is really simple to use so you can start analyzing things without doing much of compilation work and installing stuff. And another thing is you can add stuff easily thanks to Slack's so you don't waste too much time adding new things. The user interface in Davics is based on KDE. As you see on the right hand side, this is the start menu of KDE. The top one is sort of the document link. You go there and you get access to the documentation. Then you see three categories. These are called capture, process, and visualize. And this represents the information visualization process. So each of those tabs contain a set of tools which are related to capture, processing, or visualize. So it can be that the tool is in different areas like Afterglow which is sort of a collection of scripts which allow you to sort of take a TCP dump, make a CVS out of it, that's the processing part. And then you have a part which does the pre-processing for the visualization tools. So like generating dot files for graphics, for example. Okay, then we have additional services. And MySQL is installed on the CD. So if you want to do analysis with database, you can do that. Although the space is limited to the memory because everything in Davics runs within memory, meaning also the file system runs on memory. That means the disk is as large as your memory actually. And we have Apache on it. So if you need a web server or something, it's there. Let's have a look at the tools which are there. This is just a small set of tools I put here so you get an impression what's on there. First, in the capture category, there are things like network tools. Of course, Wireshark, most important one, don't miss it. Argus for net flows. Snort is installed with bleeding edge signatures. Then we have logging things so you can actually direct SysLog to Davics and these things get written to file and you can analyze it immediately. And the third thing, we have lots of tools which allow you to grab files from other systems. So like there is FTP, SCP, and all that kind of stuff there. Then we have the processing part. I already mentioned Shell Tools. So I'll skip that one. Then we have generic processing for Afterglow and LGL, for example. And things like data enrichment. Maybe you would like to have your IP address looked up into a country. Maybe that's interesting for your analysis. So there are tools included on the CD which allow you to do this. Or you can have who is queries and whatever. By the way, who finds the typebook as a t-shirt? Good idea, Raffi. Okay, so let's move on to the visualization thing. There are tools which are particularly specific to network traffic analysis like EtherAPE and INEDWIS. And there are tools which are more of generic use like Afterglow and Rproject. Rproject, for example, is a statistic analysis suite. There you can analyze any data you like. Let's have a look at the PDF manual next. So it contains a quick start guide, contains information about setting up the network like wireless access, then links to external resources. And probably the most interesting part is all the customizing things like how to create custom ISO images, creating new modules, or installing things on USB stick or hard drive. The manual is organized the following. You can access it on a global level, like you say, I wanna see the whole manual, or you say, okay, I just wanna have a look at a certain chapter in the manual. Or when you look at the tools, like in the example here, you see Cytoscape. When you open up this section, you see the tool on top of it. And then on the bottom, there's a thing called Davix example. This is sort of the quick start example to get things visualizing in Cytoscape, for example. And on the bottom are the external resources to the tools documentation on the internet. Okay, we also have a little disclaimer here. So the manual is not really an introduction into security visualization methodologies. So if you're interested in how the process really works of visualizing things, then we suggest you have a look at Raffi's book. In there, there are detailed use cases about how to do things and how to work with the tools really hands on with script examples and whatever. So the documentation in the CD is really a rough thing to get you going. It's not about methodologies and stuff like that. Good, just a word on customization. So you can see how it works. So when you look on the DVD, they're like different folders. There's a folder, for example, called Slack's modules. In there are all the modules located, which are on Davix. So if you create your own model, you can put that on the ISO in that directory and it will automatically load. And if you would like to overwrite a single file, maybe there's some config file which doesn't suit your needs, you can do that as well. There's a thing called root copy. And in there, you can add your individual files you wanna overwrite on the CD. So actually how Davix works, actually Slack's works is it first loads all the modules and then it copies all the things which are in root copy into the memory file system. So that's really nice thing to customize. So that's it for me, Raffis there. Thank you very much. All right, so I'm quickly gonna talk about very basics of visualization. So you guys get a little bit of a start on how to analyze the capture we gave you. What you see here in this slide is the information visualization process. It's fairly simple. If you start on the left hand side, the first thing you wanna do is you wanna kind of define the problem you have, right? You have a certain data set, you have a certain use case, something you wanna do with the data you have. It's always good to kind of, to really know what you're after. So if you have a PCAP or a TCP dump capture or something, you might wanna figure out what are the communications in there, who is talking, what are the machines on there, what are the services running. Maybe you wanna figure out whether there's an attack in there. The second thing is you might wanna figure out whether you actually have the data for that use case. So if I want to find attacks, I probably need the packet captures or some IDS log files. If I only have a firewall log file, I'm very restricted in terms of what attacks I can find. The second thing is, if you move along with the process, if you have all that data, you might have two gigs of packet captures from your border routers or something. Well, you probably wanna start filtering already. You wanna figure out what is really necessary to accomplish that use case. So if I'm looking at the border router and I wanna see the attacks happening internally in my network, it's probably the wrong data source. If I wanna look at things that happen from my internal people towards the outside, I might wanna start filtering for just those sessions and not the things that are coming into the network. So that's a way of already reducing the amount of data that you have to look at. The next step is that you have to start normalizing the data or you have to extract the information from the data that you have. For example, if you have a packet captures and you wanna look at communications, you wanna extract the source IPs and the destination IPs or whatever endpoint information you want. Maybe you wanna take the source MAC and the destination MAC. So you have to start pulling those out. And a lot of times what I do is I just write some Perl script or an AUK command or something to pull those things out. A lot of times I go and get some parser from somewhere. There's a few of them on secris.org if you're interested or in the Afterglow distribution, there's a few also in there. So I can very easily extract those individual components from the log files that I need. And here a word of caution. If you're looking at TCP dump traffic and you're probably gonna run into this if you start analyzing the capture we gave you, be aware of the source destination confusion. Meaning that if you look at a TCP dump output and you start doing an AUK for example, so you get the third column. It's always the first IP address that you see in the output. You will end up with communications going from a client to a server and from the server going back. So you will think that those individual nodes talk to each other because what happens is if you look at the TCP dump output, the SIN packet, the session initiation will come from the client to the server. Now the next line is gonna say now the server talked with a SIN actor to the client. Now in the capture it looks like both of them are source addresses but it's really not. The second packet is in burst. So you need to make sure that you turn those around so you're not getting confused who is talking to whom. And for that reason I usually use parsers that already have that logic built in and there's one in the afterglow distro that's called TCP dump to CSV which uses some heuristics to figure out who's really the server and who's the client to output the right thing. Then once you have the information extracted you wanna visualize it somehow so you wanna do a visual transformation and generate the first graph. And there you wanna think about how does the output have to look and that can be all kinds of things. It can be that you wanna generate a pie chart or a line chart or bar chart. Maybe you wanna generate a link graph where you actually see nodes and then just connecting it to see who is communicating. And then once you generate the first graph most of the time it's gonna be not your end product. You're not gonna be satisfied. There's probably gonna be too many nodes, too many bars. They might not be scaled the right way. You wanna add some color. So you wanna go through a visual or a view transformation and start adding properties to that graph or change it up. And then in the end you might have certain examples and you wanna start interpret this thing. So you wanna figure out is there really an attack in the traffic where it came from and so on. And that might again point you to maybe even restart the whole process and say well I have other things I need to know now. All right. Let's talk about data formats quickly so that you can actually start using these visualization tools on the CD. Unfortunately there's not a standard data format that all of these visualization tools are using. Some of them you see as v-files so comma separated values where you have like IPS comma port comma IP comma something. So it's just comma separated values and you have to sometimes transform your data input into this format. And there is again on the afterglow distribution I have a lot of parsers that take like a snort alert log and convert it into a CSV file. Then one tool is using a TM3 format. This is the tree map tool from Ben Schneierman which the data format here is slightly different from a CSV or a TSV file by adding the name of the field or the data element and the data type. So you see the first column gives it a name that's an arbitrary name you wanna give it and the second one is a data type. The tree map tool needs this because it treats integers different than strings and it treats it different from floats or timestamps. So you need to give it the data type also. A lot of times I just start out with using either strings for everything or ports are gonna be integers but I'm not gonna be very specific about how I categorize those. So usually strings and integers are enough. Then there is another interesting format. It's a dot format which is basically a description of link graphs. So a link graphs has nodes and edges connecting the nodes. The dot description you see here is basically defining all the individual nodes. It says there's a node A, there's a node B and you can give it certain properties. You can say node A is called a green or B has maybe a gradient on it or maybe another label or something and then you define the edges you say these are the nodes that are connected and that's really what this complicated looking format here is and I will show you in a second how to generate them automatically. Then some other tools are using maybe GML or there's other graph ML, there's other data formats that some of these tools are using and you might end up having to convert one to the other and to the next one so I don't have automatic converters for everything but most of the things are fairly simple to do. Let's look at an actual example here. So if you're using Afterglow and you wanna generate things like the one on the right hand side there, it's very small but I think you get the idea with nodes and edges. What you do first you have some data source, something that generates the data, then you parse it basically extracting the fields that you want from the data and in this case you generate a CSV file. Now if you have your data for example in a database you can just run a MySQL command and extract the columns that you want and generate the CSV output. So you at least you have to input to start with or if you take a TCP dump then use something that pulls out the fields you want. And then you're ready to use Afterglow and now Afterglow takes those descriptions here, I have an example here where there's a username and then an activity this user executed. So this is guy AA Therruy executed a ping or a Anna access some patents. So you just have these two column inputs and now you can translate them with Afterglow into a graph description. And this on the right hand side here is this dot description. So you don't have to write the complicated structure of the dot file yourself and figure out how to actually do it correctly but Afterglow is transforming that automatically. And while doing the transformation you can also give it a configuration file that adds certain properties to the nodes and edges. So you can say if the IP address is in this subnet then make it green. If it's in that subnet make it orange. If the port number is above 1,024 make it dark blue. If it's below 1,024 make it light blue. So once the graph is generated you see right away certain properties in the data. And then if you run this through graph it will generate a graph that looks like this where you have the four users and the four activities associated with them. It's a fairly boring example here but I think it gets the point across. So let's take a very quick example analysis of an actual data set that I went through. I'm in my book I cover exactly this example and when I introduced the section I was saying well I don't really like to analyze worms because it's so easy to find worms in your network. There's so much traffic that generate it's really boring. So I tried to mix it up a little bit and said well that's at least look an interesting data set. And I had the opportunity to work with a cell phone carrier where they were interested in finding worms in the cell phone network. And what they have are these call detailed records that you see here and they look really ugly and a lot of these entries I don't even know what they mean. There's like numbers in here that I have no clue what they are but I don't need to. I just wanna know who is sending multimedia messages, pictures, maybe even files around. So I might send someone or you an image an MMS message and that generates one of these MMS CDRs. And I'm just interested in the phone numbers and this is the fifth and the sixth column there the ones that are bold that are the phone numbers that were communicating in this instance. So what I do now is I just look at the CDR records and I use AUK to extract the fifth and the sixth element of that data set. That's all I need. I need the phone numbers. What I wanna do then is I wanna actually generate a link graph that shows me the caller, then the message and then the callee. So who was sending a message to whom? And if I do this, you get something like this and this is fairly horrible in terms of the quality you get here. Let me try this man if I, whoa. Never try something you haven't tried before. The Mac is gonna come back. Here we go. Not sure if this helps. That does. Here we go. So what you see, the call is really horrible of this picture, but I think you can see what happened. You see all these communication. So the yellow nodes were sending a message to the white nodes, right? And that's what you expect. I might send an image to someone and they might send another one to someone else. So I'm just one of these yellow ones. There's some people that communicate a little more and some a little less. Now, if you look down here, there is one of these kind of stars showing up or these snowflakes or something. Now, first I was like, oh, this is interesting. Someone sends a lot of messages and this was in a fairly short amount of time. So like, let's see who this is. And it turned out those were service numbers. You can send a text message to some service number requesting like a background image or a ringtone or something. And that's absolutely legitimate traffic. So we started excluding all those because we just kind of generated a list of known service numbers. And what happened then is I got a graph like this. I'm going to go back into presentation mode again. I still ended up with some of the service numbers here, probably, probably want to look what they are. But I also found this pattern on the top right here where you find long chains of messages. So one message traveling from one person to another, that person sends the same message. And I know it's the same one because I look at the size and the content type of them. So these go to multiple people and then they fade out again and it generates this huge kind of chain in here. It is really, really hard to algorithmically find this. You can do it. You can generate spanning trees and calculate them and figure out whether there are any of these long chains in there. But in the end you probably, you have to go back if you find one of those chains, figure out what it was exactly and then regenerate it again and exclude and include. This way is very, very quick. It took me a very, very short amount of time to figure out what's going on in here. All right. So let's get you started here. Okay, and now it's time for analysis. So your turn actually. So are there any people here who do not got the CD? Does everybody got the ICY image now? So please hand on the CDs to those people who haven't got the image yet. And the people that don't have a laptop, just have the CD maybe that helps too, I don't know, take it home. Okay, so now the pcap file is located in slash root slash davix over, oh that's the name of the file. So the file is in the slash root directory called davixworkshopcaptures.pcap. If there are guys who do not have the defcon image, then probably they should copy the ISO image from somebody else. It's the easiest thing to do. All right, then I guess Raffi's turn. Yeah, so let's just hammer away if you have questions. Either yell or it's probably gonna end up in chaos, but you already have a question. You could, you can download the ISO of the internet, but it's quicker if you have the CD. It, there are two versions, the official release and the defcon16 ISO. So they're both downloadable from our website that's davix.secvis.org. And you can also find links in the defcon forum, pardon? So only the defcon dash, the davix dash defcon16 image has to capture in it. The regular davix image doesn't. We made an extra image for this. Okay, this gentleman in the first row. So the question is whether we have any performance numbers of how long it takes to generate any of these visualizations. No, I don't. It depends on a lot of things. It depends on the machine you're running this on, right? It depends on how, what the entropy is in your data, for example. So for example, if you're looking at a pcap file and there's 100,000 different source addresses that's much different than if there are only 100 different ones. Then it depends on the tool you're using. If you're using Afterglow, I can guarantee you it's probably slower than if you use anything else because it's just a dumb pearl script that keeps a lot of state. The bigger the file it gets, there's probably exponential growth in the runtime. I try to optimize things with caches and so on, but it's not a commercial product. You don't have time to optimize it more. Feel free to do it. But the other tools, I don't have any control over them either. So every tool is gonna vary, and it varies on the data types you have and all that. So sorry, I have no idea. So maybe if you're kind of stuck and you're like, well, where do I start? There's a couple hints here. The first thing is probably you run some kind of a TCP dump on your data, right? You wanna, it's a pcap file. I mean, if you're really lead, you probably can use the hex editor and go in there and do some magic, I can. So I'm using TCP dump to do the protocol interpretation for me and get an ASCII output, right? So you can run that over it and you get some ASCII output of the communications. You could also run SNORD on it and have the rules trigger on it, for example. Visualize that somehow. You can, the TCP dump to CSV pearl file, you can just run it, it's in the path. You pipe the TCP dump output into it and it will actually do the parsing. If you look at the file quickly on the top, it will show you how to run the command. You need to run TCP dump with certain parameters. Like, I think there's like four Ts and minus E, NNL, something, I don't remember, but look in there to get the correct command so it actually parses the output correctly. Then if you run afterglow.pl-h, you get the help from Afterglow. It's fairly simple to run it. You don't really need any command line parameters if you don't want to do anything special. There's also, there's a few pearl tools on there that I hack together. They're horrible quality, but they get the job done. So if you want to generate a bar chart, for example, if you just have basically a one column output, just pipe it into bar.pl and it generates a PNG file of a bar chart of the top 10 or I think you can even control whether it shows all of them or not. So, are there any other questions? Okay, let's get started. So for those without a laptop, this is gonna get really boring right now. I will try to probably just hook up David's here and those who want to watch, I'm gonna just generate some graphs and if you want to yell and say, hey do this, please go ahead. Everyone else who finds something, just come up and show it. I see Greg is already frantically working on some stuff. I hope it's actually related to this. So if you have to compete with Greg, who's by the way the author of Security Visualization of the book by Nostarch Press, the one, the black one of the green image on the cover, he plucked me, I plucked him. I will get an image on here officially also. So if there are any people who do not have the David's image yet, please get it from someone who has the CD. Has somebody got the image on the USB stick? There's some CD, ROM drive less user. So I was, when we were preparing for just half an hour ago before the talk, I was trying to look at the capture quickly and find some things to look at and I wanted to know how to run one of the tools and Jan caught me looking in the book to figure out actually how to run one of my own tools. I can't even remember how to run them. I had to look in the book. So this might happen again that I might have to use the book to do something here. Oh, and Jan is playing tricks on me. I like this, the Swiss keyboard layout. Oh, this is gonna be great. No, I help you. So if you wanna change language, you can do it on the bottom part here. I like the challenge, let's leave it. Really, shall I put it to French or something? No. I might actually remember how that looks. I used to type on a Swiss keyboard all the time back in the day. What was the idea here? There's this key on here. It's called Alt-G-R. I wonder who came up with that. Okay, so search file directory. This is what I've said before. I'm gonna try to, I was trying to find the command line parameters I need for TCP dump to CSV. I love not using my laptop. It's gonna be much more efficient here. All right, this is lame, sorry. Is it shift? Okay. All right, so I'm gonna read the capture file. What is it called? David's. And for those who wanna see how that looks, let's get this bigger. So everyone seen PCAP output before in TCP dump? All right, so now we gotta extract something. The easiest way, probably, and you know what? Minus OVI. G-S-T-B-Dump to CSV. What kind of shell is this? This is weird. All right, this is gonna be really funny. How do you do that, is that raw? Oh, I don't wanna do that. I wanna have it big. I know it was not installed. Let's just use that one. If I can now actually copy and paste. We need to do this here again. We're gonna get somewhere. Yeah, now the cool thing here is that I just say, I wanna see the SIP and the DIP. So the source IP and destination IP. And just to verify what we get here, do ahead of minus 10. So I have CSV output at source and destinations. This is kinda boring so far, but then I wanna see a bar chart from this. And I need to give it an output file. Oh, wow, this capture is too big. So what I usually do is I just cap it somewhere here to, I'd say, 2000. You gotta start somewhere. And if we use this GQ view on here, for example, to look at the file. So this is the first. Oops, that's not, that's not it, that's not it. It's this one. Boo, this is nice too, right? So now, at least we see. No, we don't see, this is not what I wanted. Okay, who can tell me what I did right now? No, what is this output here? It's a bar chart, great. It's the count by source IP. That's really what I wanted to do, but it's not. I can't hear you, you raised your hand, but you're not seeing it. Oh, frequency of connections, I like it. What size are you? What size? Double XL. All right, so maybe that's, maybe I'll do this because I didn't really sleep that much the last few days, so. I think I was doing, I thought I was doing okay today, but apparently I'm not. So this one here is actually for the sources, right? So I have four sources in there, and remember that TCBDOM to CSV parser actually takes care of the source destination confusion. So it will take, these are all the clients. It's not necessarily the IPs that show up in the first column of the TCBDOM output. So you could do the same with the destinations and maybe with the services. Let's do that quickly. I usually like to see what services are running on the network, so that's called S-Port. By the way, who can tell me how I know these, or how I figure out what these field names are that I'm typing in there? Like SIP, DIP, S-Port, D-Port, timestamp, whatever, where do I find them? Someone here was doing that. They're in the beginning of the Perl script. Eric, what kind of, do you want a shirt? What size? Excel? I heard it's around Microsoft, though. Yeah, so if you want to know them, go and look at the source of TCBDOM to CSV in there. You will see them. So if you look at the destination ports now, we need to refresh this, here we go. So you see there's mostly HTTP traffic. There's some proxied stuff here. And then I have two interesting ones. So this is probably already a finding that someone could have uncovered here that there's two interesting port numbers going on in here. By the way, I'm gonna give a book away for the person who's gonna tell me what this capture file is. Where does it come from? If someone can nail it on exactly where and which one? Who's that captured flag first? Is that Trey? All right, did you already get a book? Did you already get a book? No, I'll get you one. Who and the person that can tell me what year that was is gonna get the second book. 88888, it might be not necessarily. If you can look at the capture, you can find out, right? Just by the port, you might not know. Actually, I'm pretty sure it wasn't the year 2004. Nice, all right, I guess you got it. Trey got swung, all right. Get him afterwards, I'll put him aside. I still have to, so. All right, it was 2004 capture flag. It's only a very short capture that I gave you guys. I like the idea of the Tomcat there, but I'm pretty sure it wasn't. I'm not, if you can actually show me the capture that shows it the Tomcat server that responds, then I will believe you, but I believe there was another service running, which was actually something really weird. But you guys should investigate that. Just by looking at the port numbers, you might not always, a lot of times you don't know really what the traffic is, right? You can see a port 888 traffic, but you can run anything over that. And if you actually look at this traffic, let's just do that. Let's see whether this is actually Tomcat traffic. So what I'm doing is I'm using a capital A here, right? Which does, I will show you what that does in a second here, if I find the pipe again here. It gets you, ask the output. Now, what do I do, or why is this not useful right now? Anyone? VSD filter? Sure, but why is the ASCII output not really useful right now? I like that. So I'm not capturing the full packet right now. By default, the capture length or the output it gets is, what is it, 68 bytes? Something like that. What is your size? What do you try, first size do you want? Medium? All right, so what you have to do is do minus S and I always confuse which one it is. That was not it, it's just a small one. Yeah, so here now, you actually see some of the traffic. You see here application, octet stream. Looks really weird to me. You could probably try to extract just the ASCII part system.run time. This looks Java like or something. Or .NET, so that wouldn't be a Tomcat I guess. There's some get URLs in here. User agents, curl. So there was a wiki running there apparently, open wiki. So I need to take the other suggestions to know if I'm channeling over there. I need to filter just A to D traffic, right? Most of it looks kind of funky. Lambda core database. Yeah, connect through this to log in. So did anyone play Capture to Flag 2004? No, okay, so then you guys don't know what this is. I can't fail to remember what it was. But maybe someone finds out about looking at the Capture and it's gonna tell me later what it was about. I'll leave that open for now. So this is kind of interesting, right? You can go through here and figure out what these things are and you see some text in here from the responses probably. But if we wanna continue visualizing, that's really why we're here. So another thing that I usually do as one of the first things when I look at a Capture, I look at an afterglow graph, and you can probably tell I have done this before. Not on Jan's keyboard, obviously, but so that looks sort of boring. What I did is I just sourced destination communications, right? So I see who's talking to whom, no caller, nothing. I see that there is this one machine here, 4.160, that goes to all these different IP addresses, and then there's some other thing going on up here, but there's a little more traffic. These two machines here, obviously, for some reason are talking to each other. Now let's add some caller here, I guess. That might be interesting. So how I go about that is, let me find sampled out properties. Well, locate is not done there. Jan, that's another tool to add, locate, that might be nice. If you guys have suggestions on what to do on David's and things that you're missing, Jan loves getting suggestions. Doesn't mean he's gonna implement them, but so user, local, share, share. Let me do this. Share after glow, sample the properties. Does it read IIS logs? It actually, yeah, they're comma separated. So what you can do is you just, if you have a file that, they're space separated. Okay, what you can do is you can use an awk command to extract the columns that you want, and then, or yes, and then format it as a comma in the output to do that. So here is a sample property file from after glow where it basically colors the individual nodes. So I can have individual assignments for the source node, the event node, and the destination node. Right now what I did the graph before only had two nodes, the source and the destination node. And in that case, the event node here wouldn't matter, but a lot of times you have three column output, so source, destination, destination port, for example, and then you can address all these three types of node individually. So what I'm doing here is I say, well, it's yellow if the first field, so there's an array called fields that's implicitly there in the property file that you can address. If you say dollar field zero, you can put a regular expression here. This is really just pearl code here. So I say, if it's 192.168, anything, then I want to call it yellow. If it's in the 10 subnet, it's green yellow, light yellow four, if it's the 172.16 space, and red otherwise. Yeah, so the suggestion would be that we support different data inputs. With the parsers that are on there, we can actually quickly look at them, probably that might make sense here. I guess they're probably in user local share bin, oh, user local bin. So there are, let me see, star, two Cs, that's not, that's in the earlier one. We're speaking encrypted here. Some people can decode, but some can't, but that's on purpose. So it's roughly 13. So these are the parsers that are in there. So I have parser for Argus, IPFW, PF, Snort, alert logs, and TCP dump. Actually, this looks like there's a couple missing. I might have other ones, but. I'll show you. So the gentleman is asking whether there is a way, or whether we support a common event format or something with all these tools. So keep in mind, these tools are not controlled by us. We just bundle them onto CD, that's it, right? So we don't really control what kind of input they take. And different tools have slightly different requirements in terms of how the data has to look. So a node, or a graph is, for example, needs to have nodes and edges that are connected and need to have properties and so on, whereas a bar chart doesn't need that. We could probably have parsers that go from generic formats to a common format, and then you just have parsers for the common format. Well, have you guys seen those MSRC guys running off the guns with the marshmallows? Have you seen them, no? They had this fight going on yesterday with marshmallow guns. I would actually shoot you if I had one, because you mentioned IDMEF. For those who don't know what IDMEF is, it's an intrusion detection message exchange format, which is a standard for intrusion detection systems to log in a common way. I think there's an extension for SNORP to do that. I have very strong feelings about that form, and as you can tell, we're actually working on another standard right now called CEE, Common Event Expression. It's one of those horrible acronyms from MITRE. They have everything from CVs, it's CW to CCE to CEE to CME. I'm not making them up, but CPE, it's horrible, but CEE, Common Event Expression, we hope that we're gonna address the shortcomings of IDMEF and come up with a decent standard. Anyways, no, we don't have parsers from all the formats to a common format. This is something that commercial tools are trying to do at, if you buy a SIM, for example, they have all kinds of parsers, and they spend a lot of time and money maintaining those things, because the sources change, right? Like, I don't know, SNORP might change their log format. At some point, someone decides that the SysLog Output needs to have another field in there. Well, you have to change your parser to support it. Every update, and PIXES, for example, every message needs a different parser. Every message looks different. So you have to have these slews of regular expressions or some way of parsing that thing. That's really not what I have time, and I'm not really moving to do that. I'm encouraging you guys to submit the parsers you have on SecVis. There's a section of parsers where you can submit some parsers, and I think Michael Rash is unfortunately not here anymore, but he submitted some for IP tables, I believe. So there is a place where you can collect them, put your things up there. I put mine up there. You can have them if you contribute some. Maybe someone else is gonna benefit. You can also use Splunk. This is gonna be hopefully my only plug for Splunk today, but Splunk is free if you use it up to 500 megabytes a day, so you can go download it, use it, and there are field extractions. They're sort of like parsers, but not really. That will help you extract the individual fields that you need, so there's a few there. So these are some of the ones that are on the CD. Actually, the TCP dump, Snort Alert. It's the ones I mentioned, I think. That's the ones. Yeah, if you have more, we'll bundle them. I mean, we're happy to do that. Now I lost track where I was before. Oh, I did, sorry. I did, good. All right, where was I? I was at, oh, here. So, what I was gonna do here, I actually know a little bit of something about this capture. I wish I had, let's just do this. You guys feel free interrupting me again if you have anything you wanna say or share or ask. Can you repeat that, please? Capture from an OpenBSD or? OpenVMS? No. What was it? No, I don't think the captures were from OpenVMS. No, I'm pretty sure it wasn't. So if you look at this file again that I generated earlier, there was some structure in here. Actually, four. So, maybe tell you a little more about this capture file. The capture was, maybe you have to explain a little bit of how capture flag works. Who doesn't know how capture flag works here? Hands up for those who don't know how it works. One person, thank you. There's probably 90% here who don't, I'm assuming. Who wants to explain how it works? Anyone? It's great, have an interactive crowd. All right, who dares to explain how it works and comes up here and it doesn't, can have another book. The person who wants to just come up, come up here, you fight whoever is first. He has to put his shoes on first. That's the advantage of being in the first row. All right. Okay, so there's a bunch of teams. Each team is supposed to compromise the other team's servers while maintaining the road server. And each server has services on it that have a flag. So it could be a website where, you know, their picture of their flag is on the webpage. The ones I've watched before, they had like, they had a mud, I think it was a pearl mud. So you'd connect to the mud and one of the rooms in the mud would have the flag and you could view it. And if you could hack the mud, then you could change the flag. And, you know, all these services would be running on the service like, well it depends on which year it is, but all of these servers, service is running. And, you know, if you can break into Telnet or SSH or whatever, you know, crack their password, then you can change one of the flags. So the longer you own the flags, the more points you get. And of course your lose points if your flags are captured. And there's a knock and they monitor everything and keep score and they record all the TCP dumps so that people can look at them later. What do you know about the network setup? Anything? Which year? 2004. The idea is always fairly similar. Do you know anything about it? I know there's each team gets a class C, I got 192, whatever. It's actually a class, yeah. It's a C, yes, you're right, sorry. And there's a router in the, like a Cisco router in the middle, I believe. A bunch of 100 megabit or gig switches and. Yeah, so, all right, thank you for being very brave. I'm gonna explain some more on the, you're welcome. I'm gonna expand a little more. So, as the gentleman said, there's eight teams, each has a C class. And it's kind of important to see, or you can see that in the capture here. 192, 168, four is one team. Seven is another one, five is another one. So you see here, what team was this capture taken from? I don't need the name, but come on. Can you even read the labels here? Four, right. So, the gateway of the team is, be careful. So each team can capture their own data. And whoever runs the game, like Ken Shoto or whoever, they could capture everything, but obviously the teams can't. So this capture is taking a team forward that I was playing on. We had two server, one and dot two. That was, I think, the year that they screwed us all over and gave us Windows machines to protect and to keep up and to hack. I was playing on the Immunix Linux team, and we brought all the Unix operating systems, all the latest patches, everything with us. We were ready for any kind of Unix flavor. They gave us Windows. We went to the room and started downloading Windows patches over some weird link because we didn't trust the networks here. Took us forever to get the patches, and we actually ended up getting one of my friends from my previous employer. He said, hey, you know Windows, you can come and administer that for us, please. We were all these Unix geeks. We had no clue what we were doing. So obviously we didn't do too well in that game. So, if you look a little more here. So these are two servers, and there seems to be this one machine 4.160 that connects out to the other teams. So this is one of our attackers generally trapping to other teams. So he's attacking those machines. Now, the interesting thing is, all the traffic coming into the network comes from one IP address. The teams don't see who's actually connecting to the network. They don't know what team is attacking them right now. And even worse, you don't know what the score bot is coming along or not. So Ken Shoto or whoever organized the team, they run a score bot to figure out whether your services are up. So you don't know what is attack traffic versus actually legitimate score bot traffic. Because otherwise you could just run a firewall and block that stuff. You can actually, what we did was we tried to do some analysis on the captures and figure out whether there's anything in there that indicates whether the score bot comes along or not. And that's every year people try to do that. This year here, it was fairly simple. And this is maybe something someone is gonna find out. If not, I might lift the secret later. Anyway, so we have to structure here a little bit. If what happened, this might be interesting to use that knowledge in your further analysis. What else we can now do is, I was actually working on this one here. So what I really wanna do is I wanna color my servers differently, right? So if it's four dot one, actually let's do this here. One, or actually two. Then I wanna make it yellow. And actually, let's just comment these guys out. Okay, what do you want, green? What's that? I'm missing a backslash, where? Here, I'm telling you it's a real challenge. Type in a Swiss keyboard in VI because I think the developers of VI actually had US keyboard layouts in mind. They put the shortcut keys that you need a lot, like, I don't know, pound or dollar in there. In Swiss keyboard, you have Alt-G-R3 for a pound. Why do you, bitch? It's your choice. Yeah, but then I don't remember the US layout and I have to do it blind. What? Someone get me one of those guns, please. Seriously. Emacs not on there. Here, this was the command I ran earlier. What I'm doing here is I'm just adding this property file with minus C, right? Let's see what happens if I do this. And actually, I think this is just an old habit that I put things in the temp. All right, so now we have green nodes and this is not what I wanted. No one caught the mistake on my regular expression. Seriously, guys. Here we go. And we only did sources right now, so we can probably change that to instead of color at source. Did you get that? Why didn't it work before? It's kind of fast, but I didn't put the dollar in the end here. I have to stop the regular expression there, right? Otherwise it's everything that starts with one or two, which is a lot of addresses. So now, so now I changed the color source to be everything and now everything is red or green. Now what I want to also do is probably I want to have per team, actually let's do this, blue. So now we could go and code kind of every team as a different color. I'm just gonna, I'm gonna do one. You get the idea. Did I catch one? It wasn't in there, five? What's wrong here? It's not right. What am I doing wrong? It's refreshing it. Oh wait. Is there anything wrong here? No, I didn't miss the dollar because I want all the fives. There's a five, right? There's one. I'll mean neither. Yeah, five. NVI is here. All right, I'm learning. David's here, I like this. What is NVI? No color? Whatever, I like VI. Shouldn't really matter, but. So what does color red affect? It's basically first match wins, right? So the most specific should be first, so I color all the fours and then I color the fives with blue and then the rest would be red. Yes, I do. Trey, when did you go to bed this morning? This is demo if I like this. Just gonna glance over it and maybe try six. Probably not gonna work either. I really don't see it. It could be because, yeah, that might actually be it. Yes, I don't like that. I give up. All right, next. What else do we want to do here? Let's generate one. Anyone come up with anything yet? Anything interesting they want to share already? I want me to do? Anyone want me to do something specific here? The question is what format of snort log would I write? I would probably do an alert log because I have a snort alert to CSV converter. So if you run snort minus capital A full over your capture, we can probably, really pushing me here a little bit. Just kind of okay, let's try this. So please correct me if anyone, didn't I say dash? Haven't, can you read? Why don't you come into first row here? You can actually read it. That was exactly that. I haven't seen, actually Marty's not here because he's traveling. Otherwise, oh yeah, you need to tell me to do something. All right, let's do that. He works for you? Oh, okay, that might actually work. Here we go. Whoa. And so let's just capture this into a file to temp, snort, wow, there were a lot triggering. What is in here? This is not the output I want. This is TCP dump type output. Oh, here we go. Ta-ta, empty. Here we go. How here comes this challenge again with the keyboard? I love it. Starting with a pound. You're gonna kill that. O, comma, dot, s, doom-de-doom, space include, slash include. Oh, that wasn't bad. This, oh yeah, great. This is gonna take a bit. I really wanna do that because different rules and different, I guess, variables. And so I just enabled all the rules and one of them asked for SH ports. No? But was it really complaining about that? I think it was complaining actually about something else. It was like Etsy Snort's rules bleeding boot CC excluded on our rule type. There's something wrong in that file. Rules, bleeding, you know what, let's see. Let's see. So the question is whether we can change the edges to kind of encode the amount of traffic. That was submitted. The problem is, yes, you could. You can change the color of the edge right now. So if you say color.edge, you can change that. You cannot change the thickness because graphers doesn't support that. If you have any other tool that you read the dot file with, you could change the thickness of the edges, but afterglow doesn't support it right now because I'm kind of going with the feature set of graphers. But if you send me an email and say, hey, but look, this tool here takes dot files and it does edge thickness, then I will add the feature to afterglow. But I get that question a lot. Unfortunately, I don't know why graphers doesn't support it. Maybe it's just hard to deal with all the thick edges and then scaling them down. I don't know why. Yeah, graph would get messy, I agree. But sometimes it's really useful, especially if you scale it right. It's really nice to see that you have this one big pipe with a lot of traffic going on. I usually, I try to use color a little bit, but I got it. Gee, this guy's good. Is that what you're looking for? Yes. Oh, this is great. All right. What did you do? Well, not really magic. I just called the original config files. I copied back the original one and put the capture into it and that's it. All right, I've defeated it. So now what we can do is var log alert and then run it through snort alert to CSV. Snort alert to CSV. I hope this works. It might not, because the format might have changed. No, it does, yes. So we can now, well, SipDip is not gonna be interesting because we already saw the communications, right? We took the TCP dump output. We know who's communicating. Probably want to take the name and the Sip, maybe. Let's see what if that works. All right, so what we want to do is just see what source is triggered, what signatures, right? In one way, it's these afterglow to do this, I guess. Interesting, all right. So we have a few signatures being triggered. I don't like this, right? Just kind of preference. So you see that 401 triggered a few signatures here. It's a little messy, the graph. Kind of overlapping stuff here. But there was a scan end map from 4.152, which I probably expect is one of my attackers running this scan against someone. Let's actually see who it was directed to. So what we can do then is say SipDipName, no more minus T here, because we have three columns coming in. Now we see that, where's the other guy? Here, 152, communicating with 72, and he generated all these different signatures. So that's what we find. My attacker here is attacking the 72 machine here. Then we have one of the servers talking to another one and he's using, or he's triggering some signatures. That's actually fairly interesting because it could mean that 401, one of the servers was compromised and now attacks the other internal machine. That's probably something to investigate. It might be that because it's captured your flag, so these services are already kind of vulnerable, so that the communication of one machine with the other was already kind of funky and triggers these snort signatures. So you would actually, the next step would be to go in and figure out why these things have triggered, right? Which is always a really cumbersome task, but unfortunately you have to do it. But even more interesting seems, actually you know what, I was lying earlier. 401 is the gateway, two and three are the servers. So the attacks came in to each of the servers. Sorry, guess the other graph looked like a little different. I didn't see that there was traffic coming from 401 to both of these machines. So that's one thing to look at. The problem here is that it's not really useful to look at this, right? Because there's so many attacks coming in. You know there are attacks, and they just come from this one IP. Maybe you can then go back and figure out what exact packets were they, and see whether you can find any kind of identifier that shows whether this is a score about traffic or attack traffic. So maybe you see something funky in the attacks that you then can use as a filter later to filter the traffic. But it's not necessarily even that these are real attacks now because just because I have some null bytes in there, doesn't mean it's an attack because the score bot could just do something really funky or the flags could actually trigger it or something. So during the game we actually didn't really, we had a Snortbox running, but I don't think it was useful. Anyways, but this, I guess this gives you an idea of how to run stuff into Snort. Look at it, maybe you'll find some more stuff in there if you're digging a little bit. But I was gonna do something else, actually. If we go back here, what I wanted to do is really generate this graph here and see who's accessing what ports. So let's see, probably change our file here to reflect my misinterpretation. So I'm gonna just do two or three other servers, right? So I'm gonna do this and do this again. You guys are awfully quiet. Did you find anything yet? Is that right? I think, no. Four, this is right? Doom, doom, doom, doom. Yeah, this looks about right. So now the gateway is one. Then this was one of my servers. It doesn't show up in the communications here, which is kind of interesting, actually. And then you see the ports again, and we already did the bar chart earlier, so we kind of know what ports are showing up here. But at least now you see that this guy here accessed this machine on this high port number that might actually just be outgoing traffic. And I might not have enough captures to figure out what's the source and destination. So if we would actually go into the traffic right now, this is probably a response to this port and not a communication going out. So this might actually be a source port instead of destination port. But because I'm running heuristics on what is the source, what is the server, and what's the client, I might not have enough information to figure that out. So it just assumes the communication went that way. So that might just, I think that's probably a false positive there. But this gives you a little bit of another impression of what's going on. What I usually do also is I just, I make the edge lens a little smaller. If you do one, three, or something like that, it's just a little more compact. So the graph render is a little different. So that's just something I like to do. All right, I want to show you some stuff in terms of using Neato. The things we just did are like static pictures. So maybe you want to explore the graph in an attractive way. So instead of calling Neato and generating a PNG or whatever, you can call LNeato. So this gives you an interactive display. So you can actually drag around nodes. So if there is some cloud and you want to clear it up, you can just drag stuff around. You can like drop nodes when you are on a node, you can say D, it's not easy to use. You need to get to the manual to do it. So you can delete nodes and relay out everything without the nodes we just deleted. So if you sort of see things which you say, okay, that's normal, just delete the nodes and relay out. So maybe you get a clearer display. There is also a thing when you do right click on the window, you can activate the bird's eye view. This is a small representation of the graph. So now you can sort of like zoom in in the big one with a big or capital sad. And now you can sort of click into the bird's eye view to sort of browse around the image. So this can be helpful. The problem I found out is when you have really huge graphs, it doesn't really work out any more pretty good. So it's a nice feature. There are also other tools. So maybe you're into eye candy. There's a tool called EyeNetVis, which does a 3D visualization. So we have here a cube, which you can drag around. And the different dimensions are like the green line dimensions are like the green line are the ports from zero to six, five, five, three, five. Then you have the blue and the red line, which represent the source and the destination address. So now we can run the capture file in this view. So there must be some other GUI. Let's search for it. All right, so this is the main view to organize stuff. So let's open our capture file. Browse to the location where it's located. Open it up. All right, so now it's loading. There's some warning, just click it away. And now we could run it in real time, like we could go here and play. So now the capture file is running. You see this on the bottom left of the cube window. And now you see the different dots coming up. So these are the connections, which are being visualized here in the cube. We can sort of fasten up things, like go 10 times fast. Or we could say, okay, let's just head at the end. So we move the slider at the top here, and then everything should get sort of visualized. So now everything is sort of one line. Probably it's not one line, sort of just one system attacking another and on all parts. So probably we have to sort of modify the display to not cover the whole internet space, but just to cover the space of the CTF game. So you can do this with view plotter settings. And here you can define the range of the different dimensions. So it's somewhere in 192, 168 something. I'll take a net mask of 20. Then I do the source dimension. Also 20, I accept this changes. There's some warning. Go away. All right, so now we have just a view of the traffic within that range we've just defined. So the nice thing about this is, you not just only see vertical scans, meaning that a system scans a particular system on all ports, but you can also see things like a horizontal scan where the scanner just hits one port, like port 80, but through a huge range of IP addresses. So you'll notice this here in these graphs pretty fast. There are limits. I don't know how far it goes, but there are limits. There are also tools on the disk which allow you to sort of cut files up. So if you have a TCP dump file, which is really huge and crashes, I need this, you just cut the file up. There are things like edit cap on the disk. And we'll probably add other tools like TCP splice, slice, I think it's called, to the disk in the next version. You can also do things with ether ape, which allows you to view the traffic sort of in a circle. So let's open our capture file here once again. Say okay. And now it's sort of going through live mode, so you see how the traffic develops. The tool is nice if you do online, sort of you have live trackfic, you wanna look into it right now, then this is really helpful. But like postmortem as we do it here, it's not very interesting because you cannot slide through the dimension time, so you say okay, let's see it in what happens in 15 minutes. So all the nodes here are sort of put in a circle and each of these lines here represents a connection between those IPs and the color represents the intensity of the traffic. And on the left hand side you see which protocols have been found in the traffic, sort of to get an overview what's there. I think the color I think is actually encoding the protocol, isn't it? I think you can even change that if there's a tab that shows the color assignment. Yeah, that's right. So there must be some setting here. Yeah. So you can color the things as you like it. Did anyone find anything interesting or any graphs? Well, you can do it live if you like. Just come up, you're in for a book. Eric, do you actually know how to operate a Linux machine? Yes, I can stumble my way through it. I make the phone bigger. I'm not gonna say who he works for. Otherwise people won't see anything. So let's see, font and large. And now you're making me remember the whole syntax again. We'll help. We're nice guys. Oh, I made it smaller again. Ah, I hate it. I'm sort of challenged here. On your own laptop? Yeah. Yeah. All right, so I'm ready. 33342, a whole bunch of source traffic, okay? So 3333, that's service. Jake helped me. That was, was not month, month was eight at eight. No, that was eight at eight, I think. Then 33, we just looked at it. Which one, which port did I look at when we saw that dot net stuff in the, I can't remember. So but there was a weird dot net service running and then the other one was the mud thing. We looked at the text you saw and I did the minus a on TCP dump was actually from that multi-user dungeon that you also mentioned. You're helping out with the TCP. Yeah, right, where you could kind of teleport from one room to the other and steal the flags. What was the, what were the switches? Oh, TCP dump? What were the switches that you were using? Maybe you should install an alias for that. I actually wrote a shell script online. Actually just before we're gonna prepare it for the, for the talk, I started a little shell strip that generates kind of the most important graphs. Like it just runs through the same capture, generates a barge of the source destination services and then it could actually graph. I did the same thing. I'm gonna publish my first. I just saved mine in the script. Do I do it on everything or? No, that's good enough. Do you want to parse it? That's good, no. This is VI. Actually, oh you're not in VI right now. I mean VI mode. Just hit enter. This is in non-English keyboard. Could you give me, could you throw me a bone here? Set minus oh me emacs. So are there any questions in the meantime? Everything clear? That's the one that I was using. What about G-Gobi? Yeah, so it's just a dot plot. This is just, the pictures you see in the manual, these are really simple things from data that came along with the tools. So this is not really security related data you see there. So it's just a sample how to use the application. So if I remember correctly, I'm trying to find it in the book. Here, I was saying I have to peek. Basically it's a tool that helps you generate bar charts, scatter plots, parallel coordinate. It's just a whole bunch of different types of graphs. The interesting thing about this tool is that if you have a certain data input, you can say okay generate me a bar chart of the source IPs and then generate me one of the destination IPs and then show me a parallel coordinate where I have both of them and maybe also the destination port. Now these three graphs will all be linked together. So if I select one of the source addresses and one of the graphs, you will update the others with your selection. So you will see what machines this source has talked to and all of them will update at the same time. So you can use different views on the data to kind of investigate it and figure out what's going on in there. This is actually a really, really powerful paradigm that you have these linked views. There's also a concept called dynamic queries where you can do on the fly kind of investigations and drill downs and everything updates in real time. So if you start filtering for a certain source address, for example, you get rid of everything else and it's right away, it's there. So you do kind of a visual investigation or a dynamic query as it's called to look at what the data's really about. That helps a lot to kind of very quickly see, okay, this source connected to all these machines and then on these ports, and maybe you want to then pivot back to see what port these other machines connected to. And it's just a very easy way to kind of play around with the data. There are commercial tools. One that I was allowed to use also for the book and I know pretty well by now is called advisor. That's kind of the commercial end of G-Gobi or I think Mondrian also is a similar tool that has linked views. It's really powerful to do analysis in there. I did one analysis on botnets that I described in the book where I use advisor just because I couldn't find any open source tool that would actually do the job. Otherwise everything that you find in the book is all done with open source tools. I think that was the only analysis I used actually, something I have to pay for. What does it cost? I have no idea. I think it's fairly expensive. It's a BI tool, so business intelligence, and they like to charge for what they have. I mean, to be fair, it's a really, it's a commercial tool, it has a lot of features, it has different data inputs you can take, and it's fairly powerful, so the price is definitely justified. Unfortunately, no one came up with an open source variant yet that's good or is better. Eric, guess the result, no? That's parsing. Oh. Yeah, I didn't, it wasn't very interesting when I only took the first 2,000 records, so I parsed the entire file, which is like 100,000 records, but it takes forever for G. Gobi to parse that entire. And this is where a lot of the visualization tools, even commercial ones, they have problems with large data because G. Gobi specifically, they have to put everything into memory, so they support dynamic queries, right? Because if I click on a certain bar, a source address, they need to know what destinations are linked to that. So there are things in data mining, like the OLAP cubes and whatnot, and this is not an OLAP, or this is not a data warehouse where you actually run something very powerful underneath. It's just putting it into memory and probably using some really horrible data structures. Yes, so it actually kind of jumped out on this visualization, but I was having trouble getting it to show, it appears to only be showing, it appears to not be showing all the IP addresses, and I was having trouble getting this tool to... Yeah, so one of the other things is these tools are all, they're all different to operate, and G. Gobi, I think, is fairly horrible too. So at first I was really excited because this appeared to show a lot of activity on port 3333, but then I was a little bit less excited. So now we can open a second one. This is a scatterplot matrix where you have, let's see, if the, so here you see, I have this, I have a presentation on this. Most of the visualization tools have, they don't really follow good principles, right? One here is just label the freaking axes. And you know what? This is actually a problem of how we generated the data. I should not even blame the tool. In this case, the tool expects the first row to be the name of the column, and I think the first one just says 3333. So I guess the tool does it, it actually does show the axis labels. So I guess we just infer it. This is, I guess the source address here, you see the distribution over, that doesn't make sense. I don't even know what this would show. But this one here shows source addresses and then ports go in that way. That would make sense. So one. So if we edited that CSV file and put a header row in. Yes. And that's again, because there's no, every tool does it different. You don't know, right? You have to know that the first one has to be the label and for other tools it's even worse because you have to do many, many transformations. One of the worst ones was probably Parvis, I think, which is a parallel coordinate display. And what you have to do is you have to translate all your data input to values from zero to one. So you have to scale it all in. If you have IP addresses, that might work, right? You can probably have a mapping. But what if you have strings? How do you map them in? You have to have a hash table or something to map them in and that was horrible. So that's why I ended up not using that tool much. But I think now I just saw a posting on secris.org. Someone posted in the last couple of days some parallel coordinate visualization tool. I still have to look at it, but that might actually finally be one that we can use. So I think we're short of being done here. Does anyone else wanna share some last thoughts with us? I'm gonna switch this over to my laptop again to kind of conclude. But anything else? Yeah? So the question is, or the comment was first that this is very network focused or event focused. Whether I gave or we gave any thought to binary analysis and this is actually perfect leeway into this here. So Greg, here's the image, Greg. So the book, and what I normally do, I'm really focusing on time series data, everything you can associate a time stamp with, right? That's the network traffic logs, that's maybe configuration files, all kinds of IT data that you can associate a time stamp with. Greg was sitting over there, if you don't know him. He's working a lot on doing binary analysis and in his book also he's showing a few tools, one of them, Rument, which is the tool he wrote, that he can use to analyze binary captures and I guess for forensic investigations, you would run a memory dump in there, for example, and Greg's presentation is probably something you should look at if you're interested in that, where he showed how you can go through memory dumps and visually figure out what's going on in there, which was very interesting. That was that. Any other? Any other final comments? Yes? Which one? Oh, Backtrack 3 has a new visualization tool in it. I guess we have to talk to Mox. Apparently there's a new, in Backtrack 3, there's a new web crawler that visualizes some of the traffic. I post on Secwis, just make an entry here, something cool in Backtrack. People can go there and see it, so we all get the feedback. Okay. Anything else? DCP Dump supports that, so they can send PCAPs to a remote station? Why are you sure? I haven't played with that. Can any of these tools handle life data? Yes, there's a few on there that are actually kind of interesting. One is called Real-Time Graph or RTG from Philippe. That might even be wrong. I'm a French guy. He wrote Scopi also. It's on the CD, or it's on David's. Check that out, you can actually run, what I did is I wrote just a bash script for a loop, a while loop, and I pumped the TCP Dump traffic in there, and I think I submitted that on Secwis.org. If you go and poke around in there, I actually show the little script you use. You can add new nodes on the fly and it does these 3D kind of link graphs and just adds new ones, and they get a little messy. That's why I think I reset it after a certain time, but that's kind of cool. I used to run that in the background on my machines or as a screen saver. That's kind of interesting. You can also, what I do sometimes is run again just a while loop and pipe like 2,000 lines of TCP Dump into a file and then run Afterglow over that and just generate an image every 2,000 packets or whatever the number is you're comfortable with. And then use some GQ view, for example, that updates automatically when there's a new file, so you will see the traffic at every instant in time, and you will notice that you will actually recognize the pattern of normal traffic, and I was able, I ran that at work, and again as a screen saver, and I was able to see even someone or my machine did something weird because the pattern just looked completely different. There's some other tools that do real-time traffic also, but I think we have to wrap up here. If you're really fast, I have about five t-shirts left if you want one. And Eric definitely gets the book. Didn't I give you a run already? So we're gonna hook up Eric's display as the final slide. Thanks everyone for coming. I hope this was a little useful. It was kind of an experiment to run this at Defcon as a workshop hands-on. I know it's not the same as running a training where everyone knows they have to bring their laptop and do this, but I hope it was a little fun for you. I enjoyed it, thanks.