 Alright, I'm Raphael, I work for a company called ArcSight and for a living I do enterprise security management or log analysis and correlation. And somehow over time what happened to me was that I got kind of bored of the textual log files and parsing and everything so I somehow developed a passion for visualization and that's what I want to talk about today. So the rough outline is going to be that I want to introduce the problem very quickly. I'm going to show you what graphing really is and what I mean by that. And then I'm going to show you a tool that I developed over time and some of you might have seen my talk last year where I introduced Aftergo a little bit. Over the last year I have made quite some improvements on the tool and there are new features in there which are quite interesting and I've gotten some really good feedback from you guys on what I should change and how you're using this tool to actually visualize your own data. And what I want to do in the end which is kind of the main part is I want to go through a firewall log file and show you some of the things you can do with it and how you can visualize it. Just to send that up front and you might have heard that from me already but I need to do it because of legal issues, IP addresses and host names showing up in event graphs and descriptions were obfuscated or changed. The addresses are completely random and any resemblance with well-known addresses or host names are purely coincidental. But don't try to play with these IP addresses and think it's me. You're screwing someone else. Alright well the motivation for doing visualization or looking at graphs is really the question here. What would you rather look at? Do you want to look at the left-hand side or the right-hand side? I prefer the right-hand side because I can't even read the font on the left-hand side. And I guess this is something probably everyone is hurt by now but a picture is really worth a thousand log entries. I hope to be able to prove that in the next 50 minutes if I'm not going over time. Alright so how does graphing work or what do I mean by that? There are basically two steps if you simplify it a little bit. You have a device that generates a log file or maybe events. You record them and you want to make a visual on the right side some kind of representation bar chart or a link graph or a tree map or something. What you have to do is you take that log file on the left side and you need to parse that thing. You need to figure out where are my source addresses in there, where are the destination addresses, where are the ports. So you need to parse it out and figure out all those things. And then you throw it into an event visualizer, whatever that is. You can use afterglow which I will show in a second and you generate a graph. What I'm going to look at for graphs today is I don't want to cover all the different types of graphs that are out there or visuals. There's bar charts and pie charts and line charts and so on. What I want to cover is just these two types. There are link graphs and tree maps. Link graphs are basically you have a couple of nodes or three nodes and they're connected to each other and they show you a certain relationship. Tree maps, they look a little complicated when you look at them the first time. You might not quite understand what they are but I want to introduce them to you and show exactly what they are in a second. Now if you take link graphs, a lot of times you will see graphs that show you communications between different IP addresses. You see that this machine talked to all these other machines but the interesting thing is that you can actually reconfigure these graphs to also show you other things in the log file. So what I have here is a snort event. You don't have to be able to read this but I basically take five fields out of this snort event. It's the event name, the IP addresses and the ports. And there are different ways to configure a graph now. The regular way would be that you take the source address, then the event name and then the destination address. So you will see things that what has this IP address done to other machines. However, if you configure this graph a little differently, if you say, well, let's visualize a way that I have a source address, a destination address and then a destination ports. You will suddenly see port scans because you see source machine going to some other machine on all kinds of ports. So you wouldn't see that in the first configuration where you see a source IP, a name and then a destination IP because you're not showing a destination ports. So the point of this slide is really you can configure these graphs all kinds of ways and depending on the configuration, you will see different things. Now let's have a look at three maps. And I'm not sure how many people are familiar with those but they're really useful for a lot of cases. So let's assume this rectangle that I have here is all your network traffic. Everything you see on the wire. Now what I want to do, I want to see how many percent are UDP or TCP. Basically I want to see the transport protocol here. And what I do is I just configure the tree map to use the transport protocol as a parameter. So what happens, I have 20% UDP, 80% TCP and it just partitions it that way. So 20% in the rectangle is UDP here. Now if I change the configuration, do not just be the protocol, the transport protocol, but I also want to see the service. So what happens then is basically this space is subdivided now and the protocols are shown inside. So for UDP you might see things like DNS, the rest of them P and this is probably like a ratio of 90 to 10. And then on the other side, the TCP, you see the partitioning to all the TCP protocols you have. So you can set up a hierarchy and the hierarchy is going to be displayed inside of this tree map and you could go deeper. I could say now that I want to see the target ports or the source IP address and it will just be subdivided inside of these little rectangles and it will show the percentage again. And you will see a real example later on what you can really do with this. Now let's switch over to Afterglow which you will find either on the CD or on SourceForge. There are two versions right now, it's a little confusing, I apologize for that. There's basically a version 1.x, currently the one on CD is 156 and it's basically a Perl script that generates your link graphs. And then the version 2.0, I started coding in Java and I used the library and I thought I would be able to get all the capability or functionality from 1.x into that new Java framework. But it turns out the library I'm using is a little buggy and the author was not really responsive. I traded a few emails and so far I haven't gotten any patches for the bugs that I found. So for now there are two versions, I'm sorry. Hopefully at some point there's a 3.0 which combines everything. So what you will also find in Afterglow is when I started playing with the log files I needed to write parsers because otherwise I couldn't really visualize my log files. So what I decided to do is the parsers I'm writing I just put under the Afterglow distro as well. There are three parsers, one for the BSD packet filter, one for TCP dump and I updated that to 3.9 recently. So if you're using an earlier version of TCP dump it might not quite parse it out anymore. And then one for send mail. If you look at the parsers quickly they might actually be quite useful even if you don't want to do visualization. Here's an example of how to use TCP dump to CSV. So what it basically does is you pipe TCP dump into it. That's just a long common line on top. And then you tell it what fields you want to extract from the original TCP dump output. And here what I said is basically I want to see the source IP, the destination IP and the source port. And if you want to know the names or the fields that are available look at the source code there's a list of all of them. So what this thing also does for you it takes care of connections. If you look at a connection or if you look at TCP dump and you see a client connecting to a server you will see a request going over the wire and the source address is the client, destination address is the server. But then immediately followed by that you will see a response. And now the source address seems to be coming from the server going to the client. If you visualize that you will suddenly see that there was a bidirectional communication. So you're thinking that the server actually connected back to the client. Which a lot of times you don't really want to do you want to see the real connections who was connecting to whom and not the bidirectional thing. So the parser takes care of that for you. It will keep state and remember what initial setups it saw and using some heuristics if it sees a syn by itself. Then it's probably the session setup if there's a syn act that's the response. So it keeps a hash of all those connections and inverts the output if necessary. Then the send mail parser. Send mail is they have certainly not designed this thing with logging in mind. What they do is for every email they generate two log entries. One entry tells you this email came from someone and then the other entry tells you it went to these people. And the way you connect them is the message ID which is in bold here. So what I did is I just wrote a Perl script that basically remembers all those message IDs and whenever it finds the combo of two went froms it will output one entry showing you the two went froms in one line so you can actually visualize connections now. And then the third one is the packet filter to CSV and I'm gonna use that in a little bit so I won't waste any time on it right now. Now afterglow and we saw a similar diagram already on how do you graph something. So basically you have a device outputting some log file. You have a parser. Either you use one of the threes I provide or you use a awk or something or Perl to parse the logs into a CSV file and that file you pump into afterglow and afterglow outputs not a graph directly but a graph description. So it basically uses the dot description from the graph is library and it just says okay there are these nodes A, B, C, D, E and they're connected that way so you have a node from A to B, one from B to C and so on. So it just gives the description and then you use some kind of a grapher and I usually use graphers to generate the graphs because what that does, it does all the layouting. So it figures out where to place all the nodes so the overlaps of the edges are not too high and you can kind of identify groups and so on and I didn't want to tackle that problem because that's other people have done that. So what are some of the features of afterglow? Well, it generates link graphs. Then what you can do is you can filter nodes but you can say well I can do that with grep minus V. Yes, you can but it's a little nicer to actually configure it in afterglow and what you can do is you can either do it on the name so I can say I don't want to see any nodes that contains windows, let's say or you can say based on the number of times that node was seen I only want to see nodes that have been seen more than 10 times or more than 100 times or something so you can kind of filter noise out. Then you can do fan out filtering. A fan out of a node on the right side there that blinking node has a fan out of three. So there are three connections going to other nodes so you can specify when you draw a graph that you only want to see things that have a fan out of X so let's say only everything that has more than five connections going out you want to visualize and you might already have in your mind what you want to do with this and it's basically detecting port scans very, very quickly and filtering the noise. And then you can color everything you can color the edges and the nodes and finally you can do clustering which is very useful as well because especially if you start visualizing firewall log files you have this hundreds of machines or thousands of machines connected to your network you're not interested in every single one because that graph is just going to be cluttered so you want to cluster all those together and that will show you what I did with that. If you start after glow you can use minus H and I think the usage is actually up to date I hope. These are not all the parameters that are there but the most important ones and the most important one in here is minus C where you provide a configuration file and it will show you a few of those files in a second. Then you can use minus D and that basically adds the count of how many times a node showed up into the node itself. So you can kind of figure out what was more severe than other things because you might have seen them a lot of times. Then there's an easy way to change the edge length you just say minus E and then give a number that's basically the default edge length that it uses to have the nodes separated. Then you can also use minus N if you're not interested in node labels at all you just get rid of them and sometimes that's really useful to just see kind of patterns in there if you're not interested in seeing all the labels and knowing exactly what it is but just kind of figure out what does this look like. And then you have minus O it's a horrible way I named it here omit threshold it's basically how many times does a node have to show up to actually be in the graph that's what I kind of described before and number of occurrences. And then you have a minus F and a minus G which is used for the fan out filtering and you can specify two different fan out filters one for the source nodes how many time or how many connections does that guy need to have and one for the event nodes or the middle node how many connections that guy have to have to show up and minus F is for the source and minus G is for the event node. Now we're getting into the property file and this is really where you configure everything for afterglow. Let's start out with coloring. What you do is in the file you basically see color dot and then either source event target or the edge and then you give a pearl expression this gets really evil here. The easiest pearl expression is just a string you just say red and that the source for example is gonna be red all the time. Then if you wanna get a little more sophisticated use the next line color event equals red if and then I provide some kind of an expression there and afterglow keeps track of the current log entry in the at fields array. So whenever if I do this here dollar fields one that means that the second column starts at zero the second column has to be matching these reg X basically starting with 192 dot something. In that case I would make the node red and you can do all kinds of things here it's really just doing an evil on this pearl expression so whatever you wanna do you can read from files and all kinds of things. Then if you wanna filter the nodes you just use the color invisible. Same idea you say invisible then if the first column equals IS action it won't be shown. Then the clustering works very similar you just say cluster dot source and then again you give a pearl expression. What this does now is it evaluates that for every node and basically returns or changes the node name. So if I have a whole bunch of 192.168 addresses and I wanna cluster them together into one node I can basically say that the cluster source equals and then I just give some kind of a string I can say 192.168.x.x if the field matches my regular expression 192.168.star. And that would be the node that is assigned now instead of all the other so that's evaluated every time a node is drawn. We'll see an example of that in a second. So this is kind of my hello world example. My input data is on the upper left there are four entries in my log file A, B, A, C, B, C and D, E. If you would just visualize that you would get something on the lower left the nodes are connected. The command to run after glow is on top and I think there's a pipe missing in your slides on the top right but you will realize that yourself. You cat the file, you pipe it into afterglow you give it a property file and you use minus T. By default afterglow expects a three column input basically the first column being the source the second column being the event node and the third column being the destination node. So it's always a three tier thing here but if you use minus T you just give it a two column input and that's what I have here. So now to color these nodes as you can guess how that works you just say color.target is blue and by the way that's all my PowerPoint skills I can't go any further. So target is blue if the second column is not an E. So you look all the target nodes the ones that get an arrow pointing to them they're blue, the E there is not blue because it's not E or it is E. Sorry it's early. Then the source I wanna make green if it's not D. So that's that one guy there and by the way it's first match and then it's done so it doesn't go and evaluate the rest. And then everything and all the other sources are gonna be red and then whatever is left I just say color is green and so I don't have to say color.target now or anything I just say color and that applies to all the nodes. All right so that's the easy example and it's getting harder from here I hope you're all still following, okay? Good, all right but let's quickly talk about Africa 2.0 so in case you wanna play with it it's a little simpler. You have two comment line parameters. One is for the property file and the other one optional is for the data file. You can also provide the data file inside of the property file if you choose to do so. It's taking CSV input again, it's kind of my mantra I wanna use CSV files because then you can use your own parsers I don't have to care about your data set at all. You just give me the CSV file and I try to draw your graph. And the interesting thing about Africa 2.0 is that you have the tool generate the graph itself so it doesn't just generate a description and then you have to use some kind of graph to actually visualize it but it's a Java itself that generates the tree map right away and what I can do with that is you have a little bit of interactivity. You can basically zoom in into the different parts of this tree map if you wanna look at close ups and you can change on what level of the hierarchy you wanna use the color. So before, if you remember the example my top level hierarchy was the transport protocol so you can say I wanna color this graph by the transport protocol so you will just have basically two different colors in your graph or you can choose the color on the service and then you will have all kinds of different colors just as in my example. So you can kind of play with where you wanna use the color from what depth in the hierarchy. And a simple output looks like this. All right, so let's get a little more practical and let's see how we can visualize a firewall log file and you gotta stop me if I'm going to fast here. So if I'm analyzing a firewall log file there are a few steps I wanna take. The first step is certainly I need to parse it. The second step, I probably wanna look at what is coming into my network. What is allowed through the firewall so I kind of know what is hitting me so I can probably verify the ACL and figure out whether there are any gap or any holes of things that are coming into my network which shouldn't. Second, I probably wanna look what is leaving my network. Everything is allowed to go out and I'm actually quite surprised to see very big corporations in the US still having no outbound ACLs and just letting everything out which is, I don't know, great security. Then I usually look at outgoing blocked traffic and that tells me that either I have misconfigured machines which are not configured according to my policy because they should not be trying to connect outbound if they're not allowed to or maybe my firewall rule set is misconfigured and there is really a need for this service to communicate out. So if you have ever set up a firewall yourself and nothing worked anymore, you might have realized that you might have blocked ICMP and it doesn't do path empty discovery anymore or something like that. So communication is gone, especially in the older DSL worlds. And then if you're brave, you can also try to visualize or analyze the traffic that is blocked from coming in and you will see all kinds of things in there. So the parsing part, I have a PF output here, a log file, just two lines. Then what you do is you cat that file and you use PF to CSV to parse it. And in this case, again, I wanna use source IP destination, IP destination port and the output is just gonna be those three fields delimited by a comma. Now this CSV output is the input for afterglow. Well, putting it all together, you basically cat the log file, PF to CSV, give the fields you want, pipe that into afterglow with a property file and then pipe it into a graph viz. And there are basically four tools that graph viz provides that use different layout algorithms. One is dot, which uses a hierarchical layout and you will see that in a second in the first example I have. Then there is a needle, which does a organic layouting. So it tries to place the nodes optimal so there are not too many overlaps and similar things fall together. Sometimes in a few cases that doesn't work out too well and there are two others, there's circle which uses a circular approach. It basically starts by putting all the nodes into the middle and then there is apparently too much overlap there. So it starts to just move some of the nodes out and out and out. So it does like concentric circles until it finds a layout where the nodes are placed in a decent distance to each other. And then there is a 2PI which uses some other weird algorithm to place things. So if you get stuck with the others you can try 2PI and see what comes out of there. All right, so let's take the first use case we wanna go after here. I wanna look at all the traffic that got passed coming into my network. So what I do is I somehow grab all the passes in my log file and this is, don't do this literally, I just use this place all the pass in. It really depends on how you filter for that in your log file. What I did in my log file is I grab everything that is coming in on my outer interface which I think was Excel one or something. Then you pipe it into PFCSV. I get my source IP destination IP destination port and then I go into afterglow, give it a properties file. I use minus D because I wanna see the counts on the nodes. How many times did these things show up? Something showed up just once. I might just close my eyes and not think about it. And then I pipe it into dot. That's the hierarchical layouting. And that basically looks like this. Do you have a node on top called external? Which is already kind of strange because there is no external IP address or an IP address that's called externally my log file. So we'll figure out how we do that. And then you see all the machines in my network tab where it exists. And then the bottom circles are the target ports or destination ports that you connected to. So how do we get the colors and the clustering? Well, use a few features to make this not too complicated. So the first thing we wanna do is the clustering. So what I do is say the cluster.source. So all my source addresses. Because I'm assuming the sources are in the internet. That's what I'm looking for. Everything that comes into my network. So the source is external. That's the cluster name that I wanna give it. If it doesn't match my internal IP range. And match is a function that Afterglow provides you. And it basically, I told you before that there's this add fields array that it basically keeps track of the entries in the log file for the current entry that it looks at. The match just looks at all three of them and makes sure that none of them matches. And if I mean source, it only looks at the first one. If I just set cluster, it would look at all three of them. So it always uses the one where it's evaluating currently. So this is a very simple way of clustering all my external nodes. Then I say, well, I wanna have this thing red. So basically if the field is external. I could also use matches here if I use external as a reg X basically. But I chose to use field, which is another function that I have that just returns to current field it looks at. And then you can compare that to something. Then the event. So that's a dark blue nodes there. I wanna have them blue if they match my internal address range. I hope that all my nodes are gonna be blue now. However, you see there are a couple of light blue ones in the bottom left. So what I did there is I say everything else that's not blue is light blue. And that's actually kind of interesting. Why do I have the light blue ones on the bottom left? Well, if you look a little closer, you see that my machines were connecting out to those guys in the internet and they're using port 20. And that's an artifact of FTP because the way PF to CSV parses my file, it doesn't know that there is a back connection out from FTP, so that still stays in there. But it's kind of nice to see that. So I know that two machines in my network were actually FTPing back out or sending traffic ban data out to the internet. And everything else is just red. And what I see in this graph now, if I look a little closer, I see all kinds of different ports. I see port 25. Yeah, I have a mail server. I have port 22. Yes, I do SSH administration on there. I have some echo requests. Well, hopefully that was me pinging those machines and no one else. FTP, I have web, I have HTTPS, I have some DNS. And then I have port 100. And it's actually a log file of mine and I had that lying around for like two years and I visualized it and I'm like, what is port 100? And it showed up only 21 times. And I remembered, yes, that was my backdoor SSH access whenever I updated SSH on port 22 and I didn't quite know how to do that back then that I don't, if I bounced it, I always locked myself out. So I just had another SSH on port 100. So I could connect in there, bounce the one on 22, update it and whatever. So that's, I remembered it was all okay. So I was good. All right, so the next use case is a very simple one. I wanna see everything that goes out of my network. And here again, I'm using very loose syntax. I don't, you don't see the PF to CSV up there in the command. So if you run this literally, it won't work, but just you, instead of the graph pass out, plug your PF to CSV in there and the things you wanna do. So basically, I look at all the things that are going out and this is the graph that shows up. You can't read the node labels, I know. In the middle, it's an external node again and everything connects to it. Those four machines or three machines, no four of mine, the light blue ones, the circles that you see, they all connect out and you see the red ones are all the target ports. I don't see much of concern in here. There's a lot of web traffic. There's some echo replies. So maybe I could actually filter that because I have people pinging me. So I might wanna restrict them from knowing that my machines are up. I have some DNS traffic going out and one of the traffic, a little bit of the traffic is 123 NTP. That's probably okay if I'm sinking out with some external source. What I might wanna do is verify what the target addresses of those NTP connections were but they were really NTP servers I configured and not some other machines and someone is trying to tunnel something or so. Well, the next thing I wanna try here is I wanna look at all the traffic that gets blocked going outbound. And remember, these are a misconfiguration of my internal machine or the ACL is misconfigured because there is a service that is needed by the internal machines and it's not allowed. What comes up is this really nice cluster of nodes. What happened? There's a lot of target ports that are being blocked. So does that mean all my machines are misconfigured or the ACL is wrong? Probably not. Let's think about this. We have a client and a server simplified. I have some machines in my network which are the servers, the clients probably on the internet. I have my firewall with two interfaces. If I have a connection, again, this request-response thing, and this is gonna screw you up all the time. So always think about this when you have, if you get a graph like this, whether there's something wrong. So what happened here is when I built my firewall rule set, I was not very consistent on what interface I'm logging and I logged on the responses going outside. So not just did I not only log on the requests, I also started logging on different interfaces. So always think about it. Already when you set up your firewall, what do you log? Do you just log requests and responses as well? I don't know what interface you log them. And it also depends on where you wanna have your ACLs on the internal interface or the external interface. A lot of firewalls don't give you that option, but especially if you start playing with IP tables or PF or anything else in the more hand-woven firewalls, you will run into this problem. So think about it before you do it. So this is the reason why we have this huge cluster of nodes here. So, well what I did is I basically extended my PF to CSV and unlike the TCP dump parser, I didn't wanna take care of the swapping right away and kind of transparently to me, but what I wanted is I just wanted to see it should just tell me if it thinks that it should be switched. So if I say reversed as a parameter to PF to CSV, it will write an R in the last column. And it was really late in the night when I coded this and I haven't changed it yet, so bear with me. If it's reversed, if it thinks it should be reversed, it just writes an R in the last column. So if I do a grep minus V of an R in the end, I won't see all those responses. So if I filter out all the responses and you could use any other method of getting rid of your responses in the log files, it's just one way to do it. Then I end up with a more sane graph here. It's still a lot of target ports. If you think about it, everything that's blocked going out, there really shouldn't be anything. But there's some artifacts, like very high port numbers where the heuristic to actually determine whether it was reversed probably didn't kick in. It didn't have enough state to figure that out. You could probably go ahead and improve that heuristic, but I was kind of satisfied with the graph I got here. I can read this and kind of filter them manually. But there are some other interesting ports. One is 427, a service location. And what this told me is that I had four Macs standing in my network, and they're using, I'm not a Mac person, so I'm not quite sure what this service actually does, but it's some kind of an automatic setup, service broadcasting and it advertises itself to the network saying, hey, I have these services, I'm here. So this concerned me a little bit and they actually showed up like 16,000 times in my log file, it's probably more than, that's probably almost all the block traffic that I had. If you had to go through a log file, this log file manually or textually, you would have seen about 16,000 of these guys and somewhere intermingled, you probably would have gotten tired finding those other ones, the other ports that were blocked. So this helped a whole lot to actually find this and hopefully none of this traffic was actually leaving the network and it was all blocked. All right, now we're looking at the blocked incoming traffic, yes, it's gonna be way messy. So I'm not even gonna show you the graph that was computed by this. I decided to still tackle the problem and kind of figure out what is in the data set or what is in the data that actually tries to hit me. And one of the things is, it's always the low hanging fruit, find your port scans. You can do it a thousand ways, I'm just gonna show you how to do it here. So basically I'm using the minus G capability. So remember that was the fan out filtering and G looks at the event node, so the middle node. And if I say minus G2, that means everything that has a fan out of three or higher or not two or one, I don't wanna see. All the demands of three or higher I wanna see. So everything that has more than two ports that were connected to, I wanna show them. So if someone with port scan me, I would see that because they hope the port scan is more than two ports or whatever threshold is set. It's actually very interesting what shows up on the right side in the graph. I didn't have a port scan. So I don't see any fan where things are hitting me. I have three nodes at most there and there were people probably just trying to get to some well-known ports of mine, very specific targeting, port 80, 22, 443. And it's only about three or four attackers that really did this. So the kind of yellowish nodes that you see. So that's actually kind of interesting. And the data set was probably a couple of days in a very, very busy web server form. So this is pretty good. How do you configure this? And actually I cheated a little bit. If I look at port scans, I'm not necessarily interested in the things that hit me on ports above 1024 because I usually don't have anything running there anyway. No services are up there. Maybe there's some Oracle running on 1521 or some other SQL server. And you might wanna include those as an exception here. I didn't do so because I didn't have any. But what I wanted to do is I clustered my target node, the ports, and I wanted to make a cluster for all the ports bigger than 30,000. So it's basically say the node should be bigger 30,000, just that string, if the third column is more than 30,000. And then, so that's taking care of all the ports higher than 30,000. And then I say, well, everything else, if it's bigger than 1024, I wanna cluster it again. And then the rest I'll leave alone. I wanna see those individual ones. And again, I used the minus G and this basically gave me the graph on the right. This is kind of how you can find port scans. The same you can do with tree maps. By basically what I did here is I set up the hierarchy as source IP, destination IP, destination port. So you will see a big rectangle for the source IP. Inside you will see all different destinations that that source went to. And inside of that, you'll see the different destination ports that were accessed. Who can see the port scan? Exactly, it's very easy to find. There's just a lot of nodes there. And what you can do is you can actually zoom in. And now I just see this one source IP you go into destination IP and I see all the destination ports. What is kind of interesting is there are all these random port numbers on the right side and then on the left, about 50% of the traffic hit a port 6346. I think that's some kind of file sharing. So someone really hit me hard on that port and didn't get in. Something else I usually do is I wanna see whether there are any IP addresses that are spoofed or abused and there is a Bogan address space. Basically all the IP addresses that are not assigned to anyone by IANA. So no one should be using them. No one is really allocated that space that's published on the IANA webpage. And it changes frequently. So make sure if you do this, you update your configuration. But I wanna see all the connections coming from those IP addresses to my network because no one should be using those. So again I just grabbed the blocks incoming and I warn you right now, this config is gonna be crazy. If you thought that it was complicated before, it's gonna hit you. So what do you do in the configuration file? First I need to let Afterglow know what is the Bogan address space. So there is a command called variable and then you just provide any Perl expression you want, any Perl command and we'll just evaluate it. And if you assign what I did here is I build my array ranges with all the subnets that are in the Bogan address space. You could put anything in there, any Perl code, it would evaluate it, it would assign the variables and keep them and you can use them in further assignments and do all kinds of things. So this is just, I'm just defining a variable here or an array. Then for my color assignment, it gets a little complicated. So basically what I wanna do is every value I get, I wanna check every IP address, I wanna check against the array and see whether it's in any of those subnets specified. And you basically start with the subnet assignment here. So you check if the field is in the subnet that we're looking at right now. So it basically runs through all the entries in the array and the array ranges up here. So it goes through all these entries and checks whether the field that I'm looking at right now is in the subnet. If yes, it just returns one and I'm basically counting in dollar value how many times or in how many of those it was, it's always gonna be one because my definition there is mutually exclusive of the individual entries. Anyways, so what it does then, it just says it's gonna be red, the node, if it actually founded. So if value is set. So if you don't quite follow how this map thing works, just copy it and use it for your own things. And then what I wanna do is I wanna have everything else green if it's not in my, or everything else that is in my internal network is green. So I provide the match of my internal address space and everything else is blue. We'll see in a second how that looks. But we're not quite done yet because what is gonna happen if this graph, you'll see a lot, a lot, a lot of source IP addresses that connected. There's gonna be a million individual nodes so you won't be able to find those red ones that we just defined. So what we need to do is some clustering. This looks horribly complicated but that's just because I was a little anal on what I wanna see here. Basically what I wanna do is I wanna cluster by class A network. I don't wanna see each individual address, I just wanna see what class A's connected to my network. That's basically what I do here with this regx replace. What you can do is it's a function I provide for afterglow and what it does, it just matches the regular expression I give there, the backslash backslash d plus. If that matches, so if it has a number, if it starts with a number, then it returns that number because I do a match there. So it will return the first octet and add the string slash eight. So I see that it is an eight class that I clustered. If it does not match my internal addresses, because I wanna see if there's actually a source that it's my internal address. And if it's not one of the bogon address spaces, because I don't wanna cluster those together, I wanna see exactly what IP address hit me. And not just, oh, it was in this class A. You follow me? It's early, I know. All right, and I do the same thing for my targets. Just for your own reference, these are the new features we just introduced. And this is how it looks. It's actually quite sane. So you will see the bogon address space is the red ones. Then you have the regular external addresses clustered by A class. You have my internal machines, the blue ones. Suddenly, it's actually interesting. Suddenly there are more than my four machines from before. It's actually kind of funny. I haven't realized that earlier. But the interesting part is the red nodes. I had a lot of bogon address space. And this is real, I didn't make this up. But if you look, there are 10 addresses. There are 192.168 addresses. There's some multicast addresses. But still, this was in a hosting environment, in a data center. And why would there be other things leaking over to my network from 192 spaces and 10 spaces? So maybe some other people that were hosting there were trying to play tricks and try to see what they can do. Spoofing IP addresses are just like leaking over to my connections. All right, I'm actually right on time. Before I give you the summary, I have one thing that I keep doing with Afterglow, which is kind of interesting. I have a, basically run a Unix command on my box. It's just a while loop. And you'll find the command on, if you go to Afterglow.sourceforge.net in the FAQ, I have a little command line. So basically what I wanna do is I wanna show the network traffic in my screensaver. It's kind of cool if people walk by your desk and they see the traffic that's on your box. Some people might start playing with you and sending you traffic so they show up in the notes, but so don't encourage them. But so basically what you do is you run a while loop, continuous, just never stop, run TCP dump, pipe it, the output into TCP to CSV, get the fields you want, and then just do like head minus 1,000 or something. So every 1,000 packets will actually continue. Then pipe that into Afterglow, into NEDO, and the output file that I usually write in NEDO is just a file with the date in there. Just add the date as a parameter to the file name. And then it will just write every 1,000 packets it will write a file. And then you can have a cleanup script that kills the ones that are older or just keep two around or something. Configure your screensaver to pick up the pictures from that directory and just slide show them. And it's really cool, I like it. All right, so let's go to summary. We looked at Afterglow. You can do filtering, you can do coloring, you can do clustering if you get brave. And why would you, what we do all that? Well, I hope I was able to show you a few use cases where I was able to find my port 100 that I connected to or if we find the Bogon address space, you find outliers, you find relationships among things. Who is talking on the network? And I know people in here that use Afterglow to find people that are connecting from countries of concern, things of that nature. Suspicious activity basically. All right, so don't read log files, visualize them. Thank you. I think I have time for a couple questions if anyone, yes. Did I try cluster by country? Yes, basically what you can do is there is a mapping that's published that maps IP addresses to countries. So just block that in somewhere, exchange your IP addresses by country, and automatically it will have them clustered by country. Yes. No, not at all. So I just gave you one example, firewall logs. I actually yesterday sitting in one of the sessions and sorry, I was a little bored. I was basically running Kismet and I quickly used an awkscript to pick out the access points that are there, the machines connecting, and I visualized that quickly. So I see that all these machines go to this access point and suddenly you see this machine goes to the other access point. There's really cool things you can do. Do I have any competition? Lots. There are a few people. Greg Conti does some visualization with Rament. There's a lot of university work that's done visualization of network traffic. I'm not sure if anyone has a very simple method of visualizing just open source. If you have any, let me know. Have I extended this to look at the amount of traffic, things like amount of bandwidth and so on? So basically what you have to do, you have to do that yourself. So I kind of offload that work to you because I just visualize your CSVs, right? So you just write some kind of an aggregator that aggregates all the traffic. You just awk and do sums in there. It's not too complicated. Oh. All right, so questions please go to the mic so they're recorded. Thank you. Can you weight the size of the nodes by the number of packets from them? Can I weigh them? Can you make them larger if there's a lot more traffic? Oh, can I make it larger? I didn't put that in. It shouldn't be too complicated to extend after go if you want to do that. Would take me a little bit of coding. Download it and extend it if you want. I might do it. That's actually a good idea. Thanks. How about parsing server logs? Parsing server, any kind of server logs? Yeah, any kind. Sure. You can, for example, Linux, you use Syslog. It's a pain to write your parser because every line is different. Every entry has a different format. But you can visualize logins, for example. You just pick all your SSH messages or all your login messages, pick the source address, where did it come from? If it's locked, it's not always the case in server logs. Lock the username that's come in. You can do all kinds of things, really. This is giving you the toolbox to go off and do your own visualization of your logs. And please send me use cases if you have them. I'm collecting them. Server logs are very interesting in cases where you're not trying to go after the perimeter threat where things are coming out from the outside. But if you're looking more from inside or problems, so people that are logging into machines that might not be allowed to access machines for if you do any compliance reporting or something, who got into my financial servers, what networks they come from, things of that nature. Anyone else? Are you trying to make it real-time or? Great question. I was trying, yes. The real-time problem is, that's kind of why I started to switch over to that library I'm using for Java. Because I wanted to have more control. The way I set up Afterglow right now is that it basically takes the log file, the CSV file, and generates one graph. So I kind of lose state. The problem is, I mean, you can do what I told you about the TCP dump, the while loop, and then just get one picture after the other. The problem with that is, every picture you calculate is gonna rearrange all the nodes because of the layouting. So what you have to do is you have to remember, you kind of do a sliding window, you remember the current layout, you feed that into the next layout, and you just add the new ones and then drop off things over time to get rid of them. It's not as easy as it sounds, or maybe it already sounds complicated, but I was trying for a little bit, I just don't have the time to do it right now. But there are, I think Dan Kaminsky with Xovi does something of that nature. He tries to animate it. You can try to download that. It's slightly different what he does there. Thank you. All right, I think. What kind of limitations do you have on log file sizes? Your box. Okay. And afterglows memory.