 Keep a warm applause for Chris Kubeck. Hello all. I'd like to talk to you guys about something that I absolutely love. Visualizing, excuse me, finding things in security logs through visualization. I have some questions for the audience. If I could get a show of hands. Have any of you worked or been associated with monitoring, network, application or security logs? Anyone ever have to investigate a security incident, perform any forensics, or basically dealt with any sort of awful, raw security logs? Excellent. And how many have wanted to know what any attacks or compromises might look like visualized in those security logs? Excellent. So you're in the right place. I've been in IT security for about 15 years on and off, and in recent years there's been a lot of new challenges due to emerging threats such as polymorphic and metamorphic code. In addition to that, new regulations and laws requiring monitoring of all sorts of logs has occurred. So what this problem has created is an overload of logs from a lot of different sources. You have firewalls, you have routers, you have switches, you name it. In this very basic network diagram, there's five firewalls, web proxy, database, email server, two network intrusion detection systems, and a whole bunch of different endpoint computers with different operating systems and versions. In a larger organization, all of these things have to be monitored. So one other thing that occurs is you have to learn the different types of technologies. You have to learn Cisco, or you have to learn checkpoint, or you have to understand what a Microsoft security log looks like if you're monitoring a database server. It's a lot of information. Another issue is with all of these logs, for example, a firewall cluster could produce one and a half million logs in a 24-hour time period. Trying to correlate them with all of the different assets is very difficult, and it's near impossible unless you have an unlimited time. In addition to that, with polymorphic and metamorphic code, you end up relying on your weakest link. For example, antivirus, effective rates. This is from abuse.ch out of Switzerland, and they track the antivirus effective rates for the Zeus polymorphic binary. As of nine days ago, the effective AV rate was only 36.35%. So that means that your antivirus is actually failing more often than it is detecting this. All of these different logs have to be monitored, maintained, and reviewed by someone. A lot of things end up getting missed as well. In addition to this, you have to end up opening different types of consoles. I deal actually with six different types of IPS systems, and I don't want to say I'm lazy, but I like to work more efficiently, not harder, and I don't want to have to open all of these different consoles to try to look at my network. I don't want to have to learn Cisco, either. Here's a blue coat one. It's too much. This is only a small portion of what I deal with every day. I deal with over 25 different devices. So I was able to meet this challenge by putting my logs in a correlation engine. And what I was able to do was I was able to centralize all of the logs and normalize them. So they were placed in a format that I could read and mapped in fields that I could search with, develop rules, sets, and detection rules on. So, for example, it doesn't matter if the information is coming from a router, a firewall, a network intrusion detection system, it'll all line up. So my attacker address, my ports, all of that information will all line up. In addition to that, the correlation engine is able to categorize the different events. When you have one and a half million firewall events, you don't really want to know if one particular endpoint didn't connect five times. But you do want to know if something can't connect a thousand times, you might have a network issue. Also, because of a time stamp issue, in logs, the device that records the log time stamps it. Well, that doesn't mean that firewall A is set at the correct time or the same time as firewall B. So if you're trying to investigate something, it's very hard to correlate things if nothing's set correctly or you can't map the times. So here's an example of a proactive approach that you can also use with correlation engines when you have all of your logs in, all of your ingress and egress points. And this is from the SRI malware center, and they happen to track and list a filter set for the most prolific bot net command and control servers. So I took the information of the IP address, the port, and also the fact that one of my clients happens to have antivirus that is not covered for this particular threat. So I went ahead and I put it into the correlation engine and I looked at their entire network. And it doesn't matter what technology it came from. The category outcome is failure, but also the endpoint too, in this case, produced 16,000 firewall drops and web proxy drops trying to get to this particular IP address. This endpoint is infected. The rest, they're still under investigation. So some other examples. I had a client that was having a major DNS issue, called me up one day and said, we can't get any of our DNS requests out of the network. Well, since I had a unique view on their network, I could see everything coming in and going out. Now, a lot of times it's very difficult when you have a large network trying to find a breakpoint. It could be a firewall someplace in a closet for all you know. So what I found was most of their DNS servers were misconfigured to still make DNS requests to retired root servers. And one DNS server in particular was trying to connect to a suspected compromise DNS server on the Internet. And this is a graphical view of what I was able to do in about 10 minutes. On the left are their client's DNS servers located on the DMZ. In the middle are firewall drops and failures where it cannot get out. Most of their DNS servers were still going to the retired root servers, one in particular this untrusted DNS server. This DNS server, external DNS server also had threat expert advisories that listed that it was hosting possibly a trojan or keylogger. So this is not something you want your DNS server to be doing. Additionally, looking deeper into that one particular DNS server that we thought might be compromised, I was able to see that it had 139 open on the DMZ. It was a public name server and it was trying to communicate to several Bogan IP addresses and networks. It also had a very suspicious port combination that it was using which is typical for malware infections. More sophisticated malware will try to get out of the network at any cost. So here we have their DNS server. I saw that all of the traffic for these particular IP addresses were coming from source port 139. It was a public available DNS server and their internal clients used it. The DNS server was compromised. They went ahead and took it down and re-imaged it and set it back up. Another way that you can use correlation engines to get past some of the current challenges of encrypting payloads to get the data out of the network is looking at what's going on with encryption. Now when you encrypt a packet, only the header information is in the clear text. The rest of the payload is going to be encrypted. So it's very similar to IPsec. And it's a good idea. Imagine if you're doing your internet banking at work, you want to have a key, your bank should have a key, but no one else should have a key in between. Well, this has become very, very popular for malware developers because they can hide everything, completely bypass any web application firewalls or any sort of deep packet inspection. And this is just a brief visual of how an endpoint can be compromised, trying to get out over the firewall or the web, and finally when it encrypts the traffic, it's able to get out. Now I use this technique to see what was going on and see if this was actually occurring. It turns out it wasn't one of my client networks. They had a covert communications channel that had been going on for about three months. This was not a good thing because it was traced back to possibly one of their competitors and one of their factories. So all of their security controls were completely bypassed just by the data payload being encrypted. And this is what it looked like. Over a 24-hour time period, the compromised endpoint was trying to get out by the firewall, was trying different things to see if it could get out like Microsoft Update and so forth. Finally, when it encrypted the packet, it was able to have complete, clear communications both ways. For my last example, because I'm starting to run out of time just a little bit, the distributed denial of service attack in South Korea, which occurred over the July 4th weekend, was very targeted. It went after DNS infrastructure and only some U.S. government in South Korean websites. It was planned very well over a long holiday weekend in the United States and on many military bases, including South Korean U.S. military bases, which were also attacked with this distributed denial of service. July 4th is a well-known as a drinking holiday, so all the network admins, all the security admins were out at play. There were initial estimates that the attack involved 1,100 to 166,000 computers. I'm more inclined to believe the 166,000 number. It was controlled by various types of malicious code, but it also tended to use endpoints that had high bandwidth availability and more sophisticated pieces of malware. There are performance evaluations that are done. You want to know what type of CPU, how much memory, what bandwidth is available, but you don't really want to take over a computer that's on dial-up. You can't do much with it. So I went ahead and took a look to see if some of my client networks were actually participating. And one of the things that I noticed was one particular client who was significantly backed by an EU government due to the recent economic crisis was participating. I went ahead and I filtered the traffic across their network for some of the targeted IP addresses. And what I found was about 200 endpoints were actually participating in this attack. In addition to that, this particular client had some issues with their security policy due to budget cuts and being partially owned by an EU government. And what it ended up looking like was all of the red squares here and here are various endpoints and servers, which were attacking this one particular IP address. All of the blue are firewall, web proxy, except allowing the traffic out, whether it was for DNS or HTTP traffic. They were clearly participating in this particular type of attack. Now, one thing that's rather important about this is, for instance, the United States government believes that if anybody attacks them, including cyber warfare, they have the right to attack back. And the last thing that anyone wants in Europe is one of their countries being tagged as possibly an agitator starting or participating in a cyber warfare situation. So these are some of the different ways that I was able to use a correlation engine, put the logs in and find something useful out of it. The particular correlation engine that I used was ArcSight. It's a commercial product. However, it's extremely expensive. It also doesn't support VM image installations, and they don't have a way that you can try it out either. It takes a lot of hardware and support. However, there is an open-source correlation engine. It's made by Alien Vault. It's community-supported, and you can put any type of log in it. You could put robot sensory data in it. You can put your router log in it, and it will go ahead and normalize it. So it is available, and it is for free, and available for download, and it's actually an extremely good product. So in closing, some of the ways that I was able to solve my issues was using a correlation engine by centralizing my logs, putting them into a human-readable format, and being able to do something useful with them. It doesn't matter what vendor I was involved with. It doesn't matter what version I was involved with. I knew if the packet was able to get out and traverse the network, I was able to see if there were any covert communications channels that were able to exist on the network. One last thing is the great thing about correlation engines is when you put all of those particular logs in, if you want to test things to see how an attack might react with metasploit or vulnerability testing or pen testing, you can actually see which assets might actually report the attack and what it looks like. So you can build custom detection rules to protect yourself. Or you can also see if it was detected at all. So are there any questions? So thank you. If there are questions, raise your hands, please, and I will come to you. Hello, thank you. Have you any recommendation for a very large network that would generate like gigabytes of events per hour? Yes, I do. There's a couple of products that you can use for really, really large logs. One of them is ArcSight does have a logger that you can grow with the size of your logs. Also, Alien Vault as well has a logger system so that you can store many of your logs long-term and very high-volume logs as well. Any other questions? Somebody on ArcSight is asking, what's the correlation method used? And he names Pearson and Kendall Gibson as examples. Repeat that, please. Somebody asked, what's the correlation measure used? And he names two examples, Pearson or Kendall Gibson. I don't know. I don't really understand the question, I'm afraid. The question was, which correlation algorithm is used? Oh, which correlation algorithm is used? Unfortunately, ArcSight will not list its correlation algorithms in this particular case. They won't open it up, which is another reason why I advocate the usage of an open source correlation engine because I would like to know how it's actually working to see if it's actually working. Any other questions? First of all, thank you for raising awareness for this kind of log visualization. And my question is, are you capable of doing life analysis or is it just logging, logging, logging and running sometime later? It's capable of doing live analysis. So I can get real-time information from all the logs. I can also feed in additional information sources such as reputational information from a brand new black, white and gray list and from various research locations as well that can be fed in at the same time. So it can all be done in real-time. Any other questions? Somebody else asked if you've seen an example where the correlation engine is capable to highlight the flow of a single package to different firewalls and different layers. It can. It can if it's set up like that. Absolutely. Absolutely. Any other questions? Thank you. Do you have a contextualization process of the outlets? Do you compare the systems you protect with the outlets from the IDS, IPS and systems? Do I compare the systems I protect against the outlets you get from the IDS, IPS, firewalls, synthesis? Do you verify the outlets if it's coming from an Apache system? Do you verify if the IP address contains an Apache service? It contains a what? It has an Apache service running. Do you contextualize your outlets? I can't understand you completely. I'm sorry. The alerts. I think the question was if you had say a log coming from Apache server, do you use external network devices to confirm that this Apache server is really running on that? Oh, yes. Yes, I do. Absolutely. Because in addition, you can also put in real-time information on what the various systems are and what their vulnerabilities are. So you can also put in from vulnerability scans if the system has the operating system and if it happens to be vulnerable against an attack so that if a particular exploit comes in, you can actually alert saying it's vulnerable to that and we need to open a security incident. So you can leverage different technologies against each other. How did you deal with all these hundreds of different lock formats and product updates and things like that? Because my current experience was that whatever updated the firewall to a new version, then it took about four to eight weeks till the vendor updated its parser to work with the new lock files and at the end we came out with about 60% of the devices we wanted to monitor that could be integrated because there was no parser for the different lock files and if we asked vendors for creating parsers, they had this very expensive professional service so it was far less useful than we thought at the beginning. That has been a problem. The commercial product that I used called ArcSight, they now happen to cover somewhere around 3,500 different devices so whenever a device comes out and is also changed for version, they also include those particular, the parsing capability. So we were able to do a lot more with them because they had support for the different type of parsers and they usually came out with them at least about a month before the new version changed because they worked with the different vendors so that was very, very helpful. Alien Vault also does something similar although they don't support as many products. They do have the ability to write your own custom parser a lot easier than ArcSight professional services. Any other questions? So if there are no other questions? There's one over here. Then, okay. Are there vendors using data mining approaches like canons clustering? K means clustering. K means clustering. K means clustering. I haven't heard of that. I'm sorry. Hello. Thank you for the talk. You mentioned the open source correlation program. What was the name again? Sorry, I didn't get you. Sure. It's actually, it's Alien Vault. It's at alienbolt.com. It's at alienbolt.com. Are there any other questions? So give a warm applause again.