 We're going to start our next talk in just one minute. It's going to be by Russell, and it's going to be on effective log and event management. Thank you. All right, thanks, guys. This is great. The turnout here is amazing. Hope everybody sticks around the Blue Team Village. Check out the CTF and everything else that's going on. So I'm here today to talk about effective log and events management. I work for a small federal government contractor in the DC area. And we run systems that have to maintain FISMA ATO. And my department is actually responsible for IT ops and for security, because it's a small company, like I said. So for us, monitoring and log and events management, it's essential for IT ops to ensure availability, performance, and utilization, as well as for information security, and for compliance to detect malicious activity and show that we actively monitor our environment and review our logs to satisfy the audit and compliance requirements. So in the presentation today, I'm going to share with you the process that my team and I have developed to effectively manage logs and security events in our environment. Because we're not a large business, we had to develop a process rather than purchasing a big naming sim. And we prefer to manage everything almost entirely internally, rather than hiring vendors and outsourcing to retain knowledge. So there are going to be lots of examples in the presentation of tickets, actual tickets, output from our environment. So I was going to invite everyone to move forward to see, but we're pretty packed. But I'm actually happy here with the setup, with the slides and the wall, you'll be able to see pretty well what I'm talking about. So I'm going to do a quick kind of like overview of what we do and why, and then I'll get into lots of examples to show you. It's a lot of slides. I'm going to go pretty fast, because it's a short talk. And I'm probably not going to have any time for questions. If you want to chat, I'm going to hang around for a while. I'll just move to the back of the room. Please feel free, come up. We can talk about what I do, what you do, pros and cons. It's all good, all right? So briefly, why is monitoring important? Well, monitoring improves your IT operations and your knowledge and visibility into your environment. It improves your situational awareness and knowing what your baseline statistics are, i.e. what is normal. The more you monitor and review your IT ops, the better your situational awareness and knowledge of what's normal will be. Second, monitoring is important to information security in order to identify and detect malicious activity and breaches. And proper logging is critical to incident response. Third, monitoring, particularly documented daily review, is required for many compliance requirements. According to the latest Verizon DBIR, which the graphics came out like last week, you can find a link to it on Twitter, internal log review is the least used method for discovering breaches in 2017. I find that astounding. I don't know if you do, but I find that astounding. According to their data, the number one method for discovering breaches in 2017 was reported by an external customer, followed by employee, then fraud detection, then third parties, such as external monitoring services. So why aren't organizations identifying breaches from internal log review? I suspect it's because few are dedicating the resources it takes to adequately perform. But without it, you're exposing your business reputation and allowing breaches to go undetected. Recent fire hour reports said that it takes 146 days for breaches to be detected. Other reports have said anywhere from 99 to 500 or more days. Monitoring is critical for incident detection. You don't want to wait for a customer or Brian Krebs to notify you that you've had a breach. So daily manual review of log exceptions is a key to security operations and incident detection. So if this is new to you, you're asking, well, how do I go about daily log review? Well, first you have to decide what to log and monitor. So this is a diagram from a blog post by Jessica Payne called monitoring what matters. And this is what Microsoft's incident response team often finds when they're called in to do IR. Many organizations either log way too much without the proper context or not enough, nothing other than the system default logging capability. You need to decide what logs to gather and review based on your business needs and the resources you have to analyze them. And then you have to set up tools to manage and process your logs so you're not drinking from fire hose. So these are some of the log sources you should focus on. Your server logs, windows, event logs, and syslog, security tool and network device logs, maybe even web proxy logs, DNS request logs. There's lots of information in there. And logs from your applications, especially if you're hosting systems, tracking logins and user activity, looking for other anomalies, and logs from endpoint devices using tools like Windows Sysmon or mobile device management products. So in determining your monitoring strategy, you need to determine what's critical to the business. And that can vary depending on the type of business you are. A retail business is going to have really different needs and resources to spend on monitoring than a government contractor or an information security company. This may be determined for you, depending on compliance requirements. If you have to meet PCI, HIPAA, FSMA, GDPR, et cetera, the hard work is in tuning the monitoring system to your environment. This is where a log management system or a log correlator or SIM comes in. They allow you to centralize your log data, correlate, index, and ultimately do stuff with your logs, like writing searches to track activity from a specific user IP address, and turn the log data into actionable information. So at my company, we use Splunk. There are plenty of products in this market, including the Blue Team Village Sponsor Greylog. Splunk is free for indexing up to a gig a day. I believe Greylog is free for up to five gigs a day. So you don't have to buy a license to use these tools. It obviously depends on how large your environment is. But after that, they usually tend to charge you based on how much log data you ingest per day. So at its core, these tools are log aggregators and indexers. They give you the ability to search and correlate log data or just look for stuff. There are also lots of dashboards you can download or purchase to help you analyze the data from different vendors. And then you can write searches to produce the results that you need to review. You can save those searches, and then you can run them on a schedule, like a Cron or a scheduled task. So when you run those searches, you then send the results in an email to some type of ticket management system. You're going to write searches to look for things like daily VPN logins, admin logins, root logins, et cetera. Once you have a log management system and you've tuned the searches, you need to set up a process to review them. And this is where you need to use a ticketing system. So we use RT. Has anyone heard of RT? OK, great. A few people have. So RT, it's a web-based ticketing system or issue tracking system, as they call it. It's actually written in Perl, believe it or not, and it was first released in 96, but it's still actively maintained. It's a great, lightweight, free ticketing management system. These tickets have standard information, like you would see in most issue tracking systems, like when it was created, the owner or the priority, et cetera. And the search results screen, you can search based on most of the ticket fields. So our login events management process is pretty simple. You configure logging and send all of your logs to your indexer. Then you develop and automate the running of searches that produce output of data that you care about, like administrator logins. Then you email those results into a ticketing system, automated on some kind of schedule, and then you review them on a daily basis. Here's a workflow diagram. So I can't see it on my screen, so I'm going to look over here. Like I talked about, on the far left, you've got all your logs, servers, routers, switches, infrastructure, databases. You're feeding them into some type of correlator. We're just using Splunk. That's why that's there. So with the data that's there, we develop searches. We have some one to five minute or real time searches that look for high severity issues. When those come up, they actually send alerts to our team and will go look and figure out what's going on and resolve the issue. And then we have all of the daily tickets that we review on a daily basis. And so those are like IT ops things, like backup success and failure, security things, system access and authentication, even customer support tickets. Like can you throw everything in there? So we review these tickets, like I said, on a daily basis. If the ticket, your original search produces way too much data, like 18 screens of stuff, which isn't reasonable to review every day, or there's not enough, we'll go back and modify the search and output criteria to continually improve the results in the process. And then what's great, if you have to meet compliance requirements, like we do SSAE 16 or SOC audits, or if you have to meet FISMA compliance or anything else, when our auditors come out, they know about this process. So they ask me, Russell, what's changed in the last year? What are you doing that's new? And then they say, OK, give me all your tickets for these five weeks. And I simply create PDFs of everything and throw them at them. And they go through and review them and just ask me questions. So that's way easier than having to go through and explain all kinds of things, showing them how you review logs, having all the history documented that you close them every day, at least for us, satisfies our auditors. So we use this process to generate about 50 daily tickets. Here's a list of the subjects for some of those tickets. Next, I'm going to start reviewing some examples in detail until I run out of time. So this is our VPN users ticket. And the output here, it comes from our VPN endpoint. So in the left-hand column, you see that's VPN endpoint. And then you've got your username, the action, timestamp of course. And then on the far right, we've got the FQDN, rather than the IP address. And the reason that we did that is because we're a pretty small company in the DC area. And we know where most of our people live. So we review this every day. And if we see that Todd logged in from Mexico, and we didn't know he was in Mexico, then we're going to go ask and find out, is he on vacation? Or do we need to do IR with this? What's going on? Even more local than that. You can see in some of the FQDN's Baltimore, Maryland Fios. I think there's one that says DC. So people have actually become accustomed to this. They know we're looking now. And so when people are going on vacation, they come to us and they're like, hey guys, I'm going on vacation next week. So if you see an unusual login, don't go to my boss and ask them what I'm doing. They don't like you have to go to their boss and ask, what is Bob doing? All right, this is our RDP logins daily ticket. If you don't know what RDP is, that's remote desktop protocol for Windows. It's the way you remotely log in to a Windows machine. So here we get the timestamp again. And this one we have a source IP address. But I might have blocked this all out. The username, the server name, whether or not they successfully logged in. Again, this allows us to look for what's normal versus anomalous activity. Because we're a pretty small environment, we actually look at all the successes every day, not just the fails. Bigger companies can't do that. It's just way too much information. But as a smaller company, we look at the successes as well, and we tend to know when people are logging in. So if we see something in an unusual time, we might go ask them, why'd you log in to 3M last night? We never see you log in after like 7 o'clock. Again, you want to know what's normal in your environment to help you find anomalous activity. This is a system daily ticket that shows automated logins. This is a Linux box. You can see SSHD logins. But it's the same type of ticket. And we review these every day. This is our daily outbound attachments ticket. So these are email attachments from our organization going outbound. It's sort of a high level DLP, because we can scan the names of all the files leaving the building, and look for anomalies or things that look bad. For example, if it says something goes out that says salary in it from HR, maybe going to the HR person's personal email address, we would go inquire as to what's going on there. Or an executive emailing something called like transfer authorization that wouldn't be normal. You'd want to figure out what's going on. You can just watch out for unusual or potentially bad activity simply by reviewing this. Obviously, this isn't a DLP solution. They do way more than this. But it's a good place to start if you're on a budget. It's also maybe a good manual process if you do have automated DLP, but you're not really looking at what people are emailing out. Again, this is going to depend on the size of your environment if you can do this. This is a daily OWA connections ticket. So this is actually write your iPhone or your Android device. If you're running Exchange in your environment yourself, you can produce this output. And this is showing IP addresses people are connecting from and their device ID. So what would be interesting here? Maybe a large number of device IDs when we know the person has a phone and an iPhone and not more than that, or maybe the same IP or range of IPs trying to log in as lots of different users, especially ones that don't exist in our environment. This is an example of a proxy violations alert. So this isn't a daily. This is one of those real-time tickets. And this actually runs, I think, every 15 minutes. But we get this when a user attempts to browse somewhere to a site that's blocked by our proxy config. Something else we look at on a daily basis, this is our daily building access ticket. Because we're pretty small, again, it's not a huge amount of data. So we have a ticket I review every day that shows every swipe with every key card in our building. I look at it for unusual activity, like failed swipes into restricted areas, or people trying to access things after hours. We get the timestamp for the swipe, the key holder name, the building location, and the result, access granted or access denied. It's a quick way to look for unusual activity. And I've talked to other folks and they say, there's no way I can't look at that. It's too much. And maybe they'll look at fails, or they get automated alerts when someone tries to go into the data center and doesn't have access. But if your environment isn't that large, you might want to look at who does go in for successes. We've spotted things that are unusual. We've had to figure out what's going on. So in order to monitor file integrity on servers, we use Tripwire, as well as some other tools, and generate daily tickets to review the output from those. Like this is an example of a daily Tripwire ticket. We actually separate them. So we get a ticket every day for Linux servers, a ticket every day for Windows, and a ticket for Solaris. If anyone knows what Solaris is. It shows changes since the last baseline. So these are file changes and things like changes to Redge Keys. This is our daily DFS ticket. DFS, for the uninitiated, is the Windows Distributed File System. It's basically Windows File Shares, but it's distributed across multiple servers. And so here, we're looking at certain shares. We don't look at all of the changes to DFS, because even in a small company, that's huge. It's really noisy. But we look at the HR documents, or our proposals folder, things like that, and look for unusual activity. This is, again, maybe more for inside or threat type review, but it shows us if people are snooping around and trying to get access to things that they shouldn't. We monitor logs from our network security devices and tools, such as our firewalls, routers, intrusion prevention and detection, and web application firewalls, and from our vulnerability scanners. So this is an IPS daily ticket. It shows the number of IPS alerts per signature or control type. We use this to look for different patterns in the type of alerts our IPS is detecting. This is a real-time alert from an IDS. It matched the signature for an exploit kit. I think it was Angular. And so you see we get timestamped, the destination IP, the URI. And when we get these real-time alerts, we take a quick look at them and use them to determine if we need to dive into IR or do any more investigating. Obviously, you get a lot of false positives. So endpoints are arguably the most important assets to monitor, because that's where your users can do stuff. We use a standard commercial antivirus product for one layer of detection, and it automatically sends us texts and generates a ticket when an AV signature is hit. We also conduct a daily review of pattern file updates and the status of agents. But most valuable in our environment, we've deployed Sysmon across all of our workstations and Windows servers. And Sysmon has configured to alert us to potential malicious behavior on endpoints based on search queries that we've configured. This is an antivirus alert from one of our endpoints, notifying us of possible malware infection. For Sysmon, we use the Swift on Security config as a baseline, and we modified it for environment, mostly excluding the items that don't apply to us. Before Sysmon, our endpoint visibility was pretty limited, with no ability to monitor behavior and activity on endpoints. We deployed Sysmon and Splunk forwarders to the endpoints. It sends all the data from Sysmon into Splunk, and we run searches to generate alerts on malicious activity, and also for daily review. We're working on integrating the MITRE attack framework into this as well. So we have Sysmon alerts generated for all of these types of activities, unauthorized image activity. You can read them all there. I'm going to go through a few slides. This is an unauthorized image activity alert. So this was triggered by, I think this is an image launching command.exe. This is a Sysmon suspicious child process alert. This is looking for malicious processes coming out of office docs. This is an example of a Sysmon potential network outbreak. So this is triggered when one endpoint connects to more than four destination IPs in a five minute period. So we tuned this for our environment. This is more than the normal number of outbound connections, so that's the number we landed on for deciding we want to see those so we can go look and see if this is an indicator of compromise. This is an example of a Sysmon suspicious RedServer32 process. If you didn't know, RedServer32 can load remote malicious script files. This is right out of the MITRE attack matrix. We also monitor output data, including print jobs, rights to optical media, and USB storage device activity. This is a data windows print jobs ticket. This shows us all the windows print jobs, including the username, the source, PC, the note, pages, the print error, and the time of the job, and the document name. So most of our TNP, this is kind of like looking for anomalies, looking for insider threat, looking for people trying to print things and steal them, things that they shouldn't be. This is a Sysmon USB usage alert. So from Sysmon, we generate these whenever there's a USB storage device activity. In this case, it's a Kingston data traveler thumb drive. So we block USB mass storage devices, but before we set up these alerts with Sysmon, we're going to get lab data when use was attempted. So with Sysmon, we now can get that, and we can go in and review. Lastly, this is an example of a Sysmon attempted removal of media rights to DVD, notifying us of an ISO planning program being launched. Again, we block removal of media, but we still want to know about the attempts. So what we're also considering, we're a little bit splunk for AWS app, which ties directly into AM and pulls down AWS logs. So with these, we're looking for the same types of things just in the AWS cloud instead of on-prem. That's why every action with AWS is based on API call, like user additions, permission changes, et cetera. We provide rules to almost like an IDS, looking for abnormal behavior. And we're pulled from CloudWatch, CloudTrail, AWS Config, and VPT Flow Logs. So first of all, this is the AWS app in its block. It provides a dashboard giving you a nice high-level overview of user activity. This is a custom search rerun that reports on AWS console authentications, both failures and successes. To wrap up, again, this is all the slides showing our login events management process. So I'll leave this here for a couple of minutes. You know, implementing this process, as I've described, this will enable you to understand what normal IT ops look like, maximize your detection capabilities, and it'll help you perform instant response. Also very important, again, being compliance requirements for your auditors. You can implement this with the tools that I mentioned, CISL, Explunk, RT, or any other combination of log management and ticketing system. So that's it, and we'll leave that there for everyone if you want to take a look at it. And I'll be hanging around the back. Thank you.