 Live from Las Vegas, it's theCUBE. Covering Splunk.conf 19, brought to you by Splunk. Okay, welcome back everyone live in Las Vegas. theCUBE's coverage of Splunks.conf. I'm John Furrier, host of theCUBE. We're here, our seventh year covering Splunk.conf. This is the 10 year anniversary for .conf. And of course, security data is at the heart of the core cyber security challenges for enterprises and for public sector. We've got two great guests from SAIC, Tyler Williams, Principal Cyber Security Analyst and Karthik Supermanian, who's the Principal Senior Cyber Security Engineer. Guys, welcome to theCUBE. Thanks for coming on. Thanks, great to be here. So we were just talking before the cameras came out about cyber and the global landscape. Obviously, the threats are everywhere, right? So general cyber is a couple things. There's national security in the U.S. You got overseas, it's digital. Packets can come from anywhere. This has been a huge challenge in managing this whole cyber. So you guys have a lot of experience on the public sector side and commercial. What's your take? Why are you guys here? What's going on here at .conf? So in addition to what you said, and we have done a lot of hardening on things coming into our network, one of the biggest things that we've started seeing now are insider threats. Like for example, our customer is the FRTIB, the Federal Retirement Thrift Investment Board. They manage a $500 billion sovereign wealth fund for the retirements of federal government workers. That's a lot of billions. And in addition to protecting from outside, you want to protect from inside people, either stealing people's data, stealing their money, or anything like that. So we're trying to provide that holistic cybersecurity of we're making sure that we keep people out, but we keep our data in. And that's what we're here to do. I love that use case, sovereign wealth, sovereign T. Concept of sovereign T and sovereign wealth. Obviously money's involved. Insider threats are just as important. It's been talked about a lot, but has anyone really cracked a code on this? What are you guys doing? Because if anyone can walk out, they have a lot of human issues around passwords. So how do you guys frame this? How do you guys look at the insider threat landscape and how do you guys attack that problem? Well, one of the great benefits of Splunk, obviously, is its capability to aggregate data across the organization, across the enterprise. And so we can really identify user behavior and find deviations in that. And when they start to trend towards something that's negative, we can note on it. And so our SOC can take action as necessary. Well, one of the things on the notes here, you guys I see are doing a talk here at the event, detect and mitigate insider threats using Splunk's machine learning toolkit and enterprise security. Obviously they're shipping 6.0, availability. What is that talk going to be about? How does Splunk play in all this? Can you unpack that? Because if people are talking more about the machine learning toolkit being, kind of leaning on that heavily, automation is certainly very important. But what does enterprise security 6.0 bring to the table? So can you take us through the evolution of where you guys are at with Splunk? If you want to handle the enterprise security standpoint? Yeah, generally enterprise security has traditionally had really good use cases for the external threats that we're talking about. But like you said, it's very difficult to crack the insider threat part. And so we, leveraging the machine learning toolkit, have started to build that into Splunk to make sure that you can protect your data. And Tyler and I specifically did this because we saw that there was immaturity in the cybersecurity market for insider threat. And so one of the things that we're actually doing in this talk, in addition to talking about what we've done, we're actually giving examples of actionable use cases that people can take home and do themselves. Like we're giving them sample code of how to find some outliers. They give an example of what the solution is. So the use case that we go over in the talk is a user logs in at a weird time of day outside of their baseline. And they exfiltrate a large amount of data in a low and slow fashion. But they're doing this obviously outside of the scope of their normal behavior. So we give some good searches that you can take home and look at how could I make a baseline? How could I establish that there's deviations from that baseline from a statistical standpoint and identify this in the future and find the needle in the haystack using the machine learning toolkit? And then, if I have a sock that I wanna send notables to or some sort of, some notification to, how do we make that happen? How do we make the transition from machine learning toolkit over to enterprise security or however your sock operates? And how do you do that? Do you guys write your own code for that? Or do you guys use Splunk? So Splunk has a lot of internal tools and there's a couple of things that need to be pointed out of how to make this happen because we're aggregating large amounts of data. We go through a lot of those finer points in the talk. But sending those through to make sure that they're high confidence is the challenge. So you guys are codifying the cross-connect from the machine learning to the other systems. All right, so I gotta ask, this is basically pattern recognition. You want to look at baselining. How do people, can people hide in that baseline data? So like, if I'm saying I'm an evil genius, I say, hey, I know these guys looking for anomalies in my baseline. So I'm going to go low and slow in my baseline. Can you look for that too? Yeah, there are, there absolutely are ways. Fortunately, there's a lot of different people who are doing research in that space on the defensive side. And so there's a ton of use cases to look at. And if you aggregate over a long enough period of time, it becomes incredibly hard to hide. And so the baselines that we recommend building, generally look at your 90 day or 120 day out, I guess, viewpoint. So you really want to be able to measure that. And most insider threat incidents that happen occur within that 30 to 90 day window. And so the research seems to indicate that those timelines will actually work. Now if you were in there and you read all the code and you did all the work to see how all the things come through and you really understood the machine learning behind it, I'm sure there's absolutely a way to get in. If you're that sophisticated, most of the times they're just trying to steal stuff and get out or compromise a system. So is there other patterns that you guys have seen in terms of the areas that are kind of low hanging fruit priorities that people aren't paying attention to? And what's the levels of importance to, I guess, get a hold of or have some sort of mechanism for managing insider threats. Or it's like passwords, obviously one. But I mean like, what's the levels of? There's been a lot of recent papers that have come out in lateral movement and privilege escalation. I think it's an area where a lot of people haven't spent enough time doing research. We've looked into models around PowerShell so that we can identify when a user is maliciously executing PowerShell scripts. I think there's stuff that's getting attention now that when it really needs to, but it is a little bit too late. The community is a bit behind the curve on it. And C-Sharp's becoming more of a pattern. So you're seeing a lot more C-Sharp. PowerShell's kind of hunted down, kind of crippled or like identified. You can't operate that way. But is that an insider going to do that? Do insiders come in with the knowledge of doing C-Sharp or is it going to come from the outside? So I mean, what's the sophistic, I guess my question is, what's the sophistication levels of an insider threat? Depends on the level. So the CERT Insider Threat Institute has aggregated about 15,000 different events. And it could be something as simple as a user who goes in with the intent to do something bad. It could be a person who's converted from the inside at any level of the enterprise for some reason. Or it could be someone who gets really upset after a bad review that might be the one person who has access. And he's being socially engineered as well. There's all kinds of different vectors coming in there. Yeah. And so in addition to somebody malicious like that, there's the accidental, you're phishing campaigns. Somebody important clicks on an email that they think is from somebody else important or something like that. And we're looking for that as well. And that's definitely. Spear phishing has been very successful. Yeah. That's a hard one to crack. It is. If they have that malware in there, looking at say HR data, oh, this guy just got a bad review. Good time to send him a resume or a job opening for, and that's got hidden code built in. We've seen that move many times. Yeah, and natural language processing and more importantly, natural language understanding can be used to get a lot of those cases out if you're ingesting the text of the email data. Well, you guys are at a very professional high-end firm, SAIC. I mean, the historic history goes way back in a lot of government contracts. They do a lot of heavy lifting from, anyway, from development to running full, big-time OSS networks. So there's a lot of history there. What is the state of the art? What do you guys look at as state of the art right now in security, given the fact that you have some visibility into some of the bigger contracts, relative to endpoint protection or general cyber? What's the current state of the art? What should people be thinking about? Or what are you guys excited about? What are some of the areas that state of the art relative to cyber, cyber security, around data usage? So, I mean, one of the things, and I saw that there were some talks about it, but natural language processing and sentiment analysis has come a long way. It is much easier to understand, or to have machines understand what people are trying to say or what they're doing, especially, for example, if somebody's web searching history, and you might think of somebody might do a search for how do I hide downloading a file or something like that. And that's something that, well, we know immediately as people, but we have, our customer, for example, has a billion, 1.2 billion events a day. So, if a billion seconds, that's 30 years. So, it's a big number. We hear those numbers thrown around a lot, but it's a big number to put it in perspective. So, and we're getting that a day. And so, how do we pick out that? It's hard to step on that problem. You can't stand, you can't put step on that. These are machines. The most cutting edge papers that have come out recently have been trying to understand the logs, having the machine learning, understanding the actual logs that are coming in to identify those anomalies. But that's a massive computation problem. It's a huge undertaking to kind of set that up. I really have seen a lot of stuff actually at cons here, some of the innovations that they're doing to optimize that, because finding the needle in the haystack is obviously difficult. That's the whole challenge. But there's a lot of work that's being done in Splunk to make that happen a lot faster. And there's some work that's being done at the edge. It's not a lot, but the cutting edge is actually looking at every single log that comes in and understanding it and having a robot say, boom, check that one out. Yeah, and also the sentiment gets better with the data because we all cross those billions of events. You can get a smiley face or a happy face, depending upon what's happening. It could be, oh, this is bad. But this comes back down to the data points. You mentioned logs is now beyond logs. They got tracing other signals coming in across the networks. So it's a massive problem. You need automation. You do. You got to feed the beast, right? The machines. And you got to do it within whatever computation capabilities you have. I always say it's a moving train. The target's moving all the time. You guys are staying on top of it. What do you guys think of the event? What's the most important thing happening here at Splunk.com this year? I'd love to have both of you guys take away in on that. There's a ton of innovation in the machine learning space. All of the pipelines, really, that I've been working on in the last year, are being augmented and improved by the staff that's developing content in the machine learning and deep learning space at Splunk. So to me, that's by far the most important thing. Karthik, your take on this? Yeah. Between the automation, I know in the last year or so, Splunk has just bought a lot of different companies that do a lot of things that now we can, instead of having to build it ourselves or having to go to three or four different people on top to build a complete solution for the federal government or for whoever your customer is, you can, you know, Splunk is becoming more of a one-stop shop. And I think just upgrading all of these things to have all the capabilities working together so that, for example, phantom, phantom giving you that orchestration and automation after, for example, if we have an ES notable event saying, hey, possible insider threat, maybe they automate the first thing of checking, you know, immediately pulling those logs and emailing them or putting them in front of a SOC analyst immediately so that in addition to, hey, you need to check this person out, it's, you need to check this person out. Here is the first five pages of what you need to look at. I'll talk about the impact of that because without that SOAR feature, okay, the automation orchestration piece of it, a security orchestration, automation piece of it, without it, where are you? No speed, what's the impact? What's the alternative? Yes, so when we're right now, when we're giving information to our ES or analysts through ES, they look at it and then they have to click five, six, seven times to get up the tabs that they need to make it done and if we can have those tabs free-populated or just have them either one click or just come up on their screen for once they open it up, I mean, time is important, especially when we're talking about an insider threat who might be taking a lot of money. The alternative is not having money fast enough. Yeah, and the alternative is a 5x increase in time span by the SOC analyst. Yeah, and no one wants that. No, they want to be augmented with the data, ready to go, ready to alert on it. All right, so final, you guys are awesome insights, walking data sets right here. Love the insight, love the insight. So final question, for the folks watching that are Splunk customers or were not as on the cutting edge as you guys pioneering this field, what advice would you give them? Like if you had to shake your friend, hey, get off your butt, do this, do that. What do people need to pay attention to? That's super urgent that you would implore on them. What would your advice be? Why don't you start that one? All right, so. Yeah, no, it's fine. One of the things that I would actually say is, we can code really cool things, we can do really cool things, but one of the most important things that he and I do as part of our process is, before we go to the machine and code the really cool things, we sometimes just step back and talk for a half an hour, talk for an hour of, hey, what are you thinking about? Hey, what is a thing that, or what are we reading and what are we learning and formulating a plan because instead of just jumping into it, if you formulate a plan, then you can come up with better things and augment it and implement it versus a smash and grab on the other side of just, all right, here's a thing, let's dump it in there. So you're saying is before you jump in the data pool and start swimming around, take a step back, collaborate with your peers or get some kind of game plan. We spent a lot of hours whiteboarding, but I would add to that, to augment that, we spent a lot of time reading the scientific research that's being done by a lot of the teams that are out solving these types of problems and sometimes they come back and say, hey, we tried the solution and it didn't work, but you can learn from those failures just like you can learn from the successes. So I recommend getting out and reading, there's a ton of literature in that space around cyber. So always be moving, always be learning, always be collaborating. It's a moving train. Guys, thanks for the insights, epic session here. Thanks for coming on and sharing your knowledge on theCUBE. theCUBE, we're running one big data source here for you, all the knowledge here at .conf, our seventh year, their 10th year is theCUBE's coverage. I'm John Furrier, we'll be back after this short break.