 Welcome everyone to theCUBE's presentation of the AWS startup showcase. The theme here is data as code. This is season two, episode two of our ongoing series, covering the exciting startups from the AWS ecosystem and talk about the future of data, future of analytics, the future of development and all kind of cool stuff in the cloud. I'm your host, John Furrier. Today we're joined by Ed Bailey, it's a senior technology, technical evangelist at Cribble. Thanks for coming on theCUBE here. I thank you for the invitation. Thrilled to be here. The theme of this session is the observability lake, which I love by the way, I'm gonna get into that in a second. A breach investigation is best friend, which is a great topic. Couple of things, one, I like the breach investigation angle, but I also like this observability lake positioning because I think this is a teaser of what's coming, more and more data usage where it's actually being applied specifically for things. Here it's observability lake. So first, what is an observability lake? Why is it important? Why it's important is technology professionals, especially security professionals need data to make decisions and need data to drive better decisions and need data to understand, just to achieve understanding. And that means they need everything. They don't need what they can afford to store. They don't need, not what the vendor's gonna let them store, they need everything. And I think it's a point of the observability lake because you couple an observability pipeline with the lake to bring your enterprise of data, to make it accessible for analytics, to be able to use it, to be able to get value from it. And I think that's one of the things that's missing right now in the enterprises. Admins are being forced to make decisions about, okay, we can't afford to keep this, we can't afford to keep this. They're missing things, they're missing parts of the picture. And by being able to bring it together to be able to have your cake and eat it to where I can get what I need and I can do it affordably, it is just, I think that's the future and it just drives value for everyone. And it just makes a lot of sense, data lake or the earlier concept, throw everything in the lake and you'll can figure it out, you can query, you can take action on it, real time, you can stream it, you can do all kinds of things with it. Observability is important because it's the most critical thing people are doing right now for all kinds of things from QA, administration, security. So this is where the breach piece comes in. I like that's part of the talk because the breached investigation's best friend, it implies that you got the secret sauce behind it. So what is the state of the breach investigation today? What's going on with that? Because we know breaches, we see them out there but why is this the best friend of a breach investigator? Well, this is unfortunate but typically there's an enormous delay between breach and detection. Right now there's an IBM study, I think it's 287 days but from the actual breach to detection and containment. It's an enormous amount of time. And the key is so when you do detect a breach and bringing in your response team and typically without an observability lake, without cripple solutions around observability pipeline, you're gonna have an incomplete picture. The incident response team has to first understand what's the scope of the breach? Is it one server? Is it three servers? Is it all the servers? You gotta understand what's been compromised and what's the impact? How did the breach occur in the first place? And they need all the data to stitch that together and they need it quickly. The more time it takes to get that data, the more time it takes for them to finish their analysis and contain the breach. I mean, hence the, I think about an 87, 90 days to contain a breach. And so by being able to remove the friction, by being able to make it easier to achieve these goals, what shouldn't be hard by removing that friction you speed up containment and resolution time. Not to mention for many semi administrators, they don't simply have the data because they can't afford to store the data in their sum or they have to go to their backup team to get a restore, which can take days. And so that's, it's just so many obstacles to getting resolution right now. I mean, it's just, you're crawling through glass there, right, because think about it like just the timing aspect. Where is it, where is the data? Where is it stored? And it's relevant and- And do you have it at all? Yeah, you have it at all. And then, you know, that person doesn't work anymore. They change jobs. I mean, who's keeping track of all this? You guys have now this capability where you can come in and do the instrumentation with the Observability Lake without a lot of change to the environment, which is not the way it used to be. Used to be buy a tool, build a platform. Cribble has a solution that eases the struggles for the enterprise. What specifically is that pain point and what do you guys do specifically? Well, I'll start out with kind of example of what drew me to Cribble. So back in 2018, I'm running the Splint team for a very large multinational. The complexity of that we were dealing with the complexity of the data, the demands we were getting from security and operations were just an enormous issue to overcome. I had vendors come to me all the time that will solve your problems, but that means you got to move to our platform where you have to get rid of Splunk or you have to do this and I'm losing something. And what Cribble stream brought into was I could put it between my sources in my destinations and manage my data. And I would have full control over the data. I'd have to lose anything. I could keep continuing to use our existing analytics tools and that sense of power and control. And I don't have to lose anything. I was like, there's something wrong here. This is too good to be true. And so what we're talking about now in terms of breach investigation is that with Cribble stream, I can create a clone of my data to an object store. So this is almost any object store. So it can be AWS, it could be the other vendor object stores, it could be on paramodric stores. And then I can house my data, I can house all my data at the cheapest possible price. So instead of eating up my most expensive storage, I put all my data in my object store and I only put the data I need for detections in my SIM. So if, and hopefully never, but if you do have a breach, Lockstream has a wonderful UI that makes it trivial to then pick my data on my object store and restore it back into my SIM so that my IR team can develop a complete picture of how the breach happened. What's the scope? What is their lateral movement? And answer those questions and it just takes the friction away. Just like you said, just no more crawling over glass, you're running to your solution. Are you mentioned object store and you're streaming that in? You talked about the Cribble stream tool, I'm assuming there, we're streaming the pipeline stuff. But is there a schema involved? Is there database challenges? What, how do you guys look at that? I know you're vendor agnostic. I like that piece. Plug in and you leverage all the tools that are out there, Splunk, DataDog, whatever. But how about on the database side? What's the impact there? Well, so I'm assuming you're talking about the object store itself. So we don't have to apply the schema. We can fit the data to whichever the object store is. We structure the data so it makes it easier to understand. For example, if I want to see communications from one IP to another IP, we structure it to make it easier to see that and query that. But it's just, we're, it's completely vendor neutral and this makes it so simple, so simple to handle. No, no predefined schema needed. No, not at all. And then it made it so much easier. I think we enabled this for the enterprise. I think it took us three hours to do and we were able to then start, I mean, start cutting our retention costs dramatically. Yeah, it's great when you get that kind of value, time to value, it's critical. And all the skeptics fall to the sides pretty quickly. I got to ask you, go ahead. So I would say, I mean, previously, I would have to go to our backup team. We'd have to open up a ticket. We'd have to have a bridge. Then we'd have to go through the process of pulling tape. I mean, it could take, you know, hours, if not days to restore the amount of data we needed. And just, you know, we were able to run to our goals and solve business problems instead of focusing on the process steps of getting things done. All right, so take me through the architecture here and some customer examples because you have the Kribble streaming their observability pipeline. That's key. You mentioned that. And then they build out these observability lakes from that. So what is the impact of that? Can you share the customers that are using that solution? What are they seeing for benefits? What are some of the impacts? Can you give us some specifics? I mean, I can't share with you all the exact customer names. I can definitely give you some examples. Like, referenceable customers would be transunits. So I came from transunit. I was one of their first customers and it solved an enormous number of problems for us. Autodesk is another great example. The idea of that, we're able to automate common data practices. I mean, just for example, we were talking about with backups. You'd have to put a lot of time in to managing your backups and your analytics platforms. You have to, and then you're locked in to custom database scheme is your locked into vendors. And it's also, it's still, it's expensive. So being able to spend a few hours dramatically cut your costs, but still have the data available. And that's the key. I mean, I didn't have to make compromises. Because before I was having to say, okay, we're going to keep this. We're going to just drop this and hope for the best. And we just didn't have to do that anymore. I think for the same thing for TransUdit and Autodesk, the idea that we're going to lower our costs, we're going to, we're going to make it easier for our administrators to do their job. And so they could spend more time on business value fundamentals, like, like responding to a breach. You're going to spend time working with your teams, getting value, observability solutions and stop spending time on writing custom solutions using open source tools. Because your engineering time is the most precious asset for any enterprise. And you've got to focus your engineering time on where it's needed the most. Yeah. And the, the, under underestimate the hassle and cost of ownership of swapping out pre-existing stuff, just for the sake of having a functionality. I mean, that's a big part of it. It's pain. And that, and that's a big thing about Lockstream is that being vendor neutral is so important. If you want to use the Splunk Universal Forwarder, that's great. If you want to use Beats, that's awesome. If you want to use Fluendee, even better. If you want to use all three, you can do that too. It's, it's the customer choice. And we're, we're, we're saying to people, use what suits your needs. And if you want to write some of your data to Elastic, that's great. Some of your data to Splunk, that's even better. Some of it to, some of it to pick your pick Grafana as well, or, or XB. You have the choices to put together, put your own solutions together and put your data where you need it to be. We're not asking you to be only in our ecosystem to work with only our partners. We're, we're letting you pick and choose what, what suits your business. Yeah, you know, that's the, that's the direction. I was just talking about the Amazon folks around their server lists. You know, you can use any tool. You know, you can, they have that core architecture through everything into S3, and then pick whatever you want to use. SageMaker, just that other thing. That's the, this is the new way. That's the way it has to be, to be effective. How do you guys handle that? What's been the reaction from customers? They, do they like roll their eyes and, and doubt you guys? Can you do it? Are they skeptical? How fast can you convert them over? Right. And that's, that's always the challenge. And that's, I mean, the best part of my day is talking to customers. I love hearing feedback, what they like, what they don't, what, what, what they need. And, and of course I was skeptical. I didn't believe it when I first saw it. Because I was like this, you know, because I was used to being locked in. I was used to having to put a lot of effort, a lot of custom code. Like, what do you mean? It's this easy. I believe I did the first, so this is 2018. And I did our first demo. It was like 30 minutes in. And I cut about a half million dollars out of our license in the first 30 minutes in our first demo. And I was, I was stunned because I mean, it's like, this is, this is, this is easy. I mean, this is, this is the, yeah, exactly. I mean, this is the, I mean, this is the future. And then for example, we needed to bring in, so like the security team wanted to bring in a UBA solution that wasn't part of the vendor ecosystem that we were in. And I was like, not a problem. We're going to use Logstream. We're going to clone a copy of our data to the UBA solution. We were able to get value from this UBA solution in weeks. What typically is a six month cycle to start getting value. And it just, it was just too easy. And the best part of it. And the thing is that just struck me was my engineers can now spend their time on delivering value instead of integrations and moving data around. Yeah. And also spending more time preventing breaches. But what's interesting is counterintuitive here is that if you, as you add more flexibility and choice you'd think it'd be harder to handle a breach, right? So now let's go back to the scenario now. You guys say an organization has a breach and they have the observability pipeline. They've got the lake in place, your observability lake. Take me through the investigation. How easy is it? What happens? How they start it? What goes on? So once your sock detects a breach, then they bring in the, typically you're going to bring in your incident response team. So what we did, and this is one more way that we removed that friction, we cleaned up the glass, is we delegate to the answer response team the ability to restore. We call it, so if Cribble calls it replay, replay data out of your object store back into your sum. There's a very nice UI that gives you the ability to say, I want data from this time period to this time period. I want it to be all the data or their ability to filter and say, I want this IP, just this IP. For example, if I detected, okay, this IP has been breached then I'm going to pull all the data that mentions this IP in this timeframe, hit a button and it just starts. And then it's going to restore how fast your IOPS is for your solution. And then it's back in your tool. One of the things I also want to mention is we have an amazing enrichment capability. So one of the things that we would do is we would have pipelines. So as the data comes out of the object store, it hits the pipeline and then we enrich it. We use GUIP information, reverse DNS. It gets processed through threat intel feeds. So the data is already enriched and ready for the incident response people to do their job. And so it just, it removes the friction of getting to the point where I can start doing my job. You know, at this, this theme, this episode of the showcase is about data as code. And which is, you know, we've been, I've been saying this on theCUBE for since it was around 13 years ago that developers are going to be dealing with data like they deal with software code. And you start to see, you mentioned enrichment. Where do you see data as code going? How relevant is it now? Because what you're really talking about is when you add machine learning in here, that has to be enriched and iterated on too. We're talking about taking things off a branch and putting it back into the core. This is a data discussion. This isn't software, but it sounds the same. Right. And this is something that the irony is that I remember first time saying that to an auditor, I was constantly going with auditors. And that's what I, that's what I describe as I'm going to show you the code that manages the data. This is the data as code that's going to show you how we transform it, how we secure it, where the data goes, how it's enriched. So you can see the whole story, the data lifecycle in one place. And that's, and that's what we had, how we handled our auditors. And I think that is enormously, you know, positive because it's so easy to confuse. It's so easy to have complexity to get in the way of progress. And by being able to represent your data as code, it's a step forward. Is the amount of data and the complexity of data, it's not getting simpler. It's getting more complex. So we need to come up with better ways to handle it. Now you've been on both sides of the fence. You've been in the trenches as a customer now. You're a supplier with a great solution. What are people doing with this data engineering roles because it's not enough data engineering? I mean, cause if you say data as code, if you believe that to be true, and many people do, we do. And you look at the history of infrastructure as code that enabled DevOps, AI ops, ML ops, data ops. It's happening, right? So data stack ops is coming. You obviously secure is huge in this. How does that data engineering role evolve? Because it just seems more and more that there's going to be a big push towards an SRE version of data, right? I completely agree. I was working with a customer yesterday and I spent a large part of our conversation talking about implementing development practices for administrators. It's a new role. It's a new way to think of things. Cause traditionally your Splunk or Elastic administrators is talking about operating systems and memory and talking about how to use proprietary tools in the vendor that's just not quite the same. And so we started talking about you need to start with getting used to code reviews. The idea of getting used to making sure everything has a comment is one thing I told him. It was like, you have a function has to have a comment just by default, just it has to. Yeah, the standards of how you write things, how you name things all really start to matter. And also you got to start adding you're considering your skill set. And this is something probably one of the best hires I ever made was I heard a guy with a math degree because it needed his help to understand how machine learning works, how to pick the best type of algorithm. And I think this is going to evolve that you're going to be just away from the gray bearded administrator to some gray bearded administrator with a math degree. It's interesting. It's a step function. You have a data engineer who's got that kind of capabilities like what the SRA did with an infrastructure, the step function of enablement the value creation from really good data engineering puts the democratization playback on the table and changes that entire landscape. How do you, what's your reaction to that? I completely agree. Cause so operational data. So operational security data is the most volatile data in the enterprise. It changes on a whim. You have developers who change things. They don't tell you what happens. Vendor doesn't tell you what happened. And so that idea that lifecycle of managing data to the same types of standards of disciplines that database administrators have done for years, it's going to have to have to filter down into the operational areas. And you need tooling that's going to give you the ability to manage that data, manage it and flight in real time in order to drive detections, in order to drive response, all those business value things we've been talking about. So I got to ask you the larger role that you see with observability, like we were talking before we came on camera live here about how exciting this kind of concept is and you were attracted to the company because of it. I love the observability at Lake concept because it puts all that data in one spot. You can manage it. But you've got machine learning and AI around the corner that also can help. How is all this change in the landscape of data security and things because it makes a lot of sense. And I can only see it gets better with machine learning. Right, totally. And so the core issue, and I don't want to say, so when you talk about observability, most people have assumptions around observability is only an operational or an application support process. It's also a security process. The idea that you're looking for your unknown unknowns. This is what keeps security administrators up at night is I'm being attacked by something I don't know about. How do you find those unknown, and that's where your machine learning comes in. And that's where that you have to understand there's so many different types of machine learning algorithms. Some of the guy that I hired and he started educating me about the team number of algorithms and how it applies to different data and how you get different value, how you have to test your data constantly. There's no such thing as the magical black box of machine learning that gives you value. You have to implement, but just like the developer practices to keep testing and over and over again, data scientists, for example. The best friend of a machine learning algorithm is data, right? You got to keep feeding that data. And when the data sets are baked and secure and vetted even better, all cool. Ed, great stuff, great insight, congratulations Gribble, great solution. Love the architecture, love the pipelining of the observability data and streaming that in to a lake. Great stuff. Give a plug for the company where you guys are at where people can get information. I know you guys got a bunch of live feeds on YouTube, Twitch, here on theCUBE. Where else can people find you? Give the plug. Oh please, please join our Slack community. Go to cribble.io slash community. We have an amazing community. This was another thing that drew me to the company is have a large group of people who are genuinely excited about data, about managing data. If you want to try Cribble out, we have some great tool, try Cribble tools out. We have a cloud platform, one terabyte of free data. So go to cribble.io slash cloud or cribble.cloud, sign up and sign up for, you know, it just never times out. You're not 30 days, it's forever up to one terabyte. Try out our new products as well, Cribble Edge. And then finally, come watch Nick Hudecker and I every Thursday, 2 p.m. Eastern. We have a live streams on Twitter, LinkedIn and YouTube live. So just my Twitter handle is ebaily1367. Love to have, love to chat, love to have these conversations. And also we are hiring. All right, good stuff. Great team, great, great concepts, right? Of course, we're theCUBE here. We got our video late coming on soon. I think I love this idea of having this video. Hey, video is data too, right? I mean, we've got to keep it. I love it. I love videos. It's awesome. It's a great way to communicate. It's a great way to have a conversation. That's the best thing about us having conversations. I appreciate your time. Thank you so much, Ed, for representing Cribble here on the Data as Code. This is season two, episode two of the ongoing series covering the hottest, most exciting startups from the AWS ecosystem talking about the future data. I'm John Furrier, your host. Thanks for watching. All right, thank you.