 From around the globe, it's theCUBE with digital coverage of AWS re-invent 2020. Special coverage sponsored by AWS Global Partner Network. Hello and welcome to theCUBE virtual and our coverage of AWS re-invent 2020 with special coverage of APN partner experience. We are theCUBE virtual and I'm your host, Justin Warren and today I'm joined by Ed Walsh, CEO of Chaos Search. Ed, welcome to theCUBE. Well, thank you for having me. I really appreciate it. Now, this is not your first time here on theCUBE. You're a regular, regular here. Lovely to have you back. No, I love the platform. You guys are great. So let's start off by just reminding people about what Chaos Search is and what do you do there? Sure. The best way to say is, so Chaos Search helps our clients know better. We don't do that by a special anecdotal wizard or a widget that you give to your SecOps teams. What we do is a hard work to give you a data platform to get insights at scale. And we do that also by achieving the promise of data lakes. So what we have is a Chaos data platform connects and indexes data in a customer's S3 or Glacier account. So inside your data lake, not our data lake, but renders that data fully searchable and available for analysis. Using your existing tools today, because what we do is index it and publish open APIs like Elasticsearch API and soon SQL. So give you an example. So based upon those capabilities, we're an ideal replacement for a commonly deployed either Elasticsearch or ElkStack deployments if you're hitting scale issues. So we talk about scalable log analytics and more and more people are hitting these scale issues. So let's say if you're using Elasticsearch, Inc. or Amazon Elasticsearch and you're hitting scale issues, what I mean by that is I can't keep enough retention. You want longer retention or it's getting very expensive to keep that retention or because of scale you're hitting where you have availability where the cluster is hard to keep up running or it's crashing. That's what we mean by the issues at scale. And what we do is simply we allow you because we're publishing the open API of Elasticsearch use all your tools, but we save you about 80% off your monthly bill. We also give you, and it's an and statement and give you unlimited retention and as much as you want to keep on S3 or Integlacier. But we also take care of all the hassles and management and the time to manage these clusters which ends up being on a database server called Lucene and we take care of that as a managed service. And probably the biggest thing is all this without changing anything your end users are using. So we include Kibana, but imagine it's an Elastic API. So if you use the API or your Kibana, it's just easy to use the same tools you use today but you get the benefits of a true data lake. In fact, we're running now Elasticsearch on top of S3 natively, if that makes sense. Ryan, natively is pretty cool. And look, 80% savings is a dramatic number. Particularly this year, I think there's a lot of people who are looking to save a few quid. So it'd be very nice to be able to save up to 80%. I am curious as to how you're able to achieve that kind of saving though. You don't know, you wouldn't be the first person to ask me that. So listen, Elastic came around, you had Splunk and we also have a lot of Splunk clients, but Elastic was a more cost-effective solution, open source to go after it. But what happens is, especially at scale, if it's small, it's actually very cost-effective, but underneath Elastic stack, Elk stack is a Lucene database. It's a database technology. And that sits on servers that are heavy memory count, CPU count and SSDs. So you can do on-prem or even in the cloud. So if you do it in Amazon, basically you're spending up a server and it stays up. It doesn't spin up, spin down. So those clusters is not one server, it's a cluster of those servers. And typically if you're of any scale, you're actually having multiple clusters because you don't dare put it on one for different use cases. So our savings are actually, you no longer need those servers to spin up and you don't need to pay for those seen underneath. You can still use Kibana and your API, but literally it's $80 off your bill that you're paying for your service now. And it's hard dollars. So it's not, and we typically see clients between 70 and 80% and it's up to 80, but it's literally right within a 10% margin that you're saving a lot of money. But more importantly, saving money is a great thing, but now you have one unified data lake that you can have your user go across some of the data or all the data through the role-based access. You can give different people, like we've seen people say, hey, give that help desk person 14 days of this data, but the SecUp team gets to see across all the different machine generated data they have. And we can give you a couple of examples of that and walk you through how people deploy it if you want. I'm always keen to hear specific examples of how customers are doing things. And it's nice that you've sort of drawn that comparison there around what cloud is good for and what it isn't. I often like to say that AWS is cheap to fail in, but expensive to succeed. So when people are actually succeeding with this and using this broad amount of data, so what you're saying there with that savings, I've actually got access to a lot more data that I can do things with. So yeah, if you could walk through a couple of examples of what people are doing with this increased amount of data that they have access to in Chaos Search, what are some of the things that people are now able to unlock with that data? Well, literally, it's always good for a customer size. So we can go through, and we can go through however much we want. Cliner, Blackboard, AlertLogix, ArmourSecurity, HubSpot, I'll start with that. Start with HubSpot. One of our good clients, they were doing some Cloudflare data was one of their clusters that you're using last at Search 4. They were looking to look at denial service, and they were, we find everyone kinda at scale, they get limited. So they were down to five days retention. Why? Well, it's not that they meant to, but basically they couldn't cost effectively handle that in the scale. And also they're having scale issues with the environment, how they set the cluster and sharding, and when also a denial service stack would happen, that's when the influx of data, one thing about scale is how fast it comes at ya. Another one is how much data you have. But this is as a data was coming after a denial service, that's when the cluster would actually go down, believe it or not, right when you need your log analysis tools. So what we did is because they're just using Kibana, it was easy swap. They ran in parallel because we published the open API, but we took them from five days to 90 days. They could keep as much as they want, but 90 days for denial service was what they wanted. And then we did save them in over $4 million a year and hard dollars what they're paying in their environment from really savings on the server farm and a little bit on the elastic search stack. But more importantly, they had no outages since. Now, here's the thing, you're talking about the use case, they also had other clusters and you find everyone does that, they don't dare put it on one cluster, even though these are not one server, they're multiple servers. So the next use case, Cloudflare was one, the next use, and it was a 10 terabyte a day influx, kept it for 90 days, so it's about a petabyte. They brought another use case on, which was Netmon, again, network monitoring. And again, having the same scale issue retention error. And what they're able to do is easily roll that on, so now it's one data platform. Now they're adding the next one, they have about four different use cases and it's just different clusters that you will bring together. But now what they're able to do, give you use cases, either they get more cost effective, more stability and free that we say saves you a lot of time, cost and complexity, just the time to manage that, get the data in, the complexities around it and then the cost is easy to kind of quantify, but they've got better. But more importantly, now for particular teams, they only need their access to one data, but the second team wants to see across all the data and it's very easy for them to see across all the data where before it was impossible to do. So now they have multiple large use cases streaming at them and what I love about that particular case is, at one point they were just trying to test our scale, so they started tossing more things at it, to see if they could kind of break us. So they spiked us up to 30 terabytes a day, which is four elastic, even 10 terabytes a day makes things fall over. Now, if you think of what they just did, what we're doing is literally three steps, put your data in S3 and as fast as can, don't modify, just put it there. Once it's there, three steps, connect to us, you give us read only access to those buckets and a place to write the indexy. All that stuff is in your S3, it never comes out. And then basically you set up, do you wanna do live or do you wanna do real-time analysis or do you wanna go after old data? We do the rest, we ingest, we normalize the schema and basically we give you our back in a refinery to give the right people access. So what they did is they basically throw a whole bunch of stuff at it. They were trying to outrun S3. So, we're on shoulders of giants. If you think about our platform for clients, what's a better data lake than S3? You're not gonna get a better cross-curve, right? You're not gonna get a better parallelism or security, it's in your virtual environment. But, and also you can keep data in the right location. So Blackboard's a good example, they need to keep data in all the different regions and because it's personal data, GDPR, they gotta keep data in that location. It's easy, we just put compute in each one of the different areas they are. But the net net is, if you think of architecture, is shoulders of giants. If you think you can outrun by just your volume or you can put it in a more cost-effective place to keep long-term or you think you can out-store you have so much data that S3 and Glacier can't possibly do it, then you got me. You're bigger scale than me, but that's the scale we're talking about. So if you think about, they spiked our throughput. What they really did is they tried to outrun S3. We didn't pick up. Now the next thing is they tossed a bunch of users at us, which we're just spinning up in our data fabric different ways to do the indexing to keep up with it and new use cases in case you're going after, everyone gets their own worker nodes which all are expected to fail in place. So again, they did some of that, but really they're like, you guys handled all the influx. And if you think about it, it's the shoulders of giants being on top of an Amazon platform, which is amazing. You're not gonna get a more cost-effective data lake in the world and it's continuing to fall in price. And it's a cost curve like no other, but also all that resiliency, all that security and the parallelism you can get out of an S3 Glacier is just, bar none is the most scalable environment you can build environment. And what we do is a thin layer. So data platform that allows you to have your data now fully searchable and queryable using your tools. Right, and you mentioned there that I mean, you're running in AWS, which has broad experience in doing these sorts of things at scale, but that operational management side of things. As you mentioned, you actually take that off the hands of customers so that you run it on their behalf. What are some of the areas that you see people making in trying to do this themselves when you've gone into customers and brought it into the KL Search platform? Yeah, so either people are just trying their best to build out clusters of Elasticsearch or they're going to services like Logs.io, Sumo Logic or Amazon Elasticsearch services. And those are all basically on the same Elk stack. So they have the exact same limits, it's the same bits. Then we see people trying to say, well, I really want to go to a data lake. I want to get away from these database servers and which have their limits. I want to use a data lake. And then we see a lot of people putting data into environments before they, instead of using Elasticsearch, they want to use SQL type tools. And what they do is they put it into a Parquet or Presto, it's a Presto dialect, but it into Parquet and structure it. And they go a lot out of the way to, hey, it's in the data lake, but they end up building these little islands inside their data lake. And it's a lot of time to transform the data to get it in a format that you can go after with tools. And then what we do is, we don't make you do that. Just literally put the data there. And then what we do is we do the index in a publish API. So right now it's Elasticsearch in a very short time, we'll publish Presto or the SQL dialect and you can use the same tool. So we do see people either brute forcing and trying their best with a bunch of physical servers. We do see another group that says, you know, I want to go use an Athena use cases or I want to use, there's a whole bunch of different startups saying, I do data lake or data lake houses. And but they're, what they really do is force you to put things in structure before you get insight. The true data lake economics is literally just put it there and then use your tools natively to go after it. And that's where we're unique compared to what we see from our competition. Hmm. So with people who have moved into KL Search, what's, let's say pick one, if you can, the most interesting example of what people have started to do with their data. What's new? That's good. Well, I'll give you another one. And so Armor Security is a good one. So Armor Security is a security service company, you know, thousands of clients doing great. I mean, a beautiful platform, beautiful business. And they won Rackspace as a partner. So now imagine 1,000 clients, but now, you know, massive scale that they keep up with. So that would be an example of another example where we were able to come in and they were facing a major upgrade of their environment just to keep up. And they expose actually to their customers is how their customers do log analytics. What we're able to do is literally simply because they didn't go below the API, they use the exact same tool set on top and in 30 days replaced that use case, save them tremendous amount of dollars, but now they're able to go back and have unlimited retention. They used to restrict their clients to 14 days. Now they have an opportunity to do a bunch of different things, possible revenue opportunities and other, but allow them to look at their business differently and free up their team to do other things. And now they're putting building and other things into the same environment with us because one is easy, it's scale, but also freed up their team. No one has enough team to do things. And then the biggest thing is what people do interesting with our product is actually in their own tools. So, you know, we talk about Kibana, when we do SQL, we're gonna talk about Looker and Tableau and Power BI, you know, the really interesting thing, and we did the hard work on the data layer which you can say is, you know, I can talk about all the ways you're consolidating the performance. Now what becomes really interesting is what they're doing at the visibility level, either Kibana or the API or Tableau Looker. And the key thing for us is we just say just use the tools you're used to. Now that might be a boring statement, but to me, a great value proposition is not changing what your end users have to use. And they're doing amazing things. They're doing the exact same things they did before. They're just doing it with more data at bigger scale and also they're able to see across their different machine learning data compared to being limited to going at one thing at a time. And that getting the correlation from a unified data lake is really what we, you know, we get very excited about. What's most exciting to our clients is they don't have to tell their users they have to use a different tool, which, you know, we'll decide if that's really interesting in this conversation. But again, I always say we don't, we didn't build a new algorithm that you're gonna give the SecOp team or a new pipeline cool widget that gonna help the machine learning team, which is another API we'll publish. But basically what we do is a hard work of making the data platform scalable, but more importantly, give you the APIs that you're used to. So it's a platform that you don't have to change what your end users are doing, which is, so we're kind of invisible behind the scenes. Well, that's certainly a pretty strong proposition there. And I'm sure that there's plenty of scope for customers to come and come and talk to you because no one's creating any less data. So, Ed, thanks for coming onto theCUBE. It's always great to see you here. No, thank you. You've been watching theCUBE virtual and our coverage of AWS re-invent 2020, the special coverage of APN partner experience. Make sure you check out all our coverage online, either on your desktop, mobile, on your phone, wherever you are. I've been your host, Justin Warren, and I look forward to seeing you again soon.