 So-called big data promised to usher in a new era of innovation where companies competed on the basis of insights and agile decision-making. There's a little question that social media giants, search leaders and e-commerce companies benefited. They had the engineering chops and the execution capabilities to take troves of data and turn them into piles of money. But many organizations were not as successful. They invested heavily in data architectures, tooling and hyper-specialized experts to build out their data pipelines, yet they still struggle today to truly realize their big data. Data in their lakes is plentiful, but actionable insights aren't so much. Chaos Search is a cloud-based startup that wants to change this dynamic with a new approach designed to simplify and accelerate time to insights and dramatically lower costs. And with us to discuss this company and its vision for the future is Cubulum, Ed Walsh. Ed, great to see you. Thanks for coming back on theCUBE. I always love to be here. Thank you very much. It's always a warm welcome. Thank you. All right, so give us the update. You guys have had some big funding rounds. You're making real progress on the tech, taking it to market. What's new with Chaos Search? Sure, actually even a lot of good and exciting things are happening. In fact, just this month, we made some, you know, obviously announced some pretty exciting things. So we unveiled what we consider the industry-first multi-model data lake platform that we allow you to take, put your data in S3. In fact, if you want to show the image you can, but basically we allow you to put your data in S3. And then what we do is we activate that data. And what we do is a full index of the data and makes it available through open APIs. And the key thing about that is it allows your end users to use the tools they're using today. So simply put your data in your cloud-based search. Think Amazon S3 or Glacier. Think of all the different data sets at a natural act. And then we do the hard work. And the key thing is you get one unified data lake, but it's a multi-mode model access. So we expose APIs like the Elasticsearch API. So you can do things like search or using Kibana, do log analytics, but you can also do things like SQL, use Tableau Looker or bring relational concepts into Kibana, things like joins and the data backend. But it allows you also do machine learning, which is early next year. But what you get is that with that, because of a data lake philosophy, we're not making new transformations without all the data movement. People typically land data in S3 and we're on shoulders of giants with S3. There's not a better, more cost-effective platform, more resilient. There's not a better queuing system out there. And it's on a cost curve that you can't beat. But basically, so people store a lot of data in S3, but basically what you have to do is you ETL out to other locations. What we do is allow you to literally keep it in place. We index it in place. We write our hot index, that's a rewrite index and allow you to go after that, but publish and open the APIs. But what we avoid is the ETL process. So what our index does is looks at the data and does full schema discovery, normalization. We're able to give sample sets. And then the refinery allows you to advance transformations using code. Think about using SQL or using Regex to change that data, pull the data apart, hide things, but use role-based access to give that to the end user. But it's in a format that their tools understand. Cabana will use the Elasticsearch API or using Elasticsearch API calls, but also SQL and go directly after data. By doing that, you get a data lake, but you haven't had to take the three weeks to three months to transform your data. Everyone else makes you, and you talked about the failure of data lakes. The idea of data lakes was put your data there in a very scalable, resilient environment. Don't do transformation. It was too hard to structure for databases and data warehouses. Put it there. We'll show you how to get value out. Largely undelivered, but we're that last mile. We do exactly that. Just put it in S3 and we activate it. And activate it with APIs that the tools of your analysts use today or what they want to use in the future. That is what's so powerful. So basically we're on the shoulders of giants with S3, put it there and we light it up. And that's really the last mile, but it's this multimodal, but it's also this lack of transformation. We can do all the transformation but it's all done virtually and available immediately. You're not doing extended ETL projects with big teams moving around a lot of data in the enterprise. In fact, most of the time they land it in S3 and they move it somewhere and they move it again. And what we're saying is, no, just leave it in place. We'll index it and make it available. Well, so the reason they, this is interesting, Ed. So the reason they want to move, I mean, you know, S3 was the original, you know, sort of object store for cloud and it was a cheap bucket, okay? But it's become much more than that. When you talk to customers, and they're like, hey, I have all this data in S3. I want to do something with it. I want to apply machine intelligence. I want to search it. I want to do all these things. But you're right. I have to move it oftentimes to do that. So that's a huge value. Now, can I, are you available in the AWS Marketplace yet? No, in fact, that was the other announcement we can talk about. So our solution is 100% available at AWS Marketplace, which is great for clients because they can burn down their credits with Amazon. Yeah, that's super. Great news there. Now, let's talk a little bit more about data lakes. You know, the old joke is the tongue in cheek was data lakes become data swamps. So there's no schema on right. Oh, great. I can put everything into the lake. And then it's like, okay, now what? So maybe double click on that a little bit and provide a little bit more detail as to your vision there and your philosophy. So if you could put things in data lake and get after it with your own tools on Elastic or Search, of course you'd do that. If you didn't have to go through that. But everyone thinks it's a status quo. Everyone is using, you know, everyone has to put it in some sort of schema in a database before they can get access. That's what everyone does. They move it someplace to do it. Now they're using 1970s and maybe 1980s technology and they say, hey, I'm going to put it in this database. It works on the cloud and you can go after it. But you have to do all the same pain of transformation which is what takes human, we use time, cost and complexity. It takes time to do that. To do a transformation for an user, it takes a lot of time but it also takes a team's time to do it with DBAs and data scientists to do a stack of that. And it's not one thing going on. So it takes three weeks to three months in enterprise. It's a cost and complexity, but all these pipelines for every data request, you're trying to give them their own data set. It ends up being data puddles all over this. It might be in your data lake, but it's all separated. Hard to govern, hard to manage. What we do is we stop that. What we do is literally we index in place. Your data is already in S3, typically you're e-tailing it out. You can continue doing that. We literally are just one more use of the data. We do read only access. We do not change that data. And you give us a place in your S to write our index. It's a full rewrite index. Once we did that, that allows you with the refinery to make that, we just, we activate that data. It will immediately fully index what's performant from Kibana. So you no longer have to take your data and move it and do a pipeline into Elasticsearch which becomes kind of brittle at scale. You're able to have the scale of S3 but use the exact same tools you do today. And what we find for like log analytics is, it's a slightly different use case for log analytics or value prop than BI or what we're doing with product-led companies. But if you logs, we're saving clients 50 to 80% on the hard dollars that they need a month. They're going from very limited data sets to unlimited data sets, whatever they want to keep in S3 or Glacier, but also they're getting away from the brittle data layer, which is the Lucene environment, which any of the data layers hold you back because it takes time to put it there, but more importantly, it becomes brittle at scale where you don't have any of that scale issue when using S3 as your data lake. So what are the big use cases, Ed? You mentioned log analytics, maybe you could talk about that. And are there any others that are sort of forming in the marketplace, any patterns that you see? Because of the multi-model, we can do a lot of different use cases, but we always work with clients on high ROI use cases, why? The big bang theory of do a data lake and put everything in it, it just proven not to work, right? So what we're focused at first use case of log analytics, why? As by way with COVID, everything had a tipping point, right? People were bimodal, save money here, invest it here. It went quickly to, no, no, we're going cloud native, and we have to, and then on top of it, it was how do we efficiently innovate? So the tipping point happened, everyone's going cloud native. Once you go cloud native, the amount of machine-generated data that comes from the environment, dramatically, it just explodes. You're not managing hundreds or thousands or maybe 10,000 endpoints. You're dealing millions or billions. And also you need this insight to get insight out. So logs become one of the things you can't keep up with it. I think I mentioned, we went to a group of end users. It was only 60 enterprise clients, but we asked them, what's your capture rate on logs? And they said, what do you want it to be? 80%, actually 78% said, listen, we want 80, capture 80 to 100% of our logs. That would be the idea, not everything, but we need most of it. And then the same group, what are you doing? Well, 82% had less than 50%. They just can't keep up with it. And every other thing, including elastic and Splunk, they work hard to the process to narrow and keep less and less data. Why? Because they can't handle the scale. We just say, land it there, don't transform, we'll make it all available to you. So for log analytics, especially with cloud native, you need this type of technology, and you need to stop, it feels so good when you stop hitting your head against a wall, right? This ETL process at this type of scale just doesn't work. So that's exactly what we're delivering. The second use case, and that's with using elastic API, but also using SQL to go after the same data representation. And we come out with machine learning, you can also do anomaly detection on the same data representation. So for a log analytic use case, SREs, DevOps, SecOps, it's a huge value problem. Now, the same platform, because it has SQL exposed, you can do just what we use to turn is agile BI. People are using, you think about Looker, Tableau, Power BI, MetaBase, think of all these tool sets that people want to give and use your business or come back to the centralized team every single week, ask you for new data sets. And they have to be set up like a data set. They have to do an ETL process to give access to that data, where because of the way just land it in the bucket, if you have access to that with role-based access, I can literally get you access that with your tool set, let's say Tableau, Looker, these different data sets, literally in five minutes, and now you're off and running. And if you want a new data set, they give you another virtual and you're off and running, but with full governance. So we can, it used to be in BI, you either had self-service or centralized. Self-service would kind of out of control, but we can move fast in the centralized team as well. It takes me months, but at least I'm getting control. We allow you to do both, fully govern, but self-service. Right, right. I got Cognos, I got Tableau, or I got Excel, right? And it's like trade off on each of the pieces of the triangle. Right, and then make it easy. We'll just put it in a data source and you're done. But the problem is you have to ETL in a data source, and that's what takes the three weeks to three months in enterprise, and we do it virtually in five minutes. So now the third is actually think about, and it's kind of a combination of the two. Think about, I love the beers and diapers stories. So think about early days of Teradata, where they looked at sales out data for business, and they were able to look at all the sales out data, large relational environment, look at it, and they crunch all these numbers, and they figured out by different location of products from the start of this sell more particular things. And they came up with one analogy which everyone talked about is beers and diapers, if you put them together, you sell more of them, why? Because in the afternoon, for anyone that has kids, you picked up diapers and you might want to grab a beer if you're home with the kids. But that analogy 30 years ago, it's now, well, what's the shelf space now for a proclad company? It is the website. It's actually, and what's the data coming from there? It's actually the app logs, and you're not capturing them because you can't in these environments or you're capturing the data, but everyone's telling you, no, no, you've got to do an ETL process to keep less data. You've got to select, you've got to be very specific because it's going to kill your budget. You can't do that with elastic or Splunk. You've got to keep less data and you don't even know the questions you're going to ask. With us, bring all the app logs, just land it in S3 or Glacier, which is the most, it's really shoulders of giants, right? There's not a better platform cost effectively, security, resilience or throughput. Think about what you can stream in the queue. It's the best queuing platform I've ever seen in the industry. Just land it there and it's also very cost effective. We also compress the data. So by doing that, now you match that up with actually relatively small amount of relational data and now you have the exact same beer and data, but instead it's like this user's using that use case and our top users are always, they start with this one, then they use that feature and that feature. Hey, we just did new pricing. It's affecting these clients and that clients. By doing this, we get that, but you need that data and people aren't able to capture it with the current platforms. A data lake, as long as you can make it available hot, is the way to do it. And that's what we're doing, but we're unique in that. Other people are making ETL it and put it in a 1970s or 1980s data format called a schema. And we avoided that because we basically make S3 a hot analytic database. So okay, so I want to land on that for a second because I think sometimes people get confused. I know I do sometimes about chaos there. It's like, sometimes don't know where to put you. I'm like, okay, observability, that seems to be a hot space. Of course, log analytics is part of that. BI, Agile BI, you called it, but there's players like Elasticsearch, there's Starburst, there's Datadogs, Databricks, Dremios, Snowflake. I mean, where do you fit? What's the category and how do you differentiate from players like that? Yeah, so we went about it fundamentally different than everyone else. Six years ago, Tom Hazel and his band of married men and women came up and designed it from scratch. They made basically S3. They purposely built, make S3 a hot analytic environment with open APIs. By doing that, they kind of changed the game. So we deliver upon the true promises. Just put it there and I'll give you access to it. No one else does that. Everyone else makes you move the data and put it in a schema of some format to get to it and they try to put. So if you look at Elasticsearch, why are we going after that? It just happens to be an easy, logs are overwhelming you once you go to cloud native. You can't afford to put it in a Lucene, the Elk stack, L is for Lucene, it's inverted index, start small, great. But once you'd now grow, it's now not one server, five servers, 15 servers, you lose a server, you're down for three days because you have to rebuild the whole thing. It becomes brittle at scale and expensive. So you trade off, I'm going to keep less or keep less either from retention or data. So basically by doing that, so Elastic, we're not, we have no Elastic on the covers, but we allow you to, we'll index the data in S3 and you could access it directly through a Kibana interface or an open search interface. It's the API out, it's just an API. Open APIs. It's, and by doing that, you've avoided a whole bunch of time costs of complexity, time of your team to do it, but also the time to results, the delays of doing that, costs, it's crazy. We're saving 50 to 80% hard dollars while giving you unlimited retention where you were dramatically limited before us. And it's a managed service, you don't have to manage that kind of clump, not when it starts small, when it starts small, it's great. Once at scale, that's a terrible environment to manage at scale. When you end up with not one Elastic Search Cluster, dozens. I just talked to someone yesterday that had 125 Elastic Search Clusters because of the scale. So anyway, that's where Elastic, we're not a, if you're using Elastic at scale and you're having problems with a tradeoff of cost, time and the scale, we become a natural fit and you don't have to change what your end users do. Yeah, but so the thing, though, Ed, if people hear this, they'll go, wow, that sounds so simple. Why doesn't everybody do this? The reason is it's not easy. You said, Tom and his merry band. This is really hardcore tech. And it's not trivial what you've built. What's, talk about your secret sauce. Yeah, so it is a patented technology. So if you look at our, the component for architecture is basically a large part of a 90% of our value at is actually S3. I got to give S3 full kudos. They built a platform that were on shoulders of giants. But what we did is we purpose built to make an object storage, a hot and like database. So we have an index, like a database. And we basically did that layer. You bring a refinery to be able to, do all the advanced type of transformation done, but all virtually done, because we're not changing the source of record. We're changing the virtual views. And then a fabric allows you to manage and be fully Elastic. So if we have a big queries, because we have multiple clients with multiple use cases, each multiple petabytes. So we're spending up 1800 different nodes after a particular environment. But even with all that, we're saving them 58%. But it's really the patented technology to do this. It took us six years. By the way, that's what it takes to come up with this. I come upon it. I knew the founder. I've known Tom A. Sable for a while. And his first thing was he figured out the math and then the math worked out. It's deep tech. It's hard tech. But the key thing about it is we've been in market now for two years, multiple use cases in production at scale. Now what you're doing is roadmap. We're adding APIs. So now we have Elastic Search, natural proof point. Now you're adding SQL, allows you to open up new markets. But the idea for the person dealing with, so we believe we deliver on the true promise of data lakes. And the promise of data lakes was put it there. Don't focus on transferring. It's just too hard. I'll get insights out. And that's exactly what we do. But we're the only ones that do that. Everyone else makes you ETL at places. And that's the innovation of the index and the refinery that allows the index in place and give virtual views in place at scale. And then the open APIs, to be honest, I think that's a game. Give me an open API. Let me go after it. I don't know what tool I'm going to use next week. Every time we go into account, they're not a looker shop or a tablet shop or a quick site shop. They're all of them. And they're just trying to keep up with their businesses. And then the ability to have role-based access where actually can give, hey, get them their own bucket, give them their own refinery. As long as you have access to the data, they can go to their own manipulation, ends up being just, that's the true promise of data lakes. Once we come out with machine learning next year, now you're going to rip through the same industry. And the way we structure the data in matrices, it's a natural fit for things like TensorFlow and PyTorch. But that's going to be next year, just because it's a different persona. But the underlining architecture has been built. What we're doing is trying to use case at a time. So we work with our clients, say, it's not a big bang. Let's nail a use case that works well. Great ROI, great business value for a particular business unit. And let's move to the next. And that's how I think it's going to be released. If you think about, and Gertner talks about, if you think about what really got successful in data where else in the past, that's exactly, it wasn't the big bang. It was, let's go and nail it for particular users. And that's what we're doing now in our, because it's multi-model, there's a bunch of different use cases. But even then, we're focusing on these core things that are really hard to do with other relational-only environments. Yeah, well, I could see why you're still, because, you know, haven't been, well, you and I have talked about the API economy forever. And then you've been in the storage world so long, you know what a nightmare is to move data. We got to jump, but I want to ask you, I want to be clear on this. So you are, you're a cloud, cloud native. I talked to Frank Slutman maybe a year ago, and I asked him about it on-prem, and he's like, no, we're never doing the halfway house. We are cloud all the way. I think you're, I think you have a similar answer. What's your plan on hybrid? Okay, we get, there's nothing about technology we can't go on, but we are 100% cloud native. We're only in the public cloud. We believe that's a trend line. Everyone agrees with us. We're sticking there. That's for the opportunity. And if you're going to run analytics, there's nothing better than getting to the public cloud like Amazon to do exactly that. We're 100% cloud native. We love S3 and what would be a better place to put this is put in S3. And we just let you light it up and then I guess if I'm going to add the commercial and buy it through Amazon Marketplace, which we love that business model of Amazon. It's great, Ed. Thanks so much for coming back to theCUBE and participating in the startup showcase. Love having you and best of luck. Really exciting. Hey, thanks again. Appreciate it. All right, thank you for watching everybody. This is Dave Vellante for theCUBE. Keep it right there.