 Welcome to theCUBE, I'm Dave Vellante. Today we're going to explore the ebb and flow of data as it travels into the cloud and the data lake. The concept of data lakes was alluring when it was first coined last decade by CTO James Dixon. Rather than be limited to highly structured and curated data that lives in a relational database in the form of an expensive and rigid data warehouse or a data mart, a data lake is formed by flowing data from a variety of sources into a scalable repository, like say an S3 bucket that anyone can access, dive into, they can extract water AKA data from that lake and analyze data that's much more fine grained and less expensive to store at scale. The problem became that organizations started to dump everything into their data lakes with no schema on a right, no metadata, no context, just shoving it into the data lake and figure out what's valuable at some point down the road. Kind of reminds you of your attic, right? Except this is an attic in the cloud. So it's too big to clean out over a weekend. Well, look, it's 2021 and we should be solving this problem by now. A lot of folks are working on this but often the solutions add other complexities for technology pros. So to understand this better, we're going to enlist the help of Chaos Search, CEO Ed Walsh and Thomas Hazel, the CTO and founder of Chaos Search. We're also going to speak with Kevin Miller who's the vice president and general manager of S3 at Amazon Web Services. And of course, they manage the largest and deepest data lakes on the planet. And we'll hear from a customer to get their perspective on this problem and how to go about solving it. But let's get started, Ed, Thomas. Great to see you. Thanks for coming on theCUBE. Likewise. Face to face. It's really good to be here. It is nice face to face. That's great. So Ed, let me start with you. We've been talking about data lakes in the cloud forever. Why is it still so difficult to extract value from those data lakes? Good question. Data analytics at scale has always been a challenge, right? So, and it's, we're making some incremental changes as you mentioned that we need to see some step function changes. But in fact, it's the reason Chaos Search was really founded. But if you look at it, the same challenge around data, where else, or a data lake, really it's not just a flowing the data in, it's how to get insights out. So it kind of falls in a couple of areas. But the business side will always complain. And it's kind of uniform across everything in data lakes, everything in data warehousing. They'll say, hey, listen, I typically have to deal with a centralized team to do that data prep because it's data scientists and DBAs. Most of the time they're a centralized group. Sometimes they're business units, but most of the time because they're scarce resources together. And then it takes a lot of time. It's arduous, it's complicated. It's a rigid process, I have to deal with a team. Hard to add new data. But also it's hard to, you know, it's very hard to share data. And there's no way to governance without locking it down. And of course there would be more self-service. So you hear from the business side constantly. Now underneath is like, there's some real technology issues that we haven't really changed the way we're doing data prep since the 2000s, right? So if you look at it, it falls into two big areas. It's one, how to do data prep. How do you take a request comes in from a business unit? I want to do XYZ with this data. I want to use this type of tool sets to do the following. Someone has to be smart how to put that data in the right schema, you mentioned. You have to put it in the right format that the tool sets can analyze that data before you do anything. And then second thing, I'll come back to that because that's the biggest challenge. But the second challenge is how these different data lakes and data we're also gonna persisting data in the complexity of managing that data and also the cost of computing it. And I'll go through that. Basically the biggest thing is actually getting it from raw data. So the rigidness and complexity that the business sides are using it is literally someone has to do this ETL process, extract, transform, load. They're actually taking data, a request comes in, I need so much data in this type of way to put together they're literally physically duplicating data and putting it together in a schema. They're stitching together almost a data puddle for all these different requests. And what happens is any time they have to do that someone has to do it and it's very skilled resources are scant in the enterprise, right? So it's a DBAs and data scientists. And then when they want new data, you give them a set of data set they're always saying, well, can I add this data? Now that I've seen the reports, I want to add this data more fresh in the same process as it happened. This takes about 60 to 80% of the data scientists and DBAs to do this work. It's kind of well documented. And this is what actually stops the process halt. That's what is rigid. They have to be rigid because there's a process around that. That's the biggest challenge of doing this. And it takes in the enterprise weeks or months. I always say three weeks or three months and no one challenges me on that. It also takes the same skill set of people that you want to drive digital transformation, data warehousing initiatives, modernization, being data driven or all these data scientists and DBAs that you don't have enough of. So this is not only hurting you, getting insights out of your data like in data warehousing. It's also this resource constraints hurting you, actually getting. That's the smallest atomic unit is that team, that super specialized team, right? Right. Okay. So do you guys talk about activating the data lake for analytics? What's unique about that? What problems are you all solving? You know, when you guys created this magic sauce? No, and basically there's a lot of things. I highlight the biggest one is how to do the data prep. But also how you're persisting and using the data. But in the end, it's like there's a lot of challenges at how to get analytics at scale. And this is really where Thomas founded the team to go after this. But I'll try to say it simply. What we do, and I'll try to compare and stress what we do compared to what you do with maybe an elastic cluster or a BI cluster. And if you look at what we do is we simply put your data in S3. Don't move it, don't transform it. In fact, we're not, we're against data movement. What we do is we really point us at that data and we index that data and make it available in a data representation that you can give virtual views to end users. And those virtual views are available immediately over petabytes of data. And it actually gets presented to end user as an open API. So if you're elastic search user, you can use all your elastic search tools on this view. If you're a SQL user, table looker, all the different tools, the same thing with machine learning next year. So what we do is we take it and make it very simple. Simply put it there, it's already there already. Point us at it, we do the hard work of indexing and making it available, and then you publish in the open APIs your users can use exactly what they do today. So that's, dramatically, I'll give you a four and after. So let's say you're doing elastic search, you're doing log analytics at scale. They're landing their data in S3, and then they're ETL-ing. They're physically duplicating and moving data and typically deleting a lot of data to get in a format that elastic search can use. They're persisting it up in a data layer called Lucene. It's physically sitting in memories, CPU, SSDs, and it's not one of them, it's a bunch of those. They, in the cloud, you have to set them up because they're persisting. Actually, they stand up seven by 24. Not a very cost-effective way to do the cloud computing. What we do in comparison to that is literally pointing to the same S3. In fact, you can run a complete parallel. The data in S3 is being ETL-ed out. We're just one more use case. Read only, allow you to get that data and make those virtual views. So we run a complete parallel. But what happens is we just give a virtual view to the end users. We don't need this persistence layer, this extra cost layer, this extra time, cost, and complexity of doing that. So what happens is, when you look at what happens in Elastic, they have a trade-off of how much you can keep and how much you can afford to keep, and also it becomes unstable at time because you have to build out a schema. It's on a server. The more the schema scales out, guess what? You have to add more servers, very expensive. They're up seven by 24, and also they become brittle. You lose one node, the whole thing has to be put together. We have none of that cost and complexity. We literally go from keep whatever you want, whatever you want to keep on S3, a single persistence, very cost-effective. And what we're able to do is cost, we save 50 to 80% Y. We don't go with the old paradigm of sit it up on servers, spin them up for persistence, and keep them up seven by 24. We're literally asking our cluster, what do you want to keep? We bring up the right compute resources, and then we release those sources after the query done. So we can do some queries that they can't imagine at scale, but we're able to do the exact same query at 50 to 80% savings, and they don't have to do any of the toil of moving the data or managing that layer of persistence, which is not only expensive, it becomes brittle. And then it becomes, and I'll be quick. Once you go to BI, it's the same challenge, but the BI systems, the requests are constantly coming at from a business unit down to the centralized data team, give me this flavor of data. I want to use this piece of, this analytic tool in that data set. So they have to do all this pipeline. They're constantly saying, okay, I'll give you this data, this data, I'm duplicating that data, I'm moving and stitching together. And then the minute you want more data, they do the same process all over. We completely eliminate that. And those requests queue up. Thomas, Ed had me, you don't have to move the data. That's kind of the exciting piece here, isn't it? Absolutely no, I think, the day like philosophy has always been solid, right? The promise we had that Hulu hangover, right? Where let's say we were using that platform, little too many variety of ways. And so I always believed in day like philosophy when James came and coined that, I'm like, that's it. However, HCFS, that wasn't really a service. Cloud Office Storage is a service that the elasticity, the security, the durability, all that benefits are really why we founded on Cloud App Storage as a first move. So Ed was talking, Thomas, about being able to shut off essentially the compute and they have to keep paying for it. But there's other vendors out there and Snowflake does something similar to separating compute from storage that they're famous for that. And you had Databricks out there doing their lake house thing. Do you compete with those? How do you participate and how do you differentiate? Well, you know, you've heard this term, data lakes, warehouse, now lake house. And so what everybody wants is simple in, easy in. However, the problem with data lakes was complexity of out, driving value. And I said, what if, what if you could have the easy in and the value out? So if you look at, say Snowflake as a warehouse and solution, you have to do all that prep and data movement to get into that system. And then it's rigid, it's static. Now Databricks, now that lake house has the exact same thing. Now sure they have a data lake philosophy but their data ingestion's not a data lake philosophy. So I said, what if we had that simple in with a unique architecture and index technology make it virtually accessible, publishable, dynamically at petabyte scale? And so our service connects to the customer's cloud storage. They just stream the data in, set up what we call a live indexing stream and then go to our data refinery and publish views that can be consumed in elastic API, use Cabana, Grafana, or say SQL tables, look or say Tableau. And so we're getting the benefits of both sides. You know, schema on read, write performance with schema on write, read performance. And if you do that, that's the true promise of a data lake. You know, again, nothing against Hadoop but schema on read with all that complexity of software was, what was a little data swamp? Well, Hadoop got us started. Okay, so we got to give the props but everybody I talked to has got this big bunch of spark clusters now saying, all right, this doesn't scale. We're stuck. And so, you know, I'm a big fan of Jamak Tagani and her concept of the data lake and it's early days but if you fast forward to the end of the decade, you know, what do you see as being the sort of critical components of this notion of people call it data mesh but you got the analytic stack. You're a visionary, Thomas. How do you see this thing playing out over the next decade? I love her thought leadership. To be honest, our core principles were her core principles now, you know, five, six, seven years ago. And so this idea of, you know, decentralized data as a product, you know, self-serve and federated computer governance. I mean, all that was our core principle. The trick is how do you enable that mesh philosophy? I could say we're mesh ready. Meaning that, you know, we can participate in a way that very few products can participate. If there's gates, data into your system, the CTLing, the schema management, my argument with the data mesh is like producers and consumers have the same rights. I want the consumer, people that choose how they want to consume the data as well as the producer publishing it. I could say our data refinery is that answer. You know, shoot, I love to open up a standard, right? Where we can really talk about the producers and consumers and the rights each others have I think she's right on in the philosophy. I think as products mature in this cloud and this data lake capabilities, the trick is those gates. If you have to structure up front, if you set those pipelines, you know, the chance of you getting your data into a mesh is the weeks and months that Ed was mentioning. Well, I think the problem with data mesh today is the lack of standards. You got, you know, when you draw the conceptual diagrams, you got a lot of lollipops, which are APIs, but they're all unique primitives. So there aren't standards by which to your point, the consumer can take the data the way he or she wants it and build their own data products without having to tap people in the shoulder to say, how can I use this? Where does the data live? And being able to add their own data, that's kind of the future. You're exactly right. So I'm an organization. I'm generating data. Wouldn't be great just to stream it to a lake. And then the service, okay, our search service is the data's discoverable and configurable by the consumer. Let's say you want to go to the course store. You know, I want to make a certain meal tonight. I want to pick and choose what I want, how I want it. Imagine if the data mesh truly can have that proof of information, you know, all the things you can buy at a grocery store and what you want to make for dinner. And if it's static, if you have to call up your producer to do the change, was it really a data mesh-enabled service? I would argue not. Ed, bring us home. Well, maybe one more thing with this. Yeah, please, yeah. Because some of this is we talk in 2031, but largely these principles are what we have in production today, right? So even the self-service where you can actually have business context on top of a data lake. We do that today. We talked about, we get rid of the physical ETL, which is 80% of the work, but the last 20% is done by this refinery where you can do virtual views, the right R back and do all the transformation need and make it available. But also that's available to, you can actually give that as a role-based access service to your end users, actually your analysts. And you don't have to be a data scientist or DBA. In the hands of a data scientist or DBA, it's powerful, but the fact of the matter you don't have to. In fact, all of our employees, regardless of seniority, if they're in finance or in sales, they actually go through and learn how to do this. So you don't have to be it. So part of that, and they can come up with their own view, which that's one of the things about data lakes, the business unit wants to do it themselves, but more importantly, because they have that context of what they're trying to do. Instead of queuing up a very specific request that takes weeks, they're able to do it themselves. And if I have to put in different data stores and ETL, I can do things in real time or near real time. And that's game-changing and something we haven't been able to do ever. And then maybe to wrap it up, listen, eight years ago, Thomas and his group of founders came up with a concept. How do you actually get after analytics scale and solve the real problems? And it's not one thing. It's not just getting S3. It's all these different things. And what we have in market today is the ability to literally just simply stream it to S3. By the way, simple to do, what we do is automate the process of getting the data in a representation that you can now share and augment. And then we publish open API so they can actually use the tools you want. First use case log analytics. Hey, it's easy to just stream your logs in and we give you elastic search type of services. Same thing now with SQL. You see machine learning next year. So listen, I think we have the data lake 3.0 now and we're just stretching our legs for a lot of fun. Well, you have to say log analytics, but I really do believe in this concept of building data products and data services because I want to sell them. I want to monetize them and being able to do that quickly and easily so I can consume them as the future. So guys, thanks so much for coming on the program. Really appreciate it. All right, in a moment, Kevin Miller of Amazon Web Services joins me. You're watching theCUBE, your leader in high tech coverage. This is Thomas Hazel, founder, CTO here at KS Search. I'm going to demonstrate a new feature we are offering this quarter called JSON flex. If you're familiar with JSON data sets, there are wonderful ways to represent information. They're multi-dimensional. They have ability to set up arrays as attributes, but those arrays are really problematic when you need to expand them or flatten them to do any type of elastic search or relational access, particularly when you try and do aggregations. And so the common process is to exclude those arrays or pick and choose that information. But with this new KS Flex capability, our system uniquely can index the data horizontally in a very small and efficient representation. And then with our KS refinery, expand each attribute as you wish vertically so you can do all the basic and natural constructs you would have done if you had straightforward two-dimensional, three-dimensional type representation. So without further ado, I'm gonna get into this presentation of JSON flex. Now in this case, I've already set up the service to point to a particular SRE account that has cloud trail data. One that is pretty problematic when it comes down to flattening data. And again, if you know cloud trail, one row can become 10,000 as data gets flattened. So when you first log into the KS search service, you'll see a tab called storage. This is the SRE account and I have a variety of buckets. I have a refinery, it's a data refinery. It's just where we create views or lenses into these index streams that you can do analysis that publishes it in Elastic API as an index pattern or relational table in SQL. A particular bucket I have here is a whole bunch of demonstration data sets that we have to show off our capabilities and our offering. In this bucket, I have cloud trail data and I'm gonna create what we call a object group. An object group is a entry point, a filter of which files I want to index that data. Now it can be statically there or a live stream in. These object groups have the ability to say what type of data do you want to index on? Now through our wizard, you can type in, prefix in this case, I wanna type in cloud trail and you see here I have a whole bunch of cloud trail. I'm gonna choose one file to make it quick and easy but this particular cloud trail data will expand and we can show the capability of this horizontal to vertical expansion. So I walk through the wizard, as you can see here, we discovered JSON. It's a GZIP file, leave flattening unlimited because we want to be able to expand infinitely but this case, instead of doing default virtual, I'm gonna horizontally represent this formation and this uniquely compresses the data in a way that can be stored efficiently on disk but then expanded in our data refinery on PON query or search requests. So I'm gonna create this object group. Now I'm gonna call this JSON flex test and I could set up live indexing SQS PubSub but I'm gonna skip that and skip retention and just create it. Once this object group is created, you kinda gotta think of it as a virtual bucket because it does filter the data as you can see here when I look at the view, I just see cloud trail but within the console, I can say start indexing. Now this is static data there, it could be a live stream and we set up workers to index this data. Whether it's one file, a million files or one terabyte or one petabyte, we index the data. We discover all the schema and as you can see here, we discovered 104 columns. Now what's interesting is that we represent this expansion in a horizontal way. If you know cloud trail, record zero, record one, record two, this can expand pretty dramatically if you fully flatten it but this case we horizontally represented as the index. So when I go into the data refinery, I can create a view. Now if you know the data refinery of KS search, you can bring multiple data streams together. You can do transformations virtually, you can do correlations but in this case, I'm just gonna take this one particular index stream we call JSON flex and walk through our wizard, we try to simplify everything and select a particular attribute to expand. Now again, we represent this in one row but if you had a raise and do all the permutations, it could go one to 100 to 10,000. We had one JSON object that went from one row to one million rows. Now clearly you don't wanna create all those permutations when you're trying to put it into a database. With our unique index technology, you can do it virtually and store it horizontally. So let me just select virtual and walk through the wizard. Now as I mentioned, we do all these different transformations, change schema, we're gonna skip all that and select the order time records event and say create this. I'm gonna say, JSON flex view. I can set up caching, do a variety of things, I'm gonna skip that. And once I create this, it's now available in the Elastic API as an index pattern as well as SQL via our Presto API dialect and you can use Looker, Tableau, et cetera. But in this case, we go to this Alex tab and we've built in the Kibana open search tooling that is Apache 2.0. And I click on discovery here and I'm gonna select that particular view. Again, looks like, oops, looks like an index pattern. And I'm gonna choose, let's see here, let's choose 15 years from past and present to make sure I find where actually it was timed. And what you'll see here is, sure, it's just one particular data set has a variety of columns, but you see here is unlike that records zero, records one, now it's expanded. And so it has been expanded like a vertical flattening that you would traditionally do if you wanted to do anything that was in Elastic or relational construct to fit into a table format. Now, the advantage of JSON flex, you don't have that stored as a blob and use these proprietary JSON APIs. You can use your native Elastic API or your native SQL tooling to get access naturally without that expense of that explosion or without the complexity of ETL-ing it and picking and choosing before you actually put it into the database. That completes the demonstration of KSurch's new JSON flex capability. If you're interested, come to ksurch.io and set up a free trial. Thank you. Welcome back. I really like the drill down of data lakes with Ed Walsh and Thomas Hazel. They're building some cool stuff over there. In the data lake, we see it's evolving and KSurch has built some pretty cool tech to enable customers to get more value out of data that's in lakes so that it doesn't become stagnant. Time to dig deeper, dive deeper into the water. We're here with Kevin Miller who's the vice president and general manager of S3 at Amazon Web Services. We're going to talk about activating S3 for analytics. Kevin, welcome, good to see you again. Yeah, thanks, Dave. It's great to be here again. So S3 was the very first service offered by AWS 15 years ago. We covered that out in Seattle. It was a great event you guys had. It has become the most prominent and popular example of object storage in the marketplace. And for years, customers use S3 as simple, cheap data storage, but because there's so much data now stored in S3, customers are looking to do more with the platform. So Kevin, as we look ahead to reinvent this year, we're super excited about that. What's new? What's got you excited when it comes to the AWS flagship storage offering? Yeah, Dave, well, that's right. And we're definitely looking forward to reinvent. We have some fun things that we're planning to announce there. So stay tuned on those. But I'd say that one of the things that's most exciting for me as customers do more with their data and look to store more, to capture more of the data that they're generating every day is our storage class that we had announced a few years ago, but we actually just announced some improvements to the S3 intelligent tiering storage class. And this is really our storage class, the only one in the cloud at this point that delivers automatic storage cost savings for customers where the data access patterns change. And that can happen, for example, as customers have some data that they're collecting and then a team spins up and decides to try to do something more with that data. And that data that was very cool and sitting sort of idle is now being actively used. And so with intelligent tiering, we're automatically monitoring data and then there's for customers, there's no retrieval costs and no tiering charges. We're automatically moving the data into an access tier that reduces their cost though when that data is not being accessed. So we've announced some improvements to that just a few months ago. And I'll just say, look forward to some more announcements and reinvent that will continue to extend what we have in our intelligent tiering storage class. That's cool, Kevin. I mean, you've seen that technology, that tiering concept that's been around, but since back in the mainframe did, the problem was it was always inside a box. So you didn't have the scale of the cloud and you didn't have that automation. So I want to ask you, as the leader of S3, that business, when you meet with customers, Kevin, what do they tell you that they're facing as challenges when they want to do more, get better insights out of all that data that they've moved into S3? Well, I think that's just it, Dave. I think that most customers I speak with, they, of course, they have the things that they want to do with their storage costs and reducing storage costs and just making sure they have capacity available. But increasingly, I think the real emphasis is around business transformation. What can I do with this data that's very unique and different that either, unlike prior optimizations where it would just reduce the bottom line, they're saying, what can I do that will actually drive my top line more by either generating new product ideas, allowing for faster closed-loop process for acquiring customers. And so it's really that business transformation and everything around it that I think is really exciting. And for a lot of customers, that's a pretty long journey and helping them get started on that, including transforming their workforce and upskilling parts of their workforce to be more agile and more oriented around software development and developing new products using software. So when I first met the folks at Chaos Search, Thomas took me through sort of the architecture with Ed as well. They had me, you don't have to move your data. That was the grabber for me. And there are a number of public customers, Digital River, Blackboard, or Klarna, we're going to get the customer perspective a little later on than others, that use both AWS S3 and Chaos Search and they're trying to get more out of their S3 data and execute analytics at scale. So I wonder if you could share with us, Kevin, what types of activities and opportunities do you see for customers like these that are making the move to put their enterprise data in S3 in terms of capabilities and outcomes that they are trying to achieve and are able to achieve beyond using S3 is just a bit bucket. Right. Well, Dave, I think you hit the nail on the head when you talk about outcomes, because that I think is key here. Customers want to reduce the time it takes to get to a tangible result that affects the business that improves their business. And so that's one of the things that excites me about what Chaos Search is doing here specifically is that automatic indexing and being able to take the data as it is in their bucket, index it and keep that index fresh and then allow for the customers to innovate on top of that and to try to experiment with a new capability, see what works and then double down on the things that really do work to drive that business. And so I just think that that capability reduces the amount of what I might call undifferentiated heavy lifting, the work to just sort of index, organize and catalog data and instead allow customers to really focus on, here's the idea, let's try to get this into production or into a test environment as quickly as possible to see if this can really drive some value for our business. Yeah, so you're seeing that sort of value that you mentioned, the non-differentiated heavy lifting, moving up the stack, right? Used to just be provisioning and managing the storage. Now it's all the layers above that and we're going beyond that. So my question to you, Kevin, is how do you see the evolution of all this data at scale? I'm especially interested as it pertains to data that's of course in S3, which is your swim lane. When you talk to customers who want to do more with their data and analytics, by the way, even beyond analytics, we're having conversations now in the community about building data products and creating new value, but how do you respond and how do you see chaos search fitting in to those outcomes? Well, I think that's it, Dave. It's about kind of going up the stack and instead of spending time organizing and cataloging data, particularly as the data volumes get much larger when modern customers and modern data leaks that we're seeing quickly go from a few petabytes to tens to hundreds of petabytes or more. And when you're reaching that kind of scale of data, it's a single person can reasonably wrap their head around all that data. You need tools, S3 provides a number of first-party tools and we're investing in things like our S3 batch operations to really help give the end users of that data, the business owners that leverage to manage their data at scale and apply their new ideas to the data and generate pilots and production work that really drives their business forward. And so I think that, you know, chaos search again, I would just say is a good example of the kind of software that I think helps go up stack, automate some of that data management and just help customers focus really specifically on the things that they want to accomplish for their business. So this is really important. I mean, we've talked for well over a decade how to get more value out of data and it's been challenging for a lot of organizations, but we're seeing themes of scale, automation, fine-grain tooling, ecosystem participating on top of that data and then extracting that data value. Kevin, I'm really excited to see you face-to-face and reinvent and learn more about some of the announcements that you're going to make, we'll see you there. Yeah, stay tuned, looking forward to seeing in person. Absolutely. All right, great to have Kevin on. Keep it right there because in a moment we're going to get the customer perspective on how a leading practitioner is applying chaos search on top of S3 to create a business value from data. You're watching theCUBE, you're a leader in digital high-tech coverage. Hi everyone, I'm really excited to be here today. My name is Jimmy McDermott. Excited to be talking about log analytics and how much chaos search has helped us scale our data lake. So just by way of background for Transio, our overarching mission is to eliminate the pencil and paper gaps in educational systems. And what that looks like in reality is storing a lot of data for school districts because everything that's on paper right now can be converted to some kind of electronic digital process. Now we're part of a new ed tech product category that's been emerging over the last few years called readiness solutions. We pulled together all of these disparate data points that schools are housing on students and show it to students in a really consumable and digestible way for them to understand how close am I to graduation? What am I falling off track by picking a particular class or what have you? And so by doing that, you can just kind of start to grasp the sheer amount of data that we're pulling in per student, per district, across the country at scale and why logging started to become really, really critical for us. When it comes to just the logs themselves is actually pretty simple, but the infrastructure and the requirements around it are not simple. We have one big monolithic service, but we've got many different types of logging outputs. So things that are coming from our database driver, things that are coming directly from our application layer, our networking layer, and all of those are coming into currently kind of a central repository. We offer retention for data and for logs up to our longest customers requirement. So our longest customers data requirement right now is holding on to data seven years post-graduation. Before chaos search, we had kind of this mismanaged way of bringing all these different items together. It was truly a mess. Like we were really kind of at our wits end looking for a solution that was gonna actually bring all this stuff together. We did consider spinning up a self-managed Elk stack. It really struggles at scale with that retention and that historical data. It's fine for spinning something up to analyze really hot data that's hot for like a day. And then it needs to get flushed out of that system so that it can stay hot and stay cost effective because standing up those stacks yourself is something that was just gonna break the bank for us. So we were truly lost looking for the right solution. And then perhaps most importantly, in a sense it couldn't break the bank. Chaos search met all of those needs and then more. We stream our logs directly from our Kubernetes infrastructure right into our S3 buckets, which is amazing by the way, because when we were setting up our new DevOps environment, we had engineers basically saying like, why would we do that? Like why not just ship it to this? Like why go to the extra effort of setting up a Fluent D connector to move things into S3? And they're all sold now. It didn't take long for them to really see the value of why we were doing that. And then the cool thing is that we don't really have to worry about those retention policies being managed by us anymore because S3 has all of that built in. Our developers can actually iterate faster now because they're able to access real life production logs around certain features and around certain capabilities that they previously couldn't. And so they can actually make decisions about new architecture components or refactoring that are backed up by data. And that's really at the core of everything we're doing. On a super tangible level, we actually some recent technical diligence that we had went way faster because we own our logs. Usually that's not something that EdTech companies are really thinking about. And so making this move actually led to a faster turnaround time for us on that tech diligence, which was really exciting for the cost savings that you get for a solution like Chaos Search. And then the fact that you layer on those enterprise type of features like RBAC and SSO and these other things that are part of the platform that with a different company, you would pay ridiculous amounts of money for. That's incredibly appealing for a company that is dealing with intense data security and data governance requirements, but also not a super big company, right? We can't afford enterprise contracts. So this is exactly right. And it's exactly one of the reasons that we were so drawn to Chaos Search. Okay, we're back with Mark Hill, who's the director of IT operations at Digital River. Mark, welcome to theCUBE, good to see you. Oh, thanks for having me. I really appreciate it. Hey, tell us a little bit more about Digital River. People know you as a payment platform. You've got marketing expertise. How do you differentiate from other e-commerce platforms? Well, I don't think people realize it, but Digital River responded about 27 years ago, primarily as a one-stop shop for e-commerce, right? And so we offered site development, hosting, order management, fraud, expert controls, tax, physical and digital fulfillment, as well as multi-lingual customer service, advanced reporting and email marketing campaigns, right? So it was really just kind of a broad base for e-commerce. People could just go there, didn't have to worry about anything. What we found over time as e-commerce has matured, we've really pivoted to a more focused API offering, specializing in just our global seller services. And to us, that means payment, fraud, tax and compliance management. So our global footprint allows companies to outsource that risk management and expand their markets internationally very quickly and with a low cost of entry. Yeah, it's an awesome business and to your point, you were founded way before there was such a thing as the modern cloud, and yet you're a cloud native business. Yeah. Which I think talks to the fact that incumbents can evolve, they can reinvent themselves from a technology perspective. I wonder if you could first paint a picture of how you use the cloud, you use AWS, I'm sure you got S3 in there, maybe you could talk about that a little bit. Yeah, exactly. So when I think of a cloud native business, you kind of go back to the history. Well, 27 years ago there wasn't a cloud, right? There wasn't any public infrastructure. It was, we've basically stood our own data center up in a warehouse. And so over our history, we've managed our own infrastructure and co-located data centers over time through acquisitions and just how things worked. There was over 10 data centers globally. For us, it was expensive, both from a software hardware perspective as well as getting the operational teams and expertise up to speed too. So, and it was really difficult to maintain and ultimately not core to our business, right? Nowhere in our mission statement doesn't say that our goal is to manage data centers. So about five years ago, we started the journey from our hosted into AWS. It was 100% lifted and shift plan and we were able to complete that migration a little over two years, right? Amazon really just fit for us. It was a natural place for us to land and they made it really easy here for us to, not to say it wasn't difficult, but once in the public cloud, we really adopted a cloud-first vision, meaning that will not only consume their infrastructure as a service, but will also purposely evaluate and migrate to software as a service. So I come from a database background. So an example would be migrating from self-deployed and managed relational databases over to AWS, RDS relational database service. You're able to utilize the backups, the standby and the patching tools, automatically with a click of the button and that's pretty cool. And so we moved away from the time-consuming operational tasks and really put our resources into revenue-generating products, like pivoting to an API offering. I always like to say that we stop being busy and start being productive. I love that. That's really what the cloud has done for us. Is that what you mean by cloud native? I mean, being able to take advantage of those primitives and native APIs, what does that mean for your business? Yeah, exactly. I think, well, the first step for us was just to consume the infrastructure, right? But now we're looking at targeted services that they have in there too. So we have our data stream of services. So log analytics, for example, we used to put it locally on the machine. Now we're just dumping into an S3 bucket and then we're using Kinesis to consume that data, put it in Elastic and go from there. And none of the services are managed by DigiRiver. We're just utilizing the capabilities that AWS has there too. So. And as an e-commerce player, retail company, we were ever concerned about moving to AWS as a possible competitor or did you look at other clouds? Well, can you tell us about that? Yeah, and so I think e-commerce is really mature, right? And so we got squeezed out by the Amazon's of the world. It's just not something that we were doing, but we had really a good area of expertise for their global seller services. But so we evaluated Microsoft, we evaluated AWS as well as Google. And back when we did that, Microsoft was Windows based, Google was just coming into the picture, really didn't fit for what we were doing. But Amazon was just a natural fit. So we made a business decision, right? It was financially really the best decision for us. And so we didn't really put our feelings into it, right? We just had to move forward and it's better than where we're at. And we've been delighted actually. Yeah, makes sense. Best cloud, best tech. Yeah. Yeah, I want to talk about Chaos Search. A lot of people describe it as a data lake for log analytics. Do you agree with that? You know, what does that even mean? Yeah, well, from our perspective, because our self-managed solutions were costly and difficult to maintain. You know, we had older versions of self-deployed using Splunk, other things like that too. So over time, we made a conscious decision to limit our data retention in generally seven days. But in a lot of cases, it was zero. We just couldn't consume that log data because of the cost in 2018 in it. So because of this limit, we've lost important data points used for incident triage, problem management, trending and other things too. So Chaos Search has offered us a manageable and cost-effective opportunity to store months or even years of data that we can use for operations as well as trending automation. And really the big thing that we're pushing into is an event-driven architecture so that we can proactively manage our services. Yeah, you mentioned Elastic. So I know, I've talked to people who use the Elk Stack, they say there's these exponential growth in the amount of data, so you have to cut it off at whatever. I think you said seven days or less. You're saying you're not finding that with Chaos Search? Yeah, yeah, exactly. And that was one of the huge benefits here too. So, you know, we were losing out if there was a lower priority incident, for example, and people didn't get to it till eight, nine days later. Well, all the breadcrumbs are gone. So it was really just kind of the best guess or the incident really wasn't resolved. We didn't find a real cause. Yeah, like my video camera down at my other house is when somebody breaks in, and I don't find out for two weeks and then the video's gone. So it's that kind of same thing. So how do you, can you give us some more detail on how you use your data lake and Chaos Search specifically? Yeah, yeah, yep. And so there's many different areas, but what we found is we were able to easily consolidate data for multiple regions into a single pane of glass to our customers, so internal and externally. You know, it relieved us of that operational support for the data extract transformation low process, right? It offered us also a seamless transition for the users who were familiar with Elastic Search, right, it wasn't difficult to move over. And so these are a lot of selling points benefits. And so now that we have all this data that we're able to capture and utilize, gives us an opportunity to use machine learning, predictive analysis and like I said, you know, driving to an event driven architecture. So that's really what it's offered and it's been a huge benefit. So you're saying you can speak the language of Elastic, you don't have to move the data out of an S3 bucket and you can scale more easily, is that right? Yeah, yeah, absolutely. And so for us, just because we're running in multiple regions to drive more high availability, having that data available from multiple regions in a single pane of glass or a single way to utilize it is a huge benefit as well, just to not to mention actually having the data. What was the initial catalyst to sort of rethink what you were doing with log analytics? Was it cost, was it flexibility scale? There was, I think all of those went into it. One of the main drivers, so last year we had a huge project. So we have our Elk Stack and it's probably from a decade ago, right? And, you know, version point or two or something, you know, anyways, it's very old. And we went through a whole project to get that upgraded and migrated over and it was just, we found it impossible internally to do, right? And so this was a method for us to get out of that business to get rid of the security risks, the support risk and have a way for people to easily migrate over. And it was just a nightmare here, consolidating the data across regions. And so that was a huge thing. But yeah, it was also then the cost, right? It was, we're finding it cheaper to use Chaos Search and have more data available versus what we're doing currently in the US. Got it. I wonder if you could share maybe any stories that you have or examples that underscore the impact that this approach to analytics is having in your business. Maybe your team's everyday activities, any metrics you can provide or even just anecdotal information. Yeah, yeah. And I think, you know, one, coming from an Oracle background here, so did you ever historically, it's been an Oracle shop, right? And we've been developing or reporting an analytics environment on Oracle and that's complicated and expensive, right? We had to use advanced features in Oracle, like partitioning materialized views and bring in other supporting software like Informatica and Hyperion S-Base, right? And all of these required a large team with a wide set of expertise into these separate focus areas, right? And the amount of data that we were pushing into Chaos Search would simply have overwhelmed this legacy method for data analysis and a relational database, right? And not to mention the human toll of the stress of supporting that Oracle environment into 24 by 7 by 365 environment, you know, which requires literal or no downtime. So just that alone is a huge thing. So it's allowed us to break away from Oracle. It's allowed us to use new technologies that make sense to solve business solutions. You know, Chaos Search is a really interesting company to me, I'm sure, like me, you see a lot of startups. I'm sure they're knocking on your door every day. And I always like to say, okay, where are they going after? Are they going after a big market? How are they getting product market fit? And it seems like Chaos Search has really looked at hard at log analytics and kind of maybe disrupting the Elk stack. But I see, you know, other potential use cases, you know, beyond analyzing logs. I wonder if you agree. Are there other use cases that you see in your future? Yeah, exactly. So I think there's, well, one area would be Splunk, for example, we have that here too. So we use Splunk versus, you know, flat file analysis or other ways to capture that data, just because from a PCI perspective, it needs to be secured for our compliance and certification, right? So Chaos Search allows us to do that. There's different types of authentication, really a hodgepodge of authentication that we used in our old environment. But Chaos Search has a more easily usable one, one that we can set up, one that can really segregate the data and allows us to satisfy our PCI requirements too. So, but Splunk, but I think really, you know, deprecating all of our elastic search environments, our homegrown ones, but then also taking a hard look at what we're doing with relational databases, right? 27 years ago, there was only relational databases, Oracle and SQL Server. So we've been logging into those types of databases, and that's not cost-effective, it's not supportable. And so really getting away from that and putting the data where it belongs, and it's easily accessible in a secure environment allowing us to push our business forward. And when you say where the data belongs, it sounds like you're putting it in the BitBucket S3, leaving it there, is it's the most cost-effective way to do it, and then sort of adding value on top of it. That's what's interesting about Chaos Search to me. Yeah, exactly. Yep, yep. Versus the high-price storage, you know, that you have to use for a relational database, you know, and not to mention the standby as the backup. So, you know, you're duplicating, duplicating all this data here too in an expensive manner. So, yeah. Yeah, copy-create, you're moving data around, it gets expensive. It's funny what you say about databases, it's true. Database used to be such a boring market, now it's exploded, then you had the whole no-SQL movement, and SQL became the killer app, you know? It's like a full circle, right? Yeah, yeah, exactly. Well, anyway, good stuff, Mark, really appreciate you coming on theCUBE and sharing your perspectives. Would love to have you back in the future. Oh yeah, yeah, no problem. Thanks for having me, I really appreciate it. Yeah, our pleasure. Okay, in a moment, I'll have some closing thoughts on getting more value out of your growing data lakes. You're watching theCUBE, you're a leader in high-tech coverage. Innovation, impact, influence. Welcome to theCUBE, disruptors, developers, and practitioners, learn from the voices of leaders, who share their personal insights from the hottest digital events around the globe. Enjoy the best this community has to offer on theCUBE, your global leader in high-tech digital coverage. Okay, so that's a wrap. You know, we're seeing a new era in data and analytics. For example, we're moving from a world where data lives in a cloud object store and needs to be extracted, moved into a new data store, transformed, cleansed, structured into a schema, and then analyzed. This cumbersome and expensive process is being revolutionized by companies like Chaos Search that leave the data in place and then interact with it in a multilingual fashion with tooling that's familiar to analytic pros. You know, I see a lot of potential for this technology beyond just log analytics use cases, but that's a good place to start. You know, really, if I project out into the future, we see a trend of the global data mesh really taking hold. Where a data warehouse or a data hub or a data lake or an S3 bucket is just a discoverable node on that mesh. And that's governed by automated computational processes. And I do see Chaos Search as an enabler of this vision. You know, but for now, if you're struggling to scale with existing tools or you're forced to limit your retention because data is exploding at too rapid a pace, you might want to check these guys out. You can schedule a demo just by clicking the button on the site to do that, or stop by the Chaos Search booth at AWS re-invent. The Cube is going to also be there. We'll have two sets, a hundred guests. I'm Dave Vellante. You're watching The Cube, your leader in high-tech coverage.