 Welcome to theCUBE's conversation. I'm John Furrier, host of theCUBE here in Palo Alto, California in our studios. We got a great conversation around open data lake analytics on AWS two grade companies, Ahana and Securonix. Dipti Burkar, co-founder and chief product officer at Ahana's here. Great to see you and Derek Harsie, chief architect of Securonix. Thanks for coming on. I really appreciate you guys spending the time. Yeah, thanks so much, John. Thank you for having us and Derek, hello again. We had a great conversation around our startup showcase which you guys were featured last month, this year, 2021. The conversation continues and a lot of people are interested in this idea of open systems, open source, obviously, open data lakes. There's really driving a lot of value, especially with machine learning and whatnot. So this is a key, key point. So can you guys just take a step back before we get under the hood and set the table on Securonix and Ahana? What's the big play here? What is the value proposition? Well, sure, I'll give a quick update. Securonix has been in the security business. First, user and entity behavioral analytics and then the next generation SIM platform for 10 years now. And we really need to take advantage of some cutting edge technologies in the open source community and drive adoption momentum that we can not only bring in data from our customers that they can find security threats, but also store in a way that they can use for other purposes within their organization. That's where the open data lake is very critical. Yeah, and to add on to that, John, what we've seen, traditionally, we've had data warehouses, right? We've had operational systems move all the data into the warehouse. And while these systems are really good built for good use cases, the amount of data is exploding. The types of data is exploding, different types, semi-structured, structured. And so when, as companies like Securonix in the security space, as well as other verticals, look for getting more insights out of their data, there's a new approach that's emerging where you have a data lake which AWS has revolutionized with S3 and commoditized and there's analytics that's built on top of it. And so we're seeing a lot of good advantages that come out of this new approach. Well, it's interesting, EC2 and S3 having their 15th birthday, as they say in Amazon, it's interesting, teenage years. But why I got you guys here, I want to just ask you, can you define the SIM thing? Because the SIM market is exploding, it just changed a little bit. I'll see it's data event management. But again, as data becomes more proliferated and it's not stopping anytime soon, as cloud native applications emerge, why is this important? What is the SIM category? What's it about? Yeah, thanks, I'll take that. So obviously, SIM traditionally has been around for about a couple of decades and it really started with first log collection and management and rule-based threat detection. Now what we call next generation SIM is really the modernization of a security platform that includes streaming threat detection and behavioral analysis and data analytics. We literally look for thousands of different threat detection techniques and we chain together sequences of events and we stream everything in real time and it's very important to find threats as quickly as possible. The momentum that we see in the industry is we see massive sizes of customers. We have made a transition from on-premise to the cloud and we literally are processing tens of petabytes of data for our customers and it's critical that we can ingest data quickly, find threats quickly and allow customers to have the tools to respond to those security incidents quickly and really get the handle on their security posture. Derek, I asked you what's different about this next gen SIM, what would you say? What's the big aha? What's the moment there? What's the key thing? The real key is taking off the boundaries of scale. We want to be able to ingest massive quantities of data. We want to be able to do instant threat detection and we want to be able to search on the entire forensic data set across all of the history of our customer base. In the past, we had to make sacrifices either on the amount of data we ingested, the amount of time that we stored that data and really the next generation SIM platform is offering advanced capabilities on top of that data set because those boundaries are no longer barriers for us. Dipti, any comment before I jump into the question for you? Yeah, absolutely, it is about scale and like I mentioned earlier, the amount of data is only increasing and it's also the types of information. So the systems that were built to process this information in the past are support maybe terabytes of data, right? And that's where new technologies, open source engines like Presto come in which were built to handle internet scale. Presto was kind of created at Facebook to handle these petabytes that Derek is talking about that every industry is now seeing where we're moving from gigs to terabytes to petabytes and that's where the analytics stack is moving. That's a great segue. I want to ask you why I got you here because this is again the fact of definitions because people love to hear the expert's way in. What is open data lake analytics? How would you define that? And then talk about where Presto fits in. Yeah, that's a great question. So the way I define open data lake analytics is if you have a data lake on the core which is let's say S3, it's the most popular one but on top of it, there are open aspects. It is open formats. Open formats play a very important role because you can have different types of processing. It could be SQL processing, it could be machine learning, it could be other types of workloads, all work on these open formats versus a proprietary format where it's locked in. It's open interfaces. Open interfaces that are like SQL, JDBC or DBC is widely accessible to a range of tools and so it's everywhere. Open source is a very important part of it as companies like Securonix pick these technologies for the mission critical systems. They want to know that this is going to be available and open for them for a long period of time and that's why open source becomes important. And then finally I would say open cloud because at the end of the day, while AWS is where a lot of the innovation is happening, a lot of the market is, there are other clouds and open cloud is something that these engines were built for, right? So that's how I define open data lake analytics. It's analytics with query engines built on top of these open formats, open source, open interfaces and open cloud. Now Presto comes in where you want to find the needle in the haystack, right? And so when you have these deep questions about where did the thread come from or who was it, right? You have to ask these questions of your data and Presto is an open source distributed SQL engine that allows data platform teams to run queries on their data lakes in a high performance rates in memory and on these petabytes of data. So that's where Presto fits in. It's one of the de facto query engines for SQL analysis on the data lake. So hopefully that answers the question and gives more context. Yeah, I mean the drill about data lakes has been you don't want to be a data swamp, right? That's what people don't want. That's right. At the same time, needle in the haystack, it's like big data is like a needle in a haystack, a needle. So there's a constant struggle to getting that data, the right data at the right time. And what I learned in the last presentation, you guys both presented, your teams presented at the conference was the managed service approach. Could you guys talk about why that approach works well together with you guys? Because I think, and when people get to the cloud, they replatform them, they start refactoring and data becomes a real big part of that. Why is the managed service a best approach to solving these problems? So. Yeah, and it just didn't be both, like you're on Xandahana have a managed service approach. So maybe Derek can go first and I can go after. Yeah, yeah, I'll be happy to go first. You know, we really have found, you know, making the transition over the last decade from often premise to the cloud for the majority of our customers that, you know, running a large open data lake requires a lot of different skill sets. And there's hundreds of technologies in the open source community to choose from. And to be able to choose the right blend of skill sets and technologies to produce a comprehensive service is something that customers can do, many customers did do, and it takes a lot of resources and effort. So what we really want to be able to do is take and package up our security service, our next generation SIM platform to our customers where they don't need to become experts in every aspect of it. Now an underlying component of that for us is how we store data in an open standards way and how we access that data in an open standards way. So just like we want our customers to get immediate value from the security services that we provide, we also want to be able to take advantage of a search service that is offered to us and supported by a vendor like Ahana where we can very quickly take advantage of that value within our core underlying platform. So we really want to be able to make a frictionless effort to allow our customers to achieve value as quick as possible. That's great stuff. And on the Ahana side, opening data lakes, really the ease of use there. It sounds easy to me, but we know it's not easy just to put data in a data lake. At the end of the day, a lot of customers want simplicity because they don't have the staffing. This comes up a lot. How do you leverage their open source participation and or getting us stood up quickly so they can get some value? Because that seems to be the number one thing people want right now. Dipty, how does that work? How do people get value quickly? Yeah, absolutely. You know, when you talk about these open source engines like Presto and others, right? They came out of these large internet companies that have a lot of distributed systems engineers, PhDs, very kind of advanced level teams and they can manage these distributed systems build into onto them add features at large scale, but not every company can. And these engines are extremely powerful. So when you combine the power of Presto with the cloud and a managed service, that's where value for everyone comes in. And that's what I did with Ahana is looked at Presto, which is a great engine, but converted it into a great user experience so that whether it's a three-person platform team or a five-person platform team, they still get the same benefit of Presto that a Facebook gets, but at much, much a less operational complexity cost as well as the ability to depend on a vendor who can then drive the innovation and make it even better. And so that's where managed services really come in. There's thousands of query parameters that need to be tuned. With Ahana, you get it out of the box. So you have the best practices that are followed at these larger companies. Our team comes from Facebook, Uber and others and you get that out of the box with a few clicks you can get up and running. And so you see value immediately. In 30 minutes, you're up and running and you can query your data lake versus with Hadoop and these prior systems, it would take months to see real value from some of these systems. Yeah, we saw the Hadoop scar tissue is all great and all good now, but if it takes too much resource, standing up clusters, managing it, you can't hire enough people. I got to ask you while you're on that topic, you guys ship templates, how do you solve the problem out of the box? You mentioned some out-of-the-box capability. Do you guys think of as recipes, templates, what's your thoughts around what you're providing customers to get up and running? Yeah, so in the case of Securonix, right? Let's say they want to create a Presto cluster. They go into our SaaS console, you essentially put in the number of nodes that you want, number of workers you want. There's a lot of additional value that we've built in like caching capabilities if you want more performance, built in cataloging that's again another single click. And there isn't really as much of a template. Everybody gets the best tuned Presto for their workloads. Now there are certain workloads where you might have interactive in some cases, or you might have transformation, batch, ETL, and what we're doing next is actually giving you the knobs so that it comes pre-tuned for the type of workload that you want to run versus you figuring it out. And so that's what I mean by out-of-the-box, but you don't have to worry about these configuration parameters. You get the performance. And maybe Derek can talk a little bit about the benefits of the managed service and the usage as well. Yeah, absolutely. So I'll answer the same question, then I'll tie back to what Dip asked. Really, our customers, we want it to be very easy for them to ingest security event logs. And there's really hundreds of types of security event logs that we support natively out of the box. But the key for us is a standard that we call the open event format. And that is a normalized schema. We take any data source in its normalized format, be it a collector device a customer uses on-premise, to send the data up to our cloud. We do streaming analysis and data analytics to determine where the threats are. And once we do that, then we send the data off to a long-term storage format in a standards-based Parquet file. And that Parquet file is natively read by the Ahana service. So we simply deploy an Ahana cluster, I use the Presto engine that natively supports our open standard file format. And we have a normalized schema that our application can immediately start to see value from. So we handle the collection and streaming ingest and we simply leverage the engine in Ahana to give us the appropriate scale. We can size up and down and control the cost to give the users the experience that they're playing. You know, I really love this topic because one, it's not only cutting edge, but it's very relevant for modern applications. You mentioned NextGenSim, S-I-E-M, you know, security information event management, not SIM as in memory card, which I think of all the time because I always want to add more. But this brings up the idea of streaming data real-time. But as more services go to the cloud, Derek, if you don't mind sharing more on this, share the journey that you guys have gone through because I think a lot of people look at the cloud and saying, and I've been in a lot of these conversations about repatriation versus cloud, people aren't going that way. They're going more innovation where there's net new revenue models emerging from the value that they're getting out of understanding events that are happening within the network and the apps, even when they're being stood up and torn down. So there's a lot of cloud native action going on where just controlling and understanding is way beyond the just put stuff into an event log. It's a whole nother animal. Well, there's a couple of paradigm shifts that we've seen it major patterns for in the last five or six years. Like I said, we started with the same streaming ingest platform on premise. We use some different open source technologies. What we've done when we moved to the cloud is we've adopted cloud native services as part of our underlying platform to modernize and make our service cloud native. But what we're seeing is many customers either wanted to focus on on premise deployments and especially financial institutions and government institutions because they were very risk averse. Now we're seeing even those customers are realizing that it's very difficult to maintain the hundreds or thousands of servers that it requires on premise and have the large skilled staff required to keep it running. So what we're seeing now is a lot of those customers deployed some package products like our own and even our own customers are doing a mass migration to the cloud because everything is handled for them as a service. And we have a team of experts that we maintain to support all of our global customers rather than every one of our global customers having their own teams that we then support so it's a much more efficient model. And then the other major approach that many of our customers also went down the path of is building their own security data lake. And many customers were somewhat successful in building their own security data lake but in order to keep up with the innovation if you look at the analyst groups the Gartner Magic Quadrant on the SIM space the feature set that is provided by a package product is a very large feature set. And even if somebody was to put together all of the open source technologies to meet 20% of those features just maintaining that over time is very expensive and very difficult. So we want to provide a service that has all of the best in class features but also leverages the ability to innovate on the backend without the customer knowing. So we can do a technology shift to a HANA and Presto from our previous technology set. The customer doesn't know the difference but they see the value add within the service that we're on. So I get this right here. Presto's enabling you guys to threat detection at a level that you're super happy with as well as giving you the option for give self-service. Is that right for the, is that a kind of a... Well, let me clarify our definition. So we do streaming threat detection. So we do machine learning based behavioral analysis and threat detection on rule based correlation as well. So we do threat detection during the streaming process but as part of the process of managing cybersecurity that the customer has a team of security analysts that do threat hunting and the threat hunting is where a HANA comes in. So a human gets involved and start searches for the forensic logs to determine what happened over time that might be suspicious. And they start to investigate through a series of queries to give them the information that's relevant. And once they find information that's relevant then they package it up into an algorithm that will do analysis on an ongoing basis as part of the stream processing. So it's really part of the whole life cycle of hunting to a real-time threat detection. It's kind of like the old adage, hunters and farmers. You're farming through the streaming and hunting with the detection. I got to ask you, what would be the alternative if you go back? I mean, cloud's so great because you have cutting edge applications and technologies. Without Presto, where would you be? I mean, what would be life like without these capabilities? What would have to happen? Well, the issue is not that we had the same feature set before we moved to Presto, but the challenge was on scale. The cost profile to continue to grow from 100 terabytes to one petabyte to tens of petabytes, not only was it expensive, but it just the scaling factors were not linear. So not only did we have a problem with the cost, but we also had a problem with the performance scaling off and keeping the service running. A large Hadoop cluster, for example, our first incarnation of this used the Hive service in order to query data in a map reduced cluster. So it's a completely different technology that uses a distributed Hadoop compute cluster to do the query. It does work, but then we start to see resource contention with that and all the other things in the Hadoop platform. The Presto engine has the beauty of, not only was it designed for scale, but it's feature built just for a query engine and that's providing the right tool for the job as opposed to the general purpose tool. Derek, you've got a very busy job as chief architect. What are you excited about going forward when you look at the cloud technologies? What are you looking at? What are you watching? What are you getting excited about? Or what worries you? Well, that's a good question. What we're really doing, I'm leading up a group called the Securonix Innovation Labs and we're looking at next generation technologies. We go through and analyze both open source technologies, technologies that are proprietary, as well as building our own technologies. And that's where we came across the HANA as part of a comprehensive analysis of different search engines because we wanted to go through another round of search engine modernization. And we work together in a partnership and we're going to market together as part of our modernization efforts that we're continuously going through. So I'm looking forward to iterative continuous improvement over time and this next journey as we're seeing because of the growth in cybersecurity really requires new and innovative technologies to work together holistically. Dipty, you got a great company that you founded, co-founded. I got to ask you as the co-founder and chief product officer, you both the lead entrepreneur also get the keys to the kingdom with the product. You got to balance that 20 mile stare out in the future while driving product excellence. You got open source as a tailwind. What's on your mind as you go forward with your venture? Yeah, great question. You know, it's been super exciting to found the HANA in the space. Cloud data and open source is there kind of, that's where the action is happening these days. But there's two parts to it. One is making our customers successful and continuously delivering capabilities features, a continuing on our ease of use theme and foundation to get customers like a Securonix and others to get the most value out of their data and as fast as possible, right? So that's a continuum. In terms of the longer term innovation, the way I see the space, there is a lot more innovation to be done and Presto itself can be made even better and there's a next-gen Presto that we're working on. And given that Presto is a part of the foundation, the Linux foundation, a lot of this innovation is happening together collaboratively with Facebook, with Uber, who are co-members of the foundation with us. Securonix, we look forward to making a part of that foundation and that innovation together can then benefit the entire community as well as the customer base. This includes better performance with more capabilities built in, caching and many other different types of database innovations, as well as scaling, auto-scaling and keeping up with this ease of use theme that we're building on. So very exciting to work together with all these companies as well as Securonix who's been a fantastic partner. We work together, build features together and I look at delivering those features and functionalities to be used by these analysts, data scientists and threat hunters, as Derek called them. Great success, great partnership and I love the open innovation, open co-creation you guys are doing together and open data lakes, great concepts, open data analytics as well. This is the future, insights coming from the open and sharing and actually having some standards to love this topic. So, Dipty, thank you very much and Derek, thanks for coming on and sharing on this CUBE conversation. Thanks for coming on. Thank you so much, John. Thanks, take care, bye bye. Okay, it's a CUBE conversation here in Palo Alto, California. I'm John Furrier, your host of theCUBE. Thanks for watching.