 Good morning, welcome back to theCUBE's continuing coverage of AWS re-event 2021. I'm Lisa Martin. We have two live sets here. We've got over a hundred guests on the program this week with our live sets, our remote sets, talking about the next decade in cloud innovation. And I'm pleased to be welcoming back one of our CUBE alumni, Tomer Shira, the founder and CPO of Dremio to the program. Tomer's going to be talking about why 2022 is the year open data architectures surpass the data warehouse. Tomer, welcome back to theCUBE. Yeah, thanks for having me. It's great to be here. It's great to be here at a live event in person. My goodness, sitting side by side with guests. Talk to me a little bit about before we kind of dig into the data lake house versus the data warehouse. I want to unpack that with you. Talk to me about what's going on at Dremio. I know you guys were on the program earlier this summer, but what are some of the things going on right now in the fall of 2021? Yeah, for us, it's a big year of a lot of product news, a lot of new products, new innovation. The company's grown a lot. We're probably three times bigger than we were a year ago. So a lot of new folks on the team and many new customers. That's good. Always new customers, especially during the last 22 months, which have been obviously incredibly challenging. But I want to unpack this. The difference between a data lake and a data lake house. Well, I love the idea of a lake house, by the way. But talk to me about what the differences are, similarities, and how customers are benefiting. Sure, yeah, I think you could think of the lake house as kind of the evolution of the lake, right? So we've had data lakes for a while now. The transition to the cloud made them a lot more powerful. And now a lot of new capabilities coming into the world of data lakes really make that whole kind of concept, that whole architecture much more powerful to the point that you really are not going to need a data warehouse anymore, right? And so it kind of gives you the best of both worlds. All the advantages that we had with data lakes, the flexibility to use different processing engines to have data in your own account in open formats. All those benefits, but also the benefits that you had with warehouses where you could do transactions and get high performance for your BI workloads and things like that. So the lake house makes kind of both of those come together and gives you the benefits of both. The benefits, both talk to me about from a customer lens perspective, what are some of the key benefits? And how does the customer go about from, say, they've got data warehouses, data lakes to actually evolving to the lake house? Data warehouses have been around forever, right? And there's been some new innovation there as we've kind of moved to the cloud, but fundamentally, there are very close and very proprietary architecture that gets very expensive quickly. And so with the data warehouse, you have to take your data and load it into the warehouse, right? Whether that's teradata or snowflake or any other database out there, that's what you do, you bring the data into the engine. The data lake house is a really different architecture. It's one where you actually, you have the data as its own tier, right? Stored in open formats, things like parquet files and iceberg tables. And you're basically bringing the engines to the data instead of the data to the engine. And so now all of a sudden, you can start to take advantage of all this innovation that's happening on the same set of data without having to copy and move it around. So whether that's, you know, Dramio for high performance BI workloads and SQL type of analysis, Spark for kind of batch processing and machine learning, Flink for streaming. So lots of different technologies that you can use on the same data and the data stays in the customer's own account, right? So S3 effectively becomes, you know, their new data warehouse, if you will. Okay, so I can imagine during the last 22 months of this scattered work from, and we're still in this work from anywhere environment with so much data being generated at the edge, the edge expanding that bringing the engines to the data is probably now more timely than ever. Yeah, I think the growth in data, you see it everywhere, right? That's the reason so many companies like ourselves are, you know, doing so well, right? It's, there's so much new data, so many new use cases and every company wants to be data-driven, right? They all want to be, you know, to democratize data within the organization. You know, but you need the platforms to be able to do that, right? And so that's very hard if you have to constantly move data around, if you have to take your data, you know, which maybe is landing in S3 but move it into, you know, subsets of it into a data warehouse and then from there move, you know, subsets of that into, you know, BI extracts, right? Tableau extracts, Power BI imports and you have to create cubes and lots of copies within the data warehouse. There's no way you're going to be able to provide self-service and data democratization. And so really requires a new architecture. And that's one of the main things that we've been focused on at Dremio is really taking the lake house and the lake and making it not just something that data scientists use for, you know, really kind of advanced use cases but even your production BI workloads can actually now run on the lake house when you're using a SQL technology like Dremio. And that's really critical because as you talked about this, you know, companies, every company these days is a data company. If they're not, they have to be or there's a competitor in the rear view mirror that is going to be able to take over what they're doing. So this really is really critical, especially considering another thing that we learned in the last 22 months is that there's no real time data access is no longer a nice to have. It's really an essential for businesses in any organization. Yeah, I think, you know, we see it even in our own company, right? The folks that are joining the workforce now, they learn SQL in school, right? They don't want to report on their desk, print it out every Monday morning. They want access to the database. How do I connect my, whatever tool I want or even type SQL by hand and I want access to the data and I want to just use it, right? And I want the performance, of course, to be fast because otherwise I'll get frustrated and I won't use it, which has been the status quo for a long time. And that's basically what we're solving. So is the Lakehouse versus the data warehouse better able to really facilitate data democratization across an organization? Yeah, because there's a big, you know, people don't talk a lot about the story before the story, right? With a data warehouse, the data never starts there, right? You typically first have your data in something like an S3 or perhaps in other databases, right? And then you have to kind of ETL it all into that warehouse. And that's a lot of work and typically only a small subset of the data gets ETLed into that data warehouse. And then the user wants to query something that's not in the warehouse and somebody has to go from engineering spend, you know, a month or two months, you know, responding to that ticket and wiring up some new ETL to get the data in. And so it's a big problem, right? And so if you can have a system that, you know, can query the data directly in S3 and even join it with sources outside of that, things like your Oracle database, your SQL server database, your MongoDB, et cetera, well now you can really have the ability to expose data to your users within the company and make it very self-service. They can query any data at any time and get a fast response time, that's what they need. That self-service is key there. Speaking of self-service and things that are new, I know you guys, Jremio Cloud launched that recently new SaaS offering. Talk to me about that, what's going on there? Yeah, we launched Jremio Cloud. We spent about two years working on that internally and really the goal was to simplify how we deliver all of the kind of the benefits that we've had in our product, sub-second response times on the lake, semantic layer, the ability to connect to multiple sources, but take away the pain of having to install and manage software, right? And so we did it in a way that the user doesn't have to think about versions, they don't have to think about upgrades, they don't have to monitor anything. It's basically like running and using Gmail, right? You log in, you get to use it, right? You don't have to be very sophisticated. There's not a lot of administration you have to do. It basically makes it a lot simpler. And what's the adoption been like so far? It's been great, it's been limited availability, but we've been onboarding customers every week now. Many startups, many of the world's largest companies, so that's been really exciting actually. So quite a range of customers. And one of the things, it sounds like Jremio has grown itself during the pandemic. We've seen acceleration of that, of startups, of a lot of companies, of cloud adoption, of migration. What are some, how have your customer conversations changed in the last 22 months as businesses in every industry kind of scrambled in the beginning to survive and now are realizing that they need to modernize, to thrive and to have competitive advantage? I think I've seen a few different trends here. One is certainly there's been a lot of acceleration of movement to the cloud, right? With how different businesses have been impacted, it's required them to be more agile, more elastic, right? They don't necessarily know how much workload they're going to have at any point in time, so having that flexibility both in terms of the technology that can, with Jremio Cloud, we scale, for example, infinitely, like you can have one query a day or you can have 1,000 queries a second and the system just takes care of it, right? And so that's really important to these companies that are going through, being impacted in various different ways, right? You had companies, the Peloton and Zooms of the world that were, business was exploding and then of course, the travel and hospitality industries and that went to zero all of a sudden, it's been recovering nicely since then, but so that flexibility has been really important to customers. I think the other thing is just they've realized that they have to leverage data, right? Because in parallel to this pandemic has been also really a boom in technology, right? And so every industry is being disrupted by new startups, whether it's the insurance industry, the financial services, a lot of insure tech, different companies that are trying to take advantage of data. So if you as an enterprise are not doing that, that's a problem, right? It is a problem. It's definitely something that I think every business and every industry needs to be very acutely aware of because from a competitive advantage perspective, you know there's someone in that rear view mirror who is going to be focused on data, have a real solid modern data strategy that's going to be able to take over if a company is resting on its laurels at all. So here we are at Reinvent. I talked a lot about, I just came off Adam Silipsi's keynote, but talk to me about the Dramio AWS partnership. I know AWS's partner ecosystem is huge. You're one of the partners, but talk to me about what's going on with the partnership. How long have you guys been partners? What are the advantages for your customers? You know, we've been very close partners with AWS for a number of years now and it kind of spans many different parts of AWS from kind of the engineering organization. So very close relationship with the S3 team, the EC2 team, just having dinner last night with Kevin Miller, the GM of S3. And so that's kind of one side of things, is really the engineering integration, where the first technology to integrate with AWS Lake formation, which is Amazon's data lake security technology. So we do a lot of work together on kind of upcoming features that Amazon is releasing. And then also they've been really helpful on the go-to-market side of things, on the sales and marketing, whether it's blogs on the Amazon blog or their sales teams actually promoting Dremio to their customers to help them be successful. So it's really been a good partnership. And they are, every time I talk to somebody from Amazon, we always talk about their kind of customer first focus, their customer obsession. Sounds like there's deep alignment from the technical engineering perspective, sales and marketing. Talk to me a little bit about cultural alignment. Because when you're going into customer conversations, I imagine they want to see one unified team. Yeah, I think Amazon does have that customer first. And obviously we do as well, we have to, right? As a startup for us, if a customer has a problem, the whole company will jump on that problem, right? So that's where we call customer obsession internally. And I think that's very much what we've seen with AWS as well, is the desire to make the customer successful comes before, okay, how does this affect a specific Amazon product? Because any time a customer is using Dremio on AWS, they're also consuming many different AWS services. And they're bringing data into AWS. And so I think for both of us, it's all about how do we solve customer problems and make them successful with their data in this case? Yep, solving those customer problems is the whole reason that we're all here, right? Talk to me a little bit about, as we have just a few more minutes here, when we hear terms like future proof, I always want to dig in with folks like yourself, Chief Product Officers, what does it actually mean? How do you enable businesses to create these future proof data architectures that can allow them to scale and be really competitive? Sure, so yeah, I think many companies have experienced what's known as lock-in, right? They invest in some technology, we've seen this with databases and data warehouses, right? You start using that and you can really never get off and prices go up and you find out that you're spending 10 times more, especially now with cloud data warehouses, 10 times more than you thought you were going to be spending. And at that point, it becomes very difficult, right? What do you do? And so one of the great things about the data lake and the lake house architecture is that the data stays stored in the customer's own account, right? It's in their S3 buckets in open-source formats like Parquet files and Iceberg tables. And they can use many different technologies on that. So today, the best technology for SQL and powering your mission-critical BI is Dremio, but tomorrow, there may be something else, right? And that customer can then take that company, can take that new technology, point at the same data and start using it, right? They don't have to go through some really crazy migration process. And we see that with Teradata and Oracle, right? The old school vendors, that's always been a pain. And now it is with the newer cloud data warehouses, you see a lot of complaints around that. So the lake house is fundamentally designed, especially if you choose open-source formats like Iceberg tables, as opposed to say Delta Lake, you're really future-proofing yourself, right? Got it. Talk to me about some of the things as we wrap up here that attendees can learn and see and touch and feel and smell at the Dremio booth at this re-invent. Yeah, I think there's a few different things. They can watch a demo or play around with Dremio cloud. And they can talk to our team about what we're doing with Apache Iceberg. Iceberg, to me, is one of the more exciting projects in this space because it's created by Netflix and Apple, Salesforce, AWS just announced support for Iceberg with their products, Athena and EMR. So it's really kind of emerging as the standard table format, the way to represent data in open formats in S3. We've been behind Iceberg now for a while and so that to us is very exciting. We're happy to chat with folks at the booth about that. Nessie is another project that we created, an open-source project for really providing a Git-like experience for your data, where you have version control and branching and kind of trying to reinvent data engineering and data management. So that's another cool project that we can talk about at the booth. Awesome, so lots of opportunity there for attendees to learn even more. Thank you, Tomer, for joining me on the program today, talking about the difference between a data warehouse, data lake, the lake house. You did a great job explaining that Dremio cloud, what's going on and how you guys are deepening that partnership with AWS. We appreciate your time. Yeah, thank you, thanks for having me. My pleasure. For Tomer Shiran, I'm Lisa Martin. You're watching theCUBE. Our coverage of AWS re-invent continues after this.