 Welcome to today's session of the AWS startup showcase, featuring Dremio. I'm your host, Lisa Martin. And today we're joined by Robert Maven, principal architect at Dremio. Robert's going to talk to us about democratizing your data by eliminating data copies. Robert, welcome. It's great to have you in today's session. Great. Thank you, Lisa. It's great to be here. So talk to me a little bit about why data copies, as Dremio says, are the key obstacle to data democratization. Sure. Sure. Well, I think when you think about data democratization and really what that means, what people mean when they talk about data democratization, what they're really speaking to is kind of the desire for people in the organization to be able to work with the enterprises data, discover data, really in a more self-service way. And when you think about democratization, you might say, well, what's wrong with copies? What could be more democratic than giving everybody a copy, their own copy of the data? But I think when you really think about that and how that ties into traditional architectures and environments, there are a lot of problems that come with copies and those are real impediments. And so traditionally in the data warehousing world, what often happens is that there are numerous sources of data that are coming in in all different formats, all different structures. These things typically for people to query them have got to be loaded into some sort of a data warehousing tool. Maybe they land in cloud storage, but before they can be queried, somebody has to go in and basically reformat those data sets, transform them in ways that make them more useful and make them more performant. And so this is very, very common. Like I think many, many organizations do this. And it makes a lot of sense to do it because traditionally the formats that data is sourced in is pretty hard to work with and it's very slow to query. So copies is kind of a natural thing to do, but it comes at a real cost, right? There's a tremendous complexity that can come about in having to do all these transformations. There's a real dollar cost and there's a lot of time involved too. So if you could kind of take all of these middle steps out where you're copying and transforming and then transforming again and then potentially persisting very high performance structures for fast BI queries, you can reduce a lot of those impediments. So talk to me about, oh, I'm sorry, go ahead. Go ahead. I was just going to say, one of the things that is even in more demand now is the need for real-time data access. I think real-time is no longer a nice to have. And I think what we've been through the last year has really shown that. So given the legacy architectures and some of the challenges with copies being an obstacle to that true democratization, how can data teams actually get in there and solve this challenge? Yeah, so I think going back a little bit to the prior question and I can fill out a little bit more of the detail and that'll lead us to your point that one of the things that is also really born as a cost when you have to go through and make multiple copies is that typically you need experts in the organization who are the ones who are going to write the ETL scripts or kind of do the data architecture and design the structures that have to be performant for real-time BI queries, right? So typically these take the form of things like, OLAP cubes or big flattened data structures with all of the attributes joined in or there's a lot of different ways that you can get query performance. Typically that's not available directly against the source data. So one of the things that data teams can do and there's really two ways to go about this, right? One is you can really go all in on the data copy approach and kind of home grow or build yourself a lot of the automation and tooling and parts that it would take to basically transform the data. You could build UIs for people to go in and kind of request data and you can automate this whole process. And we found that a number of large organizations have actually gone this route and they've kind of been at these projects for in some cases years and they're still not completely there. And so I wouldn't really recommend that approach. I think that the real approach and this is really available today with kind of the rise of cloud technologies is that we can shift our thinking a bit, right? And so we can think about how do we take some of these features and capabilities that one would expect in a data warehousing environment and how can we bring that directly to the data? So, with the shift in thinking, it requires kind of new technology to do this, right? So if you could imagine a lot of these traditional data warehousing features like interactive speed and the ability to kind of build structures or views or things on top of your data but do that directly on the data itself without having to transform and copy, transform and copy. So that's really something that we kind of call the next generation data lake architecture is bringing those capabilities directly to the data that's on the lake. So leaving the data where it is, next generation is a term like future ready that's used a lot. Let's unpack that and dig into why what you're talking about is the next generation data lake architecture. Sure, sure. And I think to talk about that, the first thing that we really have to discuss is really a fundamental shift in technologies that's come about really in the last few years. So, as really cloud services like AWS who have risen to prominence, there's some capabilities that are available to us now that just weren't three, four, five years ago. And so what we can do now is that we have the ability to truly separate compute and storage connected together with really fast networking. And we can provision storage and we can provision compute and from the perspective of the user, those two things can basically be scaled infinitely, right? And if you contrast that with what used to have to happen or what we used to have to do in platforms like Hadoop or in scale out MPP data warehouses, is that we didn't have not only the flexibility to scale compute and storage independently, but we didn't have the kind of networking that we have today. And so it was a requirement to take basically the compute and push it as close to the data as we could, which is what you would get in a large Hadoop cluster. You've got nodes which have compute right next to the storage and you try to push as much work as you can onto each node before you start to transfer the data to other nodes for further processing. And now what we've got with some of the new cloud technology is the ability to basically do away with that requirement. So now we can have very, very large provision pools of data that can grow and grow and grow really without the limitations of nodes of hardware. And we can spin up and down compute to process that. And the thing that we need though is a way of processing it, a query processing engine that's built for those dynamics, right? That's built so that it performs really, really well when compute and storage are decoupled. So I think that that's really the trick is that once we really come into the fact that we've got this new paradigm with separate compute, separate storage, very fast networking, if we start to look for technologies that can scale out and back and do really perform a query in that environment, then that's really what we're talking about. Now I think the very last piece in what I would call kind of next gen data lake architecture is very common even today for organizations to have a data lake, right? That contains a tremendous amount of data. But in order to do actual BI queries at that interactive speed that people expect, they still have to take portions of the data from the lake and go load it into a warehouse, right? And then probably from there, build OLAP cubes or extracts into a BI tool. So the last piece really in the next gen data lake architecture puzzle is once you've got that fast query engine foundation, how do you then move those interactive workloads into that platform so they don't have to be in a data warehouse, right? How do you take some of those data warehousing expectations and put those into a platform that can query data directly? So that's really what the next generation means to us. So let's talk about Dremio now. I see that just in January of 2021, series D funding of 135 million. And then I saw that data Naomi actually coined Dremio as a unicorn as it's reached a $1 billion valuation. Talk to us about what Dremio is and how you're part of this modern data architecture. Absolutely, yeah. So, you know, if you can think about Dremio as a, you know, in the technology context really is solving that problem that I just laid out, which is we're in the business of, you know, building technology that allows users to query very large datasets in a scale out very performant way, you know, directly on the data where it lives. So there's no real need for data movement. And in fact, we can also not only query one source of data but we can query multiple sources of data and, you know, join those things together in the context of the same query. So, you know, you may have most of your data in a data lake but then you may have, you know, some relational sources. So there's a potent story there in that you don't have to consolidate all of your data into one place. You don't have to load all of your data into, you know a data warehouse or a cloud data warehouse. You can query it where it is. That's the first piece. I think that the next piece that Jeremio provides is kind of as we mentioned before, we're giving almost a data warehouse like user experience in terms of very, very fast response times for things like BI dashboards, right? So really interactive queries. And the ability to do things like you would normally expect to do inside a warehouse. So you can, you know, create schemas for instance, you can create layers of views and accelerations and effectively allow users to build out virtually in the form of views what they would have done before with all of their various ETL pipelines to, you know, scrub and prepare and transform the data to get it in shape to query. And at the very end, what we can do is selectively kind of in an internally managed way, we can accelerate certain query patterns by creating something that we call reflections, which is an internally managed, you know, persistence of data that accelerates certain queries, but it's entirely internally managed by Jeremio. The user doesn't have to worry with anything to do with setup or configuration or cleanup or maintenance or any of that stuff. So. So does reflections really provide a differentiator for Jeremio? I mean, you look in the market and you see competitors like Snowflake, Single Store for example, is this really kind of that competitive differentiator? I think it's one of them. I think the ability to create reflections is it's certainly a differentiator because what it allows is, it allows you to basically accelerate different kinds of query patterns against the same underlying source data, right? So rather than have to go build a transformation for a user that, you know, potentially aggregates data a certain way and persist that somewhere and have to build all the machinery to do that and maintain it. And Jeremio literally it's a button click. You can, you know, go in and look at the dataset, identify those dimensions that you need to say aggregate by, the measures that you want to compute and Jeremio will just manage that for you. And any query that comes in, that may be going after this massive detail table with a trillion rows that has a group buy in it for instance, we'll just match that reflection and use it and that query can respond in less than a second where typically the work that would have to happen on the backend engine might take a minute to process that query. So really that's the edge piece that gives us that BI acceleration without having to use additional tools or any additional complexity for the user. And I assume you're talking about like millisecond response times, right? And you said under a second, but I'm sure milliseconds? Hundreds of milliseconds typically. So we're not really in the one to two millisecond range. That's pretty rare, but certainly sub-second response times is very, very common with very, very large backend datasets when you use reflections. Got it. And that speed and performance is absolutely table stakes today for organizations to succeed and thrive. So is what Jeremio delivers a no copy data strategy? Is that what you consider it? I consider that it's that and it's actually much more than that, right? So I think, when you talk to really users of the platform there are a number of layers of Jeremio and we often get asked, I get asked, who are our direct competitors, right? And I think that when you think about that question, it's really interesting because we're not just the backend high performance query engine. We aren't just the acceleration layer, right? We also have a very rich, fully featured UI environment that allows users to actually log in, find data, curate data, reflect data, build their own views, et cetera. So there's really a whole suite of services that are built into the Jeremio platform that make it very, very easy to install Jeremio on, install it on AWS, get started right away and be querying data, kind of building these virtual views, adding accelerations, all this can happen within minutes. And so it's really interesting that there's kind of a wide spectrum of services that allow us to really power a data lake in its entirety really without too many other technologies that have to be involved there. What are some of the key use cases that you've seen, especially in the last year as we've seen this rapid acceleration of digital transformation, this adoption of SaaS applications, more and more and more data. Some of those key use cases that Jeremio is helping customers solve. Sure, yeah, I think there's a number of verticals and there's some that I'm very familiar with because I've worked very closely with customers and in financial services is a large one. And that would include banking, insurance, investment, a lot of the large Fortune 500 companies that may be in manufacturing or transportation, shipping, et cetera. I think lately I'm most familiar with some of the transformation that's going on in the financial services space. And what's happening there, companies have typically started with very, very large data warehouses. And often for the last four or five years, maybe a little longer, they've been in this transition to building kind of an in-house data lake, typically on a Hadoop platform of some flavor with a lot of additional services that they've created to try to enable this data democratization. But these are huge efforts. And typically these are on-prem and lots of engineers working on these things really full-time to build out this full spectrum of capabilities. The way that Dremio really impacts that is we can come in and actually take the place of a lot of parts of that puzzle and give a really rich experience to the user, allow customers to kind of retire some of these acceleration layers that they've put in to try to make BI queries fast, get rid of a lot of the transformations like the ETL jobs or ELT processes that have to run. So there's a really wide swath of that puzzle that we can solve. And then when you look at the cloud because all of these organizations are either, they've got a toe in the water or they're halfway down the path of really exploring how do we take all of this on-prem data and processing and everything else and get it into AWS, put it in the cloud. What does that architecture look like? And we're ideally positioned for that story. We've got an offering that runs natively on AWS and takes full advantage of kind of the decoupling of compute and storage. So we give organizations a really good path to solve some of their on-prem problems today and then give them a clear path as they migrate into cloud. Can you walk me through a customer example that you think really underscores what you just described as what Jemio delivers and helping customers with this migration and to be able to take advantage and find value in volumes and volumes of data? Yeah, absolutely. Unfortunately, I can't mention their name but I have worked very, very closely with a large customer as I mentioned in financial services. And one of the things that they're very keenly interested in is they've had a pretty large deployment that traditionally has been both Hadoop-based and they've got a large, several large on-prem relational data warehouses as well. And Jemio has been able to come in and actually provide that BI performance piece, basically the very, very fast second, two-second, three-second performance that people would expect from the data warehouse but we're able to do that directly on the files and tables that are in their Hadoop cluster. So I think that and that project's been going on for quite some time and we've had success there. I think that where it really starts to get exciting though and this is just beginning, is this customer also is investigating and actually prototyping and building out a lot of these functions in the AWS cloud. And so the nice thing that we're able to offer is really a consistent technology stack, consistent interfaces, consistent look and feel of the UI, both on-prem and in the cloud. And so we can really once they start that move, now they've got kind of the familiar place to connect to for their data and to run their queries and that's a nice seamless transition as they migrate. What about other verticals like I can imagine healthcare and government services? Are you seeing traction in those segments as well? Yeah, absolutely we are. There are a number of companies in the healthcare space. I think that one of the larger ones in the government space, which I have some exposure to is CMS, which is one that we had done some work through a partner to implement Dremio there. And this was a project I think that was undertaken about a year ago. They implemented our technology as part of a larger data lake architecture and had a good bit of success there. So what's been interesting when you talk about the funding and the valuation and kind of the buzz that's going on around Dremio is that we really have customers in so many different verticals, right? So we've got certainly financials and healthcare and insurance and big commercials like manufacturing, et cetera. So we're seeing a lot of interest across a number of different verticals and customers are buying and implementing the product and all those verticals, yeah. Right, so take us out with where customers can go and prospects that are interested and even investors and finding out more about this next generation data engine that is Dremio. Absolutely, so I think the first thing that people can do is to look at the data and the first thing that people can do is they can go to our website which is Dremio.com and they can go to Dremio.com slash labs and from there they can launch a self-guided product tour. I think that's probably a very quick way to get an overview of the product and who we are, what we do, what we offer. And then there's also a free trial that's actually on the AWS marketplace. So if you want to actually try Dremio out and spin up an instance, you can get us on the marketplace. Do most of your customers do that like doing a trial with a proof of concept, for example, to see really how from an architecture perspective how these technologies are synergistic? Absolutely, yeah, I think that probably every large enterprise, there's a number of ways that customers find us. And so, often customers may just try the trial on the marketplace, but customers may also reach out to our sales team, etc. But it's very, very common for us to do a proof of concept. It's not just architecture, but it would cover performance requirements and things like that. So I think pretty much all of our very largest enterprise customers would go through some sort of a proof of concept and that would be done with the support of our field teams. Excellent, well, Robert, thanks for joining me today and sharing all about Dremio with our audience. We appreciate your time. Great, thank you, Lisa, it was a pleasure. Likewise, for Robert Maybin, I'm Lisa Martin. Thanks for watching.