 Here we go. Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining this DataVersity webinar today, which is Data Warehouse or Data Lake. Which do I choose? Sponsored today by Ahana. Just a couple of points to get us started due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section or if you'd like to tweet. We encourage you to share highlights or questions via Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And to find the Q&A and the chat panels, you can click those icons on the bottom middle of your screen to activate those features. And just to note, Zoom defaults the chat section to send to just the panelists, but you may absolutely change it to chat with everyone. And as always, we will send a follow-up email within two business days, containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now, let me introduce to you our speaker for today, Allie LeClerc. Allie has over a decade of experience in open-source software, product, and community marketing. She's currently the head of community at AHANA, the software as a service for Presto Company, where she works closely with the Presto Foundation to drive open-source programs. Prior to AHANA, Allie held community and marketing positions at Alexio, Cashbase, and Time Warner. She holds a degree from Yale University in political science. And with that, I will give the floor to Allie to get the webinar started. Hello and welcome. Thank you, Shannon, for quite a nice introduction. And welcome. Thanks, everybody, for joining. Looking forward to talking more about data warehouses or data lakes, which do I choose? So why don't we jump into what we'll actually be talking about today? So I'm going to start with just a quick introduction. What is data warehouse? What is a data lake? And then I'm going to touch on the data lake house. Maybe it's a term that you have heard about lately. I think in the last six to eight months, the data lake house has certainly been picking up quite an amount of buzz. So I'll be talking more about that. And then I'm going to talk about Presto for the data lake house and do a quick intro on what Presto is and why we think Presto is the best SQL engine for that data lake house. I'll touch on some real-world use cases. And then I have some time for Q&A at the end. So with that, let's get started. So let's start the traditional data warehouse. Typically, it was a columnar structure. If we look back to the traditional data warehouse days, it had an in database analytics, very performance focus. And really, it was built for structured data. And what that really meant was that data had to be modeled. And data modeling was somewhat of an endless task. You had the ETL, that data, you had to abstract all that data from all of your sources, and you had to transform it, and then you had to load it back into the data warehouse. And then finally, it was all primarily accessed through SQL. So data warehouses were made up of a few different sources, a few different outputs. But for the most part, if you look at the challenges, which I have here, they were, and they still kind of are, they're expensive, very expensive actually. They can be a little bit slower. They can be difficult to manage and very expensive to maintain. And they do provide access to just a limited amount of data. So a lot of challenges, yes, a lot of benefits, but a lot of challenges as well. So let's look at the data lake. If we move to the data lake and take a look at just the data lake in and of itself, let me just take a step back. We all remember Hadoop, right? Hadoop was going to replace the data warehouse. Years ago, everyone was reading that the warehouse was dead and Hadoop was taking over the world. And just a quick primer on Hadoop. So Hadoop was a file system, data storage. It was really inexpensive to use. And loading data into that Hadoop data lake was really cheap. And you could store all types of data, structured, semi-structured, unstructured. And it was all about ingesting data, loading it in and creating the structure once you had all of that data in there. And you could keep all of your data in there. And one of the big things that Hadoop helped was with that separation of storage and compute. So really for the first time, it separated out those two layers where you had storage with HDFS and then compute went through many iterations and generations, even in that Hadoop time frame. So it started with MapReduce and then it was Hive. But the point here is that you separated those two tiers out and you can do a lot more compute on your storage. So primary use cases for the data lake, discovery, text analytics, data science, mostly notebook and Python and other languages but became the primary way to access, although there was some SQL access as well. So while the data lake was less expensive, certainly than the data warehouse, there was limited performance, especially when it came to complex analytics, there was limited SQL access. And in general, the data lake was kind of hard to govern. Hadoop didn't really have those enterprise capabilities at first. And I think one of the biggest challenges with Hadoop was that it was really hard to use. It was never really simplified. And technology, it could be really great, but if it's complicated and it takes too long to get value from then it doesn't really serve its purpose. And over time, a lot of people struggled with it. So what were the drivers behind the modernization of the data warehouse and data lake? Let me start by saying that cloud is probably one of the biggest drivers of this. And I'll cover that in more detail later on. But what we've seen since the inception of the data warehouse and the rise in the data lake with the advent of Hadoop, there's been this really rapid modernization of those platforms and it's driven by three things. So the first is digital transformation. Everything's moved to digital now. There's massive upticks of things like mobile technology. There's more interactive web and mobile apps. And those are way more engaging. And there's tons of new data and data types. And along with that is an uptick in engagement. So think about engagement with your employees, with your customers, with your partners, et cetera. Second is real-time events. Everything's moving closer to real-time and that's created a need to respond quickly to business events of any kind. Third is really being on the cusp of seeing everything automated. So in the world of robotics, for example, there are these like massive plants for everything's being automated. And in order to automate everything that requires incredible intelligence delivered to machines, to sensors and all kinds of devices in the IoT in order to automate everything. So to smart data has really been a driver to modernize. So if we look at the data warehouse versus the data lake, these trends have really pushed us to the modernization of both of them. If we look at this slide, modernization is happening in both places, but in all areas. So the most modern of both is Cloud First, which I alluded to earlier. And there are increasingly more companies who are moving to the cloud and there are increasingly more companies that are born in the cloud. And I call these digital native businesses, their entire infrastructure is in the cloud. It could be one cloud, it could be multiple clouds. And then going on one step deeper on the cloud first, today we're seeing a lot of containerization. So Kubernetes, I'm sure many of you are a part of it, you may even be using it, super popular, very stable. And a lot of companies are leveraging containers like Kubernetes to run their business. And even at the HANA where I work today, we were born in the cloud. We built a HANA as Cloud First and we built completely on Kubernetes and containerize it to take advantage of all the flexibility you get with that in the cloud and the availability of the cloud and the scalability of the cloud. So moving to the next bullet here, there's a move to more in-memory capabilities. And on the data warehouse side, they're now bringing in more complex data types than ever before. And the modern data lake is now bringing in columnar data types. So I talked earlier about the separation of compute and storage. And we're seeing that more so now. And so I won't go through every single one of these bullets in detail, but it kind of gives you a good sense of where we're at today when it comes to looking at the data warehouse and the data lake. I will touch on open formats as well. So formats like Apache ORC, Apache Parquet, specifically within the data lake, they can be consumed by many different engines. So what's really important about open formats is that you're not locked into one specific technology and you can move from one engine to another. For example, Spark supports it or Presto supports it and TensorFlow just added support for it. So with the data lake, you can leverage open formats, which are highly performant and have multiple types of processing on top of it. And of course the data warehouses are trying to expand and extend themselves to the data lake. But what happens is when you have a critical path for any product, it's built for a specific type of data. So with data warehouses, it's proprietary formats. Cloud formats like S3, there might be an extension. With data lakes, they're built for the open formats and not for these proprietary formats. So these are some of the considerations to think about as you're looking between the data warehouse and the data lake. How open do you want it to be? Is cost a factor? We have the adoption of AWS Amazon S3, which is one of the most popular data lakes today. Tons of companies, tons of data getting moved to S3 and with the advent of this simple, very cheap commoditized storage, S3 is becoming a big driver in the use of the data lake and moving to the cloud. So combining the both of both worlds. Let's talk about, I talked about the data lake house, but what does it mean to really merge the data warehouse and the data lake and the workloads? How does that work and what does that even mean? And so, we've talked about the data warehouse, we've talked about the data lake, the differences between the two, but let's talk about what's going on today in the market and we're at a point where these two things are converging. And I'll talk in a lot more detail on the data lake house stack and the architecture later on. But first, let's talk about why we're seeing this trend and specifically how a distributed query engine like Presto is helping drive this. So which one do you want to bet on in the future? Where's your primary path that you want optimized for? And the reason this is important is because it's going to tell you where your data is going to live. Is 80% of your data in the warehouse or is it in the lake? And that's an important decision and it's driven by the business requirements you have. So what we're seeing is if you have some dashboards or some reports and you need really high performance access and the data warehouse is a good fit, but there's an emerging trend of a different kind of analysis and consolidating that into a lake. It gives you the ability to run these future proof technologies on a lake. There's so much innovation happening on the lake today. So that becomes a fundamental decision. The next path, even if you choose one way or the other, the good part is that you do have a layer on top and that you can abstract that and give you access to both. And that's with SQL access. So querying across multiple data sources. At Ahana, we see most people, most of our customers leveraging the data lake and they query maybe one or two other data sources. So let's look at the list I have here. So first is SQL access, which is important. Most companies have data teams that know SQL, that can run SQL queries and that can use SQL to access data. So leveraging those resources to get better insight into your data lake and your data warehouse, that's bringing the best of both here. Then you have unified analytics, which means you can support more of your business use cases with a distributed query engine. Distributed query engines means you can leverage your existing platforms and your data sources with limitless scale for all data types, which is incredibly powerful. So our hypothesis is that the next enterprise data warehouse is the open data lake house. This convergence of the data warehouse and the data lake, what does it actually look like? So if we look at the last few decades, if you wanted to do SQL processing at a fairly large scale, you'd probably be using the enterprise data warehouse. Which makes sense, it was purpose built for SQL because it's widely adopted. And it's a great language for interacting with the abstractions you have, which are things like tables and doing things like reports or dashboards. So some of those characteristics, usually it's very coupled. So you have your storage, which you see here, your proprietary storage. And this is usually some sort of proprietary format that the vendor stores your data in. And it would have a highly optimized engine on top of that. And that gives you the ability to interpret that data in that format. Typically you get really good performance in the data warehouse, so it can get really expensive. So this is where we see a lot of companies start to have challenges with their enterprise data warehouse. The cost or price performance starts getting really high as they're using more compute resources for their data. Plus it's a really locked in type of system, data format or proprietary. So now to address those things, we're seeing what we call the open data lake house getting a lot of steam. The stack has been separated out. So the storage is no longer coupled to the compute and it's scalable. So you can leverage the cloud and be elastically scalable and it's pretty low cost, relatively low cost. So the amount you're paying for the storage is very cheap compared to the data warehouse. And now that your storage is no longer tied to a compute engine, you can leverage many different types of compute. And by the way, that storage can be anything, text, images, videos, files, many different formats. So those engines, different engines can read it. So then you have your engines on top, you can run your ML and your AI workloads. Then you can have Presto, which we'll spend some time talking about. Presto allows you to access the data in your data lake like S3 expose it with a SQL interface. So you can work with that data as if you were working with tables in a data warehouse and you could flex star from some table and it'll access the data in S3 data lake and then return the results in the table. And that's how you get your reporting and your dashboarding, et cetera. So you have your reporting and dashboarding on top of that. And then you have your data science, your ML and your AI on top of your ML and AI frameworks. And then last, you have your governance, your discovery, your quality, your security technologies. Those are technologies like Apache hoodie, iceberg, Delta Lake, et cetera, Amundsen. They give you more parity with the data warehouse. And I won't go into a ton of detail into this, but I wanted to show kind of the whole picture here on how we're looking at the open data lake house. And really like I mentioned earlier, the reason we're seeing this shift to the open data lake house is to solve for those two key challenges with the enterprise data warehouse, cost and flexibility. Price performance is typically much better on the open data lake house. And you have a lot more flexibility with open formats, open data formats, open technologies, open source, open clouds and the like. So considerations for choosing a data warehouse or data lake, I'm guessing or hoping many of you joined today to learn more about what you should be thinking about as you're making some architectural decisions around unified analytics or choosing a data warehouse or data lake. So what we put together are just some considerations to think about as part of that process. And there are eight areas to drill into and I won't go really deep into each of these, but I hope they help kind of frame it for you. So starting with data, what kinds of data do you need to support? Can your approach support the breadth of that data, structured, complex data types, textual, streaming, analytics? You wanna be able to support a broad range of analytics, not just SQL, but Python, notebook, search. So you wanna plan for the future, set yourself up with a solution that can support a broad range of analytics, not just what you need today, but what you might need in the future. Users, does your solution support a broad set of users on a single platform? Can your data engineer, your data analyst, your data scientist, your line of business owner, can they all access the data that they need to do their daily tasks, their daily jobs? What is your platform support? Is cloud critical? Does it support your enterprise requirements? Is it cost efficient? Drilling more into cloud. Is it elastic? Is it automated? Can you scale it? Is there mobility? Is there globality? What can you do as you expand into new regions? Does your platform support that? Drilling into the enterprise, security, privacy, governance, unification, a lot of enterprise requirements. Will your platform support those things? Business, looking at business. Does it support business semantics and the logic I wanna include in that? Will it allow me to create measurable value for my organization and optimize? Can you create more value over time? And finally cost. Will you be able to forecast your costs accurately over time? What is the cost at scale? As it grows, anyone that's doing analytics, and as anyone that's doing analytics or will your analytics be growing over time? Will you be able to scale as your business grows without having to break the bank? So I think what's important to note here is that when it comes to costs, the way data platform teams today are evaluating technologies is really changing. And at HANA, what we're seeing is a lot more people wanting to start pay as you go so that you only pay for what you use as you decide what technologies to leverage. So it's usually in a consumption model and my recommendation is make sure you have that option because it gives you the flexibility to try things out and you don't have to have that exorbitant cost to try it out. And the cloud is allowing you to do that. That's definitely an enabler of this model. All right, so a little bit more about us. Why am I in here today talking about data warehouse or data lake and where does HANA and Presto fit in? And so let's start first with open source Presto. I don't know how many folks have heard about Presto. Let me do a quick introduction here. So Presto is an open source project. It's a distributed SQL query engine for the data lake and the lake house, which you saw in the data lake house architecture slide. And it was built for fast analytic queries against data of any size. The project came out of Metta Facebook back in 2013. And today it's being used at scale at internet giants like Uber and Twitter and TikTok by dance in addition to Metta. And you can query data in place so you don't need to move or ETL data. It supports federated querying so you can join data from different source formats. So super high level, that's why Presto is so awesome. Today, Presto is governed by the Linux Foundation. So it's an open, neutral open source project. And there's a lot of companies. I named a bunch of the bigger companies but big and small that are leveraging Presto. So I won't go into a ton of technical detail today on what Presto is, but if you are interested in going deeper there are a ton of resources available on ahana.io that will take you more in depth into the architecture, et cetera, if you wanna go deeper into that. But I will go into some of the more popular use cases and how we see customers using Presto for the data lake and the data lake house today. So starting with interactive ad hoc querying is right in Presto's sweet spot. You can run interactive queries on your data as you need to and that's all in your data lake. There's a ton of reporting and dashboarding which should be no surprise given what we've been talking about today. So more of like kind of the traditional data warehouse workloads. But we also see federation across different sources or different data lakes. And some of these applications are powering customer-facing applications. So taking advantage of the power of SQL in that sort of way is becoming more popular. And then more advanced functionality the data lake house can provide with data lake house analytics and transformation using SQL. So high level, these are kind of the five key use cases that we see across our customers and the Presto user community. And let's talk about Ahana. So what do we do? So our mission at Ahana is to simplify the use of Presto and make it accessible and usable for data platform teams of all sizes. As Shannon mentioned in the introduction we are the SaaS for Presto company. So what we've done is we've built it to be fully integrated and cloud native. It gives you the best of both worlds. You have full visibility into your clusters, into your nodes. You don't have to worry about installing and configuring anything. So given the vast potential that the open data lake house provides there's still a lot of challenges in standing up that kind of environment. And really what we do at Ahana is we wanna create the easiest managed service for Presto. We wanna enable you to do your SQL compute on top of your data lake, reporting and dashboarding, your lake house type of applications and supporting data teams of all sizes. So it's free to get started. It's just pay as you go. It's all in the AWS marketplace. It's a really easy way to get started with Presto on your data lake or for your data lake house as you begin your journey to the data lake and to the data lake house. So let's talk about some real world use cases. Starting with Blinkit. So for folks who aren't familiar, Blinkit is one of India's top instant delivery services. Their motto is everything delivered in 10 minutes. So one of the challenges Blinkit had prior to moving to Ahana was they were in a data warehouse, a cloud data warehouse and their price performance, specifically the costs were just starting to get unmanageable. They are, they kind of, they grew very fast exponentially. The amount of data that they were processing, the amount of compute resources they needed, it just grew significantly over a short period of time. And so what they found was their price performance, they just, it wasn't making sense for what they needed. So what they did is they moved from their data warehouse to the open data lake house approach and use Ahana Cloud for Presto at the core. So Presto has the SQL query engine on top of S3. And today they use this architecture to power over 200,000 orders per day at a much, much better price performance. And we actually did a customer presentation with them a few weeks ago at the AWS startup showcase. Really great video, check it out if you're interested in learning more about their architecture and why they chose the data lake house approach and specifically Presto to power the SQL queries for that data lake house. But one of the things they said is that Ahana is providing blanket with the SaaS managed service for Presto and giving the company the advanced data management capabilities it needs to meet its instant delivery promise. So fantastic use case here on a company facing challenges, facing limits with their data warehouse in the cloud and moving to a more open architecture, open data lake house approach. And then next is Securonix. So Securonix is a security information and event management software company. They use Ahana for their in-app SQL analytics. So what they call it is threat hunting. And every single day they'll get millions of potential threats events to stream in. They'll stream it into their S3 data lake. And on top of S3, they wanna be able to run quick queries to see where potential threats may be lurking. And that's why they chose Ahana for Presto. Those billions of events that get stored in S3. They, again, similar to Blinkit were facing very significant challenges when it came to price performance with their cloud data warehouse moving to this approach. They saw three X better price performance with Ahana for Presto on AWS. So just a few use cases there, some validation on the challenges folks are facing, real customers are facing in the cloud with their data warehouses and why they're moving to this more open approach, open architecture, open data lake house. So with that, I know we have plenty of time for questions and it looks like there's quite a few here. So Shannon and I will turn it over to you to see the questions. Thank you so much. Yes, lots of questions. And just to answer the most commonly asked questions, just a note, I will send a follow-up email to all registrants by end of day Thursday for this webinar with links to the slides and links to the recording. So diving in here, so Ali from environmental, social and governance perspective, focusing on the environmental aspect specifically. Can you provide any insight on the impact of the data warehouse versus that of the data lake with regard to energy consumption and carbon footprint? So I don't have any metrics or statistics top of top of hand here. And you know, I've never really gotten this question. So Indra, I'm sorry, I don't have very good and good answer for you, but let me take that and follow up with my engineering team to see if we can get you something that I just want to make something up here. So we'll take that and we'll follow up directly with you. Good, it's an interesting question. So can we have a star scheme of data warehouse inside the data lake? If not, what's the reason for that? Yeah, so we're seeing a lot of folks actually leveraging what data marks on top of the data lake. And Roshan, I hope this kind of answers your question, but one of the things that Amazon, the AWS talks a lot about is building kind of your line of business or your data marks on top of the data lake. And an architecture that a lot of enterprises are moving to are basically setting up line of business data marks on a data lake so that each line of business has full access to all of their data. So that means that for an example, the marketing team has their marketing analyst, has their marketing ops, has their marketing engineer, all having access to one specific data mark on the company's data lake. And HR might have a tool or a data mark. And so we're seeing this idea of kind of a mesh on top of the data lake a lot more. And so I think in that way, it's a little bit of a star schema in that the data lake is kind of the central repository and then you can set up your data marks on top of it. So it does kind of create a star schema in that sense. And there's some really great articles that AWS has written on that, Roshan. So I highly recommend checking those out. I love it. So it seems like a data lake is more of an ODS but has a cleaner data. Is this the fair assessment? So when I think of ODS, yeah, I guess so. I think it kind of depends on the use case, right? And so a lot of companies are very organized in their data lake approach and they have their meta data catalogs and they have their data catalog sitting right on top and that does help it become much more organized and clean. But keep in mind, a data lake is a repository for any and all types of data. So you can have your files in there, you can have your photos in there, your videos in there and then that's the beauty of the data lake and specifically with the open data lake house is that you can run your SQL workloads on top but you can also run your AI and your ML workloads on top. And so that's one of the really powerful characteristics of the open data lake house is that you can run various types of workloads, workloads that you can't necessarily run on the data warehouse. And that is why we're seeing a big shift to this data lake house approach. So are there tools to build a lake house available in the on-prem world? Yeah, absolutely. So I showed, let me see if I can find this architecture again. I showed our view because we are native, we are fully in cloud. Here it is. But you can build a lake house on your own. I mean, that's the beauty of the open data lake house is that each one of these components is self-managed, right? And so instead of a cloud data lake, you could do an on-prem Hadoop. You could run Presto on top of that. You could run your ML and AI on top of that. You could set up your own governance and security. So absolutely. I think it would be more of a do it yourself approach but it's certainly possible. So how does Presto leverage existing identity and access management infrastructures? Yeah, so there's a few different things. So let me talk about it from an AHANA perspective. So one is we actually just announced a lot of security updates and enhancements from an AHANA for Presto perspective. Specifically, we now have integration with AWS Lake Formation. If you're not familiar, AWS Lake Formation is an obviously AWS service that provides the identity and access management. It helps you build a secure data lake in literally hours. And so through that integration, you can do a lot of very fine-grained data access privacy control. Additionally, we also integrate with Apache Ranger, which is an open source security tool, also giving you identity and access management, very fine-grained permissions, et cetera. So there are a few different routes you can take depending on what you want to do. And that's from a AHANA perspective. Of course, from a Presto open source perspective, you can roll your own, right? So Presto integrates with Apache Ranger. And so you have the ability to do that as well from an open source perspective. So are you multi-cloud or just AWS? Yeah, good question. So from an AHANA perspective, we are today just in AWS, just available on AWS. However, Presto, the open source project is available in all clouds or on-prem. So you can run Presto, do it yourself, wherever you want. Is AHANA Cloud FedRAMP certified? I feel like we get this question quite a bit. So I need to bring it back to my product team. We're not FedRAMP certified. But because we get this quite a bit, I think it's something we need to look into. Definitely worth it. So yeah, besides cost consideration, when do you recommend using Presto S3 versus? Redshift. Yeah. Well, and we've got a couple of... We've got a couple of a comparison. And though we are a vendor-neutral company, really when should you use Presto? Yeah, and I think this kind of goes back to some of the use cases I was describing. And so for those who don't know, Redshift is Amazon's cloud data warehouse. And so I think it's, besides the cost consideration, what kinds of workloads, and I talked about this earlier, future-proofing yourself, what are the use cases that you're going to need to run? What are the kinds of workloads that you're gonna need to run? Not today, not just today, but in the future, right? And so, yeah, I talked a lot about price performance and that's where we see immediate, where you'll see immediate results, but down the road, are you gonna be needing to do AI and ML at any given point in time? And I think having the flexibility to do that is something to really consider, not to mention open data formats, right? The ability to run various compute engines on top of your data, those are the kinds of things to think about if you're looking at a data lake house approach versus a data warehouse approach. And specifically here with Ernesto asks about Presto S3 and Redshift. That's what I talked about because I live in the Presto world, but there's other technologies that, other data warehouse technologies out there and other data lake house technologies out there that are gonna kind of solve for the same sorts of problems. But Redshift to Presto S3 is certainly one that we see quite a bit. Awesome, thank you. So why is Presto performance on a HANA cloud three times faster than Presto on AWS? I love that question, Carmen. So we've built some proprietary features in a HANA, one of which is what we call data lake caching. So we actually have a cache built in directly into a HANA that is giving you that performance boost. So it's not available in the open source Presto project today. So that's typically why we see a HANA cloud for Presto faster, Presto on AWS. And then of course not to mention all the other benefits you get from a managed service, right? So, you know, resource, engineering resource overhead, managing and deploying and tuning Presto. I mean, I think there's literally like a thousand different configuration parameters that you can throttle back and forth with on Presto, which is a lot. And so we've kind of abstracted away all those complexities and made it very easy to get Presto up and running in the cloud literally like in 30 minutes, you're up and running. Oh, that's impressive. So it sounds like you're using virtual star schemas and that people aren't necessarily materializing them. What are they using to do that virtualizing? Does Presto allow us to define schemas and present these to reporting tools? Yeah, so Richard, I think let me try to answer your question. So I think, you know, typically what we see is Presto running on top of the data lake and then I don't have it in this slide. So, but your metadata catalog and your data catalog, those are gonna be important components to organizing your data, materializing it, being able to, you know, use that as the crawler, I guess, in some sense of your data. And then I'm not sure exactly virtualizing is kind of an overloaded term. So I'll take a guess by virtualizing, you mean being able to, I think, identify where your data is sitting and underneath the data lake, I'm not sure. But anyways, with Presto, so you'd actually be using the attached catalog to define schemas and then from there, that's right, your reporting and dashboarding would use that to actually pull that in into the reported dashboard. I love it. So many great questions coming in and following up on that, you know, most enterprises use Active Directory, maybe Octa for identity and access management. AWS is a separate world isolated in that respect. Can we integrate with Active Directory? Yeah, I think you can with open source Presto. So you, if you go to the docs, there's I think a whole thing on Active Directory and LDAP authentication. So check that out and that will give you more detail. What was that URL again? Just go to the Presto docs, PrestoDB.io, it's the open source projects. And in the docs, you'll be able to see more about Active Directory. And what do you use for metadata management? Yeah, so from an AHANA perspective, AHANA comes pre-bundled with an AHANA managed Hive Metastore. So that is part of the product itself and that's all self-managed and comes pre-bundled. So Hive and then Glue is the other one that you can also integrate with from an Amazon perspective. Some great questions coming in here. So when is the perfect moment to move from data warehouse to a data like taking into consideration the amount of data and the amount of sessions connected at X time in my infrastructure and what do I need to measure in terms to take the decision? Yeah, it's a good question. So what we see in our kind of customer base is when those costs start escalating. And so if you, as you scale your business, as your data starts growing, as your compute resources, resource needs start growing, that's when you're gonna start seeing some price jumps. And that's from our perspective, what we're seeing is the biggest pain for our customers. All of a sudden, they're paying 10 X more in some cases every month because of the compute. And so when you start seeing that pain, that's a good time. Obviously before that is even better, so you don't have to pay it, but if you have an expectation that your business is going to grow, your data is gonna grow and your compute needs are gonna grow, that's when you should start exploring the data lake. And not to mention, we talk to a lot of folks that are kind of in their journey of moving from a data warehouse to a data lake because they have hit those pains or they're even before the data warehouse. They're pre-data warehouse and they're trying to figure out, should I choose a data warehouse or a data lake? And our take is the data lake is gonna give you much more flexibility, much more openness and help you future-proof your use cases. And our belief is that 80% of the data is gonna be in the data lake. So take advantage of the data lake, build that data lake house in the right way so that you're set for today and you're set for the future. Perfect, and so in the fabric versus mesh debate, which we used to talk about as data federation, where do you place Presto? Yeah, so it's funny. So the fabric versus mesh debate, I think there's kind of, I think these terms all get completed. You have data fabric, you have data mesh, you have data virtualization, right? And so Presto is powerful in that you can query data where it lives. And so Presto sits kind of on top of your data sources and that's kind of the beauty about Presto is you don't need to move data, you don't need to ETL your data, copy your data, et cetera, right? You can query your data where it lives. Now that said, our perspective at Ahana is that like I just mentioned, we think like up to 80% of data is going to live in the data lake. And so if that's the case and if that's where your primary workloads and use cases are going to run, then Presto just on top of S3 is gonna take care of about 80% of workloads. And that's what we're building. We're building the best engine for the data lake, for the data lake house. And so it's for us, it's less about fabric and mesh and accessing a bunch of different data sources and more about building the best performing, best SQL engine on the market for the data lake. A lot of questions on schema, schema changes and data updates are always incredibly painful. Does Presto make this easier? So when we talk about the data lake house, Presto works very well with technologies like Apache hoodie and Delta Lake and iceberg and those and it's that layer that would typically be kind of handling the inserts, the upserts, the updating. And so from that perspective, Presto plus Apache hoodie, for example, on top of S3 is how we are recommending to our customers to do that in the right way. I love it. So I'll give everyone a couple more minutes to put questions in the Q&A, but in the meantime, you know, is Presto asset compliant? So from, I see it cause we're in the chat now. So from an asset compliant perspective, you can, you can use asset tables in Presto. And this goes back to what I was just talking about with, excuse me, sorry. Oh, excuse me. So at a high level, you can support it, it is supported, but there are some nuances there and I am not an expert in that. So I won't go into detail, but I recommend checking out the Presto documentation. And can you speak to your data quality approach? I'm not exactly sure what that question means. Katie, maybe you can elaborate a little bit more. I'll give Katie a moment to elaborate there. In the meantime, can you operate against Delta Lake? Yes. So Richard actually next week, there is a really great virtual event happening where Denny from the Delta Lake community will be talking about, we just, the Presto open-source community just announced an integration, Presto plus Delta Lake. So he'll be diving more into the architecture on that and there are integrations with both Presto, Hoodie and Presto in Delta Lake. And so yes, the answer is yes, you can. And check out the meetup next week, next Tuesday. It does sound yummy. Sounds great. So back to Katie's question on data quality, tools for using data quality, data cleansing, governance. Sure. So from an AHANA perspective, we talk a lot about Lake, AWS Lake formation, which gives you governance on your data lake. If you are an AWS user, I highly recommend checking out Lake formation. It's an easy service that gets you up and running on building a secure data lake on S3. And then AHANA for Presto integrates very seamlessly with AWS Lake formation. Open source, like I mentioned earlier, Apache Ranger is good for security and governance. So there are a few tools that are pretty baked in that you can use. And then you can also roll your own as long as you can connect JDBC, ODBC or on top of your data lake, you should be able to use it with Presto. All right. Well, that's all the questions we have, Ali. Thank you so much for this great presentation and thanks to AHANA for sponsoring today's webinar. And again, just a reminder to everybody, I will send a follow-up email by end of day Thursday with links to the slides, links to the recording and the additional links here that we've posted throughout the webinar and to y'all. So again, thank you all for a great joining us today. And Ali, thank you so much. Hope y'all have a great day. Thank you.