 Come back everyone. I hope everyone had a good 10 minute break. Get to use a restroom, stretch out a little bit. Let's see the questions here. Thanks Rohan for answering that question. Okay, so let's recap where we've been and this is good to kind of transition to the next part of our summit workshop. So we talked about analytical stores, we talked about the data lake, we talked about how you wanna run SQL on top of the data lake with Presto, talked about the Presto architecture which is a distributed system with the coordinators and the workers. We talked about connectors, the idea of connectors, how they map the underlying data source to a common abstraction of tables that you can then query on top of Presto. Then we talked a little bit, we double clicked a little bit on some of the details like the file formats of the underlying data, we talked about the metadata catalog and how that plays a crucial role in helping the query engine decipher the files that are inside the data lake. This all sounds really, really great and again the benefit of a lot of this is it's scalable, it's a lot more low cost so one of the inherent properties of a data lake is the storage is much lower cost. But it's not perfect, which is the reason that we're gonna talk about the lake house. So what do we mean by it's not perfect? So a lot of words, so obviously it's here, you can read it, I'll just speak to it and you can listen to me. So when the data lake first came out and this technology has been out for about 15 years now, at least mainstream and most organizations would have both a data warehouse and a data lake. The data warehouse being for typical SQL type stuff, you could also do it obviously in the data lake, the data lake being more for machine learning and data science and the bottom line is folks try to have these architectures that would support both the data warehouse and the data lake and they ran into a lot of problems. So for example, when you actually try to start writing files to a data lake, you're working with files. What happens if people are trying to read that data while you're writing it? What happens when you have multiple writers that are trying to write files? You end up getting like pipeline messes that are very hard to deal with and a lot of it is because the first version of SQL on top of the data lake didn't really have some of the more advanced functionalities you were used to in data warehouse. And I'll talk a little bit about what those are. And so that is kind of where the lake house came in and that is where folks said, hey, we need to add a little bit more functionality so that we could get the best of both the data lake and the data warehouse. And so that's kind of the main takeaway that you need to kind of get to, which is this data lake house is really this data management system, which is on top of the data lake, but gives you benefits of both the data warehouse and the data lake. And if we double click on, well, what do we mean when we want to make this work on the data lake like a data warehouse? Well, there's primarily two things you need to make sure that you can tackle. One is the performance of SQL on the data lake just as good as a data warehouse. So there's a lot of work in that area. How do you make the engine faster? Presto is actually working on that, not only with the core engine, but with another project called VLOX. That's the performance side. Now, there is a slew of other capabilities that you need to put inside the engine that really make it like a data warehouse. So what are these capabilities? We call them broadly data management capabilities. So one, you wanna make sure that any operation you do on top of the data lake is reliable. And this usually comes when you wanna mutate things, right? So in data warehouses, you have this concept of a transaction, which is like an all or nothing. Either everything works and the state of the data is well-known or nothing works and you roll back. You also update the data. So it's one thing where you just load the data and read it like we've done. Well, what happens if you wanna update a record? What happens when you wanna insert a new record? What happens when you wanna update a record? Well, what does that actually mean? Because I have this one file which that record's in, but I wanna replace that record. Do I overwrite that file? What do I actually do there? So actually mutating the data and deleting data when you're dealing with files underneath becomes a problem. The other thing is data quality. So if it's just a bunch of files, what prevents me just from writing files and overwriting files? So it turns out that you wanna enforce things. You wanna say, hey, you wanna write this file to the data lake, but it actually isn't correct. The schema's off from the existing data that's there. Or you wanna say, hey, I know I started with data that looked like this schema. I wanna evolve it. I wanna change the column name. I wanna drop a column, things like that. How do I deal with that? You might wanna do data versioning. So what if you wanna keep track of different versions of the table? How do you do that? Then you have other concerns, like the gentleman earlier asked, is how do you efficiently access the data? Indexing, data statistics, clustering, optimizing for files. And if you start creating all these versions of files, how do I know my data lake isn't a bunch of full stale files? It's full of stale files. How do you clean that up? So there's a bunch of services. Streaming and incremental processing is also a big thing. So change data capture, this is something that hoodie excels very well at. And then another thing is data authorization. So how do you, you can expose everything as tables and databases. How do you restrict what people can see? How do you restrict what databases people can see? How do you restrict what tables people can see? And how can you even restrict the columns and the rows that people can see? So if you remember I had the transaction table, there's actually technologies here that I have this credit card number. Well, if someone's querying this, do I really wanna have the credit card number show? There's actually technologies that you can layer on top of the data lake, which provide that kind of masking and prevention. So that's really what we mean by data management capabilities. And the predominant way that this is being done is with this concept of a table format. So in a basic data lake, you just literally have the files for the data. These table formats, and arguably hoodies more than a table format actually, is they add additional metadata to provide or enable some of these data management capabilities. So if I go and I go back and look at my data lake, so let's go to S3 and let's reload. No, I wanna go to S3, let's go to S3. And we'll see this later. I think it's in the workspace. Is that where you guys have your stuff? No, it's not in the workspace. It's probably in the demo. I'll just show you an example. I actually can go to another table in my labs. I'm just gonna show you a dummy example. So I actually have an example. I'll just write a hoodie tester. So you can have like in this case, this is again a very dummy example. So you see I have this parquet file here. But alongside this parquet file, I have this other folder called the hoodie folder. So what happens with a lot of these table formats is they add additional metadata inside the data lake. And the engine is able to leverage those additional metadata to then implement a lot of these data management capabilities. So it's another layer on top of the files so that before the query engine interacts with the underlying data, it can do things like, hey, what version of the data am I looking at here? What are the blocks that require, do I need to actually read to get this version? Or if I wanna make an update or insert, how do I write those in a safe way to allow for other guarantees? So in the market today, there are primarily three main formats you can choose from, hoodie, iceberg and delta. They all work by and large the same way. Some of the fundamental underlying principles are different. Today we're gonna focus on hoodie. Presto does support all of these major formats, but Presto and hoodie have shared origins at Uber. And so they work particularly well together. And so for that, my colleague here, Nadine will walk you through that aspect of it.