 Databases, a database seminar series at Carnegie Mellon University is recorded in front of a live studio audience. Funding for this program is made possible by Ottotune. Google. Hi guys, welcome to another session of the native internet seminar series at Carnegie Mellon University. We're excited today to have our friends at Snowflake to come give three talks in one session, which has never been done before. Let's see how it works. Our first speaker will be Nelena, who is a tech lead of the Daleks team at Snowflake. Prior to that, she also worked at Microsoft. There are Tyler Jones, who's a software engineer at Snowflake, who will be talking to us about, who's also a tech lead, who talked about streaming gestion at Snowflake. And finally, we have the great Ashis Makavala, who is the founding engineer and the number one programmer at all Snowflake. It's not the French guides, it's not Marcin, it's Ashis. The entire company is based on his geniusness. So we're very excited for them to spend the afternoon with us talking about databases. So with that, Nelena, go for it. Actually, for the audience, if you have any questions, please unmute yourself and fire away at any time to let this be a conversation and not just have our Snowflake friends talking themselves for an hour. Okay? All right, the floor is yours, go for it. Awesome. Thank you, Andy, for that wonderful introduction. And as Andy said, we are here to talk about three new features in Snowflake. We have three topics, three very interesting topics to talk about. Iceberg tables, streaming and gestion in the store. So let's dive right in. So the first thing we are going to talk about is iceberg. And Snowflake announced support, native support for iceberg tables in the Snowflake Summit this year. It was a pretty big deal. And we are going to talk about what it means, why it is important, and all the design changes that we had to make to achieve the support to achieve Iceberg support in Snowflake. We will cover short history, we will come through the history of data lakes and the problems in traditional data lakes. We'll talk about how Apache iceberg solves some of those problems. What we did in the Snowflake architecture, what we changed in the Snowflake architecture to add support for iceberg tables. And once iceberg tables are created through Snowflake, what does one need to do to access them using external tools. And what are the new challenges that we are going to address in this space. Let's start with a short history of data lakes. Anybody who is familiar with this might be very familiar to people who have used data warehousing using Haroo or Hyde, or any of the older systems. On the, what on the right you can see that we have a storage layer, which is typically HDFS in the past, as cloud services became more popular, we have we now have as three years or blob storage and so on. Typically, you would find parking and org files or files in any other columnar storage that, but parking or for the most popular ones. And then we have a compute layer. This is the layer which mungers all of these data files and produces processes those and produces results right there are SQL interfaces there are other programming interfaces. There are many other different programming interfaces that people have come up with. As a result, on the compute layer you can see lots and lots of blocks boxes. Hi spark big thing. And there are so many more I just couldn't put anything in there. But one, there's one basic thing that underlies all of these systems that all of these systems were built on top of files. Every data lakes used files as their user file system as their metadata. Tables were organized, all the files for a table were organized in a directory, and any file that showed up in a directory automatically became a part of that table. When partition, when, when there was a need to partition the data, we said we will create a nested directory structure, and all the partitioning partition files and showed up in the nested directories. The ownership of the files into a table was very implicit, and it was based on which directory the file landed in. And once the file landed there it was a part of that table. This was improved a lot by the query engines, and they tried to introduce some transactionality and some consistency guarantees to all of these systems, but the underlying storage assumptions continue to stay the same. And this, as you might expect this started causing lots of problems. There were it was it is not easy to provide asset guarantees, you can see, like, even though there was a files metadata layer in the fire embedded in the file system there was also a metastore, which, which held your catalog information which said, these are all the tables I have and these are all the partitions in that table and this is where the data for them sits like the root directory. So, when an update operation or an insert operation happen, you would need to put the file in the particular location and also update the metastore and there was no great way to do this in an atomic manner. And there were tools that didn't always go through the metastore they sometimes went and accessed files directly. Obviously they didn't see get a consistent view of the of the table. Sometimes they saw files that were like not properly committed to the table. And this resulted in like basic transactionality problems right multi partition inserts were not atomic isolation as we described was was best effort, the query engine tried to do the best it could but if somebody goes and accesses your storage independently, all you have to do is get it off schema evolution, extremely error prone partition evolution. Almost always needed a complete data rewrite and there was no good way to do access control maintenance was difficult, like, are cleaning up orphan files, cleaning up expired tables deleted files, identifying which files are referenced in which files this was pretty there were no there was no like standard tools for doing all of this, and most companies that use these systems ended up creating their own custom tools and building processes around this and would have an entire team trying to maintain and keep the storage effective and efficient. So it was, it was, it was difficult is the best way to describe it like overall managing all of this was extremely difficult. And people liked it, people like that there was no vendor lock in. They could use their own storage and had more control over the data. There's a single source of group. They didn't have to maintain multiple copies of the data for different use cases. And most importantly, they liked using all the different tools that came up, came up, that showed up in the open source world, a train out flying spark on the same data set. The absence of a way to provide all of these transactionally guarantees was a deal breaker for us. There's no way snowflake can function as without providing isolation guarantees or without providing asset transactions. So, this is where Apache iceberg came into the picture Apache and with Apache iceberg. It was born at Netflix and they introduced this as a table format. Just as a file format describes how to store data in a five a table format describes how to store data in a secret table. It prefer it specifies how to perform updates on the table so that all readers have a consistent view of the data. It describes aspect to achieve snapshot isolation. And it provides a way to say that to explicitly define the membership of files in a table schema evolution is easier there is better, better way to do partition evolution. And there is an explicit metadata layer that you can update and modify and version so that you can time travel more effectively across your data and metadata. So let's let's take a deeper look into how iceberg achieves this. So in the earlier diagram we just saw one flat storage layer where all the data files which was mostly parking or presided and then there was a compute layer that that went and managed all of those files. Here we see a tiered system we see a data layer a metadata layer and a catalog. The data layer continues to be the same year we still have the parking and the archive, but we have a very explicitly called out metadata layer. We have a metadata file a manifest list and a manifest file, and together we maintain a snapshot of the table and as the ML operations happen on the table, newer sets newer snapshots of the table appear, and you get snapshot by solution when you're accessing iceberg tables as iceberg tables through the iceberg catalog. And a catalog is simply a key value store saying that hey for tables so and so this is the root metadata file and go and this is through this root metadata file you can spin up the entire table you get you can get all the information for that table. You can find more details about iceberg from the spec here, but for now I want to move into how this applies to snowflake and what are the changes that snowflake made to adapt to iceberg. So this is a very popular diagram of the snowflake architecture, we have the data storage layer we have virtual warehouses as compute warehouses and then we have a cloud services area. This was published in the snowflake white paper there's a link link in this presentation, but all the layers are properly defined and separated out. So, in this, in this in the snowflake architecture is the sections that need to change to adapt for iceberg where the metadata storage and the data storage. In snowflake data storage happens in FDN files which is, which is a columnar file format that snowflake came up with, and it is highly optimized for storing columnar data. In the open source land, an equivalent would be parquet files which is the most popular columnar file format. The metadata storage for snowflake has been EP files and, and has been stored in EP files and, and in FDB EP files are expression property files, where we store statistics about each file like main match statistics and TV's number of miles default values and so on. And then we have all of that index properly in in FDB and together with FDB and EP files, we have a really really powerful metadata system. I think iceberg has a completely different way of storing metadata. It is all stored as files, and it is also in storage as files there are no, there's no database requirement for storing all of the metadata. So there was a challenge for how do we translate, how do we draw mapping from the snowflake metadata to iceberg metadata for data storage it was fairly straightforward, like instead of writing like snowflake proprietary files we will now write parquet files. And there was also a challenge about like, how does the metadata map. Oh, before we go to the metadata mapping. There's also a new concept. So all of these, the metadata storage and data storage. This needs to happen on on the customer's volume right in the customer's bucket. The way to achieve in order to achieve writing to a customer's bucket we needed to introduce a new concept called external volumes, which, which would be a storage location is it simply a storage location that the customer gives us and it says, Hey, this is the location I want you to create the data files and the metadata files. And these are all the access credentials for accessing the data. We support all cloud providers, all the cloud providers like AWS Azure and GCP just by nature of being multi cloud customers can pick where they want their data to be written. This is created. And when you create an iceberg table you associate the volume with the table. When you insert into the iceberg table data shows up on the volume there's nothing too complicated about it. It's fairly straightforward, but it is really really important because it provides customers to say this is where I want you to write my data and customers now have complete access and complete control of the data and the metadata files. You can use external tools to read, read the data to process it and do what you what they want with it right like there is no, the data doesn't seem to be locked into snowflake they don't have to explore that out. Next we are going to take talk about what happens for a snowflake what happens when we try to do run a query in snowflake. Let's only consider the right side the snowflake part of this diagram for for the first part, we will go bottom up. Let's say there is a there are several insert operations that happen on a table in a transaction and all of these insert operation right some data files. Once this data files are written snowflake will capture the min max and statistics for each of these data files and write them into our EP files which is the snowflake metadata artifact. The EP files are then referenced in the snowflake meta metadata layer which is the EP and once, once all of these EP files are written in the EP, we create a new version for the table, which is then associated with those EP files and this table version is committed. Once the table version is committed new queries can see all the updates that have happened to the table. So, how does this work for iceberg tables. We have. We try to create a mapping for each of the concepts that we have in snowflake in to have an open source concept. So when we write data files instead of writing a DN files means we now write parquet files, when we write EP files, instead of right. In addition to writing EP files, we also write the iceberg metadata files which are the manifest files, where we write where we list all the party files and their statistics. These manifest files are then written into a manifest list, and we create a new snapshot where when it's snowflake we would create a new table version and iceberg we create a new snapshot. And then the snapshot gets committed, all external tools will see the same updates that snowflakes sees. So there is no lag, everything happens in the same transaction and external tools can access everything in the exact same way that snowflake accesses those files. Finally, when we were actually trying to generate this metadata, we noticed that generating the equivalent iceberg metadata requires us to write at least three files that is three different for the operations on S3, which gets really expensive for low latency operations. You don't want a small one to insert to be doing three port operations and it increases the query latency actually quite significant here. So we had to make a choice of generating the iceberg metadata in line in the snowflake query or generating it in the background. And because of the high latency for like writing these three additional files we decided to generate it in the background. In addition to having background metadata generation, we also have a part that generates this metadata on demand. So if the background thing has not yet caught up and and some there was a request from the snowflake catalog to access a specific table, we will generate the metadata on fly and return it so external tools still continue to see the all the latest updates on the how does this tie into external tools now you have an iceberg table you have all the iceberg metadata how would how would one go about accessing it. Let's say you have a spark. You have a spark job that needs to access the size book table I spoke provides a fairly a very convenient SDK that is used by almost all the tools that use that support iceberg tables. So spark would load the size book SDK and it would say from snowflake catalog I want to read the orders table. Once the iceberg SDK gets his name it looks up in the catalog. What is the root look what is the location of the root metadata file for the orders table. It pulls out that pulls out the root metadata file spins up the table metadata in memory and sends that back to spark spark spark can now query that table. When spark is wearing the table it is wearing the exact same data files that snowflake created and snowflake queries. So there are no two copies of the data and spark sees all the updates that are happening to the table. As the updates are happening so there's no latency or there's no lag in the updates seen by external tools. So this is all well and good, because snowflake provides really high transactionality guarantees right we, of course we are said we have asset transactions we have isolation, and we have multi statement multi table transactions. And when you're running iceberg tables through snowflake you get all of that. But in addition to that you get all the other features that snowflake snowflake supports. You get ingestion for iceberg continuous data ingestion for iceberg tables you get clustering materialized research indexes, row based access control, data masking and and everything else that snowflake supports. So, using iceberg tables and snowflake makes it a really really powerful makes them really really powerful right like now you have data in your bucket, you can use, you can use your spark jobs you don't need to rewrite them, you can use your train of clusters train of queries you don't need to change any of that, but you can, you can use everything in snowflake as is. Yeah, she has a question. Yes, you want to go for it. Yeah. I have a question about the Mac background metadata file generation. It's like you had two slides ago. How do you get persistency for that so let's say you have data inserts a database, but the metadata files were not generated. So if the server were to crash, how are you supposed to find the files with no metadata files generated for. So if the server were to, as, as you see in this picture, like, if spark wanted to query square your table, then it would ask the iceberg SDK to load the root metadata file from a catalog. So here the catalog would be a snowflake catalog, and the request would come to snowflake if the server crashes there would be a different server to which that request would come it would say for the orders table, give me the root location of the location of the metadata file, because the metadata is not generated, it would generate the metadata on fly, because we already have the snowflake metadata. So all we need to do is translate the snowflake metadata to the iceberg metadata. So the, the operation does not commit unless we commit the snowflake metadata. The transition at any time. That makes sense. Yeah, like two copies two versions of the same metadata, and you're able to translate from one to the other cool. Thank you. So we, we want to, we want to do a lot more for iceberg tables. Data lakes are traditionally left unencrypted, we want to increase security for all of these tables we want to make sure they are all encrypted and the big piece for encryption is, is enabling having a having a better way to do key management. We want to take this challenge for high scale data, like, all of these data existence. We want to provide a way to migrate existing tables to snowflake without prewriting any of the data files. So this would simply be a meta meta data trans meta data transformation translation to snowflake iceberg has introduced low level deletes. So they have position deletes and quality deletes and we want to be able to support all of those, all the new features in iceberg in snowflake. So we want to contribute snowflake features back to the iceberg community. There are, there are lots of new things, new and interesting things that we've done in snowflake that customers love, and that we are not able to make available for iceberg tables because simply because the format doesn't support it. And we want to give back by saying that hey you know, we embrace this and we want to, we want everybody to be able to use all the features we have on snowflake. Question from Chris. Yes. Hi, I was wondering, can you have transactions that span these metadata roots like can you get snapshot isolation across multiple roots. And when you do it through snowflake, yes, because snowflake, the when you when it's an iceberg people that snowflake it still goes through snowflakes transaction system, which which does allow multi table transactions and multi statement transactions. And the metadata translation, as I said happens in the background, once the tape once the snowflake metadata gets committed, all of that translation happens in the background so the iceberg metadata generation is completely disjoint from the snowflake transaction system. I see so in your example the spark query could have snapshot isolation across multiple roots. And that's only through snowflake it's not natively an iceberg. Yes, that's not natively nice but no. Got it. Thank you. Any other questions for iceberg. Well, what was like one sort of one unexpected thing we want to expect a problem you guys had to deal with that like, like, you obviously you expect out the problem or vectors for them when you actually get into weeds of iceberg parking files or this environment that sort of be harder than you had originally anticipated. There were, there were many. So, there was, there was some there was some challenges in doing data type mappings. Some of the native data types were simpler because they directly mapped. But I think snowflake has a timestamp with time zone but iceberg doesn't have and doesn't have the data type so we had to decide how to represent that snowflake has a variant data type which is a schema less struck. So iceberg doesn't have that concept. So, we are still kind of thinking about how we can manage that in an, how we can manage that well and in the iceberg land. In addition to that the way iceberg represents metadata. You can imagine that every snapshot is is complete in itself. So every snapshot points to a manifest list and that points to a manifest file to a list of manifest files. So a snapshot does not depend on the previous snapshot. But that also means that when an update operation happens, you need to the new snapshot needs to have an entire list of all the data files show up in the new snapshot. So computing that the depth of that list in snowflake is very efficient because because of the way we laid out a metadata to have for the updates to happen much faster, but translating that that that didn't directly translate to how iceberg lays it out so that became like how do we make that operation efficient computing the different files and making that list show up again and have it be independent was was interesting. Thank you. With that, I think dice but pieces done, and I would like to hand over to Tyler who's presenting the next section. Thank you. So let me share my screen. One second. Let's see. And then. Great. So today I'll be talking through snow pipe streaming. So before we begin maybe to give some context about the history of data ingestion at snowflake. When folks, when snowflake first became available we had a few mech it a few ingestion mechanisms. Our first mechanism was the insert statement. So just manually running insert statements from the driver of your choice. We introduced support for copy, which allow you to copy a file from some location in cloud storage into a table. We then introduced snow pipe, which then allow for a continuous copy command essentially, as you put data into a bucket, we would read those files and just them into the table behind the scenes and then you could query them. What we were missing from our ingestion mechanisms was way to ingest data in a low latency fashion and in order fashions that it was high throughput. So our other ingestion mechanisms while they did offer some of this did not offer it comprehensively so folks would often hack together copy commands to adjust data in order or do it via insert statements and then get throttled things like that. So we introduced no pipe streaming to introduce some of these to overcome some of these challenges. So it's no pipe streaming is is it's directed direct direct data streaming to snowflake. There's no ingestion to tables over HTTPS. No pipes are involved. No commands are on nothing like that with snow pipe streaming you can achieve exactly once and per channel ordering. I'll elaborate in a the next slide what a channel is, but essentially we introduced the concept of virtual partitions in the forms of channels that you can stream data to snowflake into. So you can process this data once you can process it in order, things like that. So here's the ingestion mechanism data once presented to a client SDK which I'll discuss in the future slide. Once the data is presented to the client SDK it's queryable on the order of seconds in the common case. Ingestion right now that our latency with snow pipe streaming is on the order of two to five seconds and the five seconds tends to come from cloud storage tail agencies. So if we put ingestion mechanism, it's possible to stream gigabytes per second ingestion into a single table. It's low overhead, it's minimal configuration and setup is needed, like all of snowflakes features. And this is also a low cost option for both flood and trickle cases. It's also possible to aggregate data to make trickle cases cheap. And it's also possible to flood to provide a flood and have that be fairly cheap to ingest as well. So the new concepts that we introduced as part of snow pipe streaming. So I elaborated or I mentioned the channel on the previous slide. So a channel is basically a logical partition that represents a connection from a client into a destination table. Channels are named so you can open up the channel to a table, you can give it a name, and you can stream data over that channel. Channels are owned by a single client. Two clients cannot share the same channel. If a channel is opened by one client and then reopened up by another, the first client's rights are invalidated. You can no longer write data to that table via that channel. These are roughly analogous to say Kafka partitions, however, you can have as many of these as you want. These are named, and he's had the concept of an offset token. So as you write data into a channel, you can provide an offset token that allows you to reason about your data source on your side. So for instance, like if you're reading data from a Kafka partition or something like that. You can provide as a string to snowflake the offset number within the Kafka partition. The offset tokens are totally opaque to snowflake. We provide them to you with the channel you can get the latest committed offset token and reason about ingestion rates on your side. We offer a client SDK as part of this. So this is snowflakes applied software, which we fondle for convenience, starting with existing ingest SDK that accepts bytes in the form of rows. It then buffers that data for some period of time. It then aggregates this data across channels that a particular client has opened and flushes them to cloud storage. Once it's flushed it then registers them to snowflake tables via REST API, and we'll go through diagrams in the upcoming size to cover what the flow of data actually looks like. This is only available in Java for now. And it does a fair bit on its side to do the buffering the plushing and things like that. We've also introduced the concept of a mixed table. So a mixed table is, it's a regular snowflake table. However, unlike regular snowflake tables that are only made up of FDN files, which is snowflakes proprietary data format. It contains a mix of BDEC files, which is the new file type created by the client SDK and FDN files. So when you query this table, you'll query a set of different file formats, which is why we call it a mixed table. We do background migration from BDEC to FDN via regular DML, copy snowflake, things like that, and which scan back occurs from FDN, or from BDECs to FDNs. And we also have reclustering and small file GC that migrates this data transparently in the background. It's important to note that the data is queryable immediately when it's in BDEC format, we just migrate it in the background for convenience. Maybe she's just going to get into this. This basically smells like Unistore, right? Like that you have this, like, I'm assuming BDEC is like a row format and then you eventually migrate it to the column or format, is that correct? BDEC is actually a column or format. BDEC is actually, it's basically arrow. However, these files can span data across multiple tables, which is why we don't just call them like arrow blocks or something like that. It's kind of a blob that can belong to multiple tables. So this BDEC thing is also internal. It's not like the user needs to know. Exactly, exactly. This is totally transparent to the user. So if they're querying FDN or BDEC, or if there's migration occurs, they don't need to be concerned about that at all. It just works. Yeah, I was just going to ask. You said that it's basically arrow, but that it's not because it has to span kind of these multiple partitions. I'm just kind of curious what the particular technical limitation is. You don't have to go super deep. Yeah, maybe I should have rephrased that differently. So the data itself is an arrow column or format. However, the blob has what we call chunks within it, and a chunk can belong to a particular table. So you could have, say, chunk one, that's of arrow data that belongs to table one, you could have chunk two that belongs to table two, so on and so forth. So it's not just straight arrow that makes up the file, it's segments of arrow data. Got it. So like record batches, but much larger. Exactly, exactly. Okay, cool. Thank you. Yep. So here's a diagram of the client channel table mapping. So in this example here we have client one that has opened up a few channels to different tables. And as it writes data to these tables, the data gets buffered amongst these channels, we generate the blob that belongs to multiple tables, and then we go and register it to these tables. And when you query this data, we have offsets associated with the the per table data that allow us to know okay start reading from this particular range in the file. So here, client two, this only has two channels that belong to table three. So if client two were to say open up channel 11 to table one client ones rights over that channel would be invalidated it would not be able to continue writing to table one, it would have to reopen up the channel or create a new channel to write to that table. But this is some highlights of the Java Injust SDK that we provide. So we have these client API's where you can use a factory instance to open up channels to a particular table. So you can provide the name, the target database schema things like that. There's some authentication related information that needs to be provided so that you can actually write to this table. We support key pair authentication as well as OAuth. And then once you open up a channel you have a channel object, and via that channel object you can call insert row or insert rows where you can provide the row and a schematized format. The offset token that allows you to reason about your upstream stores with respect to that particular row. We also have an SDK or excuse me a method that allows you to get the latest committed offset token for a table. Since there is some buffering that takes place both client and server side, you can call this to know okay how far stuff like actually committed with respect to my upstream source so I could like throw data away or do check pointing whatever. And then we also have a close option which just performs some memory management this this is not really like a snowflake option. You don't really need to actually call this. So in terms of what the flow of data actually looks like. So first we start with the client SDK running in the customer's environment. So the user opens up one or more channels to a particular table and then provides rows in the schematized format for these tables. As I mentioned before we buffer this data in chunks client side, and then we generate blobs that are then persisted the cloud storage. It's important to note that these chunks are encrypted with pro table level keys. So we are not writing you know plain data to cloud storage this is all encrypted with keys derived from the base tables keys so things like try secret and stuff like that continue to work. Security is paramount for snowflake and we are very secure with what we do here. But once the data has been written to cloud storage. The client SDK then calls a REST API that tells snowflake hey go and register these blobs to the table so that they're then queryable. This request hits a normal query GS node. So, I'm not sure which talks have been given here about snowflake but we have clusters that customers are mapped to internally on snowflake side, and this request will hit a cluster that a customer is mapped to. Once we have received that request we do a number of things including authorization authentication privilege text things like that. But then we fan out to a separate type of cluster that we call a commit service cluster. And what this commit service is responsible for doing is buffer in this data. So it'll get this commit doing some exactly once and messaging checks things like that. So it'll get this this this request from the front end node, and it will respond with hey we got this yet but it's not yet committed. And this gets propagated back to the client SDK which is told essentially we got the data, however it's not committed you can continue sending us more data. It's an asynchronous protocol. So just once it has these, these chunks what it's doing is that accuse this, this, these blog registration requests up. We do some deduplication we do some validation. We have a number of sequences that the client SDK passes long that we use to ensure that hey can decline actually right to this table. We can see all the messages that we expect things like that. And once it has buffered and a set of a blog registration request for some period of time, it then goes and actually creates a new table version by fast committing via our normal table commit operations. At that point, once the data has been committed. It is then queryable in the BDAC format. As I mentioned before in the background of course like since this data is written in our format. If you have dml copy snow pipe operations things like that reclustering, we then migrate from the BDAC format to at the end format. This is totally transparent to the user the data is queryable as before. And once you query it we scan both the arrow and the at the end data and elaborating a bit more on what the querying portion looks like so say you query one of your mixed tables. After we do our normal compiler magic. I think we have a question. You go for it. You want to meet yourself. Yeah, so I'm just curious what kind of message can you ingest into snowflake like users can only insert, or they can do update and delete it. And if users want to do delete and update. Do they need to specify something like a primary key. That's a good question. For now this is an insert only API. If you want to do update or insert or like some type of CDC. You have a couple ways of doing that right like you can embed an action saying in your row and then reconcile that you know on the snowflake side. We are working on something called merge tables which will allow you to do this kind of like continuous insert update delete based on merge with the primary key supplied but that's a little bit out for now. Thanks. Yeah. And was there another one. Great question. Right. Okay so really quickly, querying a mixed table so. So once we do our normal compiler magic we get to the table version and metadata resolution. So we have a metadata layer that tracks for every table version that gets created, what files were added and removed from the table. So in this example, let's say we have T one. We added partition two and P here is is abbreviating as FDN so say we added FDN partition to and we deleted FDN partition one. At table version two, we added BDAC one and BDAC two, and in table version three, we added BDAC three and BDAC four. And now, with this data, which by the way, I forgot to mention all the BDACs that we that snowflake has registered to it also have EP information per generated and passed along by the client SDK. So we can do things like pruning and all our normal optimizations and stuff like that. So we have a set of files we do pruning we have a set of rewrite rules that operate on these. We then generate the plan, and then once the plan actually execute. It scans the mix of the snowflake partition P2 and then the BDACs one through four. That's all part of the same scan set there's no special operations or operators or unions or anything that takes place. We have a single table scan operator that's capable of scanning both our internal format and this new format. So we generate this data we generate row sets and we pass them down to our other operators. But that's it. That's all I have to present. So like the, a lot of the infrastructure to sort of keep track of these different like tailored to that like that all of the snowflake it's obviously that the error stuff and making sure that like, okay I have a bunch of extra extra locations I need to scan. And then there's obviously obviously the API takes up the inbound the front end, but it sounds like a lot of this is actually be using existing snowflake infrastructure makes this fantastic right like it. Yep, yep, we tried to plug into as much as we could with everything here right so like you said we plugged into our existing metadata layer we plugged into a lot of our existing you know scanning infrastructure things like that. We had to do a ton of work to palm these these files through though that that was a big part of the difficulty here as well as having files that belong to multiple tables this was something that was new for snowflake. So this made things like garbage collection, EP management encryption things like that, very difficult to pull off correctly so a lot of work plumbing that through. Okay, Ryan, did you have a question. Sorry, I think I muted by accident. Thanks. Alright, cool. Thank you. Yep. I'll stop sharing and hand it over to a sheesh. Sheesh you're actually on track. It was amazing. And I will unmute myself. So, I'm the last and definitely the least of the presenters. So, what we've told you about today about snowflake is how we're sort of expanding data warehousing, how we're improving data lake, how we're working with the open source, getting back to the open source and even bringing things from the open source, enhancing them with features and facilities that snowflake has built in. And there's one more thing that, you know, we want to talk about which is Unistore today. So, if you take Tyler's presentation that talks about how can you ingest a lot, a lot more data into snowflake tables gigabytes per second. If you look at Lima's presentation, it talks about how can you query data that's already produced using snowflake or other open source tools that are out there. So there's this one little piece that's sort of missing is how can you do low latency querying and low latency transactions on top of snowflake. And so that's what I'm here to talk about. And that product is called Unistore for us. That's the marketing name for it, better name for it in the sort of industries from the age tab. So it's very typical you see two different sets of databases you see transaction processing databases, you see analytic databases, customers are constantly performing ETLs, ELTs between these two systems replicating data making multiple copies, making sure that they have to monitor this, not getting consistent data across these two different sets of systems. And this basically hurts innovation for them. So we want to see if we can bring all of this into one system together, one database, one connection, one product. As Tyler said, you know, one compiler, one set of operators, one set of SQL API, make it really easy for customers that's consistently been our goal at snowflake, and we keep striving in that direction. So we call that bureaucracy. And we want to get away with that. So that's what a single transactional analytical system together looks like we call it Unistore. So before I sort of step into, I'll go back on slide. I'll talk about Unistore. I have a few questions. Let's make this a little interactive we have some time. So, you know, snowflake started off as a data warehousing service. Anybody, what do you think is the current percentage of DMLs versus select statements that snowflake sees in a day? Yeah, including inserts. DMLs, yeah, any inserts updates, deletes, merges are all DMLs. Okay, so I got 5%, I got 5% in chat. I got 2% in chat. Anybody else? I asked that. Yeah, so we see about 60% are DMLs, sometimes a little more. And fewer actually select statements. Next question. P95 query execution time for a select statement or a DML in snowflake. These are analytic queries, Gavin, I wish he was doing it. Five minutes, 200 milliseconds, 10 seconds. So, so five seconds is our P95. We don't have any statement, whether it's a DML or whether it's a select statement. What is our P99? I won't wait for this one. It's less than a minute for all of these. And our P50s, as someone said is closer to the hundreds of millisecond range. So what this kind of tells you is that customers are already using data warehousing products as operational data warehousing products. The build of Hdap is more real than it ever existed before. And so whether you want it or not, snowflake is running an Hdap system today. And if we can improve the concurrency by a couple of orders of magnitude, and if we can lower the latency by, you know, one and a half orders of magnitude or so, then we can have a really compelling Hdap product for our customers. And that's what we love to do. And we've done that by introducing another table type, which is called as hybrid tables. So what hybrid tables do, again, with the same connection, you can create a new table type and it will perform both operational as well as analytical queries. And it will support critical transactional features like, you know, unique keys, primary keys, indexes, differential integrity constraints. It also does roll out locking these. We don't support the standard tables in snowflake today out of the box. You can define a constraint, but we want to enforce it. We don't really have support for indexes. But what we also did is we said that now we've got a new table type hybrid tables, but and you also got standard tables, but you need to be able to do transactions across these two seamlessly. So we worked on that. Now you can sort of imagine to build something like hybrid tables. We need a different storage system. Here's, you know, the way we have common storage is not going to work if you have operational data that needs to be stored. And so you actually have two different kinds of storage systems between hybrid tables as well as standard snowflake tables. So you actually have to do transactions across them. So fair amount of challenges that are part of this. But that's the beauty is we do all the hard work and the plumbing behind the scenes, you know, new compiler operators, we've done them new execution operators for the new storage engine. We've added that, but it's all seamless to you as a customer. Right. So take a quick look at what snowflakes architecture looks like today. And we've shared this, it's a few different layers. We call these cloud services. Some of the people, you know, Tyler called a GS, but that's what it means is cloud services. That's where the clients queries come in. Infrastructure manager make sure cloud resources are managed as the compiler slash optimizer, which actually combines a query query comes in against authenticated authorized. The query data gets looked up. You know, what is the catalog? Is it still there? All of that is in FTP. Eventually this query after compilation gets scheduled on query process, you know, virtual warehouses or compute, which customers provision, you know, a DAG is executed on, which is our query plan. The result is produced by scanning data from columnar storage, lob storage. And the result is returned back to the client going back up to the chain. Right. And we, as I mentioned, columnar storage is not the best for operational data. But, you know, row based storage is so we added a new row based storage engine, which works underneath the same infrastructure, the same layer is the same stack. Right. The query processing, as I mentioned, has new operators that can scan row based storage. The metadata manager also has to do more work. So it needs to understand how row based locking can be done. And some of that has to be pushed down to the query processing layers. Previously in Snowflake, the query processing is mostly oblivious to transactions and metadata manager. So query processing is mostly stateless. And we've changed that to support row level transactions down at this level. So we didn't stop there. What we're doing to get good analytical performance as well is we take the row based storage, and then we turn it into columnar storage. Right. And, you know, we, we're not talking a whole lot about this. And this is one of the reasons why I went and he came and said, actually, will you talk about unit stories that I can't talk enough about the internals of it yet. We're sort of, you know, partway through rolling this out figuring this out. So not enough details can be shared. I'm sorry about that. Sometimes soon. But we turned this into columnar storage such that, you know, you can perform good analytical query on top of this as well. So, you know, I didn't want to sort of end that quickly, but we're tracking on time and we wanted to keep about five to 10 minutes left for more questions. And, you know, whoever has more questions, please, you know, shoot at any of us. Go by. I see you kept your promise. Thank you for finishing on time. I'll plot, have everyone else. Okay, as she said, we have time for questions. Although, again, he will be somewhat cagey about his answers that you know, but that's okay. We can keep pushing him. Ben, I saw your hand first want to go for it. I'm sure apparently adds indexes to snowflake. How does that work. I know that snowflake for kilometer data you write an entire block to S3. How do you have a mutable index based on S3. That's a great question, Ben. So with Unistore, we store the data all rights, go to the row based store first, and in the row based store, you know, tables are part of the key space indexes are part of a separate key space. And so, just like, you know, other databases like my sequel or so. We have an index which you can use for local. Yeah, Gavin, go for it. Yeah, hi first thanks for the presentation that was super cool. So, I have a question I work in the GraphQL space and with GraphQL API is you often do a field wise selection right because you're going to pick a row set and you're going to reverse relationships and it's always a sparse set of columns. If you're implementing a like a column or storage layer for these old TV storage, have you benchmarked with the performance differences between row wise and column there. We have done some benchmarking we're not quite at the level to sort of look at GraphQL like queries and see how sparse column, you know sparse. If you've got lots of lots of columns then what actually happens when you select a few. Okay, yeah, no worries. Stephen. Hi, so how would the hybrid Unistar really compared to the FDB that Snowflake uses kind of amazing to me that it's no flake engine team is planning to maintain parallel development on FDB does no flake column or existing database format and also these new units. I have a good question. So, you know foundation DB FDB is is like key part of our metadata infrastructure we make huge contributions to open source as well. The way I see it is these things are sort of, you know where database company we can invest in multiple different database technologies simultaneously, and it sort of helps us make sure that our own cloud services on better. Maybe another way to ask Steven's question is can you confirm or deny that Unistar runs on foundation. I cannot. Okay. That's her. Any other questions. So my question would be what was like how much did you have to change in the optimizer, or would you guys call it compiler to know that it's a roaster showing up. And I guess the other question would be, what is the strategy for going forward for snowflake would you would you go to the customers and say hey I see you're doing a bunch of updates on on this table. And then would you then recommend, and can you automatically migrate them to a hybrid table, or would like, I guess how aggressive would be get people often using the way, you know, doing, doing old to be stopped by existing stuff like now, and how to get the hybrid stuff. Well, so the answer to your second question is very easy. It is a customer who comes and says, look, I want to do operational updates, you know, inserts, deletes on a regular snowflake table, you can probably do 10s per second. Today, maybe hundreds if you sort of feel it's out of you, you'd have to partition, etc. We built this as a data warehousing system. And in the unistore world, we want to take that to, you know, few orders of magnitude higher thousands, tens of thousands, etc. Right. And so you would instantly know from your use case saying this is not going to work with standard tables and this will work for only work with and we've talked to customers about how many changes they notice. How many changes, what are the kind of changes and how many changes we had to make to the optimizer. There are a lot of tricky and interesting problems here, because we don't want customers to have to think about their workload, especially when it comes to scheduling on the compute side. So you could probably have mixed workloads that need to run on the same query processing infrastructure, the same virtual warehouse. So how do you decide what is a short query, what is a long query or what is a, a, a analytic query and what is an operational query. We've put some smarts in the compiler optimizer to sort of figure that out. And then we'll start simple and then we'll add more complexity as we go. You know, we don't use machine learning today for this kind of stuff, but maybe we will eventually as well. I think the friendship people talk about that I think that's all it to the use of a method figure out whether you're shorter, shorter, shorter, long way. Okay. Very exciting. Hi, anybody else. Hi, Ryan from data dog. How does the unistore storage engine change the way customers pay for snowflake. It seems fairly different than the pricing model that you're supporting today. And obviously the cost structure could be vastly different as well. Yeah, that's a very good question. Ryan. Good to see you again, by the way. We're not talking about the pricing and a whole lot of detail. We've talked to a few different customers about it yet, but it's still very much internal in the works. Of course, there will be so either today we, you know, customers pay for compute and snowflake and they play for storage, which is on S3 those are the two key dimensions that they pay for. If you look at other sort of vendors who cloud vendors who do operational workloads, then they've got additional dimensions to what snowflake has for data warehousing. Right. So, if you look at, you know, AWS operational databases, then they charge for IO. And so we will add similar dimensions to for hybrid tables. Thanks. Congrats on that. Thank you. Okay, awesome. She's with me and Tyler. Thank you so much for spending the afternoon with us. This is, this is, you guys pulled it off you finished ahead of schedule in three talks so congrats is a new record for us.