 Hi, my name is Deep and my colleague Alex and me will be talking about Zstore, which is a columnar store that we've built for Postgres. We both work at VMware and on the Green Plum database server team. So let's get acquainted with the agenda for today. We'll be setting goals and expectations for Zstore and column stores in general to begin with. Then we'll talk about Zstore's design and we'll compare and contrast with the design of heap, which is Postgres's row store. We'll talk about how Zstore is benchmark, show you some performance numbers, and then we'll tell you what the open areas of work we have for Zstore, and we'll show you how to get involved with the project. So let's lay down some expectations for a column store in general. Whenever you think about a column store in database land, you usually refer to laying out the data for a column physically on disk together. So you expect that there will be a high degree of special locality for data belonging to the same column because of the layout. What this means is that that kind of data is much more compressible than the data if you store it row wise. So whenever you think column store, you typically always think about compressibility. Column stores in general handle OLAP style workloads, meaning you're projecting a few columns and selecting a lot of data out, much better than the OLTP style workload where you're just probably doing updates and deletes or you're querying on the entire row, or you just want one row out of the database and there are many transactions. That is a case that column store doesn't really handle well, but for the other OLAP style, it should do well. And because fetching a single or few of the columns takes much less number of disk pages to be brought out of a disk, that means that index builds will also be faster and your selects would also be faster. So those are some of the generic expectations. More specific ones for Zstore would be every Postgres user can use it. This is something that has been discussed in the Postgres community for a long time, that Postgres doesn't have a column store and we need one, and it's demanded by enterprise customers. We want Zstore to be optimized for OLAP workloads. At the same time, we want it to be performed reasonably well for OLTP style workloads and maintain feature parity with the heap storage layer that you get out of the box with Postgres. This is motivated from what we've seen with Green Plum customers. Green Plum has its own column oriented storage and customers were initially satisfied with no updates or no deletes on the columnar tables, but soon they wanted updates and deletes and modifications. So that's something we keep in mind. We want full MVCC support for visibility and transaction isolation, as well as crash safety. We also want to be able to automatically absorb improvements to Postgres, any part of Postgres that doesn't involve Zstore. And the way we get to that is with the fact that Zstore is an implementation of the Postgres stable access method API, which allows you to plug your own storage layer essentially. Now, keeping it a strict implementation means that we can benefit from other improvements to Postgres without having to change Zstore's code. At the same time, we want to be able to contribute to the table AM API, make it better while we're developing Zstore. That's a nice side goal. Let's talk about Zstore's design. So overall, we'll be discussing where the table access method API sits in the Postgres layers. We'll be talking about the unique B3-based architecture for Zstore, what the file layout looks like, how we do compression, and then finally finish things off with a single row insert example and a single row select example. So there will be some jargon that we might be using very often, a tuple is a row, an attribute is a column, a block in Postgres is a page, in Postgres is 8kbunit4, that it's 8k in Linux, then Linux kernel will be 4k, but here it's 8k by default. You can configure it, but it's usually still 8k. Then visibility and MVCC basically dictates whether a row is visible or not to another transaction. And we store this kind of information in undo pages in Zstore. Finally, a datum refers to a single value of a column, and we'd be referring to that quite often. So let's talk about Postgres' layers and where the table access method fits in to Postgres' backend. So you can see that it lies between the query planner, query executor, which handles DML and select queries, and the table commands interface, which handles DDL, and the buffer manager, which handles the shared buffers, which basically buffers in main memory on disk pages that pertain to the relation. So we see that the access method API and manager lies just in between, and these are some of the implementations. heap is the default row store for Postgres, this is Zstore, this is us, and we both heap and Zstore go through the buffer manager. This means we can utilize all of the code responsible for Postgres' buffer management, like buffer misses, hits, things like that, eviction policies and whatnot, a lot of code there that we can leverage. Buffer manager delegates to the storage manager, pulls stuff out of the disk and page cache, and all of that we can leverage. If you want to implement your own access method, you can opt not to go through the buffer manager. If this TOAAM could be an in-memory AM, for example, that's completely up to the implementer, but we opt to use the buffer manager, because this has a lot of benefits for observability and things like that, so which means Zstore tables look and feel the same as heap tables to a database administrator or user. These are the methods that are involved with implementing a table AM API in Postgres. The only requirement you can implement any subset of these methods, so if your table AM doesn't want to support deletes and updates, don't implement these methods. All of these methods do depend on one little thing, which is that every tuple should have a tuple identifier, and we'll be going into detail about the tuple identifier very soon for Zstore, but this is the only big strict requirement, really. So this is a link to a TOAAM access method, which is basically a stopped version of the entire access method API, and it's a great starting point to see what an access method really looks like. With that, I'll let Alex cover Zstore's internal design. Hi, I'm Alex, and right I'm going to talk about the internal designs of Zstore. So this diagram is a diagram, or the kind of diagram that you would probably see whenever someone talks about real store and calling store, so I put one here as well because of peer pressure. But keep in mind that this is an abstract view and the actual implementations of a real store or a calling store could look different from this. And now let me give you the Zstore implementation. So you can think of a Zstore table or Zstore relation as a forest of bee trees, where each tree represents a column or maybe a column family that consists of multiple columns. And within each tree, each node represents a 8k block that we just talked about for Postgres. So in this diagram, attribute tree at number one stores all the data from column one attribute tree at number two stores all the data for column two, and so on. So all of the bee trees are indexed by a 48-bit integer called Zstore tuple ID or simply TID. As its name indicates, it uniquely identifies a tuple. A tuple's TID never changes and a tuple can be moved around within a block or even between blocks. You may notice that there is a special tree called TID tree. Instead of storing the actual user data, it stores the visibility information of all the tuples. Visibility information means something like transaction IDs of a tuple when it was inserted or deleted, and also the operations can be done to undo when a transaction is committed or a port. So I wanted to talk about the benefit of this design. First of all, the tree is a fast and versatile data structure. In the Postgres world and other databases, we have already used it widely for indexes. So we think it would be really cool to leverage the same data structure as a column store. It is nice. For example, given a tuple ID, you can descend the tree and quickly locate the block that stores that tuple. Also, by having a logical tuple ID, we can move tuples around easily when we insert new data, or when we do compression or split pages. Whereas for the traditional heap table, the TID is tied to a physical location of the tuple. Another feature of that store is that we designed decisions we've made is to leverage the 8K block. That way it's really easy for us to manage, but it requires the compression algorithm to be able to compress that into a certain size. For our case, right now we support our Z4 algorithm, and it works pretty well for us so far. And lastly, by having a separate TID tree, it separates the metadata for visibility information from the actual user data, which is also nice. So bear with me. I'm going to do a quick refresher for Btrace since we're talking about Btrace. So let's take this example of creating a table full using Zstore with two columns, i and j, and we insert 30K rows. And with this nice function, we can see all the blocks that belongs to attribute number j. So as you can see, it has four blocks. Block nine is the root block. It covers the low key of TID 1 to the... It is inclusive until the exclusive high key of the maximum TID. And it stores the downlinks to its children, block six, eight, and 12. Each stores the actual data of the TIDs or tuples it covers. And each of the leaves block has a next pointer to the next blocks. Now let's look at all the blocks we have within table full's relation file. The left side demonstrates the relations between each of the blocks. And the right side sorted all of the blocks by the block numbers. As you can see, the majority of the blocks are B-tree blocks, either for the attribute tree or for the TID tree. And there is a special block or page called metapage. It's always block number zero. It stores the pointer to the root blocks of each attribute tree. And from the TID tree, it stores pointers to an undo page. Undo page stores the actual visibility information. And then if you have oversized data, then you may have a pointer stored in the attribute leaf page that points to a toast page, where the payload of the oversized data are stored. We'll talk about that in a couple of slides. It's not important at the moment. One important thing I wanted to note here is that the block numbers are the numbers the buffer manager gives us. So two consecutive block numbers means the two blocks comes from two consecutive requests from the buffer manager through the storage manager to the operating system. It is not guaranteed that two consecutive blocks are actually physically consecutive to each other on the file system, but that's the best we can hope. Now let's zoom in and look at what we store on A page. So this is the generic, the layout of a generic Postgres block or page. It applies to, it is designed originally for heap table. And it is also applied to other access methods or indexes, for example, like gene index or just index, et cetera. As we can see, in the beginning of the page, we store a header. And that pasted the struct of the header and also the link for the code there and documentations. You can check them out if you're interested. But the interesting things for us are PD lower, PD upper, PD special. PD line pointer is important for heap table, but not for that store. We'll also talk about that. So let's think of this as a heap table page for now. Between the page header and PD lower, we store item pointers to the actual items. And then between PD upper and PD special, we store the actual tuple items. So as we insert new data into the page, PD lower will grow downwards and PD upper will grow upwards. And finally, between PD special and the end of the page is a special area for access methods. You can store whatever you want for access methods there. Now moving on, let's look at the layout for a TID leaf page. As you can see, the layout is actually exactly the same as the heap pages layout, except for the item restore. Instead of storing tuples, we store a thing called ZSTID array item. I'm not going to dive into the details of this item or the structure of this item. But the gist is that restoring them logically is just a set of TIDs and their corresponding undo pointers. The undo pointer could be either an actual pointer to an undo page. It also could be two special pointers. It could be either for a pointer that marks this tuple as visible to all the transactions or it marks this tuple as invisible to all the transactions. The physical content of the item are more compact than the logical content. As you can see, we do data encoding for the TIDs. And we also have done some hack to make the undo pointers more compact. Now moving on to a actual leaf page layout. The actual leaf page looks slightly different from the heap page or the TID leaf page. So between the header and PD lower and also between the upper and special, instead of storing item pointers or items, we store the structure called ZSIStream. It could be either in compressed format as the upper stream or in uncompressed format in the lower stream. So for now, let's just focus on the lower stream and see what are the chunks we store there. So within each chunk, the structure is very similar to what we saw for the TID array items, where we store a set of TIDs. But instead of undo pointers, we store the actual calling data there within each chunk. And again, the physical content is more compact because we've done some special encoding there. And I want to briefly talk about toast. Toast in the PostgreSQL world stands for the oversize attribute storage technique. It's also known as the best thing since sliced bread. It's actually written in the PostgreSQL documentation. I'm not lying. Anyways, whenever you are going to write an oversize datum, a.k.a. Winger datum, is greater than 8k. Then the first thing we try is to compress this datum using the PostgreSQL default compression algorithm called pglz. And after doing compression on that, if the compressed datum can fit into the ad stream or the page, then we just write that directly into the chunk. Otherwise, we would need to allocate a toast page or a set of toast page. And then we write oversize datum there and then store the pointer in the chunk to the first toast page of this datum. Now, you may wonder why do we need to store two streams on a page? Here is why. Whenever we insert new datum, we first append rather chunks. We first append the chunks to the lower ad stream in the uncompressed format. And as we keep doing insertion, the lower stream will consume more of the free space until it consumes all of the free space. And when that happens, we would uncompressed the upper stream and then merge that with the lower stream and then compress them together as one single upstream again. And hopefully, after that, we would have more free space left on the same page. And as we keep doing this, at some point, this page is going to be full of upper stream with datum in compressed format. And when that happens, we would need to allocate a new page. Now, we have talked about the overall design of that store and the page layout. Now, let's go through an example of inserting into a single row into table full. Let's say, insert into full with values hello, 32. Now, with table full, for each of the backends, we have a, we maintain a hash table called tuple buffers. And the hash table is hashed by, the hash key is the relation id. So giving table full, we could locate the entry for table full. And within each entry, we store TIDs that are maybe pre-reserved for this table and most importantly, the add buffers of the table. So let's say we do this insertion, and let's say we do not have pre-reserved TID for table full. Then the first thing we need to do is to descend the TID tree and find out what is the first unused TID. And then create some endo record for this transaction or for this table and then return the new TID, in this case TID 5 back. And then we can use that to write to the add buffers. You may wonder what exactly we store in add buffer. The add buffer itself also has two layers of buffering. The first layer is just a summer erase with the size of 60. The arrays are the TIDs and the diatoms and it's now information. So once we have filled up the arrays with the new diatoms, we want to write them to the add stream buffers, which is the second layer of buffering that add buffers provides. The add stream buffer is actually the in-memory representation of the DS add stream we just saw. So it stores the chunks. So whenever we have filled up the arrays, we write out the raw diatom into the add stream buffer so they're encoded into chunks. And whenever we fill up the chunks, whenever we fill up the add stream buffers, we would need to flush that into the shared buffers of the attribute trees. So for insertion, we always only flush the buffers when the buffer is large enough. For example, when we wish one megabytes, but for some other operations like doing a new selector insertion, we might not wait until that happens. We might flush the buffers sooner than we fill up the add stream buffer. Anyways, this is how we do single insertion. Now let's move on to sequential scan with column projection. So let's say we do select INJ from table full. So whenever it comes to sequential scan with column projection, it's really important for the storage layer to understand what columns it needs to scan. Otherwise, it has to scan all the columns, then there's no difference or no benefit. You can get out of the column store. So this is the API that we added for table AM API. So to let the executor pass down the column informations that the storage layer may need. Anyways, once we have started sequential scan, the next thing we need to do is to do getNextLot. It's lot is just a container for a tuple. Don't be confused by the name. Just think of it as a tuple. So when we do getNextLot, first thing still, we descend the TID tree and get the TID out of the TID tree. Got the next TID out of the TID tree in this case 5 again because I made up the example and I'm not creative. Anyways, with the TID 5, we call this attributeFash function and we make this call to each of the attributes that it needs to scan. And then fetch the column data for those attributes. Eventually, we will stitch all the values we got together into a whole tuple. So this is how select is done. Now I'm going to hand this over to Deepavit for performance. All right. Let's talk about some performance numbers for Zstore. So in this slide, we look at what the on-disk footprint for Zstore looks like. So on the very left is our baseline, which is heap. So for a heap, these tables take so and so many bytes. If you look at Zstore, it takes about half of the size, which is a pretty good result. However, this is for when the table is loaded with one copy session. Copy is the command that is available to load data from an external source such as a CSV file into Postgres. But what happens if you run 16 parallel copies into the same table at the same time, we get a lot of bloat. As you can see, the size has gone up and is now almost equal to heap. This is pretty undesirable, this column. So the reason why this bloat happens is because we are doing out-of-order inserts into the attribute tree because of concurrency. And that leads to very inefficient page splits. And you can get more information as to what these splits look like. But long story short, it's like the pages are underutilized as a result of the splits. So a way to mitigate that is to reserve tids in batches in the tid reservation for copy commands. And we do that in batches of n, which is a configurable number. And as you can see, with an increase in n, we see the size requirement go down drastically for these tables. And we beat heap here with n equals 1,000. A side effect of this is that the copy runtime goes down from 100 minutes, with n is 1, all the way down to 7 minutes as you increase n. And the reason is we reduce contention on the tit tree. And if you reduce contention on the tit tree, it's always a win. In zet store, everything goes through the tit tree. So that is what is happening here as that's a neat result. So let's talk about some of the monitoring tools that we use often. One is timing on. It's a pcql command. A pcql is a command line client for Postgres. If you run any query with timing on, it'll give you the full round trip time of the query, which will, unfortunately, sometimes include the execution time and the time taken to send the tuples over the client connection and also the processing time of the client. So if you want to avoid these two factors and purely look at execution time, you can turn to Explain Analyze. Now, Explain Analyze is going to give you a very detailed output. It'll give you the plan. It'll give you how much time was spent doing input output and things like that. It will only take into account execution. Yeah. So it will only take into account execution time and not the time taken to send tuples over or the client processing time. There's some timing overhead that can be associated with Explain Analyze. So that's a gotcha. On some systems like FreeBSD on a Google Cloud platform, VM, you would notice that Explain Analyze takes a lot longer because of the get time of days is called taking too long. So turn to PG test timing. That's a command line utility available with your Postgres installation to test the impact of the overhead from Explain Analyze. We use Explain Analyze quite heavily as we will see. So single column projection is kind of the best case for a column store. And we are going to scale test that. We are going to primarily look at on-disk performance. So we're going to take one of the TP-CDS, which is a very popular analytics benchmark. One of the tables, pretty white table, as you can see. We'll be using LZ4 for compression to get the best results, both in terms of compression ratios and execution time. We'll be operating on about 100 gigabytes of raw CSV data. We'll be using table spaces to place these tables on SSDs and rotational hard disks to compare the impact of these disks on the on-disk performance. We'll be loading data serial and concurrently, as you could see with the last two slides. As you could see with the slide, a couple of slides back. Concurrent copy does have an impact. We'll be using the track IO timing. These are some of the configuration parameters that we use optimized for data loading. Track IO timing, along with explain, analyze, buffers timing and verbose will give you the results that you're looking for in this case. We ensure that before every run of an experiment, we have nothing in the database shared buffers, and we have nothing in the operating system page cache to begin with, to truly test out on-disk performance. So this is what sample, explain, analyze output looks like. So it will give you the total execution time of the query without showing the query results, without spamming your screen like crazy if you're selecting a lot of stuff. For example, it will also tell you how many buffers were read. Multiply this number by 8K, and you'll get the number of bytes read, and the total input-output time taken to read these bytes. So this is an example. We use this pretty often. And let's take a look at some numbers. So for a table loaded with 16 parallel copies, we're seeing that the heap table size is around 112 gigabytes, whereas ZStro is taking up around 59 gigs. You can see there's a vast difference in the number of bytes read and the number of buffers read, because in ZStro, we have to select a lot less number of columns. And that is kind of reflected in the input-output time. We are beating heap here quite well, but not as well as kind of what we expect, because if you look at the ratio between bytes read, that kind of 112 gigs over 1.4 is not really reflected in 59 over 20. So we would want this number to be a little less. So that's something we want to strive towards. So these results were for SSD. Now let's talk about the results for HDD. On the HDD, we are not seeing as good performance as we are seeing on the SSD. In fact, we're being beaten by heap here. If you run iostap, and you look at the average request read request size, you'll see that for heap, it's around 336, whereas for ZStro, it's around 8. This high number usually typically represents a sequential workload. This one represents a low number, such as this represents a random workload. Now, where is this randomness coming from? Well, if you're having multiple copy sessions writing into the same table and writing into, they're probably writing into the same attribute tree. So you can see that these numbers are apart. They're not consecutive. Consecutive numbers imply they are close to each other on disk. And the gaps between these numbers can only go up if you have more concurrent loading happening. So the attribute trees leaves are scanned in order when you're doing a scan on a column, as Alex was mentioning. So these need to be as sequential. So you kind of want a picture like this for attribute number tree number two here. HDD, a hard disk, definitely suffers more from random reads than an NVM ESSD does. So this is something that we have to work around and get. And we'll show you how we fix this. The way to fix it is for every attribute tree, we maintain a free page map, and we pre-reserve a list of contiguous blocks. The number of contiguous blocks that we reserve is dictated by this rel option that you can specify when you use the create table statement. And this basically, we term this n for the remainder of this presentation. With an increase in n, you will see that the run time, the input output time doesn't change that much for SSD, because it's not that important for an SSD. They do random or semi-random reads pretty well. But you'll get the drastic difference in a hard disk. In the hard disk numbers, you can see we've gone down from 133 seconds input-output time all the way down to 2.7 seconds. Input-output time as a refresher is the amount of time that was taken to populate the disk pages on bring them into shared buffers. So this is a win-win for us. If you run closer inspection, if you run in IOSTAP, you'll see that average read request size improved from 8.01 to 290.90 with our fix. And the figure that we used was 496 for n or the rel extension factor. So that's a pretty good result right there. If you want to look at it deeper, this is a frequency distribution of the read sizes. And you can see that for heap, most of the reads was the average read request size to be 336. Most of the reads are around 128 to 512K, pretty high. If you look at Zstore, all of the reads are the average request size of eight. All of the reads are actually clustered in this bracket. You can see. So that means that the average request size isn't that great. And that's something we alleviate if we increase n. Increasing n to 128 kind of alleviates a situation with more reads of higher length, as you can see. And you can see reads of even higher length when you ramp up n all the way to 4096. You can ramp up n as much as you want. Typically you want to go as high as possible, depending on your workload, depending on the amount of data you're copying, depending on the number of concurrent sessions doing the loading. If you don't have any concurrent sessions doing the loading, you don't need to increase n. A discussion is there. There's a link here to the thread that discusses this. So please take a look. This is what happens when these are some of the results when you're doing a warm run without clearing cache or without clearing the shared buffers. You can see that SetStore outperforms heap just by virtue of the fact that it's lacks a lot less data. It needs a lot less data to be in memory. If you remember that's only 1.4 gigabytes. So that easily fits into the shared buffers and gets good performance a second time, third time, fourth time you run your query, which is great. And this is SelectStar, which is the worst case for a column store. You can see that most more or less, we do well with an increase in n here as well from 447 seconds to 236 seconds. Finally, this is the storage buff test suite. This kind of gives you a holistic view of the many elements such as select, copy, vacuum, delete, update, toasting, inline compression and things like that. It gives you a nice picture of SetStore's performance. All of these numbers are on very small tables, not very large, but significant enough to keep in mind while we develop. So this is kind of, it will give you a kind of overview of what SetStore is like. All of these numbers are to be taken with a grain of salt because changing some of the sizes of the tables, you will get different results. And with that, I'll turn it over to Alex to discuss open areas of work for SetStore. All right, let's move on to the open areas of work for performance. Actually, all of the work are for performance, otherwise we'll just use key table. Before I get started, I wanted to point out that all of these links are linked to a discussion on the Postgres hackers mailing list. So check them out if you're interested. Because we have a long list here, I'm just gonna pick a couple of those to talk about. First, let's see eliminate toast flow. Right now, sometimes toast table doesn't have a good compression ratio in our tests. That's because when you have an oversized item, it may take up two toast pages and the second page sometimes is almost empty. So that leads to bad compression ratio and we want to improve that. And then we want to avoid full table rewrites for alter table column operations, such as add or drop column and set data type because for that store, we don't need to rewrite the entire table for those operations. And today the table access method API doesn't provide interface for that. And then we want to make update and release faster. This is mostly for implementing in place update and we use TIDs. Because when you just want to update one attribute, we want to just patch that one attribute instead of deleting and inserting a new role for every attribute tree. We want to have faster extreme decoding because right now, each tree data structure helps us to locate a tuple to a block quickly. However, once you reach to that block, we also want to have fast access to the tuple within the tongues. Moving on, we want to support calling family and even road floor. We've talked about that before. We have the infrastructure for it. I want to talk about replace undo with upstream undo framework. So Postgres uses redo for crash recovery and the undo framework aims to provide visibility information AKA MVCC as a term we usually use a stance for multi-version currency control. Every access method would need that. Today, heat table stores the same information together with the data file. But as we are adding more new access methods, we want to extract that, we want to extract that out from the table access method and have a common framework that provide the same support to all the access methods. So we definitely are going to leverage the undo framework once it's ready. And then we want to explore more about compression algorithms. Right now, we're using LD4. We wanted to do more exploration there and we wanted to make the prior planner aware of the table access methods. For example, we want the AM to provide more statistics that's specific to the AM and make the planner make more wise decision based on those statistics. And here's the tools you can play with that store. There's a bit of repo and some inspect functions for a developer are interested in the internals. You may want to check it out. And we have an Ansible playbook to do the benchmarking stuff. And we have a more lightweight storage per suite for development. And this is how to run regression tests for that store. And we have this discussion throughout going on in the Postgres hackers list. We have a Slack channel in the open source Gwynplum Slack instance. We have Twitter accounts and here's the blog we have. So finally, a big thank you to everyone who has involved in the thread and the development of that store. And we're ready for Q&A come talk to us. Okay, we're ready for questions. I'm not yet seeing any questions from the chat or the Slack channel. Our videos are not showing up, but we are all here for questions, Steve and I. We're also going to be, we're gonna wait here for five minutes. And if you have questions afterwards, we are also there in the Slack channel as well. Here is a Slack channel for open source databases. It's called two track open source databases. But please paste your questions in, I don't know what this is called. Paste your questions while we're here. I can see we have 14 attendees here. Okay, I have, I see a question from Josh. How are you handling cost estimation? Right now, we haven't done much yet. We have proposed a few patches throughout the afternoon for calling specific cost estimations. I have also pasted that thread at the end of the slide. Basically, we need more calling specific. Yes, so to the question from Josh, so we, so right now, Postgres planner isn't aware of the column in nature. So we have, as Alex was mentioning, we have a few batches with them right now. So to collect that information and to use that information, specifically like when you do a scan, you're selecting, you're scanning a subset of the pages. So that information is to be made available to planner to really know how many pages are actually being going to be scanned. So that's kind of on our radar. So for a question from Robert, what index types we treat as trend, do you see being useful with that store? Right now, we do support these index types. I think there's a bit of a, well, bit of a performance regression with the trend. I don't believe we have tried just indexes, to be honest, but we do expect the index types to be fully supported. We definitely do. And I see a question from, hold on. I see a question from Cassia. Interested in the workflows for the solution. Can you elaborate this question? I'm not entirely understand what does workflows mean or solution for here? Maybe Cassia is talking about whether this will be good for OLAP or NTP. We do want it to be good for OLAP workloads, for sort of analytic workloads, specifically. And where specifically you have queries that are operating on top set of columns, specifically if you have a wide table, that's great. That's a other high use case for a column store. Other such, workflows where we would want it to be useful is some OLTP workloads as well. We do definitely want to support insert, substitute, delete, and kind of support whatever it does in terms of features. That's what we're striving for. So Robert asked a question. It seems like UUIDB schema might need to also float because it can't be backed in order. So, the table that we use is an N64, but we use like 48 bits of that. So we are not, I'm not sure what we mean by UUID here, but it is like integers and most often the integers are consecutive. So it's very compressed, like each TID item in the TID tree leaves or even the attribute leaves, we can back the TIDs together. And it's very pretty compressible. We use Delta encoding. It was most of the time they were consecutive. So we'll continue the discussion on Slack. I'll, we'll echo the questions that were remaining here and continue on Slack. Thank you very much. Thank you.