 Hey, I'm Özgün. I'm one of the founders at Cytus Data. Prior to Cytus, I was a software developer in the distributed systems engineering team at Amazon.com. Today I'm going to talk about C-Store, a columnar store extension for PostgreSQL. Within this talk, I have about 30 slides and a live demo. The slides are fairly technical, so if you have any questions, feel free to interrupt. Before I dive into the talk, I'll spend just two slides to talk about what CytusDB is to put things into context. And then I'm going to switch back to the C-Store FTW part of the talk. As a quick question, prior to this conference, how many of you have heard of CytusDB? Okay, that's a fairly large number. And in one sentence, CytusDB scales out PostgreSQL for real-time workloads. CytusDB shards and replicates your data, and when you send a query to CytusDB, it automatically paralyzes the query across the machines in the cluster. And CytusDB isn't a fork of PostgreSQL. So we didn't take a particular PostgreSQL version, say 8.0, forked it from there and never looked back. In fact, before writing CytusDB, you looked into the integration points within PostgreSQL and the APIs that PostgreSQL provides. And you can almost think of CytusDB as a distributed query planning and a distributed query execution engine that uses the planner and executor hooks within PostgreSQL. For the end user, what that means is you get all the newest PostgreSQL features. You get native support for all the data types, contrib packages, and extensions. In fact, our customers use the H-Store, Hyperlobloc, and other extensions that hold out in CytusDB. So as a user, you get the exact same PostgreSQL behavior. A picture is worth a thousand words, and this is the second slide covering the higher-level architecture. I have an example table in here, and then that example table is sharded into multiple shards. Each box in this diagram represents a shard, and then we have worker nodes. We have a worker node in here. I'll call this worker node number three. And you can think of each worker node as an extended PostgreSQL. We have a few user-defined functions defined here to facilitate the creation of shards, a few user-defined functions to move the shards in the cluster, and a few user-defined functions to repartition the shards on a different dimension for large table joints. In a sense, you get PostgreSQL with a bunch of user-defined functions on the worker nodes. And the abstraction we're following here is that we have a shard, and then that one shard maps to one PostgreSQL table. So if you were to log in to this instance, and you did a backslash d with psql, and if you did a backslash d with psql, you'd see bunch of PostgreSQL tables, each representing a distinct shard in here that has part of the table, that we declared in this place. And we also have a master node. The master node is also an extended PostgreSQL. When you send the query to the master node in here, the distributed query planner in the master node takes that query, transforms that query into a competitive computation, and breaks up the pieces in there into many smaller computations, pushes them into the worker nodes in here, does the computations, gets the intermediate results, merges them, and gives them back to the user. And as a quick note in here, the master node, we don't keep any data on the master node. All we have is the shard and shard placement metadata. So at a higher level, there is no data stored on the master node. At a higher level, does this architecture make sense? Those were my slides on Citus and the talk. We have five sections in this talk. The first is the motivation section. Why do customers want a columnar store? What's the motivation for a columnar store? After motivating the talk, we're going to start with a live demo to see C Store FTW in action. Then we're going to talk about the file storage layout that we used for the columnar store. It's an open source file storage format, and we're going to talk about the details. Next, how did we integrate the org file storage format into PostgreSQL? What were the decisions that we made in integrating it? And why did we make those decisions? Finally, I'll present some batch mark numbers and conclude. And on to the motivation section. This is one of our customers coming to us, and this is their data. They basically have a table of 30 million rows and 700 columns. So it's a fairly wide table. Fairly. And on this data, they're running simple SQL queries. So there is one or two aggregate functions in here, a few filtering operations, and some group buys. This and queries like it are the typical queries that they're running on top of that previous data. And they're using a row-oriented store. You can think of the dots in here, each dot as 100 or 200 columns. So this is 100 columns, this is 100 columns in here. And then they say they run the previous query that we've seen. For the row-oriented store to execute this query, because there is a row after another row after another row, it needs to go over each row, and more importantly each column in here, including all the hundreds of columns that aren't in the query itself. And then the row-oriented store then gives that data back to the executor, and the executor takes that and does what it does. Filter operations, aggregations, and group buys. Obviously what's the problem here? The cost obviously here is, in the example with 700 columns, we ended up reading all of those 700 columns, but we only needed five of them. If you go back to the SQL query, if you imagine that SQL query, we were only using five of these 700 columns in the query. And their data set size was 40 gigs, not small, not large, and we ended up reading more than 39 gigs of data that we didn't have to read. And if you're using rotational disks, that's a lot of disk bandwidth. Even if you're using SSDs, which tend to give much better disk throughput, you're still spending 60 seconds during the sequential scan reading all of these columns that you didn't need to read. And even if you're going to memory, if you're reading the data from memory and you have a very, very high IO bandwidth, you're spending four seconds just bringing the columns from memory into the CPU itself. So that's a pretty expensive operation and you're wasting lots of IO bandwidth here. Let's go back to the same SQL query and look at the columns that we needed to read. I marked them in blue here, so we have the ID column, we have the price column for the aggregate functions, we have two columns for the filter operations and the group by weight. Now let's look at the column-oriented store. In this example, we can picture this ID column as being sequentially laid out on disk. So you have the ID column that's sequentially laid out on disk and the column next to it, the size column is either in a separate file or it either lives at the end as a separate chunk. And now when we run the previous query over this column-oriented store, we first read the ID column. You don't need to read anything from size because these are laid sequentially on disk. We just jump over to price and read the price column. We again jump over 200 columns in here and read the quantity column and so forth. So basically what we're doing in here is when you think of each column as being laid out sequentially on disk, we're skipping hundreds of columns that we don't need to read. And that's more or less the basic motivation for a column-oriented store. You're reading a subset of the columns to reduce the IO bandwidth. This data could be in memory or on disk, but it's most helpful when you're reading it from disk, obviously. And because we're grouping columns of the same type together, say we had an integer column, we're putting together integers together, the data also tends to compress really well. If you have text, textR group together, integers group together, and they give really good compression ratios. And within C Store FTW, we have PGLZ as the default compression algorithm and the ability to add other compression algorithms as well. And when you're writing to disk, you take a group of values, you compress them together, so you write less data to disk, because the data is compressed, they occupy less space on disk, and when you're reading back those compressed values back from disk, you read less as well. And that's more or less the basic motivation for column-oriented stores. Now that you're super motivated, let's see a live demo. Here's the live demo on our part, and I will go back to the technical part. Any other slides on this lab? Any other questions after this point? Did the motivation make sense? You're right. Yeah, you're right. I think it was price and weight, and then it didn't fit into the slides. Okay, so my name is Jason Peterson. I also work at Citus, and I also once worked at Amazon on Distributed Systems and the CloudWatch department of AWS. So I'm going to demonstrate a quick install of C Store FTW and some queries using the TCPH workload benchmark. And then I'm going to hop over to EC2 and show you guys two clusters of Citus machines, just two little worker nodes, so not very big. But on each, I have the TPCH dataset loaded with a scale factor of 100. That's roughly 240 million rows in one table that's often joined with another cable that has 60 million rows in it. And in one case, I've loaded it using just plain postgres, and the other case, I've loaded it using C Store FTW, and those two tables I've specified using columnar stores and the compression that Iskin talked about. So let's get started. I'm going to go ahead and create a brand new 9.4 database for this demo. And I'm going to switch to it and start it up. So shell script I have. All right. And it's just an extension, like Iskin said. So you can install it using pgxin. And that's a pretty lightweight way to do that, so I'm just going to do that right now. It'll pull it down. Citus version 1.2 was found. Somewhere up here behind the compiler output. There you go. And it's installed. One thing that we need to do before we get started is edit the postgres conf to add this to the shared libraries. The reason for that is that we use some of the hooks in order to get the copy command so we can actually copy things into the columnar store in an optimized way. So we need to make sure that we have the library loaded whenever the query session starts. So we'll pop into postgres.conf. We will find shared preload libraries and we will add cstore ftle. I believe that might work, but the risk there is that you forget to do that one time and then you're going to have some weird kind of unsupported errors pop up. I'm not entirely sure what would happen, but the copy command would probably just tell you that you can't copy to foreign tables or something like that. Or maybe you would do it in an unoptimized fashion. But by getting the hooks in there ahead of time, it doesn't really have too much of an effect on performance to put it in there. It's just kind of best practice. So I'm going to restart my machine and let's get started. Let me get my schema loaded. So the schema here is the TPCH schema which looks like this. A bunch of pretty simple tables and at the very bottom we have the two big tables, orders and line items. And you can see here we specify a server which is an artifact of the way the foreign table wrappers work in postgres. It doesn't really have too many options for C store in particular because we automatically manage where the files are created which means that you can create a table and not even say where you want the backing chunks of the data to live. And then I also specified the PGLZ compression library to be used on the data. That's particularly important because you can imagine if you have a big run of integers especially if they're like a key, you may have huge contiguous runs of the same number. But if it's like a string within a certain data set, they'll be very similar. They're kind of compressive very well which as Asgen said can result in less disk space usage which also is another way of getting less disk I.O. So let me start up a PSQL session and I will load that thing. I need to create the extension. That was at the top of there, right? So let's do TPCH. OK. So you can see I've got all the tables and I'm going to go in here and load the data in a second. I'll do it in parallel to kind of speed it up. OK. So this should complete probably 20 seconds and as you can see it's copying in all the data and in the case of the CSTOR FDW stuff it's actually going to be compressing it as well. I believe we had eight tables, so one, two, three, four, six, seven. Huh? OK. And as you can see this is just the standard copy command standard copy output. Up there in the noise of XARGs you can see the command that's actually being run. The TPCH benchmark suite produces pipe delimited value files so you can just hand those over here. I guess we're on the last one. OK. Sorry, there we go. All right, so we got it loaded and I have some queries in here in the TQ folder and we'll pull this up real quick and demonstrate just running them. I have it set to echo them so you'll get to see the SQL. Let's run query six. Then I had it set to echo them. Query six. Am I doing echo all wrong? OK. It's on. OK. Let's look at the queries. I'm going to run this query. Like Asgen said it's often just, you know, they only care about one big table. They want to do some aggregations on it. Another query we may be looking at is query three which does a couple of joins here but again kind of analytic workload on a couple of key columns. So just having those in a contiguous fashion is great for workload. And then also query 12 which again the two big tables, orders and line item in my demo are 6 million and 1.5 million rows respectively and do some group buys and some order buys. So we'll, you know, drop into the psql and run those guys and you can see they complete fairly quickly. And I am going to pop back out in a second and show you guys the difference on disk size in terms of these tables and the way we're actually representing them. So let's look at how big they were to start with. So if we look at the line item file at 719 megabytes below that into psql is about a gigawatt data so it increases by a bit. Yep, so if we go over to my cstore demo data directory we will see that there is a cstore fdw file right and I think this is the relation ID of the table. Maybe the database. And then we see, yes, here are the relation IDs of the two tables. So one's line items, one is orders line items is the bigger one and we see that it's 243 out of a starting number of 719. Right, so we're saving about 500 megs. That's not just a disk space thing, right? That also affects your OS's page cache because it doesn't have to read, you know, it's not just pages basically. It prevents IO and it also does let you store more on your machine. So now I'm going to hop over to the big machines. I have one that's plain and one that is a cstore based. Just to show you guys the line items table it's got 240 million rows on the orders table. You'll remember query three and query six both did a join between orders and line items. Apparently some of these are plural and some aren't. Six million, right? So pretty big data in here. Let's get the TPCH queries up. And we will have a race between the Postgres based table and the cstore based table. I will get Postgres a head start. This guy we've got six seconds coming back still waiting over here. I think the number should be about 22 18. Okay, so surprise me this time. So you can see that the difference and the only difference between these two clusters it's a C3 4x large with two C3 2x large is behind it for those of you not in the Amazon speak we're looking at I think four physical cores going to the two back in boxes and about 15 gigs of RAM. And they also have a SSD backing their data. The only difference here is that those two tables line items and orders have been changed from being regular Postgres tables to using the cstore FTW wrapper. Two questions. Let's take the questions. Sure. If what's going to happen. So it's not actually in the file system cache because we're working with near to several hundred gigabytes of data. So the whole the big benefit of cstore FTW is about on file layout, right? I have run these before. I mean these have been running for the past hour but I ran them before the presentation. So, you know, they're already reading from the cache. They may be thrashing it. There was a second question. Same question. Yes. So the big thing is you want them contiguously laid on a disk because you know you can't fit your data set in disk. We have a hundred gigabytes of data here you know or more maybe I think 150 or two and so let's you do more. Yeah. This is open source on github I think is going to cover that shortly though. Any other questions about the demo in particular? Yeah, the schema that I showed at the beginning is the same as the one. So it's the same data set just not on my laptop. They're out in EC2. But yeah, they use the PGLZ compression on those two tables as well. Alright. And the cool thing about the demo is it's all Postgres. So you just create instead of creating a table, you say create foreign table and then everything else is the same. Live demo and then back to the technical details. So before we started working on the columnar store we wanted to get a picture of the landscape. What are other people doing about columnar stores? And how are they doing it? We looked into people working on columnar stores and we grouped them into three buckets. One bucket one was people who took an existing database who forked it and then wrote their own columnar store engine on top of it. So basically the database that they worked on became its own self-contained proprietary database solution. A second group was on the Hadoop side of things. So there's been a fair amount of work done on Hadoop specifically on the Hadoop distributed file system to come out with a columnar store. People were writing a lot of data into HDFS and then they were reading a lot of data from HDFS. And then they wanted to read and write less. So there was work done on the Hadoop side of things and it was open and published work. They were making progress on the file specifications and we thought that was an interesting trend. And on the PostgreSQL side with the foreign data wrapper machinery in place there were a few interesting approaches. For example you have an external columnar store database say such as Monadibi and then you'd write a foreign data wrapper that connects to that columnar store. So when a query comes into the foreign data wrapper it goes and reads the data from that columnar store itself and gives it back to the PostgreSQL executor. That was the third approach that we've seen and that people have done. And then what we did was we took the best of approach number two and approach number three and then we combined them together. So what we wanted to do was we wanted this to be easily usable within PostgreSQL itself and at the same time we didn't want to introduce any external dependencies on external databases. I'll start with approach number two which is the columnar store file layout. Rewinding back a few years the first work that came out of the Hadoop side and became popular was the RC file format. This is joint work between Facebook and Ohio State University. And if you're building a columnar store let's say you have 100 million rows or records. One way to lay that out, that data is you put 100 million values of column A, 100 million values of column B, 100 million values of column C, all sequentially laid out next to each other. What these guys have done, the Facebook and Ohio State guys have done is instead they put a million values of column A, a million values of column B, a million values of column C, and a million values of column A, sequentially laid out on disk. So they do first horizontal partitioning and then they lay out the columns through vertical partitioning. This work was published in 2011 on ICDE. In the paper they talk about their motivation for picking this approach and its performance characteristics. And once Facebook picked this up it was integrated into Hive and then it became quite popular. Then a few years later there was a second generation of the RC file called the OptimizeRC file or the OrcFile format. And the primary motivation for the OrcFile format was two-fold. One was the indexes, skip indexes to skip over unrelated columns. Second, unrelated values. Second, they wanted to enable different compression methods within the same file. So you can use Gzip for column A and Snappy for column B and if certain compression methods do better with certain data types you can actually mix and match them within the same table. And before I dive into the OrcFile specification here's a slide that repeats the benefits. One is the columnar layout. Two is compression. So you group columns of the same values like together by default 10,000 values and you compress them so you get the compression benefits. And another benefit is skip indexes or minmax indexes that filter out unrelated values. Here is the specification itself which is basically a copy-paste from the Orc website. We have the stripes on the left-hand side. In here we have three of them. And by default in CSTOR 150,000 records are grouped into a single stripe. At the beginning of your stripe you have your index data and these are the indexes. And say your query is going over column 2 and column 4 you look at this index data in here and then you jump into the offset for column 2 and the offset for column 4 and you don't need to read any of the other data in here. So if your query is only touching column 2 and column 4 you just skip over the rest of the columns by looking at these indexes. And these indexes are very lightweight. So they're tiny. In CSTOR tables the index data is almost always in memory. So because these guys are in memory you can skip over unrelated data without touching disk at all. So those are the indexes. They are again fairly small and then you have the actual data in here itself. Here. These are the records or tuples. You have 150,000 of them grouped together and all of the values in column 1 appear sequentially without on disk. All of the values in column 2 appear after column 1 and then the values in column 3 appear after column 2. There is a second level of indirection here. So this is which is basically each column with 150,000 values in it is grouped into blocks and by default each block has 10,000 columns. There are two reasons to have this block. One is that's the compression so we take 10,000 values and compress them together actually ORC does. So that's motivation 1 like the compression groups. The second motivation is the minmax indexes that I just talked about. So on each block values in here the ORC file captures the minimum and maximum values and then it gets saved into the indexes in here. That's the basic ORC data layout. Any questions? And there is something to be said about compression 2. Current compression method within the default compression method within cstore is pglz This is the algorithm PostgreSQL uses for its toasted values. One difference is in PostgreSQL you take one toasted value you apply compression on that value and then you write it to disk. Within cstore ftw we take 10,000 values by default that are of the same type and apply pglz on that compression benefits. One thing I like about cstore ftw is that it's pretty easy to add new compression methods. This is important because you might want to choose one compression algorithm if you're storing your data on an sst and choose another compression algorithm if you're writing this data on a rotational disk. In fact, you can even use different compression methods at the column block level. Obviously this flexibility is pretty cool that you can use different compression methods but the natural question is how well does the default compression algorithm perform when we group 10,000 values together? These numbers are from the TPCH benchmark that Jason that showed the TPCH benchmark tables have 10 to 20 columns each and in each column you have a mix and match of integer, double, text, time stamp so it's different data types. The biggest table in the benchmark is the line item table. That's your event or fax table and other table names are laid out on the X axis in here. I normalize the table sizes to 1 to show the compression ratios across different tables. These are here. And when you look at these graphs you see you get compression ratios of between 3 to 4x depending on your table. So the line item table was 9.1 gigabytes in PostgreSQL and then it shrank down to 2.4 gigabytes. So the default compression algorithm compressed nicely here. Okay the question is okay it compresses by 3 to 4x what does that get to the end user or the customer? And the answer is that it depends. If your data is in memory then your effective memory size increases. If you have 1 gigs of RAM on your machine you can now fit a working set of 3 to 4 gigs into the machine's memory. It's almost as if your machine had 3 to 4 gigs of RAM. If you're on SSDs you may be worried about your SSD costs that they're expensive. And then if you have 2 terabytes of disk space on an SSD compression can increase the effective disk space on your SSD to 6 to 8 terabytes. So you get more bang for the buck on an SSD. On rotational disks if you're using them chances are your bottlenecked on the disk IO. Your sequential disk IO is around 100 megabytes per second on a good commodity disk and with compression your effective sequential IO increases to 300 to 400 megabytes per second. This is because you're reading the data from disk in compressed format. A secondary benefit I will say for the rotational disks could be cost savings. If you have a single machine you probably don't care about the cost of a single disk. But if you have a cluster let's say that stores a petabyte of data with replication enabled and you have 300 machines to store that data. Then compressing by a factor of 3 to 4 X reduces the cluster size from 300 machines down to 100 machines. So you get a cost savings benefit for your big data deployments too. So this wraps up the second benefit compression. We had the columnar layout compression Now let's take a look at the skip indexes. For each column block of 10,000 values C store FTW saves the minimum and maximum values among those 10,000 values in an index. When the user runs the query we extract all filter closes from the query. In the previous SQL query that we looked at we had quantity greater than 100 that becomes filter close one. And then we had last stock date less than 2013 10.01 and that becomes filter close two. So we extract those two filter closes and we then basically apply PostgreSQL's constraint exclusion mechanism so we create two constraints for those two filters. We then evaluate the constraints against the minimum and maximum values that we captured. That way we don't like to write any code to support different data types. This is when we were writing this was really exciting to me because if you look at the work that the Hadoop guys did you're saving from writing a lot of code by just writing one function to create these constraints and use the logic with the PostgreSQL itself. Okay, the natural follow up question is when do these skip indexes help me? These indexes are useful when your data has an inherent order to it, say a time dimension. So you take data that has a time dimension you put it into C store. You have a query that has a filter closed on that time dimension. With these skip indexes you can actually find the data that you're looking for and skip over the unrelated values. Or you could sort the data on one dimension and you could load it that way. That's another alternative. To summarize the three benefits to the org format are the columnar layout compression and skip indexes. Are there any drawbacks? Looking at org as a PostgreSQL user, one important drawback is it only sports integer, double text, time stamp and a few other basic types. And as PostgreSQL users this felt very limiting to us. PostgreSQL has about 40 data types and we wanted to support all of them. We also wanted to support the data types in Contrib and extension packages and new data types that users could write. We therefore wanted to use PostgreSQL's data type representation instead of the one specified in the org file format. This helped us in two ways. One is with the minmax indexes we didn't have to write any extra code. So it came like free with PostgreSQL. Second is this way we don't need to pay the overhead of serializing and deserializing data into another file format. We can just read the data from this and give it to the PostgreSQL user. So we saved that serialization and deserialization overhead. A second drawback to org is that it's designed for the Hadoop distributed file system. And table joint operations in Hadoop and resource usage are an afterthought. The nice thing is PostgreSQL foreign data wrappers already have an API defined for statistics collection. When the user runs analyze on the table name that API collects random samples from the data. C-store goes ahead and collects those random samples and gives those random samples back to PostgreSQL. PostgreSQL goes ahead and constructs histograms most common value frequencies and other related statistics and then puts those statistics into pg-stat. C-store then uses those statistics to estimate query costs and then chooses the best query plan. For the end user what does that mean? What are the benefits? One means that C-store will respect configuration settings such as work map and make more informed resource usage. The second benefit is when you have table joins. Because C-store can do cost estimation, PostgreSQL will pick the best join order and the best join method. In summary, we overcome the second drawback to org format by using PostgreSQL and PostgreSQL sampling and cost estimation APIs. And benchmark results. How do you from our team gather these benchmark results? We also have these numbers available in our blog. We use the industry standard TPCH decision support benchmark. And to be honest this isn't the best benchmark for C-store because each table only has between 10 to 20 columns. And with our customers each table has between 100 to 700 columns who are using C-store. So the benchmark isn't tailored towards why tables don't play. But it's a standard benchmark so we went with it. And I'm going to present in memory and on disk numbers on 10 gigs of data in EC2. In these benchmarks we compared vanilla PostgreSQL columnar-store and columnar-store with compression. First I'll start with rotational disk numbers. In these benchmarks just as we run one SQL query after the other. So we're only using one CPU core per query. And I'll open a parenthesis here. The numbers that we have here are somewhat bottlenecked on disk IO. I'm saying somewhat because we're only using one CPU core. If we head head side to STB where we parallelize queries into multiple CPU cores then the disk IO bottleneck would become much more significant. In the case of side to STB the difference between the blue line and the orange lines are much higher. And in here the orange lines are bottlenecked on CPU and when we have more CPU cores available the orange lines become shorter. So I'll close the parenthesis on the side to STB side and I'll come back to the single node PostgreSQL numbers. Here we picked representative queries from the TPCH benchmark. We also dropped the caches on the machine to ensure that the data is coming from disk. The blue lines in here are vanilla Postgres. Red lines are columnar-store and orange lines are columnar-store compressed. Some time benefits end up being about 2-3x when you're using a single CPU core. Again if the data was coming from disk and if you're using multiple CPU cores the benefit would be notably higher. To put the amount of disk IO saved into context we also ran IO top on all of these queries. This way we could see how much disk IO was issued for each single query. In this graph the difference becomes much more dramatic. Query 6 for example issues 9 gigabytes of sequential disk IO with vanilla Postgres. Here. If you look at cstore compressed there is more than a 10x improvement that's from 9 gigabytes down to 0.8 gigabytes. So there is an immediate reduction in the amount of disk IO you're issuing with cstore compressed. For completeness these are the in-memory numbers where we're running in memory the bars are much closer to each other. This is because the queries are basically bottlenecked on the CPU. There are two things to note here for the in-memory numbers. First the primary benefit in the in-memory case is that your effective memory size increases. So this was on a machine with 15 gigs of RAM and if our dataset size was 30 gigs which still be cstore compressed which still be serving these queries from in-memory. It's almost as if you're like 15 gigs of RAM becomes 45 gigs of RAM. The second note or the question is can these in-memory numbers go any better? The ones for cstore. And there are new technologies that are coming up for in-memory columnar stores such as vectorized execution. If you look into PostgreSQL's executor it goes one row a second row a third row and it does these computations one row at a time. With vectorized executor the executor takes a block of values, goes over them in a batch and does its computation. And we in fact started experimenting with a vectorized executor and have a public GitHub repo that shows our tests and the code. And in our initial tests we saw performance improvements of 2-6x in query run times for in-memory columnar stores too. These were the benchmark numbers. What's next? Cstore is an open source project that's actively being developed. We released cstore FTW 1.2 just last week. In this release we will use cstore's memory usage by 90%. Also previously we only supported the copy command as a way to get the data into cstore. Cstore 1.2 makes things easier by supporting insert into select. That way you don't need to dump if your data is already in PostgreSQL you don't need to dump it into a csv file and load it back in you can just use insert into select. And we have many other features and block fixes that are visible in our GitHub repo. The reason we are here is we are looking for your feedback. If you have one or two features that would make you use cstore please do get in touch with us. We have a mailing list and our GitHub repository and we are looking to hear from you. To wrap it all up cstore is an open source columnar store for PostgreSQL. We used orc the second generation rc file format. Orc gives us a columnar layout support for different compression methods and skip indexes. Then we put that together with the best of PostgreSQL. Cstore supports all PostgreSQL data types and collects statistics to serve queries faster. And it's super simple. I'll come here you load the extension you create your foreign table you copy your data and you're done. That's all I have. Thank you. Any questions? It was 1.2 Yeah. It's like a speedy production for values in the A column is very small. If that wasn't very small then the reductions would make the difference. And I was wondering are they valued for and what impact they are evaluated together. So if you have a query that has a close for A and a close for B and a close for C as we're scanning over the data set we are going to evaluate all of them together. The thing that's going to happen is these are skip indexes so basically however the data is laid out we're keeping track of min max indexes for every 10,000 values. So although they are evaluated together so we evaluate all of them together if the data doesn't have a natural dimension to it then we may not be skipping so it depends on whether the data that you're putting in has a natural dimension to it or whether you sort it. Because the skip indexes work best when either there is a dimension so your data is coming by incremental time or you sort the data and you put it into place. One idea that we may incorporate in C store 1.3 is including additional indexes so that even if your data doesn't have a natural dimension to it you can use external indexes to skip around related data. If we after the conference like after actually the talk we can sit together and take a look at the data and then it's going to become clear. So if there wasn't an inherent dimension that might have been why but all of them are evaluated together. It's having 700 pounds and 30 loads. That's typically in data warehousing workloads so people take all of their data and then denormalize it to all of the data and most of these guys could be null fields. Typically say you're looking at clickstream and now you have different properties that you're looking at that are null for a good number of the columns so you take all the probabilities that you see you denormalize it and then that becomes your fact table. It depends on what you're looking to do. One thing that you're looking to do there is to avoid joint operations particularly if you have a large data set and if it's a distributed cluster this is one way to bypass the joint like different joints. People do it a lot I mean not to the I don't know the percentages but then that's how they denormalize the data and then put it in around their queries. So you have a document in store. That's one alternative. It's really the customer's decision on terms of like do I want to use a document store do I want to use HStore or JSONB the denormalizing part is more frequent on I would say data warehousing workloads where you want to impose that structure and avoid the joint operations. Could be an alternative. I believe PostgreSQL uses the default row oriented store so the storage engine that comes with PostgreSQL is a row oriented store. This is one of the extensions that we have three extensions that we have this lays out the data in a columnar storage format so I don't think I guess you could load this into PostgreSQL to get the columnar store in PostgreSQL if they haven't diverged a whole lot from regular PostgreSQL. Does that answer your it doesn't play at all. You could use it with Citus DB would replicate for you or I guess with PGShard when this has inserts you could use it with PGShard but the regular streaming replication it doesn't play at all because streaming replication uses the right-hand logs to replicate the data and then this bypasses right-hand logs altogether. The data is going to be gone. When the copy completes there is a stripe footer so we automatically update that stripe footer to say this is the new size of the file so there is one big and then so if it crashes your copy operation would have terminated in the middle of it that stripe metadata wouldn't have been updated at the very end in the footer so you just wouldn't see the data so it would be a that's it and then I'll be here for another 5-10 minutes for other questions. Thank you.