 Again, round of applause for D.D. Moisture, thank you. Awesome. Did you get your stuff back? I did. Thank you. I didn't do it. What did I do? You came by and you were the getaway driver. Do you not remember this? I'm saying I drove but you did all the work. That's true. Yeah, I mean. We're good? I'm still in the hospital so I gotta go pick them up. I know nothing about that. Okay, let's give you that. You can pretend all you want. All right. No, legally I have to not know anything. Okay, all right, we're good. All right, cool. So in today's class quickly before we go over to the material for administrative stuff, for you guys, homework two and project one will be out today. I'll go over project one in more detail on Thursday's class because it is based on what we'll talk about on Thursday, some parts of it. And then homework two will be due before then, okay? And then p0 and homework one were due on Sunday night. All right. All right, cool. So we spent the last two classes talking about database storage and we talked about the page-oriented architecture. We talked about the log structure architecture. We talked about what the values actually look like. And so today's class, I want to spend a little time talking about ways to represent the data in maybe a slightly different way to sort of target one class of workloads versus another. We've been sort of dancing around this idea of like, oh, there's these notions of some workloads want to do a lot of rights and a lot of updates to the database and some workloads want to do a lot of reads. So I want to spend some time in the beginning just talking about what these sort of category of database workloads look like. And then we'll see that how we can design a storage architecture or the layout of data to take, you know, to be better for one approach versus another. So this is a gross oversimplification of what real applications look like. But usually this is how people talk about, you know, what database workloads look like in database world. So there's sort of three high-level categories. The first would be called OLTP online transaction processing. And these will be applications that have to do a lot of updates or do a lot of rights to the database and also read some data as well. But typically the amount of data that each individual query is going to access is going to be quite small. Think of like you go to Amazon's website, what do you do? You put things in your cart, you make updates to your payment information, you buy things like you're doing stuff on your account. So the amount of data like the total amount of data might be quite large because there's obviously a lot of users on Amazon. But when you do stuff on Amazon, when you interact with the application, the queries that get invoked at the behest of what you're doing, those things don't touch a lot of data. The next sort of category is OLAAP online online analytical processing. So these are now the, when we think sort of data science, business intelligence or data analysis, this is what these kind of workloads look like. So you're looking at not individual records, but large swaths of data. So again, think of Amazon. They don't care about, in their OLAAP queries, what individual item you bought, they want to look at things like what was the most common bought item in Pittsburgh on a Sunday when it was rainy. They're trying to extrapolate new knowledge from the existing database. And then so the last category here is sort of marketing buzzword, but there is some truth to it. These obviously, these categories of workloads aren't always cleanly defined like, you know, you wouldn't say, oh, I can't do this because it's an OTB thing or I can't do this because it's an OLAAP thing. Some cases, yes, but sometimes often people want to do analytical queries where they do their transactions or do their transactions where they do analytical queries. And so this is sometimes called HTAP. It's like a business, this is marketing firm or business analyst firm called Gartner. And they put out a lot of these buzzwords. Sometimes they stick, sometimes they don't. HTAP seems to stick. So sometimes you'll see databases claim to self there's HTAP because they want to be able to do fast transactions and do analytics at the same time. Another way to think about it is on a spectrum like this. So along the Y axis, you would say the operation complexity or complexity of the query, is it like simple things like go get one record or is it doing multiple joins, doing aggregations and so forth. And then the X axis would say the workload focus is the workload going to be write heavy versus read heavy. So you can sort of think of this division line here where you have OTP at the bottom where they're going to have a lot of writes, but they're going to be more simple. And the queries themselves, the reads could be more simple as well. Then you have sort of OLAAP up here where you're not doing a lot of writes but you're doing mostly reads. So OLAAP is another one of these buzz terms, buzz words, but like, this is a good example where like, somebody comes up with a buzz word in the past, and then it just sort of becomes the common vernacular going forward. So OLAAP was actually coined by the guy Jim Gray. He won the Turing Award for databases in the 1990s. There's been four Turing Award winners for databases. Charles Bachman was the first one. Then Ted Codd, then Jim Gray. Everybody know the last one? Not me. Not me. Definitely not me. It's on the slide. Mike Stoenberger, the guy that invented Ingress Postgres, Vertica, a bunch of different things, who was one of my PhD advisors. He won the Turing Award in 2014. Anyway, so the story that's OLAAP term, Jim Gray kind of got paid off by like his company in the 90s. He was like, Hey, we got to take him a new term. Let's call it OLAAP for databases because they were trying to sell OLAAP database. And he wrote an article saying, Hey, here's this new category of work that's got OLAAP. Then it got found out he got paid, and they had to retract it. But the term stuck. And so again, each tap would be sort of be in the middle here. Okay. So for this lecture here, we're really going to be focusing on the OLAAP stuff to really understand this, like, we're not trying to look at maybe individual records. We're trying to look at, you know, entire table scans, and you joins across them, which we'll talk about in a second. We'll talk about later in the semester what joins are. Alright, so going forward, let's use a really simple example of a database and this is actually derived from the real media wiki software. So this is basically, this is roughly what wikipedia's database actually looks like, because it's open source, you can go get it. And so you can see your user table, you have pages like the articles in wikipedia. And then you have revisions. So you have a foreign key reference to the user that created the revision. And then you have a foreign key reference to the page that corresponds to like, the original page or the title and so forth. Right. And then for, for, and I said a little thing that they do to make this thing run faster, if you want to, because most people want to get just what's the current version or revision of a page, you can have another foreign key reference going back to say, for a given page, what's the latest version I should look like, right, rather than have a scale. So the key thing to point out and remind you again, what I said in the first lecture is that the relational model doesn't say anything about how a database system, an implementation of a database relational model system has to store the tuples on, on disk or in pages. Right. And for, for the most part this semester, up until now, we've assumed that all the attributes or the values of the columns for a tuple will be stored together in a single page. We've talked about overflow pages, but we can ignore that for now. Right. We just assumed that somewhere in this like slotted page architecture or the log structure architecture, there'll be a starting point for a tuple. And I'll have the first attribute followed by the second attribute and so forth. And then once I'm running out of attributes, then I switch to the next tuple. But this approach is may not be actually the best way to store your database for some workloads. And the spoiler is it's going to be for OLAP workloads. So let's, let's see why. So for OLTP, these are where we're going to do simple queries that are going to read and write a small amount of data per operation, right, within the database. Think of like in Wikipedia, I want to go get the, the current version for, you know, for a single page. Or I want to, I want to, someone logs in, I want to update the last log in with the current timestamp and maybe the host name and when they've logged in. Or they create a new revision and I'm just inserting all the attributes together for that, for that new tuple, right. So it's a cycle, I say, I say this is like, for most of you people like, hey, I think I want to build a startup or have a startup idea. This is, you typically build an OLTP system first or OLTP application first, right, because you don't have any data just to analyze, right. So there's nothing, nothing to analyze analysis on. You don't need OLAP. You build this first and you start ingesting new data. It's typically how it goes. I guess the spoiler also to be, my answer is if you ask me what database should I use my startup, the answer is we progress, okay. We can talk about why later. All right, so now for OLAP, again, these are going to be complex queries, they're going to be larger portions of data that are going to try to extrapolate new, you know, new information from the existing database data you've already collected, right. So in this query here, we're actually trying to get the, the count, the number of people that have logged in where their host name ends with .gov. So this is like maybe 20 years ago in the 2000s or so, there was like, you know, congressmen or congresspeople, there's people in the US government, they would pay the, their staff to go update the Wikipedia pages to be more flattering, right, like propaganda BS, right. So this query would basically find out all the people that have logged in with a, you know, on a government, on a government computer, right. So again, what are we doing here? We're trying to look at the entire table of user accounts and look at, you know, examine the logged in or host name one by one, right. So the way the data system is going to represent tuples is going to be referred to as a storage model. And up until now, as I said, we've been assuming what is called a n-area storage or a row storage model, where the attributes, the bits for the attributes for a single record are stored contiguously one after another, right. And this is ideal for all the work loads because, again, most of the queries are like, go get the data from this single record or go get the data for this single entity in the database. And therefore, I'm very likely going to need all or most of the columns. And therefore, if they're packed together on that page, then it's one disk read to go get the page that has all the data, I suck that into memory and I have everything I need to process that query. And things are great, right. So if I go to my example here, say this is our page, and again, we have some header that says this is where the starting point of the tuple, and then I don't go to the next tuple until I have all the attributes listed, right. And this doesn't matter whether it's logged structured or the slotted page architecture, they both work roughly the same way, right. So now if then I had this in a single page, like this on disk amongst a bunch of other pages. So when I have my query comes along that says select the the user account with a given username and password, I can use some kind of index data structure. We haven't really talked about that what they are next week or a couple weeks. But again, think of this as a glossary of what the given attribute, like the username, go find, go tell me the page number and offset where it's located. And again, we'll cover this in lecture eight. So using this index we've identified that the record we need is just this one page. We go bring that to memory and then our query can also jump to the right offset to find the data I need. And I just since it's a select star query, I need all the attributes for this for the user account record. I just read all the bits off that offset and I have everything I need, right. Things are great. This is why for OTP the row storage architecture or the NSM approach storage model is going to be better or ideal. Well, let's gain same for insert. We use the slot of page architecture and find the next free slot and then we just insert it there. We're good to go. But let's look at this the query again. We were trying to count the number of people that have logged in per month at a government email address, right. What pages am I going to need to touch for this? Well, I'm thinking more low level. Like I have six pages, right. For this query, like in the last one here I only need to touch this one here. This one here what pages do we need to touch? All pages. Exactly, right. Because it's getting just we're scanning the entire table just to look at every single record, right. So now save for so we'll ignore how we actually bring in one page at a time and so forth, right. To say it's just iterating one by one. So we start with this page here. We bring this into memory. We see here where in the where clause we have a host name for the user account. So we need to look at this actually here and we look at one, two, one at a time and do our analysis. In the aggregation we're doing account on last log in and we're doing to convert it into per month. So for that portion of the query, I'm not saying how we execute the query just yet but we know we need this data because we see the name. So we only need to look at this attribute, right. So that means all this other stuff here is useless data for this query but we had to bring it in because it's in the page, right. We said that the pages are the data's pages are you know going to be block oriented. So it could be four kilobytes, eight, sixteen, whatever. And so we can't tell the hardware, hey magically go and bring in this slice of data here because the hardware doesn't know anything about the database. It doesn't know what the contents of the pages are. So we had to go fetch this entire page. Let's say 16 kilobytes even though we only need this portion of it, right. So that's wasted disk IO because we brought things in we don't need for this query. So the advantage of the anti-center approach, the row storage approach is great for inserts, updates and elites when you need, when you have queries that need all the attributes, all the values or almost all the values within a tuple. But it's not going to be good for Magsieba that has showed where we're doing large scans that only need a portion of the attributes within the table. Yes. So this question is for this example here, I have an index. Do I assume that this is just the primary key index or could this be an index that like on any arbitrary aptitudes? Those would be called secondary indexes. If they're not the primary key index, they're secondary index. The answer is the answer is it could be it doesn't matter. Right. Because even for my again going back to my this example here, even if I had an index on on hostname to do that match, I got to go get the last log in for it as well. So I got to fetch that in. So the same as you wouldn't have to read all pages. Yes, but it doesn't solve this problem here where I have useless data that I need. Yes. Now there's another way to there is there is what is called you could build an index on hostname and last log in only do the look up on hostname. Now you're when you get to the leaf pages of the tree, now you also magically have last log in there and you can use that. Those are called covering indexes. We will discuss them in a few weeks. Right. There'll be other disadvantages of that approach versus versus what these the alternative is to the row store. Sorry, the the other storage model will be better than that index approach, but that's one that's another way to get around this problem. Okay. So for anybody here, maybe you've already heard you already know the solution because this is pretty common now. What's what's another approach to doing this? What's another way to get around the row store problem? Yes. He says there's some parquet files that do column based. Who here knows what parquet is? I like less than five percent. So the answer is yes. But what does that mean, though? Ignore parquet. What is that what does that column mean stuff mean? Yes. So he's exactly right. Again, nor nor parquet. We can cover parquet is basically a database file format that's independent of a database. So like, you know, my sequel sequel like the bus hub that has a proprietary file format that like only that one day is going to read those open source formats where like other systems, you know, different systems can read the same sort of file format. Right. So what he said is, and what the answer is, is that is column store or decomposition storage model where we're just going to sort of pivot or flip the the architecture of the system where instead of thinking about individual rows and storing them continuously, we're going to think the database, think about the database or a table in terms of columns. And now we're going to store all the values within a single column or attribute across multiple two pools contiguously one after another. Right. And this would be ideal for OLAP workloads where we have to do large scans over over the entire table. And we only need a subset of the columns because now when we go fetch a page, potentially all the data in that page could just be for that one column that we would need. Right. So we go back here. Again, this is a row store approach. Again, where the data store contiguously one after another. Right. So say we've now break it up into columns across multiple across multiple attributes. And then we store them the values within that single column one after another. So now within a single page, my example here, I only contain the values for the hostname field. Yes. So your question is it's each collection of the yes. Yes. So the word partition is the one I'm getting chopped up on because that means something very specific in the business. What do you mean? All right. So this question is for a given tuple, for a given attribute, would the if it takes 10 pages to store this one attribute, but if I can store another attribute in six pages, would I still have to allocate 10 pages for that? And therefore like so that tuple one in this column fits is on page two and tuple one and in this column is also on page two. So I'm like, no, they're independent because they could be different. The because they could be different sizes. Well, talk about how you actually stitch this back together in a second. But it doesn't matter. Less is better. Yes. How do you get what? Sorry. OK, so this question is how do I identify an exact row? If I need to stitch it back together, few more slides will get there. Yes. Any other questions? All right. So my simple example here, assume that all the values for a sort, you know, for all the tuples for a single column can fit on each an individual page, right? To his point here, it never does, but for simplicity, we can assume that's the case, right? So now if you go back to your query we had before, trying to get all the logins for from government addresses and doing aggregation based on the month, right? So I only need to touch two pages. So the first page I'll go grab is the host name. I ripped through that, do my wear clause. Then I'm going to identify which tuples match and which offset or which identifier matches. And then now I only need to go fetch the other page and know how to jump accordingly to the tuples that match in the first wear clause and go use that to compute my aggregation. Yes. So his statement is, his question is, is this just better because the cost of the reduction in the amount of I.O. is so significant that it doesn't matter that there's overhead of doing the matching. The matching is cheap. The matching is cheap. I'll say why in a second. Yeah. Matching is easy. But the answer is actually is this is going to have so many different other advantages, not just the RIS I.O. The second half of this lecture is also about compression, right? But it sort of becomes now obvious, right? Like why we're going to get great compression because now all the values within the data within a single page are going to be within roughly the same domain, right? Just think of like if I have an age column, right? You know, nobody is a million years old. Everybody is going to be zero to 120. And so that's a really small domain of values and I can compress the hell out of that. So not only am I going to get reduced I.O. from just only getting the data that I need, when I bring in data now from disk, the amount of utility that I'll be able to get from it is going to be quite significant. All right. So the answer to his question, which people have a good point is, how do we actually stitch things back together? Right? Because even though I said most of the queries are going to run in OLAP systems or OLAP workloads are going to be doing full table scans, you probably want to say, you know, what is Andy's bank account? Or go get Andy's single record. We still want to be able to do that. So there's two design choices for this approach or to do this in a column store, a DSM. The first is to use fixed length offsets where you know that the value at a given offset in this column will match to another value in another another column and the same offset that they would correspond to the same tuple. And therefore that means every single value within the column, it's going to have to be the same length so that I can do simple arithmetic like, I want the million tuple, it's 32 the integers so one million times 32 bits, I jumped to the offset that I want. Right? The other approach that's not very common in this shows up there's one or two systems that do this approach. I forget who they are. Most systems do this because this is the better way to do this. But you can also embed like an ID and have like a sort of lookup table to say, okay, if I want if I want tuple four, tell me what offset do I need to jump to within each different column. It's more overhead. You could do this way. Some do, but most don't. Yes. The statement is if you have very like attributes that it doesn't work, we'll solve that in a few more slides. Yes. Absolutely yes. Yes. So his statement is and he's correct that this is clearly faster because it's just simple arithmetic to go jump to the right offset where this one you have to do another lookup to go get it. Yes. That's why nobody really does this way. I forget why the systems do this for historical reasons. It's it's it's not that common. Yes. So the statement is if you're doing offsets, it doesn't this mean that doing, say, deletes doesn't this make the leads harder? Absolutely yes. But what did I say in the beginning? Oh, that work closer read mostly. Right. So so most of the time I'm trying to optimize the system for the common case. Yes, I'm rando can show up and delete stuff. But like how often does that happen versus a bunch of scan queries? So I want to optimize for the common case. Now the way you can handle then the deletes is you could have a almost like the log structure stuff we talked about before. You could have a little buffer area say, OK, by the way, I deleted this offset, deleted that offset. And at some later point you can do compaction and prune them out. Right. OK, so the advantages of the of the column store approach or the DSM approach is that we're going to reduce the amount of IO because we only need to read exactly the data that we need. Again, this is another advantage of a declarative query language like SQL because the data system can look and say, oh, you're touching these columns. It's no eval statement like in Python where you have to run the thing first to figure out what it's actually going to touch. You have to tell it up front what you what did you want to access. So we can use that information. You only grab the pages that has the columns that we need from disk. Right. And then there'll be you can get better compression and query processing because it's now so this column or format which will cover the second half today. And then downside is what he brought up where this is obviously going to be much slower for for inserts, updates and deletes, especially if the data is potentially sorted already pre sorted because you have to split things up, put things back together. OK. So this idea is not new like many things of databases that goes back to 1970s, but then nobody did it for 30, 40 years later. So the earliest system that used this column store approach was this thing called Cantor. It was more or less like a file system that looked like a database built by like the Swedish defense systems in like in the 70s. So there are remnants of it in the literature but I don't think anybody even runs it still today. In the 1980s, they finally came out with the sort of theoretical side of what the DSM, the column store approach would look like. But it wasn't until the 1990s as far as I know where they actually built somebody built their own the real like first implementation of a column store system. And it was a thing called Sybase IQ. It was a memory only system but it was sort of it was sold as like a query accelerator like a caching system. You would have it. You would have your regular row store system in front of it or in the back end and then you have this Sybase IQ in front of it and it would just take copies of things in memory, put it as a column store and then you could run your queries on that. And if you ever had updated any things you would go back and you would update the back end of the real row store database. That approach is called fraction mirrors. I think we'll cover that later when we talk about distributed systems. That's basically how Oracle and other systems work today. And then in the 2000s people actually started building some of the first column store systems. Vertica was started by my advisors, my snowbreaker in San Zedanik and they took Postgres, forked it, ripped out the bottom half and rewrote it as a column store system. Vector wise was founded by this guy, Marcin Szykowski and his advisor. And then Marcin went and was a co-founder of Snowflake. So a lot of the ideas that early column store stuff that was in vector wise is in Snowflake today. And then MoniDB is as it was an academic system but he's open source out of out of the Netherlands. And actually the guy that was the founder of MoniDB he had died a few months ago. But now by the 2010s, 2020s, this is like super common, like common. And pretty much anybody today if you're saying you're building an OLAP system or data warehouse or a high performance analytical system, it has to be a column store because the benefits are so significant. Yes. So the question is, is there any particular reason why it took so long? So this is my my conjecture, speculation. The it's because of the internet because before the internet there was a small really only a small number of the companies that had like large data sets. Right. The bank, shore and then, you know, I'm not saying there wasn't analytical systems before. There was things called there was like this thing called terror data, which is very expensive. Like there was and then there's this other approach to doing a sort of modeling analytical workloads or analytical databases for analytical workloads called data cubes, which I don't I think the textbook might cover. Nobody uses them anymore. But like think of like you you have the queries ahead of time and you pre pre execute them and store it as almost like a multi dimensional array and you run analytics on that. Right. And so but those things would get like you have a batch job every night a cron job every night you would update the cube for the next day. Right. And so nobody could do sort of things in real time. You can only do things on whatever got refreshed the night before. And so with the internet comes along people want to have a lot more data. A lot more people have a lot more data and people want to start doing this in more immediately then then like the column sort of stuff like, oh yeah, this is a good idea. We should do this again. Mike told me Stonebreaker told me that he was at like Walmart labs in the 2000s and they were struggling to like scale with Teradata, which is a row storage system you do analytics on that would use data cubes. And that's where he got like, oh yeah, this is stupid. We just do column stores. So that's why he went off and it didn't see store and did Vertica. All right. So any questions about column stores or row stores? Again, if you're building an analytical system today or choosing analytical system I can't imagine there is one that's not a column store but it's not a column store. It's a bad idea. But now you can see why this is like the age top stuff is kind of challenging because Postgres is great. Postgres has a lot of good good optimizations for doing OLAP style queries. But at the end of the day it's going to be a row store and you're not going to get the performance benefits that you can in a column store. Yes. This David is wanted to have a data system that has two storage formats. Yeah. So we say the uncertainty both and then just depending the query shows up and you figure out what you want to do the row store or the column store. That's sort of what this Fractured Mirrored Sybase IQ thing does. That's basically how Oracle does now. Right. You buy their Oracle in memory column whatever it has some stupid name. Like your insert your update query shows up. Those always go to the row store. And then in the back end they propagate the change to the column store. So then when you query shows up you would it would say OK well is this something I can run on the on the column store and they were just rerouted there. And then sometimes the data you need in the column store and sometimes it's in the row store they can sort of natively to split up for you and then stitch it back together to produce the look as if it came from a single database. They'll do that for you and it's very expensive. Yes. So he says isn't that unnecessarily doubling triple amount of memory to use. If you're already using Oracle you have a lot of money. As I mean exit data is I think starting is like at least a million dollars two million dollars. A year. A year for the data system for like Oracle exit data. Database is big money like the market the market is huge right. Snowflake when IPO a few years ago they're down now because the stock market is a mess but is it big. I'm not trying to pick an Oracle like maybe I am. He owns his own Hawaiian Island. So the founder of Oracle he's like the six or seven richest man in the world Larry Ellison. He owns a Hawaiian Island paid for by databases right. There's a lot of money in this stuff. It's a line I look at look on the map he knows he owns 99% of it. There's like there's like 10 houses on it. He doesn't know he owns the rest. OK. All right. So as we said many times just reiterate it. I always always give me the main bottleneck in the database system. Right. So if we had to fetch things from disk. You know when we looked at that the performance difference between the different levels of storage going to disk is like going to Pluto. Right. It's going to take super long. And so if we do have to go to disk because our data is so big we just don't have enough memory to store it all then we can actually have the data system compress the data so that for every page we got to bring in or write back out we increase the utility or the amount of useful data we're actually moving along in those in those IO operations. Right. Yes I know you can get like file systems that do native compression themselves. Right. But the database system knows better and we can do it better. So the key trade off we're going to have to make in our decision of how we want to compress our data is going to be obviously for speed versus versus the compression ratio. Right. And in a typically in a disk oriented system especially in a cloud environment we have to go to like EBS. The disk is just so slow that we're willing to pay extra CPU overhead to compress that data because that'll you know that can potentially reduce the the latency of getting getting all the day we need for a particular query. In some cases too if we do our compression natively when I explain what it is in a second we can actually also do we can actually even though we're paying the penalty to maybe decompress it when we bring it in if our query engine or database system can operate natively on compressed data and that's going to reduce actually the CPU costs for doing query execution as well. So not only are we going to reduce the amount of IO we have to do. We can also potentially reduce the amount of CPU work we have to do to run queries. And why can we do this? Why do you think why do you think databases can get great compression in you know in data and real world data? Yes. He says it's very very well defined structure. Yes. But I switch to the answer right as well defined structure and highly skewed meaning there's going to be a lot of repeated values over and over again and we can take advantage of that and how we design our our compression scheme in our system. So the most common example that I would like to use is this thing called the Ziffian distribution think of like a power law and this was devised where this was found in the this project of Brown University called the Brown corpus when they just looked at the they looked at what they considered to be the the classic text in the English language like Shakespeare and so forth and they literally just went through every book and just counted the occurrence of every single word. Right. And what was the most common word. The T.H.E. Yes. Right. And so that appeared so many times and what's the second most common word. A. Yes. Right. But the the letter the word a appeared. Sorry the word the says that appeared most. It appeared twice as many times as the the letter the word A. And then the third one then a appeared twice as many times as the next one. Right. It's this exponential distribution. So we can exploit that and a lot of real world data is going to look like that. I just think of like in the US there's more people living in the New York city zip code than there's living in the in Montana. So if you have a bunch of addresses it's going to be more people in obviously in New York. Right. So that's one aspect of what we can exploit in our compression schemes. The other one is also going to be have a high correlation between values in different columns. Right. So the most obvious one is zip code is city. Right. A given city in the United States can only be in one zip code. So that means if you know the city we also potentially know the zip code. Right. Another one that's more more nuanced is the order date to ship date. It was like the time you bought something on Amazon versus the date when they actually shipped it to you it's going to be within a day, two days, three days now. Right. And therefore the the the difference between the two values is actually going to be quite small. So maybe you only need to store a delta instead of the full time stamps. Right. So the there's going to be there's going to be three goals and how we want to design a database compression scheme. So the first one is that we know make sure that we always produce fixed length values or fixed length data segments. So and the reason why we want to do this is because we want to do the the we want to use the offset approach to know how to jump to columns just by looking at their offsets. So we need to have every single value within a column be the same length. For the variable length data we talked about this before with the the the oversight storage. Right. We can just have this separate set of pages where we store the big blobs of big binaries over there and we just have a pointer to it. And that one will just compress with with you know GZIP or snappy whatever you want. Yes. So he says is what I'm describing a dictionary encoding scheme where there's a lookup table to go a little things up. That is one approach. It's the most common approach of databases but not the only one. But it's a basic idea. His question is is that how you get away with most of the variable length ones for strings. Yes. We'll get to the other ones. Yeah. All right. Another goal that we're going to have and this is not going to be we'll talk when you talk more about this when we talk about query execution but it'll come up in some of the conversation today is that we want to ideally post have the data system wait as long as possible before it has to actually decompress the data. And that means that we want to have our database system be able to potentially operate directly on compressed data. Right. Think of like you have a file if you you GZIP it on your mission in your box you know there's some tools that like like in the command line can actually read you know compress zip file or compress text files natively and you can do lookups without having to like uncompress it first and then do the scan. It's not exactly what I'm describing here but the height they got highly will think of the same thing. So this technique is known as late materialization. You basically want to delay as long as possible the having to materialize the original value before you know that you would need to do to send it back to the user of the application. And then the last one is super important is that we have to make sure we anything we choose has to be a lossless scheme. So every time we do this to me lossless and lossy for compression was that is able of a lossy scheme. Yes it is the same as a lossy scheme is where if you if you compress it with a lossy scheme and then decompress it. You're not guaranteed to get the exact value back exact bits back. Yes. What is the example of a lossy scheme. JPEG is like MP3 MP4 all that right. So in our database compression scheme we're going to have to choose we're going to always choose a lossless scheme because people aren't going to like it if you compress their data and then you know you come back and then it's not the same not not what you expected. Right. Doesn't seem like a big deal if you're looking at large you know billions of users billions of orders and so forth. But when it comes to your bank account you're not going to like it if you know you had hundred dollars in it gets compressed you got 80 bucks coming back you're going to be pissed right. He says round down the nearest million. There'll be other issues that can come up. There's so many things that do this is we'll come back to that later we'll come back to that later. So. And that means that in our design decision any any time you want to do something that is lossy. This has to be done at the application level because the data doesn't know that you be OK with losing some data. Right. Like for some things it's obvious like instead of storing the exact values of the temperature at like you know a very precise measurements and then doing that over time where I can jump back a year from now say what was the temperature in this room exactly this this time you know one year ago. But maybe instead I just care about what's the what was the average temperature in this room an entire day. So I could compress that down but doing aggregation and only store that single value. But I know it's OK as a human the data doesn't does not know it's OK. So therefore it won't do that for you. Yes in the back. So this question is why should we postpone query execution. Again for the late materialization we'll see this in a few more slides but basically I think of this way like if I have a say I have a really long string say it's like one megabyte but I can but it's repeated over and over again so I'll just convert that to a dictionary code that's like 32 bits. So said so said of me passing along one megabytes from you know in my query you know from one step to the next as I'm processing it I just pack pass around that single value the integer and I don't pay that copy overhead. That's roughly the idea. There's another question. Yes this question is is there a is there an option in a database system where if you want to start multimedia you could keep it lossy. So there are databases built that there are some databases that can store there are some databases that that can store like videos and things like that. But typically they are they're they're not they're not doing anything special for actually the video itself. They're starting with the metadata but like it's the timestamps and who's in it whatever the creation times like the metadata about the file itself. Most of those most of those things are just using a relational database and is writing the video files themselves at the disk. They're not really do any like transcoding anything like that. Right. I might be wrong there might be something like that. I but the end of the day I don't think they're going to build you know a new relational engine that can physically do something special in video you just store it as a file and handle it that way. Right. So again we'll cover we won't cover lossy techniques. There's another way to do approximate queries where you can do sample the data. That's sort of like a lossy approach. That's that's less common. We will talk about that later into our query execution. Right. So there's a bunch of ways we can we can compress data. There's obviously one it's just compress a page or a single block. Right. All the tuples are in the table to you know compress it and be done. I mean typically you see this in in all to be systems and we'll see the example in my seagull next. There are some systems that can do to the level compression so within a single tuple I can compress that a certain way I started to within a page I can press that a certain way and then I'll compress the the next table are completely different. Atchabit level would be like if I have a single single attribute a single value I could put that is like in my overflow page and then run G zip on that to compress that separately and then the columnar compression is you take all the values within a column and you can press them. Again so let's talk about the the this sort of first approach here. We'll see why it doesn't work for all the stuff we want to do especially in a column source system and then we'll come back to the columnar storage and this is where you can again have the native integration or native the system can natively operate directly on compressed data. Right so with naive compression at the block level the data system is going to run whatever off the shelf compression algorithm that it wants that just going to press compress the data right say the single page I take the whole thing I run G zip on it and I store that right. So there's a bunch of different compression algorithms you can use the the state of the art one now is actually we consider Z standard at a Facebook and what makes these compression algorithms different than like G zip for example is that they're going to make a trade off to have a lower compression ratio in exchange for faster compress and decompress. Right G zip you know pass the best flag. I don't know that's the best one anymore but like that'll be very aggressive to compress your data get it really down really small but it's going to be more computation expensive to compute. Yes. Yes. So his statement is I said before that we want the the compressed result of our data to always be fixed length. Do any of these approaches do this. No. Right. We'll see how we handle that next slide. Right. So again there's this trade off when you decide what compression algorithm you don't want to use in your database system when you're doing this naive compression. Again naive means that like if I compress it the data doesn't know doesn't understand what the bits are of the compressed data. Like in theory you could write something that knows how to look inside like a z standard compressed block and make sense of it. But as far as that nobody does that. It's just not worth it. And we're going to choose something that's going to have sort of the right trade off between performance and and compression speed. And at this point z standard is what every modern every newer system that's built more recently will use the standard. Facebook is actually maintaining it and making it better. You might still see LZ4 and then Oracle of course has their own proprietary thing that they patented. Snappy came out of Google but it also was like much faster but lower compression ratio. At this point the Facebook one is the best. So let's see how we're going to handle the problem that he brought up where I take a block of data it might get compressed to different sizes and how to make sure that they're actually fixed fixed line. So this is what my sequel does in in in a DB engine. So I want to get a bunch of pages to be compressed and then the the size of each page could be variable length but it's always going to be a power of two. Right. One two four eight because the default page size of my sequel is 16 kilobytes. So if you compress it down then you just round up to whatever the next power of two size it can be. That means if I have a one point five if I take my 16 kilobyte page I compress it and it's one point five kilobytes. You just pad it up and round it up to be two kilobytes. And that way you know exactly how to fit you can fit these things more easily continues to do on disk and have fewer holes. So in addition to the compressed data also going to have a little buffer space at the beginning of the page to call the mod log. And this is actually the same thing of the similar to that log structure stuff we talked about last time where if I can make if I have to make changes to the data within the page I just compend them to the mod log without potentially having to decompress the whole thing. And this is why I think that the textbook example when they describe the log structure storage is I mean I don't like the way they do because they spent too much time talking about these merge trees. But this design a pattern of like again using law structure storage appears a bunch of different places in database systems. So it's a more important to understand that other than the merging part. Right. So now when I need to read a page I can bring it to my buffer pool and I'll leave it compressed initially. Anytime I have to do an update where I don't need to look at the data inside the page it also could be an insert because I would know how many free free how much free space I got here. I could put it into my mod log. I say an update query with update update of table set value equals full since that update doesn't need to know what the previous value was. If I know that that tuple exists in this page I like to pen the log entry to the mod log. Right. Now anytime a query comes along and wants to get what the current value of for that attribute is I can potentially get it from the mod log without having to decompress it. If I do need to get the full tuple back they'll actually decompress it and then there's always blown up to 16 kilobytes in my SQL and they actually maintain two copies of this. Right. And if the if I never update the page then at some point I'll want to evict this this page but I can still keep the compressed one in memory and that way I've always had to go back to just go go get it later. If the mod log gets full then I have to then apply the change and then write it back out. Yeah. Basically would evict this first before you pick the other one question. His question is it goes to 16 16 kilobytes pages. Is that because they never compress anything large and 16 kilobytes or is that the compression algorithm. So that's the default page size in my SQL. So the whole system is set up with this assumption that you have 16 kilobyte pages. So the idea is that like when it goes on to press the 16 kilobytes all the rest of the system doesn't need to know that oh yeah it came from a compressed thing rather than from disk. Right. So in this example we said that that is we have to decompress the data before we can read and modify it. This is how much data we can actually look at. Like if we could press the entire table using GZIP or Z standard then we're screwed because any time you need to touch anything on that table we got to decompress the whole thing. So you typically can only do this you only use this approach for the off the naive compression algorithms for the oversized storage and within it like a single page. Yes. How do you question is how do you know what model to append to because there's some there's some index up above that says in this page. It's just like before record number and offset or page number and offset. Sorry. All right. So. Ideally what our data system operate on compressed data right and a high level it's going to look like this. So we have some some table with two attributes Matt and I with fake that fake salaries and then somehow there will be some compression algorithm that's going to have compressed data. We don't know what it is just yet. But then when our query comes along you know final sex star from users were named equals Andy. I want to do some kind of database magic and be able to convert the constant Andy into the equivalent compressed value for that that constant. So now when I do my scan on the table I don't have to decompress the compressed data. I can just use my compression version of the key that I'm trying to find. Right. So that that's essentially what we want to do here. And the the comms for stuff is going to be more more amenable to this because again all the values within the single column are going to be contiguous. So we have a half an hour left to see how much we get to this. So these are probably the the six most common compression schemes. Again the spoiler would be dictaring coding is the most common and that's going to handle the string problems for variable length things. But I want to talk about a bunch of the other these. And I'm going in this order because we'll see that in some cases where for one compression scheme we can get even better compression by also using one of the previous ones we talked about. OK. All right. So most simplest one we can do is called run length encoding. The idea here is that we're going to reduce down the a single value that appears over and over again in a column into a triplet. We just say here's the value that that we're storing. Here's the start position where we appear in this in this column segment. And then here's the number elements in the run. Right. Now we're going to get way better compression if we sort the data. So that I know you know all the things appear first and then all appears you know the next the next value appears all again. But this gets tricky now because if you have you would however you sort one column you have to reflect the same sort order so that the offsets match in additional columns. So how to pick actually what columns you sort on is is a non NP complete problem. So those are a really example here simple example here. We have a single table. We have an ID column and then a sex column for simplicity to assume that they're the sexes. Not a joke. All right. So with run length encoding what we can do is replace every single at a given offset we would have our triplet where we say here's the value that we're storing. Here's the offset we appear in and then here's the number of values that appear afterwards or number of currencies of this value. So you see the very beginning we have three three m's. So at the very top here we're going to say value m at offset zero and the run is length three. Right. So now when I want to have a query like this select new aggregation we want to count the the count the number of males and females in our user table. Right. All I need to do now is look at you know look at look at these these these encoded running run learning coatings and I can compute the count from this. All right. Yes. So his question is how would I handle a a where cause of theory that is multiple attributes. Yes. What do you I handle it. So it's a question. So not this query but say I have like select star from table where sex equals male and name equals Andy. Right. How would it isn't in a column how would you you would excuse us. We're jumping away ahead here but like the database system could figure out which column is more selective. Like should I are there are more males than than Andy's. Yes. So then you maybe want to go scan the name column first because you'll get fewer attribute fewer matching ones and then you can then use that. You would know what the offset of those matching Andes are and if you use this information basically find out whether the offsets are in these boundaries. Yes. This question is does this only work well with highly quarter category of data. Yes. But again we said before that's highly skewed so it's likely yes. So in this case here we can see that if we have. You know my really simple example it's a binary column is only two values. I have male female male female. So this is kind of bad because I could store this in you know small number of bits but I have this triplet where we have on to run lengths of one. Right. So this is the sample here the running the coding is actually going to make the data bigger than that it should have been. So the way to handle this is as I said before if you sort it based on this column right now the number of triplets I need to store is two right now you see you get amazing compression if you do the sorting. Yes. This question is why do you have to store the length explicitly. The statement is do you have to actually store the length. Some of the decisions do some of the decisions don't for explanation reasons I've added there. Yes. All right. So that's running the coding that's very common but it only works as standard again if you have a small number of values or unique values within a column. Another approach is called bit packing and the idea here is that if the data we can recognize that the values that you're storing within a given column are less than the what the potential largest size of that column could actually be maybe you don't need to allocate all the bits that you would normally have to allocate to store the same data. Right. Let's say I have a single column it's a 64 bit integer and so I see my values here but what's really going on in the covers is that I have this you know I have all these bits here but in the red part is the only only distinguishing part of the values. Right. Because again going back here my values aren't that big but I've said you know in 64 so it's two to the 64. I could potentially store an integer with two to the 64 the largest size to be two to 64 but all my values are super small. So this is all wasted bits. So with bit packing you basically can recognize that okay well. I can actually store this data in in 8 bits or some some smaller size. Right. And so now your reduction is quite significant. So if you just my toy example here I have five 64 bit values that is that's 320 bits. But if I do serve as 8 bit integers down down to 40 bits. Yes. This question is does this become problem with aggregations and arithmetic because the semantics of integers of the size become like. So it's the same as if. Well in that case like to that history there are instructions like there are there are instructions to do 8 bit arithmetic and to handle overflows right. And so the hardware will handle some of that for you and you stitch it back together. One plus one is the same as one plus one and 8 bits. Over it'll handle the overflows. Let me see the data system can handle this. It or the hardware handles how it helps you as well. It'll matter. It also help for for vectorization through SIMD. We'll talk about later. All right. So this works great. If all your values are exactly going to be less than less than 8 bits or 16 bits or 2 bits and so forth. Maybe the case you have outliers and that won't work in the scheme because you're really allocating 8 bits for every single attribute. So this this won't work. So a extension to bit packing is called mostly encoding. It's this is what Redshift calls it in their system. And basically you declare a column that the hey I'm it's mostly going to be 8 bits. But they can handle the case when it's not. So let's say I have a 64 bit column a 64 bit 64 bit integer column. But I have this one guy here that's that's super big and it's can't be fit in 8 bits. So you can store the column or the mostly 8 where you have all the values again as 8 bits. But then you just have a special marker for that one guy that's an outlier. And then you have a then you have a lookup table on the side that says OK at this offset here's the original value. So now as I'm scanning along doing whatever I need to do if I come across one of the special marker I know that OK I'm at this offset go in the lookup table to find the real value that should be there. Do his point now to depending what the operation like if you're doing some kind of summation or whatever the database system in conjunction with the hardware will know what the data type is and know how to to cast things accordingly. All right. So the next one is called bit mapping coding. And so the idea here is that if for low cardinality columns meaning that the number of distinct values is low we're going to store now a bit map for that determined that says where at a given offset whether the value for that attribute is is is the given value represented by the bit map. So I'm at the I position in the bit map that corresponds to the I position or I I offset in my table and I can check to see where the bit is flipped and tell me whether the value is a given is whatever the bit map represents. So let's go back to our example here with the sex column. So I can just take this column here and then I'll have one bit map for males and one bit map for females. So now if I want to say things like this offset here is a male or female I could check the first bit map of this offset at zero. I check this this offset here and it's one. And yes my example here it's it's only two sexes so you could just you know zero could represent female and one could represent male. Just think if you have more possible values you would have a separate bit map for each of those. Yes. Is it same as it's mapping one possible pattern one possible value to a bit map a vector of bits. Yes. Yes. And one means at a given offset the value is this whatever the bit map represents. So my example here again 72 bits to store the original column. But I get it down to 16 bits plus 18 bits. It was 34 bits in the compressed version as a bit map. Yes. This question is is there a reason why we're doing a one hot vector as opposed to encoding the entire space as a as a single number. What we what so what would that single number be. I see I see basically. Yeah that yes that is dictionary coding. Yes. Yeah. But then it's not exactly same because you see you're sort of doing the offsets things right like your approach is basically the same thing you could do that way too. Right. So this seems fantastic right. But it doesn't work when you have a high card now to show you how bad action yet. So say we have a simple table here of just named email addresses and zip codes. So if you just want to build bit map indexes on the zip codes. In the U.S. there's about 43 thousand zip codes similar table has 10 million records. So you have to store a single bit map per zip code. It's going to be 40 megabytes times 10 million is 53 gigs. Right. So you know this is going to show that like the the bit map some of these compression schemes can actually make things a lot worse and you got to be careful how to use it. Most systems will require you to tell it when you want how you want certain things compressed. What was that. All right whatever. I don't remember what I said. But we're short on time. Let me keep going. OK. Because it was so the other problem in this approach too is that every single time I maybe add a new value or a new tuple. I have to extend all these bit maps and every time I add a new attribute. That's easy because you just clear a new bit that field. It's it's a certain new tuple is a problematic for this. So some data systems will let you create bitmap indexes. There are there are some systems that they only store data as bitmaps like this. And there's other ways to encode databases and using bitmaps as well. We don't need to get into that right now. Delta encoding. The idea here is that we're going to record the difference between values that are that follow one after another in the single column. And so you think of like this where it's a time series data set where every minute we're getting the temperature outside. So for a given sort of the first value here we'll store that in its entirety because we always need the base value to say what our deltas we based on or derived from. And then for the time field we just store like a plus one to say we're going forward in time from the previous value and then the same thing for the temperature. We just have a delta from one temperature to the next. So we can actually compress this even further and this is why it was saying we can reuse some of the compression schemes we've already talked about and apply it to the next compression schemes. So we see here what we have just a bunch of plus ones over and over again. So we can use the run length encoding to now store that down as a more simplistic version of this. So we say here's the original here's the original time and then we have after that. Right. So we just look at this time column here. The original value was 160 bits Compressed version of just Deltan coding gets it down to 96 bits. But if we do the RLE on top of it as well we get down to 64 bits. I'm doing this math in bits and small examples to sort of make it more understandable. But obviously I think of tables that have billions of data billions of records. Right. These these are quite significant differences. We can do something similar for strings and this is called incremental encoding. And the idea here is we just want to avoid having duplicate prefixes storing them over and over for consecutive tuples. Right. So we have a simple column that of string data. We have Rob, Rob, Robbing and Robot. So we'll first start with the the common prefix here Rob. So we don't store anything there because we have it. And then we go down the next one and say what is the what is the what portion of the previous value can be found in my in the next string that I have in this case here. It's this Rob part. We just keep track of these things. Go down Rob like this robot like that. So now we know with these common prefixes we can go back and encode the data where we can we can throw out the part where we know that it's the same as the previous one. Just store the length of the portion that's that's the same. And then we just let me just store the unique suffix at the end. Right. So prefix and suffix and then the size we go from 168 bits down to 88 bits for the suffixes and then 32 bits for the prefix information. And that's what 120 bits. Yes. Why is the prefix sorry what questions why is the prefix for difference for this one versus this one for this one and this one because to this to this is the prefix then that corresponds to this right. So in this case here I have Rob going back here this portion of Rob here's not from the base one is from the one right before you the reason why you would do it this way is because now what if I need to jump to a given offset and I want to reconstruct the tuple I don't need to sort of replay everything from the beginning. I just look at the one that came before me. It makes that decoding more efficient. I don't know how common this one is. There's something like this similar to a Postgres stores information in the leaf of the bee trees but for stringing a coding like this everyone does this next one dictionary coding. So again most most people think we take oppression and if you know a little about how like me GZIP and LZ4 all work they're basically doing some variation of dictionary encoding. The idea here is that we're going to build a separate data structure on the side that's going to map unique identifiers to some kind of variable length data and then we'll replace the values in our column with that identifier code rather than storing the original string of the original variable length data. There are some dictionary encoding schemes where you can actually this is a good example of a difference between the GZIP and the LZ4 naive compression schemes versus data is native compression schemes. Database in the day is dictionary encoding will do a whole length replacement of a value meaning like I don't try to find matching or repeated patterns of byte sequences and have multiple dictionary codes for those. It's like one value equals one code whereas like in the naive compression schemes they don't know the boundaries of columns or attributes sorry they don't know the boundaries of tuples within a byte stream of column data so they're allowed to do any kind of pattern replacement. The dictionary usually means like the single code equals a single value if that makes sense like I'm not doing within the within the value itself you know replacing with multiple codes. As I said this can be the most widely used compression scheme in most database systems. So say we have original data here a bunch of string feels like this so I'm going to compute a dictionary where I have the original value and just replace it with a code like this number and then in my my column now instead of storing the string now I store the fixed length code right pretty obvious. And then it's it handles the case we talked at the beginning where if my query shows up and I have a constant want to be able to operate directly and compress data well now I extract my constant do my look up in the dictionary now I know what the code is and then I can scan through the data and now I'm doing comparison between on the codes which are just integers we way faster than doing string comparisons because the harbor can do this very quickly. And I don't have to decompress this column to go do my string match but I had to find my where close. Again does not keep banging on how great the sequel relation declarative query languages like. The query doesn't know doesn't care what the hell how these actually compressed days to figure out OK yeah it's compressed let me run this query exactly as if it was directly on the compressed data by rewriting the query do this look up based on the ID rather than the string. Yes this question is it like hashing the original value to generate this number. I think I guess he says in event of hash collision your data system will have some trouble. Yes. What's another problem. You can't go back. But here's your answer. Right so. We need to support in code and locate so that's so that's my first example where I could take the original string and convert it into a into a dictionary code. But then as you said we got to go back because we need to be able to potentially produce for a dictionary code produce back into its original form some cases we need to do that. The other thing that is also to support is doesn't be the hash function is not preserved the order of the of the the original values in the dictionary codes themselves because it's going to end up being random. So what I mean by this is that if in some cases we may want to have the for quality predicates like does something equals something the dictionary code and his hash example will be fine. But if I want to whether something is less than something or something greater than something the hash again randomizes the domain so we can't do that comparison. So a really easy trick to handle this is just to sort the strings and use that as the order of how we define there are dictionary codes. Right. So if we take all the unique values that we have and are in our and we just sort them and then assign codes based on the order that they appear. Now when I want to do a comparison like you know is it's something less than Andy or greater than Andy I could make my decision on what the values are here or even actually on the compressed values themselves. Right. So if I have something like this where name like Andy doing a wild card I can write that rewrite that into doing between calls which basically a range range scan because I can look at the dictionary and know that here's the only values that would match. My predicate based on the string and these are the only this is the only range of values that he did look at. Yes. So he says we're assuming here and yet for the columnar compression the interest yes we're assuming here that we're not doing OTP because if we in my example here because we could insert a tuple that will say we insert 100 values that are in between this range here now all my dictionary codes are messed up because I can't guarantee that that that sort of property I have yes this won't work if you do a lot of updates and you exceed the boundary ranges but again OLAP like you the cost of actually maintaining this dictionary to is kind of high because basically checking this this extra data structure doing insert updating that like you don't want to do this while you're doing OTP because you want to be in and out quickly in the days so you want to do your update do your insert get back right away if you have to maintain all this sort of compressed metadata it's going to be slow he wouldn't want to do it anyway yeah so it's going back here so what is the usefulness of being able to decode it back to its original form because there may be like some that you may be running you may be transforming the original data in such a way where you need to like get the original form so my example before I showed that the time stamps when trying to find people logging in from the government I took the took the date and then I extracted what the month was so I can run a function on some string and I want to get like just give me the first five characters so I have to decompress it first okay so let's look to two different examples of of queries we could run on a compressed data dictionary compressed data and we see in some cases we actually need to decompress the data or still scan the original data in other cases we can operate directly only on the on the dictionary right so that query that select name from users doing the wild card on Andy or Andy in this case here we're still performing the sequential scan on the column because we we need to find all the tuples that actually match my predicate right so I still have to scan scan the data I can do the scan on the compressed version of it but I still got to look at the original data but I'm doing distinct name then I don't need to know I don't need to look at all the original tuples I just need to know does this thing exist which one's exist so this particular query here the data system could say okay well I don't actually need to look at original data I just need to look at the dictionary and that'll produce the answer that I need which is great because the dictionary might be I don't know a couple kilobytes right depending on like is it based on a block or a column or whatever and the the actual column itself could be you know gigabytes terabytes so I can produce the answer I want by just looking at the dictionary so again it's this auxiliary data structure if the data system knows how to exploit it because it knows what the contents are and know what it's actually storing we can use it for query processing yes question with the dictionary also store how many references are to each name some you can do that if you assume that the that the say that the block or segment of this for a given column if you assume it's immutable that yeah you could pre compute a bunch of these things ahead of time to his point yes same if if I have a really long string and it's no column actually using it I have to delete it print it out someone's got to do the bookkeeping or vacuuming to clean it up right yeah I have to do it if you don't just create a new one most I think that's true or not yeah most times in OLS systems the dictionary confess data the blocks are immutable right you can say you can have it doesn't necessarily need to be like log structure versus like a slider page you just say the block is immutable and there's a bunch of managers you can take because of that you know the optimization you can apply because of that all right so we're over time here so the main takeaway is this is that you make sure that you're choosing the right storage model for whatever the target workload that you're trying to support and so this is something you would just figure out in the minute in the beginning do I want to support transactions or support analytics and you would design your system accordingly and then the business can get much better if it does it natively versus naive and we showed examples where you can operate directly on the compressed data so next class we'll talk about how the process of actually taking data from disk and bring it to memory okay all right hit it