 All right. So today's class is actually one of my favorite classes talking about compression. Because I think this is one of the really interesting topics that really show or get to the heart of how if we do something, you know, manage our data a certain way inside of our own database system and not let, you know, the operating system or some other thing that we have to treat as a black box do it for us, we can get much better results. But before we start, I want to jump into and pick up what we missed last class talking about compression, or sorry, system catalogs. But a real quick announcement that's totally skipped my mind on Monday. But we're having a tech talk next week on the here on the fourth floor from the guys from Oracle that built the in-memory fraction of your data to Engine we talked about last time. So they're saying everything I talked about on Monday, they're going to come talk to you about on Monday next week. So it'll be 4.30 on the 26th. So immediately after this class ends, we can just head over there to the auditorium and go see what the Oracle guys have to say about things. So this is a link that takes you to the website on the data script to show you more information. So this is optional, you don't have to do it. But again, it's just reiterate the things we talked about last class. So today we're talking about system catalogs. We're going to talk about then do a quick demo. And then we'll jump into the heart of the compression stuff we want to talk about today. And there's basically going to be two classes of compression. We're going to talk about two categories. The main one we'll talk about is the OLAP columnar compression because this is where you get the biggest win in memory database. And then we'll talk about research we've done here at CMU where we can actually compress the indexes for all OTP data systems. All right, for catalogs, the way to think about the catalog is always that it's the database with inside the database. It's the thing that's going to maintain the metadata about your data. And pretty much every single database system out there is going to just piggyback off of the existing infrastructure they already have in their database system and store the catalog as its own database. So you can store the catalog information as tables inside your database. So there is sort of a chicken and egg problem here because in order to bootstrap the system, you need a catalog, which you can have a catalog until you have a catalog. So typically the way this works is to specialize code inside the system for turning the system on and then initializing all the catalog infrastructure you need to have. So in the way Postgres, the way they do it is they have a special script that reads the header file and converts a bunch of pound of fines into this intermediate op codes that they generate and then they interpret that and allows them to then populate the schema. Again, the issue is if we can't create our catalog tables because in order to know what table you're creating has to know what's in the catalog because there's no catalog yet, you can't do that. So in our own code, we basically have hard code to C++ code for generating these catalog files or to this catalog database, but I think the way Postgres does it is actually quite interesting and probably what we eventually wouldn't want to do. Internally, what's going to happen is since we're writing our system in C or C++, we don't want to operate directly on tuples because you don't want to say, in order to get the name of a table have to invoke a select statement or invoke a SQL clearing and go into the engine and get that. So typically what happens is that there'll be some object abstraction or a struct abstraction around these catalog objects and allow other parts of your program to communicate, to access their information very efficiently. So the one thing I want to stress that I think this is the right way to build your catalogs and we'll see in the case of MySQL what have they don't do this, is that it's really important that the entire database management system be aware of the concept of transactions, concepts of isolations, concepts of concurrency control. If we have this permeate all throughout the system, then we're gonna be automatically be able to get all the asset guarantees that we get for regular data, will be able to apply them and get them for all of our catalog changes or schema changes. So what I mean by that, and we'll see this when we go through the examples of making schema changes, like if I add a column, and that's being done in a transaction, then I know that no transaction that came before my change will be able to see that column because it's not gonna be viewable or visible to it in the catalog because they're tables. And they fall under the same concurrency control projections as regular data tables. And then ensures that when I actually commit that transaction that made that schema change, then every transaction that comes after that will be able to see this. And then because, again, we're storing it as data tables, if we have our logging component can just write out log entries to say, I made this schema change. So when I come back up, I can reinstall it. So again, the right way to build your catalogs is have everything be built inside the database itself. And MySQL is probably the lone exception to this. So the way to do a bunch of schema changes depends on actually whether you're using a column store or a row store. So it turns out doing schema changes in a column store is actually really easy because the columns are isolated from each other. Their store can take your same memory as a single segments and we don't have to make major changes to other columns whenever we modify something else. In order to add a column, if we're a column store, it's easy. We just make a new column segment. And as I said, if we're doing this in the context of a transaction, we update our catalogs transactionally and then that column becomes visible when that transaction commits. But if we're doing a row store, then what we're gonna have to do is we're gonna have to just scan through and for every single tuple, copy it out to a new location and then append the column where it needs to be. So basically you have to copy everything. Same thing for a drop of column. There's sort of two parts to do this, the easiest way is just to do the reverse of this, just do a sequential scan on the entire table and copy every tuple you have into a new block that doesn't have the column. In the case way of Postgres, what they do is they're gonna go to Smarter and they just had a flag to the column inside of the catalog and says, this column has been deprecated and then it knows that no transaction should be able to see this if they come later on if you do a select star query and there's deprecated columns. Although physically it's still there, you don't get it back in the result and then when the garbage collector or the vacuum runs at some later point, then you just clean it up and you can print it out and then again for a column star it's easy, you just drop the memory or free the column, remove it from the catalog and then drop the memory. Change columns a bit more tricky. A change column could be things like changing the type, changing whether something's null or not null. The easiest one thing to change in the column is if you change the name of the column, then you just update the catalog and you don't have to touch the actual data. But depending on what kind of change you're making, you may have to do two passes over every single tuple. So for example, if I have a column that allows for nulls and I wanna change it so that it cannot have nulls, I gotta do one pass through it to see whether, actually that's a bad example. Changing something from not null, from that could be null to not null, you just do one pass through and then if anybody has a null, you throw an error and don't make the change, but then if nobody else is null, then you can make a change and you just update the catalog, that's easy. A better example where you have to take two passes would be changing it from a 32-bit integer to a 16-bit integer. So I gotta do one pass through and figure out whether I'm gonna overflow on any value that I have. And then if I'm not, then I can go back now again and then do the conversion of the data. And the reason is because you wanna do this first check is if something is like, it would overflow and you would have incorrect values, you'd wanna throw an error, abort the transaction and don't actually complete the schema change. For indexes, it's a little bit tricky as well because as we said before, indexes are this weird thing where it's like a copy of the data itself. And we wanna make sure that this is all done when we add or drop an index, it's all done in the context of a transaction. So if you wanna create an index, then you basically have one transaction to do a sequential scan through the entire table and it starts populating the index. Now the issue is then, of course, you don't wanna block everything while you do this because it could take a long time. So you allow other transactions to keep running and they make changes to that same table which may affect what should actually be in the index. So what needs to happen is you need to buffer or keep track of all the changes to the table that were applied after you started creating the index which you can do easily if it's a transaction because you know what came after you. And then when the scan completes and you've populated at least the initial version of the catalog, of the index, then you actually lock the table very briefly, go back and apply all the changes that you missed since you started and then once that's done, you commit this transaction. Now the index becomes visible to all the transactions and you can start using this, right? Again, there's another advantage of having everything be transactions in our catalog because if we put the index in the catalog and then it became visible to anybody else before we started fully populated it, then other threads could get, or other transactions could get false negatives or false positives, which we don't want to happen. For drop an index, it's pretty easily, it's easy to do. We just drop the index from the catalog logically and then that immediately becomes invisible for any transaction that comes after that. But the nice thing is that any transaction that started before we did this drop, we'll still be able to update and modify it as needed, but then at the end, once we know no transaction could be touching the index, then we can finally drop it. Question? Is this where it's like that year I thought it would probably check funds. Part of the query. So this question is, is this like an idea, is creating index like a fuzzy checkpoint in what way? You know, like recording chains, like while another transaction is waiting. We'll talk about checkpoints next week. So his question is, is this idea of I'm gonna start popping my index and keep track of all the changes that are still occurring while I'm doing this. And then at the end, I essentially do what's called a sort of stop and copy, where I stop everything, apply all the changes that are missed and then allow them to proceed. Fuzzy checkpoints are a little bit different. Fuzzy checkpoints are just, fuzzy checkpoints you don't have to do that final block at the end. Yeah. Right. So again, so I've talked about indexes because indexes are the most complicated things but this would also apply to views, materialized views or any other construct you can think of or item you can think of that would be in a catalog, functions, things like that, UDTs. The only thing that deviates from this, what I'm saying here about how all our schema changes should always be transactional, the only thing that I'm aware of that doesn't actually fit into this paradigm is sequences. So a sequence is basically, if you're familiar with MySQL, they call them auto increments keys, it's basically a global counter you can maintain in the database so that if you wanna insert say a new unique primary key for every single tuple, like every tuple gets a new number, you would use a sequence for this. So the funny thing about sequences is that these are not maintained with the same protections as regular transactions because they have a different property that we care about that we wouldn't get normally in other catalog changes. So for example, if I create an index and then the transaction that creates the index is now scanning the table and populating the index, if it decides for whatever reason to die halfway through then I don't want any of those changes to the catalog get persisted out to the log because that index never got created because it was not completed in the transaction. But in the case of sequences, what's gonna happen is a transaction is gonna come along and it's gonna update the sequence to get a new value and then it may or may not commit but we wanna make sure that value actually gets persisted because another transaction at the same time may also increment the sequence and we'd want it to the value to go up and if that first guy fails, we don't wanna go back because we've already given out another sequence to another transaction. So what needs to happen is when we update the sequence, you have to create a log entry and it doesn't have to get flushed right away but you just need to keep track of the fact that if any transaction reads or writes to that sequence in the catalog, then when it goes to commit, you have to make sure that the commit record for that sequence change gets written out. So say I have transaction one and two, transaction one updates the sequence, transaction two reads the sequence, transaction one will abort, transaction two will commit and it will persist that sequence change because the thing always needs to be moving back forward in time. So we can do a quick demo of this in Postgres if I can switch the screen. Okay, cool. Now, you know what? I'm gonna, I hate typing on the surface. That's a terrible marketing thing for Microsoft on video, but whatever. All right, so I'm gonna create a simple table that is gonna have a serial key. So Tmux Attach, all right? So drop table XXX, create table XXX. I'll have an ID field and the serial keyword is just a synonym for saying I want a sequence. So Postgres knows that they should automatically create a sequence for this, right? So I can look at my table at first. I have nothing in there. So let me start two transactions. So this guy down here. So he's gonna begin and he's gonna insert into XXX values default. So default basically here in the context of a serial key basically says it tells Postgres, go get the next sequence and use that. And so now when I look at my table, I get XXX, I have ID one because the sequence went up by one. Now down below if I start another transaction, insert into XXX values default and I do a select, I get value two, right? And again, we're both in transactions. This guy hasn't committed yet. So this guy can't see the other one. So now in this case here, if I roll back, right? And this guy commits, if I can right commit, no, I'm boarded. There you go, awesome. Well, that run that demo, actually roll back. So both these transactions are boarded, but if now if I insert again, I will see that I should get three, right? Because the sequence got logged even though the transactions that updated the sequence which is just stored as another row in the catalog table, there's a PG sequence table, that got incremented and technically should roll back because both transactions are boarded, but the system knows always keep having that much forward in time. And it has to be logged because if I crash and come back, I want to make sure that I don't roll back to zero. Yes? It's like that before it stands for the sequence number. Default, when I do insert, the default keyword basically says, go get me the next value. So in theory, I think I can do this. Yeah, and check that. Now let's see what happens. So let's gamble. I forced it to use 999 and I over wrote the sequence. I actually don't know the answer to this. Who says that if I insert the next one and let the sequence increment itself, it'll get 1,000 or who says it'll get four? Who says four? Raise your hand. Could it be five? Why would it be five? Because we have like, so we want new. I just inserted the tuple now and I put 999 in. I think it would be four. All right, he says four. Who says 1,000? Everyone says 1,000. All right, here we go. Four, oh, okay. All right, fail. All right, so that tells you that it's not going up in the catalog. All right, so one more thing I want to show you is, oh, good question, right, let's do that. Her statement is, what about if I insert five? All right, so it's there and now I come back in default. Who says it'll fail? A few, who says it'll work? All right, it's 50, 50. I didn't like it, there you go. So, all right, so that tells you that the insert is completely disconnected from the sequence. So, I'm extremely interested in sequences right now because there's no good paper to say how to do this in an efficient way. The differences with all two different things. Cockroach DB always depends, like the host number in it and that sort of pseudo incremented. If someone was looking at doing this for project number three, I definitely think there's a paper here, I'm very interested in this. And there's also work we're doing basically it's in catalog in general of how to do this efficiently. So, the example I said before was like, when you add a column, right, it has to scan through and update every single, single tuple to add that column in. We think we actually can do this in a lazy fashion where you don't actually update the column right away, you add the new column, you just keep track of like, oh, I'm at this old schema version so that when I have to read something, when I read a tuple that's in the old version, I know how to convert it on the fly and we can do this LLVM. So that I'm extremely interested in this but we come to that later. I want to show one quick example here of, do I not have my SQL running? So my SQL, it's hard to see, is it? So this is the way to get, I'll just drop, I can see it, maybe it's a smaller, oh, fuck it. So my SQL doesn't store its catalog entirely in the tables the way Postgres and most systems do. They actually rely on the file structure to tell you what databases it has, right? So just do this real quick, it'll just make it easier to see, drop. All right, so I go back here, show my database, I have 10 databases, right? Well, it turns out, if I just go where the MySQL directory is, if I go right to password, Windows to Firewall to Vector, whatever, sorry. All right, so I go the var lib MySQL, you'll see there's a bunch of directories here and these actually correspond to the databases that it thinks it has. So if I just create one like XXX, now when I go back in MySQL, and I say show databases, XXX, right? So this brings up a whole bunch of other, a whole lot of concurrency control problems because now this is something that's outside the control of the database system, it's on the file system, right? So, lo and behold, if I go back over here and I drop my XXX directory, right? It's gone. So the main thing I wanted to say about this is like that the putting everything in the database is the right way to do catalogs. This question is, does it store the tables as well as in like, so in these directories, I'm not on MySQL expert, but in these directories, there's like these FRM files, right? You can't, I'm highlighting, you can't see it. These FRM files, I think correspond to the table schema in MySQL. Yes? So this question is, if I have a transaction that's running, I want to insert that database, but then I delete it, what happens? So first of all, does it even recognize that program? Because there's just, that's just creating basically no metadata. All right, so I think what you're saying, so if I, it's kind of real, we'll go through the normal process of creating a real database in MySQL, all right? I have to create a table through ID int, primary key, right? And now if I go back down here, I forget what I called my table or database, XXX, right? So here's XXX. So here's the files that MySQL created, right? And I see I created my table FRM and that's the schema information. So now your question is, if I start a transaction, insert into foo, values one, two, three, I can be able to read my data back, right? And you're saying if we're just an unholy monster and just delete it, all right? Done. Still there, zero is affected. Still there, wow. Wait, no, you know why, right? Because it's still in the process space of the OS or if it's in the address space of this, the file is still in the OS still has a handle to it. This process still has a handle to the file. All right, we spent all day on catalogs, as you can see, I find, so this is the problem I have in this course is like, I was like, man, I really want to talk about catalogs. I think they're super interesting, but like there's no paper you can read because we're trying to write it now. And I don't know if I can spend a whole day of just like, hey, let's play on catalogs, right? So real quick, yes? Why do we want my sequence and would it be able to do all the next? It is partly used by a global conference. So this question is, why do we want sequence? Because a lot of applications use it, but it will become a global bottleneck, absolutely, right? What user is it? What's that? What will use it? Oh, auto increment keys are used very, very often in data applications everywhere. Everyone uses them. Why not just give them like the same defective with like a dirty read? Give what, the sequence? They're not sequences, sorry. That's essentially what they're doing, right? They're like, but you have to make sure it's the transaction that updated it, right? Will create the log record for it. But that transaction may abort. Other transactions may read it and you wanna make sure that even though they didn't create the change, create the log record that modified the sequence, that's gotta get flushed to disk before you're allowed to commit because you don't wanna come back and say, well, the sequence is now four, but when I ran before, I got it at five and I committed and you told me I committed, but now roll it back and now someone else is gonna try to insert and get the same sequence as you and that'd be a problem. So it's like a dirty read plus some extra stuff. Okay, let's get the compression because that's, like I said, it's all good stuff, it's all database, it's all awesome, right? Okay, right, so now sort of like, again, this is the problem of this course is like I wanna talk about catalogs but then it's a huge jump to say, all right, ignore catalogs, let's talk about compression, but let's just go with it. Okay, so back when we had a disk-based system, the main bottleneck we always had to deal with in the disk-oriented system was disk.io, right? And in this environment, it was totally fine to take the huge trade-off of I'm gonna use compression to write my compressed blocks out the disk because yes, it's gonna cost me some CPU overhead to decompress and compress them going back and forth, but that pales in comparison of what the penalty I'm paying to read something from disk. So now in a disk-oriented environment, it's always a no-brainer to do compression because the benefit you get is quite significant. But now in memory systems, it's a bit more complicated because everything's in DRAM and reading data is really fast relative to disk and so we may not wanna pay a huge, biggest penalty for computational cost to decompress and compress as we did in a disk system. So we're still gonna do compression because obviously reducing the amount of the size of the database means that we can store more in the amount of DRAM we have, maybe run on a smaller machine using less energy. But there's actually some benefits later on too when we talked about doing query processing on compressed data because we're now gonna be able to do some queries more quickly because it's compressed because we don't have to do a large string comparison, now we need to look at simple integers. Or in other cases, if it's a bitmap, we can do vectorized execution to pick out things very efficiently. So the key trade-off, as always, in an in-memory database, or any data system, is to always be speed versus compression ratio and as far as I know, every in-memory database system is always gonna choose speed. It's the whole purpose of going to an in-memory system. So you don't understand what we can do with compression, we gotta understand what real data is gonna look like. And there's two key facets of them that we can exploit or that we'll be able to take advantage of when we design our compression schemes to leverage the fact that this is what the data looks like in the real world. So the first is that the data that we're gonna see in our applications are gonna have highly skewed distributions for the values. So what I mean by that is that for a single column, a large percentage of the tuples for a given attribute will have the same values, the same subset of values. So the best example of this is something called, comes from the Brown corpus. So this was from Brown University in the 1960s where they basically selected what they considered to be the paragons or the best works of American literature of all time. And they just went through and they counted how often each word appears in these different books, right? And it turns out in the English language, the words follow what's called a Zipian distribution or a parallel distribution. So that means is that the word that occurs the most, the word the, it occurs X number of times and then the second most popular word, A, appears half as many times as the first one and the third one appears half as many times as the second one. So you have this parallel distribution. So in a lot of data sets in real world applications, they're gonna look a lot like this. Are you thinking like on Reddit, right? There's a small number of users that actually post most of the articles or write most of the comments. And then there's this long tail what people barely ever chime in or say anything. So if we know this, we can try to exploit that when we design our compression schemes. The second thing is that our data sets are also gonna be highly correlated between different attributes of the same tuple. So the best example to think of this is like on Amazon, the order date, the order, the date that the order was made is only gonna be a few days behind the ship date, right? Usually Amazon ships something in two to three days, right? So these two dates are gonna be close to each other. So maybe I don't need to store the entire ship date. Maybe I get to say, you know, here's the offset from the original date. Another example would be in for like mailing addresses, if you have the city then you always know what the zip code is. So maybe I can be smart about how I store these things. So now when we design our compression scheme for our database, there are three goals we wanna try to achieve. The first is that we always wanna make sure that when we compress our data, so we have uncompressed data, it goes into our algorithm or whatever our compression scheme is, it's gonna spit out a compressed form of that data. We wanna make sure that compressed form is always a fixed length. And we know this because the last class we talked about how we wanna, always wanna have these fixed length data blocks and allows us to be very efficient to jump to a particular offset and say, here's the tuple that I need. The second goal is that we wanna have the database system, it's execution engine, as it's processing queries, delay the need to decompress the data for as long as possible. Ideally be nice if we can operate our query directly on compressed data all the way up through the query plan and only when we have to return the result back to the user or the application, then we actually go ahead and decompress this. And you'll see in the case of the naive scheme that we'll talk about, you can't do this at all because you can't, because the compression scheme's a black box, you don't know what's inside of it. But for the native compression schemes that we're gonna have built inside of our own database system, in some cases we can actually do this. And the last goal is that we wanna have our system be what's called a lossless scheme. So this shouldn't be news to anyone here, the difference between lossy versus lossless. So in a database system, we always wanna use lossless because people don't like it when you lose their data. So if I put some data in and I compress it and then I get that data out, I wanna get the same result. So I need a lossless scheme. The alternative is use a lossy scheme. This would be something like JPEG or MP4 where you can exploit the fact of how humans interpret visual data or audio data to throw away some frequencies and things like that that we as humans are not gonna be able to perceive. And so the original data goes in, you compress it and then when it comes out now, it's gonna be a, when you decompress it, it's not gonna be the exact copy of what you put in before. And so for video, that's fine. For like numerical data, like your bank data, that's bad, people don't wanna do that. So this is why we're always gonna be lossless. Now you can do lossy compression in a database system, but this always has to be driven by the application developer. So an example would be, say I'm collecting sensor data for the temperature in this room and I'm taking, reading every second. So every single second I have a new entry or new row in my table for what the temperature was. So now a year from now, I may not care at a one second granularity, what was the exact temperature at 336 p.m. Maybe I wanna put into buckets of 10 seconds or just take the average of them. So that's a lossy scheme because I'm storing less data because I'm taking what was 10 ticks, now putting the one tick and I can't go reverse it and put it back with the original data was, right? So this always has to be done by the application because the database system can't know what's an acceptable amount of data loss, right? You as the human have to decide this. The question is. Can we use like lossy compression in the like catalog battle for fast operation and when necessary we just go to like the base data to pass the real. The question is, can you use a lossy scheme for the catalog? Yeah, you were saying like only for like application level, can we use like the other levels and when necessary we just refer to the base data. All right, so I think what you're saying is you sort of have a cash version of it. No. What are you saying? We basically like keep the base data and have a lossy compression and do the log. All right, so I think what he's saying is can you like if you have a persistent copy of the entire database, say on HDFS or something and you keep that in its original form and then in memory you keep the lossy version of it. And when necessary you refer to the base data. Yeah, but when is it necessary? When like the operation cannot be performed on top of the lossy compression. Yeah, but it's weird because like how can you execute a query when you don't know what data, like how can you know what you don't know? Right? Like so I have my temperature data, I have my time ticks. I'm gonna look at it and say oh I have a time tick for this second and then 10 seconds later I have another one. If you don't know you have on HDFS you have all that missing data how do you know what query to ask, right? And furthermore like so it means you pretty much always have to go out the disk to see whether it was actually there. So nobody does that, it has to be driven by the application. What is actually close to what you're suggesting is what is called approximate query processing. And the idea here is you maintain the entire database in a lossless form but when you execute your query you don't end up actually reading all the data. So let's say I have, I wanna count the number of visitors to my website from yesterday and it's a very, very popular website. So I could just scan every single row and do a count or I can get, I can sample and get an approximate answer. So in a lot of cases people don't care having the exact answer for aggregations, right? I don't, if I had 999,000 users and the data says I have 998,000 do I really care? Probably not. So there are some systems, there's a BlinkDB was a project out of Berkeley and then it was commercialized as Snappy Data which is GemFire plus Spark. XDB is a sort of a wrapper layer around Postgres that does approximate query processing and then Oracle in 2017 added something basic, they have aggregation functions where you can say I want an approximate count, right? Does it mean that approximate queries can be supported by lossy compression? Is it because of your query? So his question is, does this mean, does approximate queries mean you can still use, you can use lossy compression? Yeah. Sorry, lossy or lossless? Lossy, can we use lossy compression since like approximate query? So the issue is the way these things work is when they give you an approximate answer, they give you a confidence interval, but it's not just like, roll a disampling, there is actually a lot of heavy math behind this to give you, to say here's your answer within some range of correctness. You wouldn't know what that is in here. Okay, so again this is another thing where like, I think it's relevant to discussion but we can't have a whole lecture on the zone map where you should at least know what they are. Zone maps are not a compression scheme but it's sort of the same idea we'll see later on that's gonna allow us to be able to skip over large segments of data. So it's sort of like approximate query processing but we're gonna get an exact answer. It's allowing us to avoid reading things we know we don't need to read. So the basic idea is that for every single block in our database, we're gonna pre-compute all the aggregates that we could possibly execute on the data in this block. So then when we run a query, we go check the zone map and say, do I have, is there even possible there could be data that I need in here? And if yes, I go actually scan the data, if not I skip it. So in this case here, say I have a single table, it has a single column called value and then I have five attributes. So I can pre-compute the min, max, average, sum, and count for everything here. So now my query comes along, select star from table where val is greater than 600, I can do my look at my predicate and say, well I'm looking for values that are greater than 600, what's the max value for this table? Oh, I see it's 400. So I know there could be nothing in this table that could match my predicate, so there's no reason for me even to bother scanning it. So we actually support this in Peloton. We had a student last year that took the class and then did independent study with me in the fall. We actually now support this in our system. So the zone maps are just stored in the catalog. It's just another data table. And the way it works is that when we mark a block of data as immutable, then there could be a thread come along and pre-compute these zone maps. And then any time you need to modify something in the block, then this invalidates this because it's all transactional, it happens for free. And then we can re-compute it again. So for some queries, you can get a huge speed up on these. So this is used, the zone map term usually comes from Oracle. It's used in a lot of different systems. Vertica does this, and SQL does this. I think Green Plum as well. All right, so we're gonna now talk about what compression is gonna look like for us. And the first question we gotta deal with is, what's the granularity of the data we wanna compress? So the way to think about this is like, I'm gonna have some compression algorithm and I need to compress a chunk of data. What is the scope of that data that I'm compressing? So the easiest way to do this is at the block level. So I take a block of tuples and compress all the data that's inside of it. I can do this on a tuple level where I go grab an entire tuple, if it's a row store, and just compress the data that's inside of it. I can do this on an attribute level where I just say for a single tuple, a single attribute, let me compress it. So this is what Postgres does for their overflow variable length field. So they write it out to this thing called toast, it's a separate storage engine. And then they do GZIP on the toast data. That's doing it on a single attribute. The one we're gonna spend most of our time talking about in the class is doing it at the column level where we take one or more attributes within a single table and compress them all together if they're stored as a column store. So I'm gonna spend a little time talking about the block level, and then we'll spend a little time on the last one here. And this one's sort of obvious, right? If you have a Varlin pool, you can compress things inside of it. This one comes up in disk based systems when you do record level compression because we'll see how it's better than the block level because it's gonna allow you to jump to a single tuple and decompress it without having to decompress the entire block. So the first scheme I wanna talk about is called naive compression. And I'm saying naive because the database system doesn't know anything about the compressed data. As you take any off the shelf compression algorithm, whatever your favorite one is, like GZIP, BZIP, whatever, and you're gonna take a block of data in, compress it and then write it out and store that. So the, as I said in the beginning, in a in-marry system we care about performance more over compression ratio. So you actually don't wanna use something like GZIP or BZIP, because those are really slow. And instead there's these other classes of compression algorithms, like LZO, LZ4, that may not get as good a compression ratio as GZIP, but they can compress and decompress much faster. So as I said, LZO was the first one, LZ4 was an improvement over it, and then Google released their compression algorithm called Snappy in 2011. And then the current standard that everyone says this is the best is Z standard from Facebook from 2015. So let's see how this would work in a database system. So these naive compression schemes are essentially gonna be actually really any compression scheme is gonna fall into two classes. So you can have what's called entropy encoding where we're gonna try to find sequences, a common sequence of bits in our data, and we wanna encode them using a smaller value for them. And then the ones that are less common, we end up having to store them with larger sequences. So if you know Huffman coding, this is the same idea here. What's probably more common, and most of the techniques we'll talk about is dictionary encoding. And basically what's gonna happen is we're gonna find all the different values in our byte sequence, and we're gonna extract them and then map them to a smaller encoding. And then anytime we wanna decompress, we can go look up in the dictionary and say, for this code, what was the original value? So let's see how to do naive compression in my SQL because I think this really illustrates the difference of what we'll talk about for columnar compression versus what they do. Question? The difference between the key map is this impressive data. It's like they use different bits. It's like variable length or like fixed length. Say it again, sorry. Like the first one. Yeah, this is variable length and this can be fixed. Yes. And what would be the other differences? I'd have to go look. I don't think you always need a dictionary for this one. I think there's different ways to store it. Okay, so this is what my SQL NADB does compression. They added this I think in five, seven or five, six. So whenever the data is stored out on disk, it's always gonna be compressed. And I think they use GZIP, I think MariaDB can use Snappy, but this is always stored on disk in a compressed form. And so each disk page has to be one of four sizes, either one, two, four, eight. And so what happened is when it's in memory and you compress it before you write it out, if you take the ceiling of the size of the page that you're compressing and then you figure out what's the next value that I should, or the bucket or category of this page size I should fall into, and then you have to pad it out so that it fits in this. And they do that because they wanna organize fixed length pages out on disk. So then let's say I have a query comes along and I want to read a tuple that's in a page. I go out on disk, I fetch it, I put it into my buffer pool. So the one thing also too I'm pointing out here is that for every single compressed page they have this thing called the mod log that's a prefix for it. And you can think of this as like the delta chain we saw on the BW tree. So whenever I have to make changes to data that's compressed in this block, I can actually just write the entry of the change to the mod log instead of having to decompress it and then update it. So this allows you to make changes to things that you know are in this page without having to maybe decompress it. Now of course the mod log will get too big at some point and then you have to decompress it and then merge it all together. So now say, so again I wanted to update on data that's stored in here, I can do it on the mod log and it stays compressed. But if I now need to read the data inside of it then I have to uncompress it. And in my SQL the compressed size is always 16 kilobytes of a page or the page size is always 16 kilobytes. So the thing to point out here is that they always keep the compressed copy and the uncompressed copy around together in memory. And I think they do this for internal bookkeeping so they know that if I read this and I never actually modified anything I can blow this away and keep this here. Because if you blow this away then if I need to get rid of this thing I have to compress it and put it back out. But if I know I haven't updated anything then I can just drop these frames in my buffer pool and then reuse the compressed one if I need to be. So this just reiterates everything I said here but anytime you need to read something you always have to uncompress it immediately. You can't do any of these sophisticated things we'll see later because the system doesn't know anything about the compressed data. Like when I run GZip on it, right? GZip is actually gonna be doing dictionary encoding and it's gonna maintain its own dictionary table that we're gonna have to do in our own database system but that's not exposed to the database server, the database system. So in order to interpret any of the data it has to decompress it, always. So that's gonna suck, that's gonna be slow, right? And then the other issue is that we're not gonna be able to leverage in our queries we can't leverage the fact that we know what the query is actually trying to do to maybe use the data in a, or to leverage the compressed data in a sophisticated way or a way that requires us, allows us to do less work, right? So in the case of some queries where you're doing an exact comparison or maybe like a natural join, we can actually do the join or the comparison between the two if the two sides of the predicate are compressed in the same way because it's gonna be an exact match. So range queries are tricky, we'll see this later. But what I mean by this is like say we have a query here or say we have a table where it's Andy and Prashant and we have our salary and if I use some compression algorithm that then stores the values in this compressed form if I take my query that wants to find all the users where name equals Andy if I run this constant here Andy through the same compression algorithm and end up with the same value now I can do my lookup in my table directly using the compressed form of the predicate, right? Because otherwise I'd have to go to decompress every single tuple and then do the check. So this is one of the key, this is a good example of why we wanna do compression in our data system ourselves. We don't wanna let the operating system, we don't wanna let some external servers do it for us. We're always gonna be able to do a better job. All right, so there's six compression schemes I wanna go through. Some of these you may have come up in the context of other classes you've taken but I'm gonna again, these we describe strictly and for database systems is what we care about here. So we do run life encoding, bitmap encoding, delta encoding, incremental encoding. Most encodings not truly compression but we'll see why later and then we'll finish up with dictionary encoding. All right, so run life encoding, the basic idea is that in a lot of cases in certain databases there'll be large runs of continuous values that are gonna be exactly the same. So it would mean that say I have, say I'm collecting my log files for my various services and I'm putting it into my table and I'm amalgamating or combining together different services into the same table. So there'll be some services will have values for some attributes, some services will have values for other attributes. So there may be long runs of consecutive zeros in my application. So to have this stored in that table instead of storing that zero over and over again I instead can just store a triplet that says at this starting point in the table there's a value zero and it's repeated this number of times. And then when I want to then do my query on them I know how to say, to interpret that triplet and say all right, well if I was at position 10 what would be the actual value here? So we'll see this in a second but the way to get the biggest bang for the buck for run life encoding is if you can sort the columns ahead of time in a way that maximum is the length of all the runs that have the same values. So in this example here we have a single table has two columns ID and sex. I am for simplicity reasons to say sex has either male or female, right? But the same idea applies for other things. All right, so if I want to compress the sex column say I can store this as eight bit integers and eight bit integer like zero or one. So what I'll do instead of just repeating male, male, male, female and over again I have this triplet now that says I have a value male at offset zero, the starting point here and then it'll get repeated three times, one, two, three. Then in the next triplet I have female and it's at offset three and repeated one time and then so forth. All right, so now if I have a query to say is I can do an aggregation or count the number of males and females I can just scan through this and find what I need. If I need to say for position six what is it a male or female? Then I sort of have to scan through and do the math and figure out where I fall in these triplets. So this is actually a bad example of run length encoding because this ends up actually storing more data for the compression scheme than the actual original data. Because I have these three values here that have to store that for a single male or female tuple I have to store the entire triplet. So if I sort my column ahead of time so all the males appear first followed by the females now when I do my run length encoding it's super compact because now I only have two triplets. So if you can sort things ahead of time you can get a huge win for using run length encoding. So I think if I had a table of a billion people and instead of storing male and female for every single person if I just sort them on male and female then for I can represent for two triplets a billion different attributes or different values for my entire table. Yes. If you're using only the row stores we're often not being able to actually violate small amount of attributes. So your statement is if I'm using a row store then this doesn't work? Correct. It's well it would work but it would suck. Right? So when I say sort like so if I go back here like I'm sorting also the ID as well. So like things got shuffled. So that it still matches so that like the offset here matches what the region value was there. Yeah. So we're in like the benefits of like ordering. If you had a row store this would it'd be bad. Yes. All right. So rather than coding is cool because it's gonna show up later on because we actually can piggyback or can daisy chain different compression schemes with each other. So run like the coding we can actually apply after we use another encoding scheme or compression scheme and get even more even better compression ratio. All right. So next is bitmap encoding and the idea here is that instead of storing all the unique values in our column we're gonna maintain a separate bitmap for every single unique value that I have in my column. And then in the bitmap there'll be a one or zero depending on whether the original tuple at that offset has a particular value. Right? So the ith position in the bitmap corresponds to the ith position in the table. And so you don't actually malloc an entire bitmap to do this like if I have a billion tuples and I'm trying to malloc a billion bits, you break them up into chunks and you allocate them to smaller segments. So as we'll see in a second this encoding scheme only works if you have a small number of unique values in your column. And we'll see why in a second. So let's go back to our example here, the male female. So what I'm gonna do is to encode this I'm gonna have one bitmap for all the males and one bitmap for all the females. And again for a particular offset position four or in this case here since the original value was female I'll have a one here and not a zero. Right? It's pretty easy to understand. We'll see later on in the semester that there's variants of bitmap encoding called the bit slicing where we actually can do some pretty cool query optimizations on this using vectorization where now we can exploit the fact that we're dealing with bit vectors and do certain aggregations very quickly. Like for example, if I want to now count the number of people that are male I only need to do is look at this bitmap and I can use primitives vectorized instructions just to count the number of ones. And I can do that really efficiently with a few number of instructions and I don't need to scan and look at every single value by itself. So bitmap encoding as I said only really works if your data has the column that you're trying to compress has low cardinality to give an example why or when it goes bad let's say I had this table here the customer dimension table and I want to do a bitmap compression on the zip code table. But the first question is how many zip codes are there in the United States? Let me take a guess. It says 100,000 close. Well, it's about 43,000. So if I have 10 million tuples and I have 43,000 zip codes if I just want to store the original data it's 10 million times 32 bits to store this as a regular integer. So I can store 10 million entries those zip codes in 40 megabytes. But if I want to do bitmap encoding whether that's a bitmap compression the total size is gonna be 53 gigabytes because for every single unique zip code I have to have a giant 10 million entry bitmap. Yes. Well, you can only have one zip code why do you need to do like in a unary way of writing all these bits can you just do like a log and turn it out via whatever your space is and tell them? Your statement is, yeah starters, say it again, sorry. So it looks like if I can only have one zip code Do you line one zip code, yes. And then I don't need to write out all of 40,000 that I can just say or for a log of that size each number of events needs to go and store a mapping somewhere else. Yeah, but how big is that mapping? Once the size of the entire space, right? You write the space out sequentially and then you get a log thousand plus points. Let's take this offline. I think you're missing something or maybe I'm not understanding what you're saying. Like succinct encoding? Let's take this offline. Okay, all right. So again, so the other issue you have too is also maintaining this because every single time you insert a new tuple now you have to extend all 43,000 bitmaps, right? And that's bad. So the map encoding is also very common but you as the human DBA have to decide when is the right thing to do this. So this is supported in a lot of systems because the speed that you can get is a lot. The next most common compression scheme is called delta encoding. And the idea here is that instead of storing the exact value for every single tuple we can rely on the fact that there'd be some temporal locality between tuples that appear next to each other. And we exploit that and normally have to store the difference between the consecutive tuples rather than the entire value. So let's say that I have my simple application that's recording the temperature in this room every minute. And so the first thing we see is that in the case of the time the time only goes up by one minute and in the case of the temperature it's, you know, the values are very close to each other it's not fluctuating greatly, right? It's not going from zero degrees to a thousand degrees all of a sudden, right? It's 9.5, 9.4. So what we can do is we'll select the first value the first tuple in this table to be what's called the base value and then we'll store that in its original form and then all of the other values that come after it are deltas upon the previous, right? So there's sort of two ways to do this and the example I'm showing here it's plus one, plus one, plus one, plus one. So if I want to know what this original value was I have to apply all these changes to get down to here. The alternative is actually you can just store it at being the difference from the single value to the base value. So in that case here I'd have to store plus four. So the reason why you may, most systems actually store it this way is because we see here we have another example of consecutive runs of the exact same value where you just keep storing plus one, plus one over again. So actually we can do the delton coding end up with this form and then do run length encoding now and then the store that we have plus one appear four times. So now we can compress it even further, right? So there's another example where you can piggyback compressions needs off one another and get better results. All right, so a variant of delton coding is called incremental encoding. And this looks a lot like the prefix optimization we saw in B plus three leaves a few classes ago. And the idea here is that when we know that our data has common prefixes we can just store in it for a single tuple we can just store what portion of the prefix with the previous tuple that they share and then they just have your unique suffix. So I have my table, I have Rob, Rob, Robbing and Robot. And so the first thing we're gonna see is that we start with the first value we always store that in its original form so there's no change there. But then now we need to go through and figure out what is the prefix for each tuple what prefix did they share with the previous one? So in this case here Rob shares ROB with the guy that comes up for it so we know that its prefix is just theory, ROB. Same one for Robbing it actually shares four entries because it shares ROBB and then Robot just shares ROB. So now once we compute the common prefix our compressed data form now says here's the suffix that's unique and then here's the prefix length that this entry shares with the guy that comes up for it. So in the case of Robb it shares the first three characters with the one that came before it ROB and then the unique suffix is BED and that puts me back into the original form. So in this case here in order to compute exactly what the value for this one last year you have to apply the changes all the way down. So typically again you break these up into blocks so that way you're not trying to uncompressed a billion tuples it's some smaller amount. And the next one is again as I said it's not really a compression scheme it's more of a encoding scheme but I like it because it's so simple. This comes from Amazon Redshift. So they have the ability to so what you can do is you can declare that a column is a mostly integer. What that means is that it's say you have a column where some of them might be 64 bit integers but most of them are gonna be something smaller. And so you say that I can actually store this data as a smaller data type and any time I have a value that exceeds the max size for this data type I can store this in a separate space. So in this case here say I have my table I declare this column as 64 bit integers and we see is that most of them can fit into a smaller integer size but I just have this one here that's large. So in the compressed form I'll have the column the mostly eight column where I'll store that those values that we had here originally these would be stored as eight bits and then for that one entry that was really large I'll have a special flag and say why this offset the data is actually not here go look up in some special table based on that offset and that's how you get the original value. So now as you're scanning along as you're trying to do whatever you process your query when you come across this you know you need to jump here. So this works great if most of your values can store it in this smaller form it'd be bad if you get this wrong because for every single entry you look at you have to go look up in the table to find the original value and defeats the entire purpose of getting compressed compression. So like I said redshift supports us I don't know if anybody else does. All right, so in the remaining time I want to talk about dictionary compression because this was the paper you guys read and this is probably the most prevalent compression scheme you have that's available in those data systems. If a data system says they're doing native compression chances are they're doing dictionary compression. So the key thing we have to deal with this is that we need to make sure that we do fast encoding and decoding because we need to be able to get quickly transform our values from the decoded form to in the coded form and then vice versa. And then we're also need to make sure that we can support range queries because we want to be able to operate directly on the compressed data without having to go do you know decompress everything. So the other two design stages we're having to deal with is when should we actually construct the dictionary for our data and what should be the scope of that dictionary. All right, so we'll talk about those two and then we'll finish up talking about how we handle this. So there's essentially two choices when you decide when to build your dictionary. So the first you can just do it all at once. So basically you someone invokes a command and then you just scan everything and then you build a dictionary for all the tuples at that given point of time. And then what's gonna happen is that if anybody comes along and updates the table if they have values that don't fit in the original dictionary then you need to start storing them in a separate dictionary or leave them uncompressed and then at some later point rebuild the entire dictionary. This is problematic because again you basically have to scan the entire table all at once, build the dictionary and at some later point you do it all over again. It's maybe not so bad if you're using like a vacuum when you're going through and cleaning things up. But in practice for an in-memory database this is bad. A better approach is used incrementally and this will fit with the different grand layer to your compression scope when we talk about the next slide. The idea here is that we try to be intelligent about how we merge in new tuples with the existing dictionary. So meaning we'll maybe leave some of that extra space in order to absorb new values without having to re-encode everything. And at some point if we run out of space and we can't do this incrementally anymore, we can't accommodate new values then we have to do the entire recoding again. So these two both have plus sides and downsides that have managed and disadvantages but the incremental one is what everyone tries to do. So last is now the dynamic decision is what's the scope of our dictionary? And so this is sort of like the grand layer of the stuff we talked about before but it's more like what is the, we can still do compression on a column but the question is how much of a column should we look at? So what's probably the most common way to do this is sort of block level compression. And the idea here is we take a subset of tuples in a single table and we build our dictionary just for that block and then if there's another block it has its own dictionary and its own compression scheme. So you have to be a little careful now to make sure that if you have values that are in the two different blocks ideally you want them to have the same dictionary code but maintain separate dictionaries for them. So there's a little extra manager you have to do to make sure that this happens because you wanna know that if I take tuple with a value a in this block and a tuple with a value a in that block I wanna know in their compressed form whether they're equal. So you can't always do that but that can be tricky. This is most common especially in disk based systems and this is partly for like some of the software engineering or recoverability point of view. So in the case of Oracle they famously only do compression on a single block because they want every page in the database to be self-contained. So that means that the dictionary itself has to be stored inside the data page because that way if you trash that page you only affect that one page. If you had the dictionary stored in another page and the compressed data is in this page if I trash the dictionary page now that basically corrupts a whole bunch of other stuff because I don't know how to decode it. So they always make sure that every block if it's compressed is always self-contained. The next approach is to set the table level compression and again the idea here is that we basically have a single dictionary for the entire table. This is gonna be allowed to have a really great compression ratio because we don't have to maintain separate dictionaries but it's gonna be expensive to update because we may have to re-encode everything. Re-encoding basically says it's in a compressed form now and I gotta put it back in a compressed form but the mapping from the dictionary code to the value may have changed so I have to scan through everything and update everything all over again. The last choice is to do this across multiple tables. Some systems actually do do something like this. This is useful when you wanna do things like joins. If I have a foreign key between two tables if I use a different dictionary encoding scheme for the same column or the same shared column on the two different tables now what I wanna do with the join I have to decompress one of them then recompress them and put it back to the encoding from the other table or decompress both of them and then do the join which is the worst case. There's another technique to do multi-attribute encoding. I don't think anybody actually implements this but it does show up in the literature or not. And the basic idea here is that instead of having sort of a one-to-one mapping like a dictionary per column you actually combine multiple columns together and use a single dictionary. So in this case here say I have a bunch of same pair of values between value one and value two appears multiple times so instead of compressing each of these individually I can now with the single dictionary code have it represent the two columns together. So the example I said before was zip code and city. There's always a one-to-one mapping between a zip code sorry a city to a zip code and so you could have that be stored in this multi-attribute compression scheme like this. Yes. So we're being valid when you only access one attribute. His question is how would this work when you only access one attribute? When the predicate involves. Right, so we'll see this in a few more slides but you actually can do predicates directly on the dictionary itself. So you can say I say I want to find all the yeah value one equals a actually take value two equals one-to-one. I can do my lookup on dictionary I find that here's the value one-to-one here's the two codes that match that and now I know exactly what I want to look up. Okay. So the for encoding decoding we want this be fast obviously and it's pretty basic understanding right we have an uncompress value we encode it we get back the dictionary code for it and then if we get that dictionary code back to the decoding scheme then we get back to its original form. So the first thing to point out here is that there's no hash function that's gonna be available to us to do this for us. You can't use something like MD5 because that is a one-rig hash function. Meaning I put my value in I get my hash code I can't go back and reverse it. And I don't want to use a compression scheme or say an encryption hash function like SHA-1 or whatever because what's that? It's costly and it's also a variable length. So if I have say an attribute that has four characters or say a thousand characters but I have two different values for those thousand characters the encryption hash function may actually generate will generate different lengths. So that means that we're gonna have to maintain our own data structure that's gonna allow us to be able to map, go both directions both through encoding and decoding. And so the key property that was in the paper you guys read that we wanna maintain is that we wanna make sure that our compression scheme is order preserving. So that means that the values have their own left score graph for ordering in their uncompressed form and we want their compressed form to also follow that same ordering, same sorting scheme. So if I have a tuple here, Andrea Prashant, Andy and Dana the dictionary codes that I'm gonna generate for these values needs to have the same ordering. Yes. What would that be possible in general that it looks like you're fitting more information that we have to the hash key? Like there are strictly more characters than numbers that numbers I can map to. All right, so your statement is what, sorry, that the for example I take Andrea and I just keep attaching A's to it. Yes. One of them will be more than each other graphically but I run out of numbers very soon. Okay, I'm missing what you're trying to say about. So why is it possible for me to map strings up to numbers and preserving orders when there are different cards out? Why is this possible? Yeah. Are we placing some sort of restriction on the key back to the computer? No, there's no restriction. It's just like this code, we're generating this. This is us, we're gonna do this. This is not like a hash function, right? This is in the paper you guys read. They basically maintain a tree and they generate these codes for every new key that I need to put in and you make sure that it fits the same lexical graphical ordering. No, why would this be very reliant? Third, you've been an integer, right? I can now store, instead of a var char which would be some arbitrary length, I can now store this as a 32-bit or 64-bit dictionary code. So now it's fixed length. Let's keep going, we're short on time, sorry. We can talk about afterwards. So as I said to him before, one of the advantages of this now is that we're gonna be able to use the dictionary code, the dictionary table we're gonna generate to actually allow us to run queries directly on the dictionary to find the things that we want. So in this case here, if I'm doing a lookup one and find all of the entries or the names that start with the prefix A and D, I actually rewrite this to be a between query because I know that I can do a sort of preliminary scan on the dictionary and say, well, these are in sorted order, so this thing, the key, the name Dana, does not match my predicate, so I want anything that comes before this and Andrew is the first key that I have and that would match and it comes after that. So I know I only wanna look at tuples that are in this range and I can rewrite that to be a between query. And we can do this because the dictionary preserves the same ordering as the original values. Yes. How does it handle like revalcing the codes? Like if you insert like 11 things between the end and the data. Right, so we'll get this. So his question is, what happens if I now have, if I insert, yeah, so here, if I insert 20, 100 things in here, what happens? You have to re-encode, right? So here I'm showing you, I'm leaving a little slack space that you could insert maybe 10 or nine things before you have to re-encode everything. Right? Who can reuse those kind of things, sacrifice of like equal, like predicating? So his question is, can I reuse a code as a bed, as a lossy compression scheme? Right, so if I reuse Andy and Aaron for code 20, is this Andy or Aaron, which one? Yeah, so you have to, you're for the base data. There is no base data. Where did, hey, where did they do this? This is the base data, right? Okay, so look at some other examples real quick. So if we wanna do, if we just wanna do a lookup on the name, and we do the same predicate before, we still have to perform a substantial scan because we need to know, we need to get all the unique names that we have, right? If we rewrite this to be a distinct clause, in this case here, we can actually just run it directly on the dictionary. And once we find that there's the unique values that match our predicate, we can stop there. So the main thing I'm trying to find out here, for some queries, we can actually just go directly and process the predicate directly on the dictionary without ever going back to the original data. All right, I realized about the time, but real quickly, I just wanna say, you don't wanna use a hash table because you would lose the ability to arrange a prefix queries. And when we talk about hash table implementations, you are gonna store the key inside of the hash table, but you have to look in the buckets to get them. And so what you wanna instead use is a B plus tree, and that's what they use in the paper. So it's gonna be slower than a hash table, but it's still gonna allow you to do all the type of queries that we wanna do. And so the main takeaway I want you to get from the paper, what makes it interesting is that they essentially had, they maintained two trees and they share the leaves of the, they share the same leaf pages across the two trees. Right, so these are gonna be sorted leaves that's gonna be the values mapped to the codes. And so at the top here, you would take the original value and then figure out how to go down, you know, do look up in the index, then find your table here and do binary search to find the thing you want. And in the case of the bottom one, you take the compressed value in, do the lookup, and this one you had to do a sequential scan because this will be sorted on the original values, but then you can find your value and return the original form, all right? So this is an example of doing also incremental encoding because I'm leaving 10 slots here to allow me to insert a bunch of stuff. And then at some point, if I exceed that boundary, depending on how I organize my pages, I may just need to only re-encode the things that are stored inside here, all right? All right, so we're out of time, I'll stop here. The, yeah, the indexing stuff, I'll just say that there's tricks to do to actually compress B plus trees, but essentially you put it into a static form. I can send the link to the lecture from last semester to talk about this. All right, so next class, oh, you lost it, all right, whatever. Next class, now we're gonna switch over to logging because now we're gonna start talking about, again, we're sort of going down the storage architecture. Now we make our changes to the table. We wanna make sure those things always make it out of the disk. And then on Wednesday, we'll start by taking checkpoints to sort of, the idea is there to take snapshots of the database and write that out of the disk as well because we still need the disk for recovery, so we don't need it to actually restore the main data. Okay? Any questions? Mm, I need something refreshing when I can finish manifesting to cold or whole bow like Smith and Wesson. One court and my thoughts hip hop related. Ride a rhyme and my pants intoxicated. Lyrics and quicker with a simple moan liquor. To summer city slicker, play waves or pick up. Rhymes I create rotate at a rate too quick to duplicate philipines as I skate. Mike's a Fahrenheit when I hold him real tight. When I'm in flight, then we ignite. Blood starts to boil, I heat up the party for you. Let the girl run me and my mic down. Well, all your records still turn with third degree burn for one man. I heat up your brain, give it a suntan. So just cool, let the temperature rise. Then cool it off with same eyes.