 Alright. Alright, let's get started. So, real quick, has everyone at least started the skip list project? I know one team hasn't, won't name names, that's okay. Has everyone at least looked at the documentation and understand what you're going to be doing? Again, I told you guys that writing a skip list is easy, and I don't want you to, you know, I don't want you to say, oh, that means I can put it off until the day before it's due. Writing a generic skip list that's not concurrent and doesn't have to do reverse iteration is easy, but doing those extra steps is a little more challenging, because you have to get the ordering right. So, the sooner you get one or the better. The other thing too is, last night I posted an announcement on Piazza of where to get access to the MemSQL machines. You don't SSH directly into Narwhal. There's a web interface you can reserve one of the machines and then you can SS it into it through one of the PDAL machines. Has anybody tried to do that yet? Okay, so that one, when you get access to the MemSQL machine, you will have root access. You will be able to install all the packages and everything you need. And again, there's instructions that tells you what will get wiped after 24 hours and how to make sure you can save data in your home directory, then that will be preserved after every wipe. And again, if you're not using it, give up the lease so that somebody else can come along and use it. So, for today's class, we're now going to switch and start talking about more OLAP things. So, up into this point in the semester, we've been focusing on transaction processing, transaction processing for concurrently tool, and then the last two or three lectures have been about these latch-free indexes to do fast transaction lookups. So, now we're going to start, at this point in the semester, we're going to start switching gears and start bringing in more of the analytical operations that we're going to want to do in our database system. And we're going to see what components we need inside of our database system to accelerate these types of queries because they're going to be much different than the transaction processing workloads we've looked at so far. So, we're going to start off talking about the different type of OLAP schemas that you can have. And then we'll talk about the projection and the columnar indexes from the SQL server paper you guys read. And then we'll spend most of our time after that, for the sort of the second half of the lecture, talking about bit-bat indexes. It's because these are more common things that you'll have, and it's the most common type of index you would have for OLAP queries. So, just to understand the sort of the overview of what we're looking at from an application standpoint of view, the typical way that people do, you know, want to do analytics on their databases, they sort of have this setup like this, where you sort of have the front end OLTP silos, right? And this is where all the transactions are coming in. You're updating the state of the database. And usually these guys can be, you know, sort of independent from each other, right? You could have, you say, you're like Zynga, this OLTP database would be for one game, this would be for another game, and so forth, right? And then what happens is you're going to take the sort of the change log or the updates that occur in this OLTP database, databases, and you're going to stream them into this extract transform a load component or ETL component to basically do whatever cleaning you need to do and convert the data from the OLTP silos into sort of common universal schema. And then you're going to stick this in your OLAP database. And this will be a giant data warehouse where you'll have all the data from the front ends, and this is where you're going to run your analytical queries. This is typically how people have been doing this for the last 15 or 20 years. And then for this class, we've been focusing on, you know, we'll start talking about how to do more htap things, the hybrid workloads, where instead of having had this strict partition between the OLTP side and the OLAP side, we can now start doing some of the queries you'd want to do on the OLAP database directly on the front end. But we're not there yet. So today's class, we're just basically going to focus on how do you speed up queries that are running on the side here. So in the paper, they talk about how the columnar indexes that they added for SQL Server were designed to accelerate decision support systems or DSS applications. And DSS applications are these are applications that people build that allow the an organization to look at their data and analyze it to help them make decisions about, you know, future actions they want to take. The idea is that you would look at all the historical data you've collected from your OLTP front ends, right, because that's where the new information is coming in from these transactions. And you'll do analytics on them to answer questions to help you figure out what to do in the future. And this is sort of like, you know, this is sort of the with the promise of a big data was really all about, right, when big data first came out was idea was like, we're going to store everything and then do analytics on top of it to figure out, you know, what should we do next, or try to find interesting trends. Nowadays, it's sort of transition big data to means also sort of this machine learning stuff that people are doing. But for this, for this class here, we're just focusing on the traditional DSS workloads. So the example that I always like to give is actually one from from Mike Stonebreaker. The way to think about a DSS system, a classic example would be, say your Walmart, and you have a database of every single item that anybody's ever bought in history of Walmart. And there's a hurricane coming up next week, and it's going to hit Florida. So what you want to do is go back into your historical data and say, the last time a hurricane hit Florida, what was the what items were bought the most the week before the hurricane and the week after the hurricane. Because then now you can tell all your suppliers and whatever, whatever logistics companies you're working with to make sure that those products are available at that your Walmart's in in Florida, you know, in preparation of that hurricane. Right. This is different than like, just saying, Oh, I think people are probably going to need water, right, just guessing and figuring out what to ship to the store. You can actually look at the data and know exactly what people bought. This is what this is a sort of great example of what decision support systems are used for. So we'll see also now the way you model a DSS application, the database schema, because it can look a lot different than how you may want to model a an old TV application or old TV database. And in particular, in the paper you guys read, they talk about trying to support star schemas and snowflake schemas or star schema queries. Right. So I want to spend a little time understanding what these two differences are. And again, this will motivate why we actually want to build the the OLAP indexes that we're going to talk about today. So let's say we have a sort of simple application. That's the Walmart example again, right. And what's going to happen is there's we're going to have this schema where we have sort of this center table here and everything is going to emanate out from out from it. And this is sort of why it's called a star schema. Like you have the center of the star, then all those sort of things coming out of it. So this this table in the middle is called the fact table. And this is where you have all the events or the facts of things that occurred in your application. So in the case of Walmart, this would be the fact table would be all the items that people have bought, right. Every single time someone went and bought something online or went through to the cash register and scan something, you would get a new event in this fact table here. And so what you see is that inside the fact table, we're not actually going to store any extra information, redundant information about the items that were bought, right. We'll always have the price and quantity because that's unique per purchase. But for things like what product it was, what customer bought it, where it was bought, for these we're going to have foreign key references to the outer tables. So in this case here that we have a product foreign key reference to the product table here. And these outer tables are called dimension tables, right. This is where the actual data that's stored for a particular product or location, this is where it's located, right. So anytime again you want to do a say, you know, figure out, you know, find me all the products that were bought in this location that were made by this manufacturer, you would do a join between the product table and the customer table and the dimension table. Or in that case a location table, right. This looks a lot different than some way that we how we think about O2P applications because the O2P database is usually look like a tree, right. A warehouse has a district, a district has a customer, a customer has orders, orders of order items, right. This is sort of modeled differently. So now contrast a star schema with a snowflake schema. And so this is the same application we had before, but now instead of just having sort of one level of dimension tables outside the center, we have multiple levels, right. So now the dimension table can also have a foreign key reference to another one. So in this case here, right, we embedded the category name, category description for a product inside the product dimension table, but now we're going to have a foreign key reference from the category, from the product, product dimension table to the category dimension table, right. So this is just, you know, doing, this is just normalizing further the star schema, right. So before, you know, this is, this was a lower normal form, now we're, you know, this is 3NF or BCNF. So why would you want to do one versus the other? What's an obvious thing, right. So one, in the case of the, the star schema, we're repeating the category name over and over again, right. This is, again, this is the normal forms that we talked about in the intro class. The other thing is that now to do joins, so although we're going to have less redundant information, because now we're going to have foreign key references to further dimension tables, it's going to make doing joins more complicated, because with a star schema, it's only, you know, one level of dimension table, and the snowflake schema, it could be multiple levels, right. So the two trade-offs, the two issues we have to deal with, whether we want to use a star schema or snowflake schema, is first, when we normalize things out in a snowflake schema, it's going to take up less storage space, because we're not repeating, you know, attributes over and over again, but now, but it relies on us in our application code to be good stewards of the database and make sure that we don't have any integrity or consistency violations. We need, we change the name of a manufacturer or a category in our product dimension table, if we're using a star schema, we need to make sure that we update all of those entries, so that way, you know, things don't look inconsistent. And then as I said, the other issue we deal with is query complexity. So as humans, the snowflake schema looks better, it looks like it's cleaner and nicer, but from a database system implementation standpoint, supporting snowflake schemas is more complicated, because now you have these multiple levels of joins you have to do, and figuring out what the right way to do those orderings, as we know, is very difficult. So typically, it was always the case that a query on a star schema will always be faster than a query on a snowflake schema. So when you read like the literature about how to like set up a data warehouse or a DSS database, they usually will recommend that you do a star schema, right, for this reason. You're willing to pay the storage penalty of denormalizing your dimension tables in exchange for getting faster runtimes. So if we think about now this whether we're using a star flake schema or a snowflake schema or a star schema, we want to think about how could we add an index to speed up queries in our application. So it's kind of obvious that for a dimension table, it's okay for us to use a B tree because these things are going to be much smaller, right? Think of like again going back to Walmart, the fact table is going to be every single item that anyone's ever bought. It's probably in the high, you know, high millions or billions. But the number of products that Walmart could sell is much smaller than that. So the dimension table is going to be much, much smaller than the fact table. So we can put a B plus tree or a skip list or whatever we want on the dimension table and that's fine. But now to do this on the fact table is actually going to be tricky or very wasteful. I guess think about if you had a B plus tree on looking up like someone's zip code or some some basic information about like the fact table, then you're basically going to have this sort of not a really wide fan out on the B plus tree, but then this huge chain of pointers to the tuples that have the same attribute. So let's look at an easy example. So again then we have the sales fact table and we have the location table. So the query want to do is you want to count all the items that were bought at a store in the location with the zip code 15217. So the way you would execute this query is you know you have a quality predicate on the zip code here. So we could use the B plus tree to jump on the dimension table to find the ID, the location ID, for zip code 15217. But now when we want to join this with the fact table, we either had to do a sequential scan across all all all entries, which again could be in the billions, or we have a B plus tree that could jump to just all you know every single item that was ever bought at this location. But even that's going to be wasteful because again you're going to have to basically traverse the index which will be fast, but then you have to do this huge linear scan on the on the pointers, they then jump to the offset that has the data that you need, then to do the computation. So this is motivating why we want these sort of different indexes that we saw for the LTP side for our OLAP queries. We want to speed these things up, because we want to add indexes for our fact tables to speed up queries that have to access elements from the dimension table. All right so the paper you guys read was this paper that I really like from the Microsoft guys about adding columnar indexes to SQL server. And it's crazy to think like for me, I remember this paper coming out when I was grad school and I was like, oh this is kind of cool that's pretty exciting. Now I realize it was seven years ago, which is crazy. And so it was sort of first the first implementation they had for adding columnar indexes in SQL server. There will be a follow-up paper that I'll discuss in a few slides that says how they sort of overcame a lot of the deficiencies that were in the paper that you guys read. And remember the provenance of SQL server was that it was originally based on Sybase. In the 1990s they bought the Sybase source code license, or Microsoft bought the Sybase source code license from Sybase. And then they converted the work from Windows NT and that became SQL server. And since then they diverged heavily. Sybase eventually sort of fizzled out and got bought by SAP. And now it's pretty much as far as I know in maintenance mode. Meaning they'll maintain thing for they're not really adding all these new features that you would expect in a modern system. Whereas SQL server is actually considered to be one of the state-of-the-art systems. We talked about the hackathon stuff a few weeks ago. That's their in-memory LTP engine. And now the stuff they're talking about here is the Apollo engine for doing fast analytics. So SQL server actually considered to be one of the most state-of-the-art systems that are available today because they're still spending a lot of time fixing it up. Where Sybase has sort of gone away. So the basic idea what they're going to do here is they're going to take chunks of rows and for each column in that row they're going to extract out the values for the attribute or that column and then store that as a sort of contiguous column index, the columnar index. So in this first implementation all the original data that's in the row store tables is still going to be there. They're just going to add these auxiliary data structures to allow you to do fast OLAP queries directly on them. So the problem is though in this first version you can't actually update the table. So as soon as you say in this paper here soon as you say for this table I want to add columnar indexes, SQL server would freeze the table and make it read only. The reason why they had to do this because they had no way of mapping back an offset in the columnar indexes back to the original value. Or the other way around. So if you updated the row store you couldn't update any of the columnar indexes. Because the ordering of the comms would be different of the rows in the comms would be different than how they are in the rows. The thing I like about this paper though although it has again a lot of limitations because it is sort of the first implementation of this idea is that they made a lot of use of the existing components that were available in SQL server. So rather than building a brand new database management system just to add these sort of support for columnar operations or OLAP indexes, they used some of the existing parts they had in SQL server to integrate these new indexes. What I like about this is because in the same way that I said with Hecaton you didn't want to build a brand new database management system just to do fast transactions and lose compatibility with all the tools that were available for SQL server. With these new columnar indexes in SQL server you just plot them down and reuse all the same stuff or all the same products that were available for SQL server today like crystal reports, Tableau, micro strategy, all the things you would do in your application at the application level. It still does work because at the end of the day it just looks like SQL server. Internally they have these extra indexes to speed up the OLAP queries. So it sort of looks like this. All right so here we have a data table and we have four attributes of four columns. So the first thing they're going to do is they're going to split up the table horizontally into row groups and each row group is I think a million tuples and then they're going to generate a index that for all the values that correspond to a single column or attribute right and this is what they're going to store contiguously and then they're going to pass this into a component or do some encoding and compression. We'll talk about what that's doing in a second and then they'll store this in the internal blob storage mechanism for the database system and again this is an example where they're reusing the parts or the components of the system to to to add in their columnar indexes. So the blob storage again is when you have a bar binary or a large bar char you would store this in this in this data here. So they're basically taking the columnar indexes that they generate packing the byte stream up into a blob and then storing this in in the separate storage here and this little segment directory or a lookup table to say if I'm looking for row you know row group 3 at column C where do I go find the the column index I'm looking for. It's pretty straightforward. So the to figure out what's going on or what's available columnar index is available they had to add two additional information to the internal catalog of the system. So the first is that you have the segment directory that's going to allow you to do a lookup for a row group and a column offset to get to the the blob that has the columnar index but they're also going to maintain some metadata about each columnar index in a row group so they'll say what size it is how many rows that it has the made in max values right so this is useful to say if I'm looking for a value that's in that I know is not in between this this range here for this column segment then I just not even bother decompressing it and looking at it right these are called zone maps usually in databases we'll we'll see them again on Thursday. But then you also keep track of the how the the column segment for the index was encoded right we'll talk about the different coding schemes in the second. And then there's also going to be the data dictionary that's going to allow us to do dictionary encoding to map the dictionary IDs to the original values right and this is a way to get you compressed columnar indexes. So I want to talk a little about dictionary encoding now we'll talk more about dictionary compression or dictionary encoding when we do a lecture on database compression. I'll say at first this is probably the most common compression scheme you can have or people use in databases as far as I know I mean anybody that supports compression is going to be doing something like this Ahana does this, Hyper does this, Memsego does this, Redshift does this right so this this technique is going to show up a bunch of different times throughout the semester but for now I'll just sort of sort of gloss over at a high level so you understand what's going on. So the way dictionary encoding works is that you're going to have a separate auxiliary table internally inside of the databases in your database that's going to map unique values for a given column or attribute to some kind of unique dictionary ID right and the way when we look at the table in a second the way you actually sort the entries of the table can vary based on what you're trying to achieve so one way maybe to sort the entries by the frequency how often the value occurs or you can sort them in like so graphical ordering but for I think in this paper they use the first one and then now for each tuple in the clominar index instead of storing the original value like the var char field whatever it is that were dictionary compression we're going to store a 32-bit ID that we could then do a look up in this dictionary table to find what the original value was so it's going to allow us to sort of fix the length values within a clominar index and storing them much smaller storing IDs that are much smaller than the original values so let's look at an example here all right so we have two columns we have the ID column and the city column so we want to do dictionary compression or dictionary encoding on the city column here because you can see we have a bunch of var char fields that were repeating over and over again so repeating the key New York Chicago and Pittsburgh multiple times so our dictionary coding a dictionary compressed column would look like this so we have our dictionary table and we have a mapping from some dictionary ID to a pair that says what's the original value was and then some counter to say how often that this thing occurred so zero would map to New York one would map to Chicago and two would map to Pittsburgh so now inside of the actual column we don't store the var char field we just store the integer values so now if I wanted to say find me all of the entries where city equals New York I know I just need to do rip through this column and do a comparison where is the value equal to zero and that's much much faster than having to do two string comparisons right yes her question is is this the case you would do this on denormalized tables yes you can do you can do this for anything actually right sometimes sometimes in the fact table you do want to have look at there could be a var char field you could do this you could do this on the dimension table too when we talk about database compression you will see that there's some databases like IBM blue and HANA they use dictionary compression for everything right so it doesn't necessarily have to do this for the fact table or dimension table we're showing this sort of general purpose example but this is what they're going to do when they create the columnar indexes yes so his question is if I want to do a look up on say Chicago do I still need to scan the entire column yes the statement is would that be slow well one we'll talk about I mean you could there are techniques to sort the columns so that they go in the order that that that the baddies go so that you do binary search to jump to the thing that you're looking for again I don't get too bogged down on compression but basically say I want to say find me all the column find me all the entries where the the city equals Chicago so there's actually a reverse map that I'm not showing here that you can take the string Chicago and get back the dictionary ID right doesn't always be one direction so then now I have the ID and I can do actually a vectorized search a SIMD search across this where I can take chunks or multiple entries at a time and do a single instruction to do do the comparison so that's going to be way faster like yes you should have to do a sequential scan but like it's going to be way there's there's techniques to make that go much faster than just doing string comparison okay again I don't want to get too much into the the dictionary compression stuff but just they talk about in the papers so I want to bring it up here okay all right the other type the other two type encoding schemes they talk about are to do value encoding and the idea here is you want to transform a value domain for an attribute into a bunch of smaller values that are distinct right some way to think about this is like say I have a bunch of integers that are 64 bit integers that are you know one billion one billion and one one billion and two instead of storing one billion over and over again I can map them down to a smaller domain and keep track of like oh well the original value was whatever the current value I'm storing plus one billion so now I can store the the values in a much smaller data type like a 16 bit or 32 bit integer and always get mapped back to the the original value there's other techniques to do what's called delta encoding where instead of storing you know the sort of you know one thousand one thousand two one thousand three over and over again I can just store the base value of my columnar index in my in my column segment to be one thousand and then the subsequent values after that would just be the delta from that base value so I can map one thousand one one thousand two one thousand three to one thousand plus one plus one plus one right I'll talk a little bit more about about doing delta coding and other techniques in next class but the main idea what they're trying to do here is like rather than having destroying the whole value in all of its bits over and over again we can then maybe try to map it down to a smaller data type and get better compression and get you know better cache locality or better cache usage because to fetch some number of tuples from a column segment index we have to get we have to go grab less from DRAM all right then the last one is run like encoding this is a pretty common technique that occurs I think outside of databases as well but it's super super useful in in context of databases and so the idea here is that within a column index we want to compress runs of the same value within a single column into these triplets that say there is a repeated value and here's how often it's repeated so this trip is going to say what's the what's the original value of the attribute and then we'll have to start position of wherever you are in our column segment and then how many numbers we're going to have in this repeated run and then to make this work though as we'll see in the next slide in order to get the best bang for the buck for RLE you actually want to sort the columns in such a way that you maximize the compression opportunities so they talk about this vertipack tool in the system it's basically trying to solve an empty complete problem to figure out what's the optimal sort order of the columns across all columns that maximizes the the performance of the entire workload so let's look at the example here say we have again two columns we have the ID and then we have the sex column and for simplicity we'll just say we have male and female we only have two options all right so we want to do run length encoding on this thing here so what you end up with is a series of triplets like this where you would have on the first one you say the first the first value is male it starts at offset zero and then I have a length of three so that would correspond to male male male here right and then you do the same going down for the rest so the problem with RLE in some cases is that if you have a lot of entries where the length of the run is one then you end up wasting space right so in this case here I have three triplets in a row female male female where the length of the run is one so for this example here I have 10 entries and I can store male female in a single byte per entry so this would be 10 bytes but in this case here I have one two three four five five entries with three bytes a piece so that's 15 bytes so the compressed version using RLE is actually larger than the the original value or the original data so that's bad so this is what I was saying that if you actually end up pre-sorting the the column so that all the males appear first followed by all the females then the RLE compressed scheme looks like this now you only need two entries three bytes a piece so now you only need six bytes to store what was originally 10 bytes now the reason why you have to store the the offset is the way this typically works is you could store this in in a bunch of blocks and you want to know you want to be able to know how to jump into some location without having to decompress everything that came before right so you could jump do a binary search in these lists of these triplets and say well I know I know where I'm at in the in my my column segment because I know where my starting point is so I don't have to worry about I don't have to start always from from beginning so again this is sort of an example why if you if you pre-sort things ahead of time you may get better compression using RLE and what's really crazy to think about RLE for this particular example here right I'm only showing you know 10 entries but if I had a billion entries for a billion people I could store the the sex of a billion people in only six bytes right that's that's pretty crazy you can get for this so is this sort of clip what RLE is doing so this is another example another type of columnar index that the SQL server guys could support but we'll see RLE show up later on when we talk about doing bitmap compression all right so now so far we've been only talking about the lower storage level of the system right we have we have built these columnar indexes we're storing them in the blob storage we know how to find them right doing the look up in the segment directory but they're useless to the system unless the other parts the high level part of the stack of the architecture knows that they actually exist so in the paper they talk about what they need to modify in the query processing components of the database system in order to recognize that you have these columnar indexes and make use of them so there's really sort of two parts that they talk about the first is that you have to modify the query planner and the optimizer to now be aware that these sort of special indexes exist how big they are and can use them in the decision making components of the optimizer and compute the cost of actually executing the query access in them the other major change that they did was they added new operators in the execution engine of the system to now be able to process column segments at a vector at a time instead of using sort of the volcano or iterator style model that we talked about beginning of the semester instead of looking at a tuple at a time you can actually access a batch of tuples or a vector of tuples and this allows you to do some additional optimizations that they don't really get into but we'll talk about when we talk about vectorized execution later in the semester but basically you know allows you to reduce them or function calls you have you call and get next on your child you can get a it's whole took a data process it and move back to move on to the next one they also talk about how they can use the columnar indexes to help them build bitmaps on the fly to help them do joins more more efficiently so instead of building everything sort of by examining every single tuple you can look at these bitmap these columnar indexes and build the additional bitmaps you need to do did you join evaluation all right so again the paper you guys read was from 2010 there's the first version of this these columnar indexes in SQL server I think was called Denali in the later version in SQL server 2012 actually came out in 2012 but it was introduced in SQL server 2014 and 2016 is these new as the updates to sort of write all the wrongs or fix all the missing components that were in the paper that you guys read all right so the first thing they added was they had support of a columnar or a cluster to columnar indexes so now no longer did you have to always maintain the the row store as well as the copies in the columnar indexes you can have now the column indexes be the clustering index for the data store so now you know how to map back and forth between the the original data and the columnar indexes the added support for more data types so the initial version can only do the basic integers floats bar charts and things like that but now for all the different data types that SQL server supports they can do this now in the newer version but the key thing they added was again because you have the cluster columnar indexes now you can support insert updates and deletes and the way they're going to do this is that instead of modifying the columnar indexes directly every single time a transaction updates things you apply all the changes to this thing called the delta store right think of this sort of like if you're familiar with like how log structure merge trees work or level db rock cb it's like the mem table you append all the entries to this delta store and then there'll be a background thread that eventually will then migrate the data and from the delta store onto the columnar indexes right we'll see this again on Thursday when we talk about how to support hybrid architectures hybrid data models inside our system this is the problem this common technique to use right and the reason why we do this because doing an update in a columnar index or a columnar table for a transaction is actually really really slow because when you insert say a new tuple it's not like a row store you just copy the bytes directly to to its new slot a new location you actually have to break it up into the different columns and do multiple rights to the different different indexes right so that's why you want to use a delta store because you can append new rows there and then the background you'll eventually migrate them out to the column parts the thing to make this work though is that you have to have the database system be aware that now some of your updates will be stored in the delta store some of them will be stored in the column side of things and so you have to be able to seamlessly combine the results for queries from these two different segments so that it appears as a single uniform database and again we'll talk more about this on Thursday this is very very common and then instead of doing in place deletes you'll mark a bit map to say at this offset this particular tuple has been deleted and then you have to you have to read it anytime you do a look up to see whether something actually is visible to you or not okay so it's clear what's going on these columnar indexes we're like these are designed to make OLAP queries run faster and they're much different than than the B plus trees that we looking at so far because we don't really care about finding exact tuples we want to be able to find a large segment of tuples that may have particular attributes or satisfy a particular predicate we're trying to evaluate in a query okay all right so now we can transition and start talking about bitmap indexes and as I said in the beginning these are probably the most common common OLAP index that a database system will support as we see it go along though it's not it's not as straightforward as you're saying I have a bitmap for every value we'll see how any more complicated things in a second so bitmap indexes are pretty old I think they were around since the 1970s and there's a paper from 1987 that sort of lays out you know what they had done in the past prior to this so the basic idea of a bitmap index is that we're going to store a separate bitmap all right for every unique value on a particular column or attribute in a table and what will happen is the offset or location of a particular bit in the bitmap tells you whether the tuple of that offset has that particular value or not so you would say the i-th position in the bitmap corresponds to the i-th tuple on the table and what you'll do is instead of actually storing you know one giant bitmap you know continuously in memory typically what you do is you break them up into smaller chunks right could be 10 megs 5 megs 1 meg doesn't matter and that makes it easier to to allocate pieces incrementally so those are a really simple example let's go back to the the table we have before where we have the single column with male and female so the way you would do this to do a bitmap index for this field here you would have a separate bitmap for all the males and separate bitmaps for all the females so now again the one tells you if there's a one a position or a location in the bitmap that tells you whether the tuple of that position has that particular value so in this case here for tuple at offset 4 the value in the male bitmap is zero but the value in the female bitmap is one so we know that corresponds to female here right it's pretty pretty straightforward to understand so now when you want to say like find me all the all the females or all the males you only have to rip through this this single bitmap you don't have to look at the at the other ones so let's look more complicated example here right so say we want to build a bitmap index on the the zip code table a zip code attribute for the customer dimension so as a quick guess you want to think I guess how many zip codes there are in the us more than a hundred less than a hundred thousand throw out a number take a guess he says a thousand I heard a hundred thousand all right it's it's forty three thousand all right so let's say now we have in our commercial customer dimension table we have a million tuples right we have a million customers so if we're going to generate a bitmap index for this thing here it'd be 10 million times forty three thousand right which would be 53 gigabytes so that's pretty wasteful right like because most of the bitmaps are going to be sparse I mean all zeros all right now compare this if we actually just store the the attribute directly without doing a bitmap encoding a bitmap vector or bitmap index it'd be 10 10 million times 32 bits because we could store this as a 32 integer and that's you know four megs right so so using a bitmap vector isn't always the best thing to do because you want from you know 53 gigs when when the original data was just four megs so the other issue we're going to have with the bitmap index is that every single time we insert a new entry into the customer dimension table we're now going to extend a realloc the all the the the the bitmap vectors to now accommodate space for the new tuples we're adding to this thing and in that case I mean that's going to be really slow because now you're going to do malicking across all these 43,000 bitmap vectors now you can try to amortize the cost by allocating a bunch of extra data ahead of time right so that you're not doing this over and over again every single time you start a new new customer but in general this is this is a bad idea so we're going to look at now it's a bunch of different ways to sort of have the same essence of a bitmap encode bitmap index but look how to do different encoding schemes and different compression schemes to get sort of more or less the same benefit you would get from a bitmap index but without paying that huge storage cost overhead so we'll start off with different encoding schemes so the example that I've showed but so far in the case of the zip code or the male female one this is called in quality encoding because we're going to have a bitmap again for every unique value for our column and there will be a one or a zero that tells you whether the the the tuple of that offset is equivalent to the the value that's represented by a particular bitmap but there's three other approaches you could also use another one is to do a range encoding or interval encoding and the idea here is instead of having one again for one you can value you have one bitmap you can say here's an interval of values or range of values that are all be represented by a single bitmap so say I want to have all the zip codes from one five two one zero to one five two one nine I can have those combined together and be and be a single bitmap that would incorporate or that would represent all those zip codes the downside of this I guess you save space for the downside is like if I want to do an exact look up and says find me all the customers for one five two one seven I'm going to have a lot of false positives when I use a range encoded index because I would then because it's going to represent other zip codes so I'd still have to go back and look at the original tuple the original data just and apply another predicate to see whether they whether the tuple actually really does match match or not typically what happens is as far as I know that mean the commercial databases support stuff like this I think you as the DBA have to specify what these intervals are I don't know if the if the the database system will figure this out for you the other two more interesting approaches that do hierarchical of coding where we're going to use a tree to identify the empty key ranges to store less space and we'll see how to do the the the bytes license coding which I think is the one of more interesting approaches here so the way hierarchical encoding works is that the tree structure will tell you whether there is a a one at at a in a in a slot at a node below its current location right so in this case here so I would have at the root the the the bitmap will be one zero one zero so this would say at the first first position node either definitely a one at the second one there's a zero at the third one there's a one and at the fourth one there's a zero right this is not actually encoding any of the values only in the leaf nodes are the values actually going to be represented right it's just telling you whether there's anything actually here below so in this case here at this node here we'd have one to say that there is at least one one in the bitmap here zero because there's nothing here one because at least one and one at least here right and then this one's all zeros because we know there's nothing down here so the way to understand this is that the it's a it's the same bitmap vector that we have before right that's going to but it's going to tell us whether but we're going to we're going to read this across going along the leaves so now I'm actually I'm showing you the the you know so that all the zero ones here but you don't actually store them right you only need to store the one where there actually is a one so now before what was originally having to store eight bytes to store everything I can now just store this in four bytes and I can do a quick look up along this tree structure to find the thing that I actually looking for right so as far as I know that no one no system actually does this at least that I'm aware of I don't I think this has issues with cache affinity problems that may and branching that makes it makes this not as as fast as some of the other approaches we'll talk about the other one that I really like is the byte sliced encoding so in the same way that you know when you think about from going from a row store to a column store you sort of have to in your mind you have to understand how your your sort of switch transforming the table and now thinking in a different direction with bite sliced encoding bitmaps sort of have to think in the same way too right we're going to sort of turn the model to represent data in a different way right so for the for the the data we're going to look at is is a list of zip codes again and then these are actually all the zip codes of the locations that I lived lived at in my life right Pittsburgh Compton Madison and a bunch of different places right so we're going to take this first guy here the first 21042 in Maryland and if you run this in the binary function the bin function of Python you would end up with the the bit sequence like this and so what we're going to do is now we're going to store different bit slices for each of the different tuples add a position in the bit sequence so the first bit corresponds to whether the value is null or not and in this case here it's not null so it's zero and then if we take a cursor and start at this uppermost bit here add we as we slide across we're going to add all in the bits that correspond to each position to all these positions here so position 16 which is here it's a zero position 15 it's zero position 14 it's one right so now we can do the same thing for all the other attributes right and the way this is going to be stored is again not going cross horizontally but actually going vertically so for position 16 this bit map this vector here is actually restored continuously in memory so now what this is going to allow us to do is we can take queries like this right select star from the customer dimension table where zip code is less than 15 15217 and so now we can walk across the each slice and figure out whether we had matching tuples so we're going to maintain a bit map that says whether the tuple at this position in the bit slice has evaluated still evaluating to true and we get to the point where all the positions are zero we know that there's no matching tuples if we get to a point where we know that some part of the bit slice evaluated false we don't have to check it later on later on right so let's case in this case here since I already have 515217 I know that the upper three bits are zero so that means that I can skip any entry in the first three bits as I go down that at least has a one in them right so this one has a one here and this one has a one here and here and there so I know that as I as I go across I don't need to evaluate any of these bits later on right because I've already know that they would that the tuple is going to evaluate the false right so you're doing this sort of essentially it's the same sort of numerical comparison you would do looking at two integers but you're sort of breaking it up on a bit by bit basis and again when we talk about SIMD and vectorization later on you're not just looking at you know is zero less than zero is zero less than zero on and on you're actually can combine a bunch of these entries into a single SIMD register and invoke one instruction that allow you to evaluate the product across multiple tuples simultaneously right so because we store these things in bit slices it's gonna be much much faster than having to go have a four look at each one one by one there's other tricks you can do too also to now compute aggregations aggregate functions without again having to look at every single tuple one by one so one thing you do in the case of a sum all you need to do is count the number of ones in each bit position and multiply it by the or this factor here to increase to its original base value and you just go down the slice and do this over and over again and then that's faster way to compute the sum rather than just taking two numbers and incrementing them over and over again right because again we can combine multiple bits into a single single SIMD vector and invoke one instruction to make that happen and there's also other instructions that CP provides that you like Hamming codes and things like that so you can count the number of ones in in a bit slice with a single instruction as well right so there's a bunch of architecture optimizations that make bit slicing really fast and we'll see again a paper later in the semester that talked about crazy tricks you can do to make all those stuff go really fast so bit slicing sort of sort of clear that is another way to represent a bitmap index by not storing all the entries for a single value but actually storing the individual individual bits for values in these bite slice vectors all right so now we can talk about different compression schemes so the the first approach to do is just use an off the shelf general purpose compression algorithm pick your favorite one LZ4 snappy I guess the hot one is now ZSTD right and the idea is that if you're you're already late breaking up your your bitmap indexes or clumblar indexes doesn't matter what it is into these chunks you just take that chunk run snappy and compress it now the problem with this though anytime you actually need to do a read into that that bitmap you have to be compress it first so if it's a disk based database system then this is probably a good trade off because writing out the disk is super expensive but in memory database this is a bad trade off because you're wasting CPU cycles for you're spending more time doing computation than it would be just to do you know hit a cash miss and bring it out bring it from DRM into into your CPU caches so typically this is this is not what you want to do so that I want to talk about two other approaches that I think that are more clever so the first is going to be the byte aligned bitmap codes the bbc encoding this is actually the technique that oracle used to use in the 90s and early 2000s they don't use it anymore but I think it's kind of cool to look at and then we'll finish off talking about roaring bitmaps so the bbc encoding from oracle this again this was the technique that they came out with in the 1990s to to to compress bitmaps since then I and they have a newer technique and which is also was kind of crazy because oracle bought sun they also controlled the spark architecture and so and the the newer spark chips like the m7 they actually embed the oracle compression algorithms for bitmaps and other things directly in the hardware right so that's way faster than than just doing everything in software so the way we're going to get good compression with bitmaps the reason why I want to discuss this technique is because it's an example of exploiting something that we know about databases and how they're using bitmaps to get better compression than you could than you would otherwise get from a general purpose algorithm right we exploit the fact that we know in our bitmap indexes that most of the most of the values are going to be zero right and so because of that we can have a compression scheme that can try to take advantage of that that sparse sparse representation so what we're going to do is we're going to divide our bitmap into chunks based on different categories of byte sequences that they have in them or better categories of bytes so we're going to designate a byte to be either a gap byte and this occurs when all the bits are zero or we'll have a tail byte when at least one of the bits in the byte is one and what's going to happen is we're going to encode the chunks where we have a sequence of one or more gap bytes again where they're all zero followed by one or more tail bytes where at least one of the bits is one and then we're going to press the the gap bytes with RLE and then the tail bytes we'll see we have to store them in uncompressed format unless it's a special case where we know only one of the bits is is set to one so let's see an example here all right so again we have some bitmap we don't care what actually it's storing all we know is that we have a only three bytes in this bitmap have a one of them so this guy has a one right there this guy has a one here and this guy has two ones ones there so what we're going to do is we're going to break this up into chunks again remember so the chunk is specified by having a bunch of gap bytes followed by one or more tail bytes so our first chunk here we have two gap bytes where they're all zero followed by the one where they have a one of them that's the tail byte so the way we could or this is the second chunk here so the way we can encode the first chunk is that we have to always store one header byte to tell us what is going to be stored in this chunk so in the first three bits we can keep track of the number of gap bytes we have right as long as as it's it's less than seven right because you can't store any number greater than seven in in three bits then we're going to have then we're going to have the bit to say whether it's the the tail byte is special meaning it's it only has one one in it and then if it's if it's zero if the special byte is zero then we know that we have to store the tail bytes verbatim exact copies afterwards or if it's one then we just store in a single in a single in the last four bits where the position of that one is right so here the first three bytes we have zero one zero this is telling us that we have two gap bytes one and two then the next bit we have a flag say whether it's special and it's one because we know in the tail byte there's only one one then in the last four bits it tells us that the position of of this special one bit is that position four there right so we so in this case here we took what was now three bytes for this bitmap stored it in a single byte let's go to the next trunk so here's the more complicated so here we know we have 13 gap bytes followed by two tail bytes so the since the number of gap bytes is greater than seven we can't pack it in just the first three bits like we did before so for this we're going to have to store a special flag with one one one that says that the size of the number gap bytes is greater than seven so you have to jump to the first byte after the header that says here's where to go find the actual number gap bytes I have right so in this case here we're encoding 13 then you have the special bit here and this says again if there's only one gap byte and there's only one one in it but we have sorry it's only one tail byte and there's only one one in it so in this case here we have two tail bytes so we have to set this to zero so then in the next four bits it'll say how many verbatim bytes we're going to have afterwards so in this case here this is encoding two so then we have the verbatim tail bytes here so we're taking the exact copies of these two bytes and storing it down here so what used to be 18 bytes right we have six six six six six rows down three rows across we have 18 bytes we can now sort into this compressed format for oracle in five bytes right that's quite significant now the downside though is that anytime we need to do a lookup in our bitmap index we have to be compressed from beginning to end and the other downside is that also is that it's it's actually really slow a modern hardware so as I said oracles current system doesn't actually do this I like it because it shows sort of clever way to encode these bitmaps using you know low level bits and you get good compression but because you have to check all those bits to figure out what is actually being stored and then based on their contents that tells you where you want a branch on a modern superscalar architecture with long pipelines if the branch predictor guesses wrong you flush your pipeline and have to you know fetch everything back in right and that's and that's really really slow the other so the there are more there are better implementations of this idea that are now available the word aligned hybrid encodings is one scheme but nobody actually uses this because it's patented right so you know it's there are other techniques out there but nobody actually does these things because oracle ends the path all right there's newer techniques that overcome the the in the coding problem we have here yes could you go over in general where and when compression is useful in memory so the question is can I go over in general when memory compression would be useful yes so so I mean there's memory is not free memory is not cheap and memory is takes a lot of energy so if you can store as as more data you can as much data as possible with the the least amount of DRAM that's that's usually a huge win when we talk about compression later on and I think it's next week you'll see also there's there's techniques you don't have to decompress the data to execute some operations some queries on it based on how it's encoded in this case here again I'm saying the downside is like in order to to do any look up with the BBC you have to decompress the bitmap the in dictionary encoding you actually can wait to the very end to actually do do the decoding so you can actually do operations directly on compressed data and that speeds things up you typically don't do compression on the old to be side of the database because you're paying this penalty to decompress it all the time you typically only do this for the old app stuff and again we'll see this on Thursday when you have the Delta Store that's usually stored on compressed so if you have transactions update the database they're inserting new records you stored in an uncompressed format then the background thread comes along and migrates it's from the Delta Store to the regular database that's when you actually do your compression so but does the OTP side of the database include all the big records? his question is does the OTP side of the database include all the records in the modern HTAP systems when you have a single database instance that allows you to do both the the transactions and analytics all in the same box yes you'll have complete history in the bifurcated environment that I showed before typically what happens is over time you migrate data off the OTP side and delete it but it's still stored in the data warehouse you see this all the time on websites you use like on eBay for example eBay deletes auctions after 90 days that's because they move things off the front end and now it's actually stored in the back end sometimes you see some applications like if you're sort of recording events that occurred you don't want to keep the last history the last year of events because there's to be a lot of space you maybe have some like TTL process where you slowly delete things expire things over time so the front end database has the minimum amount of data you need to support the application but you still archive everything else but in the HTAP world we're trying to talk about here is like you don't actually need to delete everything you can store everything but you never in the case of the analytical queries you never need to go back and say find me exactly Matt's record you want to look at Matt's information and aggregate with everyone else so in that case you can store it in a compressed manner in the back yes sir I might have lost track of what's the benefit of doing a bitmap index what's the benefit of doing a bitmap index so I had the one example yeah so there's one example here right so yeah it's going to allow you to filter and so I want to look up find me all the sales records that were bought in the store location in zip code 15217 so I could have a B plus tree to do a quick look up in the location dimension table because that's going to be small and I'll find the ID of the location ID then now if I have a bitmap index depending how it's encoded I can do a look up real quickly in the fact table to find all just just those entries and that's going to allow me to avoid having to do a sequential scan across the entire thing right and again think about I mean I may not should have motivated this more but we're distinguishing between OLTP and OLAP in OLTP you're doing a look up to find just the you know single records right or looking up foreign you know foreign keys to find the records that correspond to another single record you know for a single customer find me all their orders I don't do that in OLAP queries OLAP queries I want to look at all the orders that were bought at this location at this time range right in this you know those kind of attributes so I'm not looking at a single record I'm looking at an aggregate at a record and so bitmap indexes can allow me to do that more quickly so like filtering is all that you want to do as David is filtering is all that you want to do with a bitmap index I mean that's the point of index right that that that's what index does right it's avoiding a sequential scan okay yes so is it true that the columns indexes are any useful when the cardinality is low his question is would you only want to use a columnar index when the cardinality is low so in the case of like the dictionary encoding it doesn't matter because you're always going to be you know instead of starting bar charts you can store individual you know 30-bit integers right for the bitmap indexes yes if the cardinality is super high then the size of your the size of your your your indexes is going to be way larger than the original data right and what I'm showing you here is some techniques to maybe reduce the overhead of bitmap indexes but like yes if I have a billion unique values for this column I wouldn't want to make a bitmap index for that because it's going to be useless right so yes the cardinality is low is when these things again think about going back to this these dimension tables right for this for the state it's easier to look at the star schema one right say if I want to do like on like location I mean how many locations could I possibly have for like a store maybe the hundreds maybe the thousands right so if I want to look you do fast lookups for this my bitmap index is not going to be really wide right because I'm not going to have a lot of different unique values for the location right and it allows me to jump exactly at the data that I need rather than having to do a sequential scan we'll see I'm not I don't want to jump ahead too much in the semester we'll see some papers that argue actually you don't want any indexes at all if you just have a fast vectorized query engine that uses query compilation doing sequential scans is super super fast as it is and indexes don't really actually help the cost of maintaining those indexes is overshadowed by the performance you can get with a super fast engine we'll get there we're not there yet yeah I'm a bit confused so you might use trade to as index the time complexity or log n right yes scan it sequentially the time complexity to always open even if you vector so his statement is he's confused because if you look up on an index it's it's it's log n but that's that's a look that's a it's log n to get to the key value pair right to get to the bottom what's at the what would be at the bottom of a B plus tree index on the location no be giant link list then then you're still scanning now and then that link list is good is going to contain pointers to the actual original tuple right so if I have a bitmap index uh I can just jump to to that yes it's linear search but that's it's still going much faster right because the bitmap index is you can rip through much more quickly because you can do things and uh and um in a vectorized manner with simby and for the big map you have to maintain the same order for the role role wise his statement is for the bitmap index you have to maintain the same order for the the row wise table yes if you're not storing things in in like a if if the base table is in a row in a row already in format we'll talk about more about column stores row stores next class okay all right uh right so um to sort of finish up real quickly so again the in this the the oracle bbc format you don't support random access any single time you want to check see whether a value is actually present you have to go decompress and start from the beginning so the I want to talk about a more modern variant of storing a compressed index bitmap this thing called roaring bitmaps so this is actually a newer newer data structure that came out I think two years ago by this guy daniel lemur in in in in canada he's a professor up there I highly encourage you to go read his blog he's sort of like one of those hacker professors that like does awesome stuff and like instead of writing papers he's writes he writes blog entries that talk about what they do so we'll see his hash function later on when we talk about hash joins but the rolling bitmaps is sort of something that he talks about on his blog and then there's a paper that came out two years ago that describes it so the basic idea is that we're going to store 32 bit editors in this comp this sort of multi-level or two-level indexing structure where sometimes we're going to store things as a as these dense chunk bitmaps and sometimes we're going to store them as just regular you know packed arrays but we're going to do the same sort of kind of value encoding you saw in the Klumner index paper from Microsoft to convert really large values or data types into smaller 16 bit representations and this rolling bitmaps is open source it's a patchy license and it's actually used now it has a bunch of different it's written in a bunch of different languages and it's used in Luzine and Spark and Hive and a whole bunch of other sort of Hadoop ecosystem systems so this is basically how it looks so we say we're a real simple example here where we're just going to have four chunks right or four partitions and each of these guys are going to point to a container in some cases the container will be our bitmap index or a bitmap in other cases it's going to be a just a regular array so what we're going to do is for each value each key or value we're going to do insert into for our bitmap index the first thing I'm going to do is going to figure out what chunk it goes to by dividing it by two to the 16 and then now since we know what the the base number is for this for the key we're storing we only have to store the key value mod modded by two to the 16 and that's enough for us to figure out how to get back to the original value it's sort of the same similar to the encoding scheme you saw for Microsoft and then what happened is if the number of values in our container is less than 4096 then we'll just keep that as as an array otherwise we'll store things as a bitmap and the idea here is that when it's less than 4096 it's actually much smaller and faster to store things as an array and only when we go above that then you want to store things as the bitmap so let's look at an example say we want to store the key 1000 so the first thing you would do is divide it by two to the 16 in this case it would be zero so that's how you want to store it at this chunk here and store it in this container and then what we're going to store inside of the the array is just the value mod two to the 16 in this case it's just 1000 right and we want to get this key out we just do the reverse of this so now let's say we want to store our larger key in this case here 199,000 same thing we divide it by two to the 16 that would tell us we want to go to partition three and then we would do mod by two to the 16 and that would tell us the position is 50 so in this case here we're storing this as a bitmap so what we want to do is we want to set position 50 because that's what we're getting from this to now be one so now anytime we want to do a look up and say you know does this thing actually exist we would know we could do a look up and see that and see that again there's just another way to represent sort of represent bitmaps all right so the last thing I want to talk about is another secondary index called column imprints and this is sort of bills on top of all that the bitmap stuff we've been talking about today right so this is not this is not something you do differently or in exchange of having a bitmap index this is something you do in addition to having a bitmap index so the basic idea what you're going to do is you're going to store a bitmap for a cash line of values in a table and you're going to store a bit in this this bitmap that says whether there is a value that exists at a byte slice or position that's been set to one right so the idea is here say we have original data keys 184 and this is our regular bitmap indexes that we have before so these are all the different values and we have a one whether at each each row or offset that says whether the the the tuple has this value so at position one we have a one for the the second tuple we have we have an eight and then at the third tuple we have a four so now what the column imprint does is now stored an addition bitmap vector that says for bitmap index whether there's a one at each byte slice or bit slice right and the idea here is now you can now do a bunch of additional comparisons like min and max sort of like the zone maps that I talked about before where all you need to do is look at this column imprint to decide whether you're going to find the match that you're looking for in your predicate in the original bitmap index so there's sort of a way you're taking a sort of a compressed representation or approximation of what's in the bitmap index is now stored in this column imprint right it's just sort of another way to have but you have multiple levels of these things it's all transparent to the application because the database system can do this internally underneath the covers so this is only actually implemented in moanedadb and there's a paper that came out two or three years ago that shows this gives you a big big performance win for sort of DSS application of DSS queries okay so sort of to finish up the key thing that sort of he he brought up is that in order for all these bitmap bitmaps to work the position of a tuple in the bitmap index no matter what encoding scheme you're using has to correspond to the position in the actual heap table the base table for you know in the database right so this is it becomes tricky now when we're using like the append only method in NVCC because every single time you update a tuple it's position now changes right so this is sort of the why the reason why you don't want to do have a bitmap index directly on on in the LTP side of the database because those things are changing all the time and the position might always change which means you have to update all now your bitmap indexes so typically what happens again you would have this bifurcation to say here's the LTP side and maybe I'll use a B plus G for that here's the OLAP side and I'll use bitmap index for that and as we also brought up two multiple times is that it's really wasteful if the cardinality is really high because then you're just going to have is all these these waste of space right and again the Fremont means that they're changing all the time and you have to update everything we're also ignoring multi-dimensional indexes which are also very common in OLAP queries right these are things like ARB trees KD trees things like that that's a whole another you know I think Christos teaches that in 826 that's a whole another ball of wax so we don't want to get into now so any questions about OLAP indexes okay so for next class again now we're going to again talk more about the the difference between OLAP and OLAP inside of the HTAP system so we're going to talk about different layout schemes for your actual data and this will finally address the floating point issues that some of you were having for the first project then we'll talk about the different storage models of how you you would store a row store in a column store and then a hybrid approach allows you to sort of do both of them together and this is this is sort of going to be the Delta store thing that we talked about earlier okay awesome guys I'll see you on Thursday