 Alright guys, it's cold out, so I'm going to spit some fire about some hot databases. So today, we're going to discuss, sort of now at the storage layer, we're going to talk about how do you actually represent a database in memory. So we're going to start at the lowest level first, we'll talk about how to actually organize the individual bits or bytes of attributes for each tuple. And then we'll go into talking about the different storage models, the different ways represent tables in their entirety. And we've been sort of dancing around column stores, row stores and things like that. So now we'll go a little bit more detail of what it means for something to be in a row store, what does it mean for something to be a column store. And then what are the performance implications of that. And then in the last 15 minutes or so, Joy will spend some time talking about how to do profiling and optimization in our database system so that when you do the BWTree for the second project, if you want to understand why you rank lower than other people, why are you worse than Prashant, then you can use these tools that Joy is going to teach you to make you as good as Prashant potentially, but not probably all the way. So this is the picture that I showed in the beginning of the course about how a in-memory database is organized, right? We said that there was this index, B plus tree, doesn't matter what it is, BWTree. And that in the value portion of the tree, it's going to be a pointer to some location of a tuple in a fixed length tuple block. And then for any attribute for this tuple that is larger than 64 bits, then we don't want to store it in line because that would mess up how we allocate the space. We then have a pointer to a variable length block that where we can store varchars, text fields, and other things like that. So this is sort of the high level of what we were talking about in the beginning. So now let's go a little bit more detail and actually understand how we organize the data within a tuple itself. So you essentially can think of an in-memory database as just as a large byte array, right? You just allocate a bunch of bits and then the database system is going to know how to interpret those bits at different locations and derive the actual type of value for that, for a particular tuple. And it's going to do this through its schema. So when you call create table, it's going to use that information and how you define the different attributes to specify where the starting point is of an attribute within a single tuple, within this byte array. And so in general, tuple is always going to be prefixed with some kind of header that contains the metadata about what's going on in the system or what's going on for that particular tuple. We saw this when we talked about the hackathon version of MVCC because they were storing the begin timestamp, the end timestamp. And a pointer to the next tuple in the version chain. Sometimes there's flags to say whether a tuple is dirty or not. And some other things we'll talk about when Noel's on a second. But in every single tuple you can think about, it's going to allocate a space within the fixed length tuple block. It's going to allocate some little portion of the data to be for the header. And then it's going to lay out sequentially the data for all the attributes for a given tuple. And the reason why we want to store the fixed length stuff together, because this can make it easier for us to jump to a particular location in our fixed length tuple list or a block by just having the offset. So if you knew the starting point of the block and you know you're looking for the 10th tuple, you know how to take that address, multiply it by the size of all the tuples of each tuple in that block, and then jump to that exact location. And then you can do whatever the operation is that you want to do on the data that you're looking for. So in general, the way we represent data for these different types in memory databases, we pretty much can follow the C++ standard. So in case of any integer, big int, small int, medium int, tiny int, we just use the same memory model or the same type representation that C++ uses for assigned and unsigned integers. We do the same thing for numerics, decimals, and reels, but the thing to understand is that the C++ standard follows what is called the IEEE 754 standard of how you represent numerics in a system. So if you're not familiar with this, this basically specifies the memory layout for a 32-bit and a 64-bit floating point number. And also the spec tells you what do you do for rounding, how do you represent non-numbers, how do you represent rational numbers, how do you take if you have a 32-bit float and a 64-bit float, how do you add them together and what should be the outcome? So we're not going to do anything special to do this because we're going to rely on C++ implementation of the standard. But if you're ever curious to know why you get rounding errors, why things don't exactly turn out the way that you expect them to, when you deal with floating point numbers inside of a database system, you would use this standard to go look at what does it actually say, how should a computer actually operate when it does the operation you're looking for? For the variable length fields, we talked about this before that if the size of the attribute is less than 64 bits, then you don't want to store it in line. And instead you store a pointer to the variable length data pool. And then at that address, there'll always be a header that says how big the chunk of data you're looking at is. And then if the attribute itself is broken up into different segments, like different chunks within the variable length pool, then you would also have a pointer in the header to say, here's the next segment you should get if you want to concatenate these things to get the original attribute. In the case of VoltDB, they don't do that. I don't know what MemSQL does for some of the other systems. But if you wanted to sort of chain these things together, you would need to spend another 64 bits for the pointer to the next piece. For timestamp attributes, there's a bunch of different ways you can do this. In the case of something like VoltDB, they just store every timestamp, whether you want the date or time or not. They always store it as a 64-bit integer of the number of microseconds since the Unix epoch. In a system like MemSQL, for example, you can specify time and date and get a 32-bit integer. And I don't actually know how they represent that. But you can imagine if you're using the Unix epoch, you don't have to keep it in the microsecond granularity. You can use something smaller. And in general, again, if you use the Unix epoch, it's pretty straightforward to do all of its standard operations you expect to get deltas. And do some traction addition for timestamps. And then, obviously, there needs to be wrapper functions to be able to convert timestamps given to strings in a SQL statement and convert it to the internal representation. And all this is sort of hidden to you as the application user. You just know you get a timestamp field and it gives it back to you of course, monolate in the driver that you're using to the database system. So here, let's look and see now how we're going to lay this out in memory. So you can think of like a tuple. Again, if we're assuming we have this giant byte array, the tuple itself is just going to be a subset of that byte array. So we can represent with a char array. So let's say that we have a table that has two attributes. We have an int for the ID, that's 32 bits. And then a value which is a big int, which is 64 bits. So there'll be a header starting point here and say this is 32 bits. And then immediately afterwards in the byte sequence, we will have our ID followed by the value. And so to access, say, the first attribute for this tuple, we would know where the starting point is for our tuple because it's fixed length. And then we can jump to 32 bits to get to the starting point of the ID, right? We just do this by doing the addition on the address. Because we have the address to this, we add 32 bits and we land here. And then inside of our implementation of the database system, we would call reinterpret cast to take that address and then we can cast the address to be whatever the type point that we want. So who here knows what reinterpret cast does in C++? Nobody, okay. So reinterpret cast basically it doesn't actually create any instructions. It doesn't replace us with assembly. It basically allows, it's a compiler directive that allows the code to then take what was ever at this address and now treat it to be whatever you're casting it to. So the key thing about this is that we don't have to copy any information, we're just allowing in the code to now access this address after we cast it as a 32 bit integer. We have to do this because in C++ it's strongly typed. So this is how we can take an arbitrary byte string and then be able to put structure of a type around the particular address we're looking for and reinterpret cast is the way we do that. And it doesn't require any additional instructions for the CPU. It's just something that makes it all work inside of C++ code. So you'll see this in our implementation, which is based roughly on how things are done in HStrum VTB. We're taking byte streams that can jump to addresses and then cast them with this way to get the actual type we're looking for. So how we store nulls is actually a tricky point in the memory database. Actually in any database it's pretty tricky. And there's essentially three ways to do this. The first way is that you can designate a particular value in the range of value supported for a particular type to represent null. So for example, say you have a 32 bit integer. You could say that the lowest integer you could represent in memory for this type will be the null value. And so this reduces your range of values by one because you'll say at the lowest point that we'll just treat that as null. And the advantage of this approach is that you don't have to store any extra information, like a bit flagged to say that the attribute is null or not. You just say if it's null, if it's this low value then I'll treat that as a null. And obviously you have to do some extra work in the upper levels of the system to make sure that if someone tries to insert in 32 minimum, then you would throw an error and say that's you're trying to store something that's outside the range supported by this type, right? It's sort of the same thing as you would have to do if someone tried to store a 64 bit integer into a 32 bit integer attribute. Another approach that's probably the most common one is that in our tuple header we're gonna maintain a bitmap or a bit vector that says that the attribute at a particular offset for that tuple is whether it's null or not, right? So if we have ten attributes and we have a ten bit array in the header, we can easily flip that on and off to say whether something's null or not. This is what Oracle times ten does. This is what hecaton does because this is what SQL server does as well. This is probably the most common way to do this. And again the downside is that you have to check now as you access the tuple in this bit header, this bit map in the header to see whether the attribute is null or not. And you have to store some extra space. If you have a really wide tuple, you would have a really long bitmap for this. The third approach is to store a single null attribute flag, or a single null flag per attribute. And you sort of embed this with the attribute itself. And so the idea here is that you would have, say, if I have a single attribute, I would have a little space right in front of where I actually store the value in the tuple to say, and have a little bit to say whether that's null or not. And this is actually what MemSQL does. But the tricky thing is, if you do this approach, you have to make sure that your tuples are, and your attributes are word aligned. So what I mean by that? So let's say this is actually the dock page or information table in MemSQL that describes the size of their different types. And we're just looking at integers. So here you have the size of all the integer types when they could be null and when they're not null. So then in this case, this case of integer, integer is just a 32-bit four-byte value. If it's not null, then it's four bytes because it doesn't need that extra flag. But if you specify as it nulls, it doubles in size. Now it's eight bytes, right? And this is because they have to be word aligned. You can't have this be 33 bits because that would cause the CPU to do more work anytime you access these things, right? So in the same case for the big int, right, you can't just, if you have eight bytes to store the regular value when it can never be null, if it could be null, they add another 32 bits or four bytes, right, for that particular flag. So even though it's a one-bit flag to say whether something's null or not, they have to spend 32 bits to allocate space for it, right? And this is because we have to be word aligned. So if you're not familiar with what word aligned means, it basically says that you're going to line up your data in such a way that you make the CPU be really efficient about reading it. So I'm going to go through an example of what I mean by word aligned in a tuple, but I want to caveat what I'm about to say is that what I'm about to describe is not actually how real CPUs work, right? In a real CPU, you have to worry about word alignment across cache lines. And a cache line is 64 bytes. The example I'm going to show you is doing word alignment across 64 bits, right, but it's just easier to show you at the smaller case. So be mindful when we talk about, what I'm about to talk about, that in the real world, it's actually a much larger alignment issue you have to deal with. Okay, so say we have our table that has four attributes, and we have our byte array that we have before. But now we're going to break up our byte array into 64 bit words. So now when we go and allocate space for this particular tuple, we're going to ignore the header stuff for this example, right? We would see in the first case, we have a 32 bit integer at the primary key. So the ID would be stored as half of this word here. Then we have a 64 bit timestamp. And now we store that in this case here. In this here, we see now our timestamp for this attribute spans two different words, right, it's going across the boundary. Because it's sort of halfway in between the second guy. Then we have our 16 bit chart two. And we don't need to store null for that, right? We know exactly how many bytes we're using for that. Because it's always going to be the same size. And then we have a 32 bit zip code integer field. And then again, we see that we span the boundary of two word bytes, or two words. So what's the problem with this? What can happen? Why is it does architecture? Why is this a bad idea? All right, so what happens if say I want to read date? I want to read the date attribute. I do a load on the memory controller to go fetch this tuple. What happens? How many reads do I have to do? Two, right, you have to go fetch this first word to get the first half, and then get the second word. Now that assumes that your CPU can do this. Not all of them do this. So let's say that we have this fetch of 64 bit value that is not word aligned. It should be three choices. There's three things that can happen. And I think the first example is what they do in x86 and x64 is that we're going to do two separate reads to go get the two words. And we're going to throw away the lower 32 bits for the first guy and the upper 32 bits for the second guy. And then we combine them together into a 64 bit value, and we can store them in a register that allows then the program to access it and do whatever it wants. In other things, other CPUs, this could be completely undefined behavior, right? It could grab the first word and completely ignore the second one. It could grab the second one, ignore the first, right? It's completely up to the architecture to decide what it wants to do for your program. Maybe a compiler would be nice until you don't do that, but it may not be able to check that and prevent that. So this is terrible, right? This would be you storing data in your database, you go back and read it and you get garbage, right? We never want to have that happen. The third approach is that basically the CPU says, you're trying to make me do something that's going to be bad, it's going to be slow, and I can't handle it. So I'm just going to throw exception and deny your read. I think this is what the ARM processors actually do. So this is obviously bad, right? This is obviously we're doing more work than we need to, to go fetch the one thing that should be really fast. We're essentially doing, for every read in this example here, we're doing twice the amount of work. Now we're doing this at a billion times per second because the CPU is really fast, but if you're just doing this over and over again, it's going to slow you down. So the way we fix this is do essentially padding. So in this case here, because we have ID was 32 bits and we have 64 bit words, we'll just put a bunch of zeros up into the 32 bit, 32 zeros to pad us out so that the timestamp field that comes next fits nicely within our 64 bit word. The same thing for the char and the integer over here. We just pad a little bits over here at the end so that the next tuple when it starts, starts exactly at this word, right? And then we don't have this problem anymore. And again, remember I said that in this example, I'm assuming that we can fetch things at 64 bit words. In a real CPU, in a modern CPU, you can fetch things at 64 byte words. So that's 512 bits. So basically what happens is, if you know that the, say I want to store two tuples in a single word, as long as that's less than 52 bits, I'm okay, then I pad out a little bit more so that the next tuple starts a little bit later, right? And all the memory database systems are going to do this, because the performance difference you get is quite significant, right? So this is clear what word alignment is. And again, so in our case, you usually don't do the padding inside, in between different attributes, you do it at the end of the tuple, right? Because it's sort of less bookkeeping to know that there's a pad here and whether there's a pad or not. Okay, right, so now when we do our read, it's a single fetch and we're good to go. Okay, so that's basically all you really need to know about how a database system is going to represent tuples and memory, right? We can leverage as much of the behavior of C++ for user-defined types and time stamps and other things like that. There's a little bit of extra work we have to do. We have to make sure we're word aligned. But in general, there's not that much, there's not all this extra stuff. It's not magic, right? Just taking bits you would normally store in the C++ struct and we're just storing it in a byte array. And this also assumes we're using NSM. Okay, so now I want to talk about the storage models. And this was the paper you guys were asked to read was an example of a hybrid storage model. But I liked how they sort of layout the pros and cons of the other two sort of extremes of storage modeling or data layout for database systems. So we're going to go through the NSM, the DSM one by one, show you what they mean, show you what the impact of them are. And what sort of design decisions we have to make. And then that'll segue into understanding why a hybrid storage model is, I think, the better way to go. Okay, so the NSM storage model is, I guess it's sort of redundant. Saying NSM storage model is like saying ATM machine, right? So in NSM, basically the database system is going to store all the data for a single tuple contiguously in memory. It's basically the example I just showed you before. We have a byte array and we'll put all the attributes one by one. And when it's done, then we switch over to the next tuple. So this is ideal for transactional workloads or OLTP workloads because these transactions usually deal with single entities in the database. And when I say entity, I mean sort of an analogical idea within the application itself, right? Think of like on Amazon, you as a customer is a single entity. So you have orders, you have payment information, you have shipping information. All that data corresponds to your single entity, but it may be spread across multiple tables. But any transaction that you're going to invoke on the front end of Amazon is just going to be accessing your data. It's not going to be doing full scans across all different entities. So these, by storing all the tables, sorry, all the attributes for a single tuple together in a table with the NSM approach, it makes it really easy to do these types of queries because you're doing like select stars to get all of your customer information. So you don't have to stitch anything together. You just go to jump to some location, some offset. And then you know you can scan everything in memory right there and that's the tuple that you want. It's also good for insert heavy up workloads as well, right? Because you don't have to split new data that comes in. You just append it to the table or write it to the table in this entirety. You don't have to jump to different column locations and write it out separately. And again, when you think of like OLTP, right? When you think of a lot of updates and a lot of inserts, right? Think of like Amazon, you're placing new orders, you're adding new payment information, you're adding new items to your cart. All of those are just inserts, right? So the NSM approach is much better for this. And of course we're also going to be using the tuple at a time iterator model because this makes sense in a tuple-oriented layout because if I'm doing a query that operates on just one tuple, I make one call to the operator below me in my query plan tree and I get all the bytes or all the bits for that tuple. So making the invocation per tuple is not bad in this model because we're usually only looking at a small number of tuples at a time. So there's two different design decisions we can make about how we actually store an NSM database. And these can be described as either heap organized tables or index organized tables. So in a heap organized table, essentially it's the example that I showed in the beginning where the tables are just stored in some separate data pool. And they're not sorted potentially on any particular attribute in the table, as opposed to an index-oriented organized table where the tuples are actually stored inside the index themselves, right? You think of in the B plus tree, the leaves of the B plus tree could have the actual tuples and not pointers to tuples. Like we showed in the very beginning. So I'm going to go through this in more detail. But I want to point out this is not actually the same as a cluster index as you would learn in class, in like an intro class, right? You can have a cluster index on a heap table but not store the actual tuples inside the index itself. So we'll go through this, right? So a cluster index is basically you're saying to the database system, I want my table to be sorted on some particular attribute, like a primary key. And this allows you to do certain kind of optimizations, like doing binary search for scans, the fine data you're looking for without having to do a full sequential scan. And so you can use a cluster index for either the heap or the index-organized table. If you have an index-organized table then it's always, by definition, has to be using a cluster index, right? So MySQL does this. MySQL, when you create a table, whatever you set as the primary key is how the, is going to be the cluster index for that table. And the tuples themselves are restored inside of the leaves of the B plus tree. In the case of something like Oracle, you can use, you can define whether you want a heap-organized or an index-organized table. And it does, it makes different choices in the optimizer based on what your table looks like according, based on how you set it up. So the one thing to point out, though, is cluster indexes are nice. Some systems, again, like MySQL, always use them. Other systems, like Postgres, can't use them at all. We talked about, I think, this in the last class. And the reason is because if we're doing the insert method at MVCC, anytime you modify a tuple, we're always depending it to the end. And so that means that would, that would wreck the sort order if we had a cluster index, or it would be very expensive to do an update, because we would have to go move things around and make sure our guy gets inserted into the right location. If you're using, like, the rollback segment approach, or the delta approach that MySQL and Oracle use, then this is not a big deal. Because you can still maintain the sort order for the master tuples. And because you're writing all the changes you make to a rollback segment that doesn't have to be sorted. So in the case of a Postgres, you can't use a clustered index. There is a cluster command, and what that basically does, it forces the database system to sort the table as if you had a clustered index. But it's a one-time operation. So if you do it, and then you modify the database, it'll come, you know, go out of order again. So the main advantage of NSM are it's going to allow us to do fast inserts updates and leads. We'll be able to support index-oriented table storage. And it's going to be, because this would be good for any kind of tuple that wants to do, or any kind of query that wants to access the entire tuple. The disadvantage though, and we'll see this in a second, is that it's going to be not good for any OLAP queries that have to do large table scans or need to only access a subset of the tuples, right? We talked about this when we talked about the decision support systems, where they're maybe only accessing a subset of the columns, and you want to scan the entire thing to compute some kind of aggregate or a roll-up table that you present to a human. So this is, we extinguished the NSM with the DSM, where it's the column store stuff that we talked about with the bitmap indexes and the columnar indexes in SQL Server. And the basic idea here is that we're going to have a, for a single attribute in a table, we're going to store all its values across all tuples, continuously in memory. And this is sort of like the same thing as vertical partitioning, where we're sort of splitting up the table into vertical chunks. And we can have our query optimizer choose plans that are optimized to execute over those large segments. And this works for read-only queries, because this works better for read-only queries, because we don't have to worry about updating the thing all the time, which would be expensive, because we'd have to take our tuple and split it up and write the different locations in memory. And for this, we want to use the vector at a time approach that we talked about before, because we can allow the query plan operator to get a chunk of data, or a vector of data, from one of these columns and pass that up to the query plan. And not pass around data we're never going to end up using. So the great thing about DSM, if you have a good query planner, it allows you to only access the data that's actually needing the query plan, and you don't waste time fetching things that you don't need. It's less of an impact in the in-memory system, because we're not worried about disk IO, but especially in a disk-based system, switching to the DSM for OLAP queries makes a huge, huge difference. But we, the advantage we're going to get is better casual quality and other things, and smaller things that aren't going to be pronounced as avoiding disk IO, but still can make a big difference in implementation. So, although the sort of column stores or DSM systems seem like they're kind of a recent phenomenon in database systems, they're actually really old. So the first known DSM database system was called Canter, which came out of the Swedish Defense Research Group or something like that. It was an internal product in Sweden. It never actually got commercialized, never solved like today. There's only two papers that talk about what it actually does, but the, and the other thing that sort of makes it, a lot of people, part of the reason why people don't know about the system is when you read the papers, they don't talk about being in a column store, they don't talk about having a vertical architecture. They talk about in terms of transposed files, basically taking things that was a row store and you transpose it to make it a column store. So using slightly different languages, because it was the 70s, but this is sort of the first known DSM implementation system. Then in the 1980s, there was a paper done by some academics, I think in Texas, where they sort of talked about at a theoretical level the idea of a decomposition storage model or these column store systems. And they sort of proposed or thought about how query optimization could be improved if you assume all your data is stored in columns instead of rows. It wasn't until the 1990s that this first sort of commercial implementation of a column store came around. And this was done by CyBase and it was a system called CyBase IQ, which is still around today. The part of the reason why it didn't take off as much as the later column stores that came around is that CyBase was sort of treating this as like a query accelerator or a cache, right? It wasn't treating us as the de facto source of the database. You would still have CyBase ASC or whatever, whatever it's called back then, and you would put this thing in front of it to sort of accelerate certain queries, right? But it wasn't considered as a standalone system. Then in the 2000s, we saw the rise of column stores as we sort of know them today, and this is when they came really big and everyone came pretty obvious to everyone right away that switching to the DSM for OLAP makes a huge difference. So Vertica was the commercialized version of Seastore, which was the first column store system from Stonebreaker. People had MIT and Brown. VectorWise was an optimized version of Moen ADB. Moen ADB actually came first, but it uses M-Map. It used materialization, iteration model. And VectorWise switched to vectorized execution and a more refined buffer pool model, or buffer pool manager, than what Moen ADB did. And so these are sort of the first ones that came out. They were like, yeah, column stores make sense. Look at this huge performance booster you can get. Vertica got bought by HP. They actually have a lab here in Pittsburgh. And then VectorWise got bought by Actian, which is the new name of Ingress, the original Ingress company. And now if you take Ingress and you select I want a column store, you end up getting VectorWise. So part of the reason why these systems took off at this time is this is sort of at the point when all the major internet companies started having huge huge data sets. It used to be in the old days only sort of a few of the Fortune 50 or whatever had these really, really large databases and were hitting the sort of the scalability bottleneck problems that people are facing now. And once the internet came along and everyone had a lot of users and they were collecting a lot of data, it became obvious to more people that this was a huge problem and then why the NSM approach was not going to work and you had to switch to something like DSM. So in the case of Vertica, Stonebreaker was actually at Walmart Labs and saw them dealing with the problems of their Teradata database and that's already came up with the idea of C-Store and eventually became Vertica. So then now in the 2010s all the major database vendors like the Big Three, IBM, Microsoft, and Oracle all have some kind of DSM columnar extension or product in their database systems. So you guys read about SQL servers, columnar indexes, we'll talk about Oracle and IBM's fractured mirrors in a second. So basically now everyone says, yes, ComStores is the way to go for OLAP and they have some way to support them. And then there's also now a bunch of systems out that are now being able to support beyond what they were done in the early days. They have systems out that are ComStores as well. So Amazon Redshift was based on Park Cell. Cloudera and Pala is a new system that's built on top of Hadoop, HDFS. SAP, Honda, and Memsegal we've talked about before. And these also have like you can declare tables as being a ComStore. So the DSM model is becoming more prevalent now in hybrid systems because you need to support fast transactions and the analytical stuff on the ComStore data. So another question you maybe have is like, well, if I have an NSM data, can I still use a clustered index? And the answer is yes. And a lot of systems actually do this. We talked about last time about sorting data based on some attribute to maximize the amount of run length encoding compression you can get. So this is the same thing as a clustered index. You're probably not going to have an integrated index in a column store system because that doesn't really make sense. Because how would you store a large column in a B plus streamed leaf? That's kind of weird. And so what you normally have is you would pre-declare for a particular table. This is the sort order I want for my columns. And then that's how the attributes will be sorted and organized. And you do this again not to, you don't do this to reduce the amount of, to prove a form of some queries in terms of transactions. But you're doing this to maximize the amount of run length encoding or compression rate you can get. And we'll talk about compression later on in the semester, beyond the dictionary encoding stuff we talked about before. But the basic idea is the same. And one interesting fact to note in the case of like Vertica, for example, it doesn't have any notion or doesn't have any indexes. Because it pre-sorts everything. So once you have all your columns pre-sorted, based on attributes, you can use binary search to quickly jump to the ranges or attribute values that you're actually looking for. So yes, you can do cluster indexes in a column store. And for the most part, it's all sort of inherent and implicit in the way the data is organized. And it's not something that you specify manually. So the next design system at the face in a column store is actually how do we identify tuples in our columns. So remember how I talked about in the columnar index as in the first implementation in SQL Server, you had no way of mapping an offset or a tuple in an attribute back to the original tuple in the row store. So in a column store, we need to be able to identify across columns that are we dealing with the same tuple, the same logical tuple. And the two ways to do this are just to use fixed length attribute values in each column so that you can easily jump to a particular position in that column and know that you're dealing with the same tuple. So in this case here, if I want to get to tuple number two, I know that I need to jump three places after the starting address for my column here, and I know I can go across and do the same thing for all the other ones. So I know that if I'm looking at this position here, I can get the other positions in the other columns. And the thing we have to be careful about this is that when we want to use compression and other encoding schemes, we may have to sacrifice some compression we can get to ensure that we still have fixed length columns. And maybe I do use some padding or something to make sure that we can always jump exactly to the right place for each column. The other approach is to use embedded IDs. This is probably less common. I actually don't know if any system actually does this. But the basic idea is that we would embed some kind of tuple identifier in the value for every single attribute, for every single tuple in our column. So this here, position zero, we store zero, position one, we store one. And the idea is that as we're doing a join or need to stitch the tuple back together, we would know how to match up the twos of going across all the columns. And this would allow us to do variable length and coding for the values in this particular column. But then we pay a penalty because now we're storing potentially a 32-bit or 64-bit integer ID for every single value. And that's really wasteful. Yes? Yeah, so this question is, if you sort the columns in the offset case, does that mess up the order? And the way it works is it's sort of a global ordering across all attributes. So you can't say, sort attribute A based on this, sort attribute B based on that. It's sort of like when you think of an order by clause, you can specify the attributes separate by commas. And it'll sort in that order. So let's say that we're going to sort from A to A, B, C, D. So we would first sort A based on whatever we want. And then that, in turn, would cause B to get sorted as well. So the offsets have to match all the way across. So again, this is why they were talking about it was MP complete problem to come up with the optimal sorting scheme for your columns that can maximize run length and coding compression. Because the best way to get run length and coding compression in A may be the worst way in B. So a DBA usually has to decide, or they have that tool in the case of sequels, or the verti-pack thing or the trademark and the paper, which is always kind of weird. That tool would tell you, here's the correct order of the sort things. So you may want to sort C first, and then B, and then D, and then A. Right? The question is, if you say I'm done with these two values, however I specify I want to sort A, you would have to sort B in sort of the same manner. Yes? Yeah. Again, as far as I know, everyone does the offset thing. But when you read the literature, they talk about, oh yeah, you can do embedded IDs. And I don't know if anybody actually does this. Again, you give up potentially worse compression, because they have to be fixed length. But in exchange, you don't have to store this. And then, again, it's super easy just to position two, jump across and get everything you want. So the advantages and disadvantages for the DSM is that, again, we're going to reduce the amount of work we have to do in our system. We're not really reducing IO, because we're not reading from disk, but we're reducing the amount of data we have to pass around from one operator to the next. Yes? The question is, yeah, we're not talking about replicator. You could have replicator columns. We'll take that offline, but yes. OK. So we're going to reduce amount of waste of work we have to do when we process queries, because we can avoid walking through and accessing data that's not even needed for the query. And as we saw in last class, if you store things in a columnar format, as in DSM, you're going to get better compression, because all the attributes that we stored contiguously, all the values for a single attribute we stored contiguously, and chances are, they're probably going to be very similar. So in the case of storing the state, two-letter state code for someone's address, all those attributes are going to be pretty similar going down the line. So you'll get great compression. The disadvantage is that it's going to be slow for any kind of thing you would want to do in a OLTP transaction, because it's going to require you to split tuples up when you store them and then stitch them back together when you need to produce an answer to the application. If you call select star on a column store, you have to go to all those memory locations, jump to the offset you want for that tuple, and then combine it, copy a bit together back into a single byte array, and then give that back up to the application. OK. So I want to make an observation now about how database does actually work. So we've been sort of just talking like, oh yeah, you have data that you want to store in a DSM, and you've run analytical queries in that, and you have data you want to store in an SM, and you want to transaction on that. And what that actually means is that you actually want to split up data within a single table possibly in different ways. Because not all the data that's in a single table will be accessed in the same manner. So the example I like to think of is like, think of like eBay. So when you first create a new auction in eBay, a new listing, that gets inserted into the database, and then that's going to get mostly used in transactions. It's mostly used for when you're accessing just a single entity for that listing. You post the auction, and then you do maybe modify the listing. People make bids on it, and then for the seven days while the auction is going on, so all these transactions are accessing that entire listing and making modifications to it. Then the auction ends, and yeah, maybe you update some payment information, some shipping information, but it's not getting all the bidding action that it had before when it was still on. And then over time, it gets even colder because now you're not even updating it anymore. The transaction's over, you've shipped the item out, they've paid for it, everything's fine, and you're not going to go back and look at old listings. So then what happens is now in the back end, eBay is going to run queries that say, if I sold dirty socks, they want to say, what are all the dirty socks that are sold on eBay within this period, or in this date range, or in this location? So now as it gets colder, they're doing analytical queries where it's not accessing my one listing for dirty socks, it's accessing all the listing for dirty socks. So I end up being used as an aggregation rather than a point query. But it's all still in the same logical table, right? Still a giant table for listings. My auction was, I added it in the beginning and I was accessing it as a transaction, but then over time, now it's only being used for the O last up in the back end. So the way people normally set this up is using what I call the bifurcated environment. So what you have is in your front end, you have your OEP data silos where you have all your new data comes in, all your transactions are updating the state. And then you're gonna aggregate all this information into a back end data warehouse where you wanna run your OLAP stuff. So this is pretty much the standard way people set up their database architectures now. So what'll happen is you would stream out updates from the data silos from the front end systems and run them through a component which is called an ETL tool, or extract, transform, and load. The idea is you're extracting data from the silos, you're gonna transform it into some universal schema and then you're gonna load it into your data warehouse. So think of this, eBay does this because if you post a listing, after 90 days they've removed the listing from the website and that's because they took it out of the data silo and then they put it in their back end data warehouse. And so in a lot of applications, this pipeline could be very time consuming. Some companies do this at the end of every day, they dump everything out that occurred during the business day, they run it through the ETL program and then they load it into their database at night. Other people do this possibly every hour, sometimes people try to do it every five minutes, but it's not immediate, right? You can't run any, if I insert something immediately into the front end OTP silo, I can't run analytical query in here on it because I haven't seen it yet. And what people wanna do is they wanna be able to take all the information they have in these data silos, do some analysis on them, and then update the front end applications with new information to make better decisions about the things that are gonna happen next, so they happen the next day, right? So this is really common in like online gaming applications. So like Zynga, all these data silos represent like a single game in Zynga, people are clicking on crap and buying things on Farmville and all those are new updates that get inserted into this and then they stream it out to a backend data warehouse where they wanna run some kind of analytical program, machine learning a model on it and be able to come up with different ways to make you buy stuff in the game. And so then they send the update back to the front end to the application so that when you start playing, it has newer information and make better decisions on how to get you to buy stuff. The example I always hear is something like, say you're like playing a puzzle game like Candy Crush, if say you play in the game and then you get a puzzle that's really hard and you're like, oh, screw this, I'm not playing this anymore and you put it down and then three days later you come back and start playing the game again, they wanna make sure that they give you an easy puzzle and not a hard one so that you get hooked again and keep on playing because if you get a hard one, you're like, oh, I remember why I stopped playing this in the first place. So they do tricks like that and so that means you wanna be able to do all this analysis immediately as the data comes in because it allows you to make better decisions. But again, the problem is just as there's a wait time to get data from the front end to the backend, there's a wait time to get data from the backend to the front end. So you're marshaling data back and forth. So this is what the hybrid storage model is designed to solve. We wanna have a single logical database instance that's gonna allow us to store both the hot data and the cold historical data. So the hot data as it comes in and then some large portion of our history did not do our analytics on it without having to move data back and forth. So we'll store the new data in NSM because we said that was fast for transactions and we'll store the older data in DSM because that's been more efficient for us to do analytics. And the key thing is here I'm saying it's a single logical database. It could be stored across multiple machines. We don't care. It just means that the application only sees one database, one table that has all its information, but physically it can be stored in any way that it wants or any location that it wants. So to implement the hybrid storage model there's basically two approaches. The first approach is to use different execution engines that are designed to operate efficiently on either NSM data or DSM data. And then the other approach is to use a single flexible architecture that can handle both types of data very efficiently and not have to worry about having separate code bases for NSM data and DSM data. So I'll go through each of these one by one. So in the separate execution engines, the way that they think about this is like you have two almost internal database management systems, separate database management systems inside your system that are both optimized to operate on either NSM or DSM data. So you think of this like say I take MySQL, that's an NSM database system. I take Vertica, that's a DSM system, and I can write like a middleware wrapper around them that combines these two systems together and present a single logical database to the application. So internally what happens when a query comes in, our wrapper has to make a decision, should I push it to MySQL or should I push it to Vertica? This is essentially what these systems are doing. And the trick about it is that you have to be able to combine query results from both different engines to make it look like it was a single database. And if there's ever a transaction that has to modify data that spans or makes modifications that span the two systems, the two separate engines, you have to use an atomic commit protocol, like two phase commit, to make sure that's done in a consistent manner. Because you don't wanna have like torn updates in across your two engines. So within this architecture approach, there's actually two ways to implement this. Fracture mirrors and DeltaStore. So we'll go through each of these. So the basic idea of a fracture mirror is that you're gonna have a complete copy of your database stored in a DSM format. So we'll have our regular row store system, and then we'll have a separate copy of the database in the same, running in the same process, the same database instance, that's gonna be for NSM. And so the NSM, sorry, DSM, the NSM is gonna be the primary location and the DSM is gonna be called the mirror. So any modification that comes in will always go first get put in the NSM and then there'll be some kind of background process that will then move it to the DSM side. So now if we have an OLAP query, we can have it just operate directly on the DSM data. This is why it's called a fracture mirror is because it's sort of like, it's a mirror that's split in a different way for the different storage layout. So this was first developed, first idea was proposed at University of Wisconsin in like 2002, but it's since been implemented in both IBM as DB2 blue and in Oracle in memory table extension. In the case of Oracle, this thing here is completely in memory and it's not durable. So because they're always updating itself based on what's coming out of the NSM. So the database goes down and crashes, this thing gets blown away, but then you turn the system back on and you can just rebuild it. I mean, sort of rebuilds it over time. So again, and then it's not, you as the application developer, you don't know that there's this mirror inside of it. You just write queries to the regular database system like I did before, and then there's an optimizer to make a decision about which of these two copies it should execute on. And the reason why somebody like Oracle would do this is because they have such a large customer base that's relying on the NSM side of things that you wouldn't want to go through and start making major changes to all that. This allows them to get the advantages of a column store in the same way that SQL Server did with their indexes without having to go make major changes to this thing. So if anybody doesn't want this, then this all still works the same. And you still get to reuse the same sort of database system ecosystem that you were relying on if you were an Oracle customer. The other approach is to use in separate execution engines is use what's called a Delta Store. And the basic idea is that we're not gonna have two complete copies of the database and store different ways. We'll have a smaller NSM portion where we're gonna store deltas of things that get changed from transactions. So if any time you have an update, we then apply the change to the Delta Store. And then there'll be a background process that will move changes and apply them to the main database which is stored as a column store in a DSM. And again, the same thing, we have a query, accesses the database, it doesn't know that it's being split across two different storage models and it knows how to combine the results from both of them. Again, in this case here, you need an execution engine that knows how to operate an NSM and you need to have one that operates on DSM. So SAP HANA is probably the most well-known one that does this. And again, the sort of reason why they went this route is not so much from a scientific standpoint, they did it from a software engineering standpoint. So SAP bought a bunch of database-driven database companies and sort of mashed the products together and ended up making SAP HANA. So they bought PTIM, they bought TREX, and they bought MaxDB. And so the front end sort of part is based on PTIM, the back end is based on TREX. So now they have two sort of separate, mini databases running inside a larger database system and therefore anytime you make a change that could span both sides, you have to use Two-Phase Kit to make sure things work out nicely and smoothly, right? Can you maintain your asset guarantees across both of the data stores? So the other way to implement a hybrid storage model is what we'll call single flexible architecture. And this is a single database system that has an execution engine that's written in such a way that it can handle both the NSM data and the DSM data. And then you don't have two separate run times, you don't want to run Two-Phase commit over these two things, right? It knows how to efficiently operate on both of them. So I'll note that a database system can still use the Delta Store approach that I just talked about in a single engine architecture. You could have your database engine, you could treat the NSM side as a Delta Store rather than its own sort of table space, and this is what Hyper does. But again, the main idea that you don't have separate code pieces that are executing queries and matching the results together. So this brings us to what the paper you guys read, the H2O, the H2O storage manager. So again, you understand it's not a full-fledged database system, it's sort of like you can build a storage manager that can do the bare minimum operations you need in a database system, like transactions and updates and queries, but you don't have support for SQL and the networking and stuff all on the top. And it's actually not clear from the paper whether they support durable writes and things like that to disk, it's a logging. So in H2O, the basic idea is that we're gonna monitor the access patterns of queries as they come in and see how they're gonna use different attributes in our tables together, and then we wanna dynamically modify the layout of the database such that we switched to a DSM model, we're optimizing the layout such that attributes that are used together are paired together in storage. The idea is that if you can reduce the number of reads and number of stitching, the amount of stitching that you have to do if you combine columns in a smart manner. So the basic way they're gonna do this is anytime you have a new query that comes in, they're gonna make another copy of the database or the table and store that table and decompose in a different way, right? So it's sort of like the fraction mirror approach where you're still always gonna have the original copy of the data, but then you can make multiple copies of it that are then each optimized for the different layouts that you need for your queries. So let's see basically what it looks like. So let's say we have three queries here and they wanna optimize, or they wanna access attributes in different ways and this is our original data. So we'd see here for this first query we're accessing B and C together, the second query is accessing B and C together. So they would run this background algorithm that would monitor these access patterns and then come up, I think we have A here being used by itself, and we would come up with a decompose storage layout that we had B and C together because they're used together often in queries. And still we'd have to maintain the original data. And as they show in the paper, as you do more analytical operations to get more complex, this kind of hybrid layout is better for queries than just doing the two extremes of everything as NSM or everything as DSM. So the problem with the H2 approach is that if you read the paper carefully they don't actually support transactions. They don't actually support any updates. All the queries that they're doing in the workload are all read only. And they're also doing this copying of this data which I think is kind of redundant. Every single time you have a query that wants to use the layout in a different way, you're making a new copy of the data. So I think that's bad and the other problem that they have is that they're doing the updates as they access the query. So as they execute the query they're using query compilation to come up with a very efficient implementation of the query execution plan. And they're using that as the means to copy things as you go along. The idea is that you pay a penalty the first time you execute a query to reorganize the database or reorganize the table. But then you have a query that executes the same attributes in the same manner. The second time comes around you'll have an optimized layout for it and you get better performance that way. And so the key thing to point out to think about in the H2O paper is that it's assuming that all the tuples in a table are gonna be accessed in the same manner. It has no way to categorize things as hot and cold. So in Peloton in our system what we're able to do is we're able to identify that not just attributes are used together in certain kind of queries but also certain data itself is used in these queries together. So what we can do is we can examine the actual where clause or the access patterns of queries themselves and recognize that this query here is updating hot data in a transaction. But then for this type of query it's accessing cold data. So rather than just taking for a single table and splitting it up in all the same manner across all the tuples, we can actually split it up so that some of the data is stored as NSM and some of it stored as DSM. So in this case here the hot data is in NSM and the cold data is in DSM. And this is what that logical tile stuff you guys were using in the first project is all about. So extracting from your operator what layout you're actually using for each individual tuple. Because otherwise you have to write specialized code in the database system to say all right I'm a NSM so I want to do my join this way, I'm a DSM, I don't want to do my join that way. The logical tiles hide all this sort of physical layout from the database system from the operator from the implementation. So again you guys again when you wrote your hash drawing you didn't care that it was one of these. You don't know, you know it doesn't matter, right? And this is why I think the H2O stuff it's a good first start but what we're working now in our own system in Peloton is to make this make this thing work efficiently. And we think query compilation is something that we have to use to make this actually work much better than what we can do now but that's something we'll talk about later in the semester. Okay, so to sort of finish up real quickly the key thing that's sort of missing from all of this is not just how to identify what attributes are used together in queries but to identify the individual groups of tuples that are used in those queries. And there's three basic approaches to how you categorize data. We'll talk about more of this when we talk about larger than memory databases later on in the semester because that's a big part of, doing this is a big part of how those things work as well, right? You identify here's the cold data and I'll shove them out the disc because I probably won't need them later on in transaction. And so the same concept we have to worry about in our tile architecture. So the first approach is you have a DBA specified manually that some table is gonna contain cold data and therefore it should be stored as a DSM versus an NSM. So this is what MemSQL currently does in their column store implementation. You have to have, and this is what the SQL server columnar indexes we're doing. You specify this table's read only and therefore it can be stored as a DSM. This is not, it seems sounds terrible but it actually is used in practice quite often. In the second approach is you have sort of an offline component that can monitor the access logs of your database system to see how individual tuples are used and then be able to categorize them based on that. So this is what Hecaton does in their Siberian implementation. And then where we think we really wanna go in our system we're not quite there yet is we wanna have an online approach that is able to monitor tuples as they're actually being accessed and then have some component integrated in the database system that can make decisions and categorize blocks of data based on this access pattern without having to worry about looking at logs offline. Okay, so what do we talk about? We talked about the different layouts, how you organize data in memory database. We talked about DSM and NSM and then we spent some time at the end talking about how you actually support a hybrid storage model. And I think why the H2O approach is not completely there because it doesn't support all the updates. So it is my opinion that having a sort of single flexible storage architecture or execution architecture for a database system is the way all the new systems are rebuilt from going forward. This is the major trend I think that's coming on now before we had these specialized systems like Vertica versus Voltdb, they support OLAP versus Voltdb. Now we're seeing these hybrid workloads or hybrid database systems be able to support these transactions and analytical operations all in a single database. And the current systems that are out there for the most part are using the separate execution engines because they're mashing together legacy components into a single system and have to make it operate correctly. But I think newer systems like Hyper, like ours, and like MemSQL will be able to support in a single execution engine both these types of layouts. And I think once we achieve this, a relational database system is going to be able to support all known workloads that are out there today with the exception of machine learning with matrices. So that means if you have a graph database, you have a graph database, you have a key value store, you have a document database, all these things can be represented efficiently in a relational database. And you can do all, for the most part, all the operations you'd wanna do in these systems or in these applications directly in a hybrid database system that supports a flexible storage model. Any questions? Okay, so, do you always gonna spend some time talking about how to do profiling for project number two? Here's the mic. Dank tips, okay. So what's profiling? I'll first just give a brief explanation for what I mean by profiling. So normally if you do program analysis, you can do it in two ways. You can do it without running the program and people normally call it a static program analysis. And the other way is to actually run the program and do your analysis. That's called dynamic program analysis. Profiling is another name for dynamic program analysis. So it gives some motivation for how one would go about doing this before I actually give more details on how you can use it on Peloton, particularly. So consider hot program Z with just two functions, foo and bar. And how can we actually speed up the program with just a debugger? Like let's say you have GDB. One way to do it is to run the program under your debugger and randomly pause it while it executes. So it runs for some time, you pause it and you basically collect the function call stack. And you resume it after it runs for some more time, you again pause it and collect the call stack. So consider this scenario. You have collected 10 call stack samples and let's say six of them were basically containing a function foo. So what percentage of time do you think is spent in the function foo? You have 10 call stack samples and six of them are in foo. Yeah, so that's the basic idea. So 60% of the time is actually spent in function foo. Now it might actually be the case that the program has actually done in bar and just at the right time when you actually collect the call stack sample, it ends up in the function foo. So it may not actually be correct. So the accuracy of this entire process increases with the number of samples. The more samples you collect, the more accurate it is. So this method is basically poor man's profiling. You can also call it as the random pause method. And in fact, this is the technique that's used in GNU's profiler, Gprof. Okay, so let's say we optimized foo to run two times faster. What is the expected overall speedup? So 60% of the time spent in foo basically drops in half. Let's assume that foo and bar are actually not correlated. So the 40% of time spent in bar is unaffected. So the answer, you can get it by using Amdahl's law. So the overall speedup is basically given by that equation. So P is basically the percentage of time spent in the optimized task. And S is basically the speedup for the optimized task. So P basically drops by S and the rest of the fraction is basically unaffected. So it's one minus P by one. So the overall speedup is basically one by, the 60% drops in half and the 40% is unaffected. So it basically the program speeds up by like 1.4 times. So you can use Amdahl's law to get an estimate of the performance improvements you would get by optimizing a specific function. Once you know the amount of time the program spends within that function. Okay, so that's some general context. Unfortunately, the random pass method does not scale well. You can't use it to profile Mozilla or Peloton. So these are some common profling tools that you can use for identifying bottlenecks in Peloton. So the first tool is, the first approach is basically to use Valgrind. Valgrind is a heavy white instrumentation framework and it has a lot of tools. You would have used it for figuring out memory bugs. So it also has nice visualization tools to actually like understand the collected information. The other approach is to use Perf. Perf is a lightweight tool. It's basically a Linux specific subsystem. You can use it to collect different kinds of information about the program. And it only has like rudimentary console oriented visualization tools at least for now. So how can you use Valgrind? So Valgrind is a dynamic binary analysis tool. It is an extensive framework. You can actually like use it, use its memcheck tools, figure out memory errors. You can use its call-grind tool for basically generating call graphs. So in our case, we'll be using the call-grind tool. So you can use call-grind to profile the specific program in this way. So you just mentioned the tool and you also should trace all the children and you specify the program. So you can also like do it for the entire Peloton using this command. So Valgrind is the basic idea of how Valgrind works is it emulates a virtual processor. So you give it a program, just a binary. It converts the binary's program to a intermediate representation. And within that intermediate representation, it basically instruments code. And then it runs it on this virtual processor. And then it collects information and gives it back to us. So it's pretty heavy weight because it actually has to do just-in-time compilation, dynamic recompilation and stuff like that. But it is actually very useful. So you can use a tool called K-cache-grind to actually visualize this information. So you will get something like this. If you run Valgrind and then use K-cache-grind to view it. So on the left-hand side, you have this cumulative time distribution. So it basically says 90% of the time was spent in DL start. So that's the distribution you get. So within a specific like B3 index, 4.95% is spent within this function. So that's the time distribution. And on the right side, that's a nice function call graph. So you can actually see that the 60% of the time was spent within, 70% of the time was spent within that function. And within that function, like 30% was spent in this function, blah, blah, blah. So you can use the call graph to identify how the time was spent in different functions. One gotcha is that we count CPU instructions in the emulated virtual processor. So it doesn't really take IO into consideration. So if you're actually IO intensive and something like that, then Valgrain might not give you the entire picture. So yeah, so that's Valgrain. The other choice is to use Perf. Perf takes a different approach. So while Valgrain was actually using instrumentation, a heavy weight instrumentation to collect this information, Perf is a lightweight framework. It doesn't do any instrumentation, it just does sampling. So it uses the hardware performance counters that are actually available in modern processors to collect this information. So let's say you have this binary. Again, it's a dynamic binary analysis too. So you just do Perf record, and then you basically sample some event, sample some event, and collect samples with some frequency. So that's the information that's given there. So you figure out which event to sample and then at which frequency you want to sample things. And you use the counters, the hardware performance counters for tracking these events. So every time this performance counter basically overflows, the kernel basically collects a sample with that frequency. So the sample basically contains information about the actual state of the program execution, and the events can be like anything. So one of them might be cycles, just CPU cycles specifically at the user level, not at the kernel level. So if you use this tool, you can actually visualize it using something called Perf report. It's a console oriented tool. So you basically get something like this. So here, so this is the program name. So those are the functions. This is also a time distribution. The difference is that it's using sampling instead of instrumentation. So it's actually like for real. The actual time spent during the entire program execution is actually given here. It takes IO also into consideration. On the other hand, Valgrind is a virtual processor, so it only takes CPU instructions into consideration. So you can use Perf in other ways as well. You can use Perf to figure out where in the program are you incurring a lot of cache misses or like branch misses. So to see the list of events that supported on your processor, you can use this command, Perf list. So you'll get a list of hardware events, software events, et cetera. And so this is another usage example. You can monitor both cycles as well as load misses, the last level cache load misses with that frequency and you'll get both information when you visualize things. So those are the two approaches. You can use Perf and Valgrind. So there are some more references here. The quick start guide will basically give you more information on how Valgrind actually works and how we can use it. And that is this nice tips for actually like common gotcha. So we are using dynamic analysis, not static analysis. So because of that, the information we collect depends on only the code we actually execute. So it depends also on the input. So you might want to run it on a different representative inputs to actually like get a more holistic view of the program. So here are some more examples on, so this one in particular, you can use this for like a lot of one-liners for how you can actually use Perf. So this was actually like posted by someone named Brendan who works at Netflix. So at Netflix, they use Perf extensively. They collect a lot of flame graphs and use it for optimizing the entire system. So, yep.