 This will be the the the last lecture we're going to do on on multi-virgin country troll and again today we're going to focus on on garbage collection because that's super important for an MVCC database system. So we've sort of already covered this in the last two lectures but let's go into a little more detail what's actually going on. So the reason why obviously we need to do garbage collection in an MVCC system is because we have to identify physical versions that are reclaimable and then remove them right because otherwise we'll just run out of space. I think I said in the beginning that with time travel queries postgres when it first got when it was created in 1980s they didn't do any garbage collection at all because they said we want to support time travel queries. I mean you can go do queries on the database as they were back in time and of course in the 1990s when people actually really started using postgres outside academia the first thing they did was add back garbage collection because you run out of space pretty quickly if you have a lot of churn in in your database. So it's sort of obvious why we need to do this. So our definition of a claimable is going to be a physical version where there's no actual transaction that's running in the system that can see that particular physical version meaning it's not visible to that that transaction under snapshot isolation or obviously if the version was created by a board of transaction we don't want that sitting around forever and we want to go ahead and clean that up right. So the great thing about multi-version correctional because we're recording those timestamps in order to provide snaps to isolation that we can just use all those same timestamps to determine when when tuples are actually visible or not. And the idea here is that the the timestamps we're using to assign transactions to understand their the global ordering is the same timestamps will use it in marketing versions or the lifetime of physical versions and we just say all right if no one can see this then we go want to go ahead and remove it right. So one thing we need to talk about though is and what was in the paper you guys read from the hyper team is this notion that like the complications that that are going to rise if you now start having transactions or queries that run for a long time. Again I said in the OOTP environment the transactions are almost always short lived right update Andy's account on Amazon commit that transaction and you're done right. And so in that world updating those versions the transactions that need to maybe read those old versions they're they're not they're not sitting around for a long time so it's not like there's a long running transaction that needs to see the version of the database as it existed an hour ago in a pure OOTP environment we don't have that issue but now it's when we start throwing in the analytical workload and analytical queries then then we have to care about this. Again under snapshot isolation I need to see only the versions of the tuples that existed by or that were created by transactions that committed before I started. So if my query is going to take an hour then I need to see the snapshot of the database as it existed for that entire hour. Now I can run under lower isolation levels like we done committed then I should read whatever I want who cares but if you want to provide snapshot isolation then then you need to do this right. So then the issue is going to be what we talked about in the first lecture about what again what I call traditional garbage collection that was like all right I look at my time stamp and of all my active transactions and if anybody if any version is less than the smallest active transaction time stamp then I know I know and go ahead and prune it but now if I have these queries that are sitting around for an hour then that's gonna be you know that's gonna be a longer amount of time yes yeah yeah yeah I don't know if I covered h-tap in the in the intro class. So the OOTP and overlap and in the in the 2000s people basically figured out oh you actually want to have specialized systems for each of these and then you can run your OOTP queries on this database and your OOTP queries in this database and that way they have different design choices different goals and you can build a system that's optimized for both of them. This h-tap stuff is a bit newer concept hybrid transactional analytical processing and the idea here is that I want to be able to run analytical queries as soon as data arrives. So instead of waiting for me to use some kind of ETL process to offload it from the OOTP side into another database system I'm gonna run it on as immediately as it shows up. So that's a more common thing now right because the longer it takes for me to go figure out like if you're playing a game and I want to you know trick you to buy crap if I have to run the analytics on my back end machine and it takes a long time for that data gets transferred over then I may lose out on the sale. I guess like well why do we care about like maybe like running under snapshot isolation? Yeah so he has a perfectly good statement is like all right well for these analytical queries do we really need to run under snapshot isolation? Can we just run at a lower isolation level and is that good enough? In many cases yes and so this is sort of a, this is sort of a, how do I say this, a perception in academia that is actually slightly different or actually is quite different than what actually happens in the real world. So in the academic world we say oh of course you went around serialized whilst isolation or snapshot isolation but then in the real world most people run it like read committed because that's the default you get in PostgreSQL and MySQL right? So yes there are, there are some cases where yes you do want snapshot isolation you don't want serialized while queries for this analytical stuff most the time you don't need this. But even then like it does occur enough that we have to solve the problem. All right yeah we did a survey of DBAs two three years ago and we basically said look like 50 to 60 percent of all the papers in Sigma out of VODB they assume transactions run at serializable isolation but then you ask real DBAs and like like 10 percent of the transactions run at serialized isolation. Everybody runs it read committed because that's the default you get in real database systems and those who don't bother to change it unless you know what you're doing. So I say that snapshot isolation is, it was the bare minimum you would actually want to have for analytics you probably almost never want to have like serializable analytics. It doesn't make sense. Snapshot isolation is good enough. Okay. Actually it doesn't make sense if it's snapshot isolation then I see everything as committed that's I can't have any anomalies because I'm not doing writes as well. So yeah snapshot isolation is like the highest you'd ever want to go. Yeah okay. All right so I sort of already covered this but what are the problems of having these old versions? Again assuming we want to achieve snapshot isolation for our analytical queries. Right obviously we are increasing our memory usage because now we're creating new versions and our version chains are getting longer but we can't reclaim that memory. So the amount of storage space that our database system is consuming or using just keeps growing growing infinitely. Right but now that also means that if we have to now if we can't reuse the memory that from older versions because we can't recycle the space now we've got to go back to our allocator and potentially go back to the operating system through malloc and ask it for more memory and that call is not cheap and is not free. Like going to malloc and asking for my memory is definitely will come a bottleneck if you have a lot of threads doing this at the same time. Our version chains are gonna get longer again that means for transactions that have to traverse the version chain to find the right version that they want then they're gonna take even longer. Now if you're doing newest to oldest and most your OTP transactions are just touching the newest version right this is not that big of a deal. For the analytic queries yes they have to traverse the whole chain to maybe find the right version. The only system that I know that does oldest and newest is hecaton so they would have this problem but if you do you know if you do an oldest and newest then it's a long traversal to put the new one at the end that's only a problem in hecaton but for analytical queries it takes longer for them to find the right version they want. Another big issue which we haven't really talked about so much is this notion of having consistent performance and stability in in your database system in terms of performance. If now you have like these long running queries for an hour and then the hour the query is done and now you need to go back and clean up all the old versions that are finally finally reclaimable well now you're gonna have this huge spike in the CPU usage for your garbage collection threads because they're gonna say oh look at all these versions I can go clean up let me rip through that and and you know start throwing them away and now queries that are running at the same time they might now have a dip in performance because now there's contention on the CPU for resources because you're doing garbage collection right. So again like a lot of times in the real world a lot of companies or organizations they're they're very conservative with taking on new new new technology for databases because they want to have consistent performance. It doesn't help you to say like alright here's the latest greatest version of a new database system and it's gonna make 95% of my queries go go much faster but then the 5% are gonna be like randomly slow people aren't gonna want that they'll stick with the old stuff that they actually know. Alright and then the last one is going to be an issue when we start doing when we talk more about casuality and other things so and compression so if I now have all these old versions scattered around in my my table space then when I start doing if I'm doing this garbage collection all at once now I have a bunch of holes in my in my table that I can refill with other other objects but when I want to do compression I want to basically get a bunch of old data and compress it down because it's read only if my all the versions are sort of scattered around because I can't keep reusing the same space over and over again then I lose this locality and I have to do a bunch of extra work to to combine together objects that are related to each other within time and that way they can be compressed. This won't make sense right now I'll cover this at the end of the lecture okay. Alright so for today I want to spend a little bit of time at beginning talking about the issue about delete so this is something we didn't cover in the last two lectures I should have but it's in here now and then we'll focus on the different design decisions you have for garbage collection which is in the hyper paper you guys read and then we'll talk about block compaction which is the thing I was saying about combining together unused space across the the data tables in order to you know compact or combine them together and free up memory and then I'm missing the bullet point here but I'll finish off in doing a tutorial on perf which is what you need to do for project one okay. Alright so we've talked about it doing inserts we're talking about doing updates we didn't really talk about how to do deletes so inserts are easy right it's it's the first physical version of a tuple I find a free slot in my in my my data table and I just insert it right no big deal and then updates we already know how to handle right depending on what version scheme we're using right whether we do delta records or the pen only whether you know what direction the version chain is we know how to handle that. Deletes are a little bit tricky now because you need to basically record now that this this physical this logical tuple has been deleted and that even though someone may come along and insert that same tuple all over again that's technically now in another snapshot and you don't want to reuse the version chain and so basically you need a way to record to say all right well this thing is now deleted no other version should come behind it right so again we can't have any right right complex is only the first writer wins so if my transaction deletes this tuple before I you know before I commit you try to update it I'll beat you so we need to have the same sort of correct correctness semantics as before right so the question is now how we actually we're going to record that our tuple was logically deleted at some point in time because we can't delete the version chain because that's you know then then the existence is gone right so there's two basic approaches to do this the one is you maintain a separate flag somewhere that says that this logical tuple has been deleted and so you can either just store this now in like the tuple header that we talked about before or record all those timestamps or you can just have a separate column that's just a bitmap field that says the tuple within our block at this offset has been deleted so that means now when I start scanning our transaction starts reading the database I always have to check this thing first to see whether it's actually you know whether I'm going down a path that that's not the leader or not right so in our system we store it as a separate column that's a separate bitmap field right the other approach is to do a tombstone tuple and the idea here is that we store a new physical version a special physical version at the end of the version chain or at the beginning of the end depending what direction we're doing and then the somehow to indicate in that in the special version that it's representing that this tuple was deleted and then that you need to store all the timestamps and everything you would do before and that gives you no information about say when this when this tuple was actually deleted so anybody that comes prior to that in early snapshot please still see the older versions so one way to sort of improve this and make make this work nicely instead of having sort of polluting all of the we said a wasting space in their fixed data pool you could have a separate data pool to store these tombstone tuples because you don't actually need to restore you know store the whole tuple and then the version chain now this points points to this thing and you look and say oh with this bit set there's some bit pattern inside it says oh this is actually tombstone is tuple not a regular tuple this is only really the issue if you're doing a pen only right so again think about this if I have if I'm doing a pen only and I have a thousand columns if I delete that tuple and I want to create a tombstone tuple well one approach is I now make a special tuple in my same table with all my other tuples for that table that has a thousand attributes so it's wasted space just to record that that thing was actually deleted or I can have a marker that says hey you know or have a special tuple space to say this represents a deleted tuple and that means shared across different tables because we're not storing any attributes in the tombstone tuple we're just a marker to say this thing was deleted so in Peloton in our I would say not beloved but whatever the old system we killed we did it this way and we would use the special pool because we were doing a pen only like in Hecantan and therefore if we now create a tombstone tuple if we put it in the same data table it'd be a big waste of space and the newer system we use we use a separate column as a deleted flag yes this question is why doesn't there need to be a oh sorry different different tuples for deleted tuples or for different tables oh so question is if I delete three tuples from the same table why do I need a separate tombstone tuple for each of them because in the tombstone tuple you're recording the begin and end time stamp because I need to know when was this thing deleted and I think your other question too was well why do I need to have why can I have a why can I have a same tombstone table special table that can be shared across multiple data tables because we're not storing any attributes in the tombstone the tombstone just says hey you're dead at this time so it doesn't matter what table it corresponds to yes I think I think I think part of the issue is going to be that first right would win so who cares your market is deleted there might be some orderly issue of like can I record that it's been deleted and then I stop the goat yeah if I do that then if my transaction that deleted that tuple aborts then I got to go make sure I go back and remove that if I don't modify the the previous tuple other than just maybe the pointer then when I went on when I bought I stopped update the pointer yeah you might be able to get away with something like that I didn't think about that oh yeah if it was it was the student that built this piece of the system it's not that he liked the cut corners but he basically did everything is the most most efficient way but possible which is always not the best engineering approach so if there was a way to make that work I would and like do that hack you're proposing I suspect you would have done it but we can take it offline maybe do it on the board and figure out why that would have not work okay again so again for this one I don't think anybody does this other than us because if you're doing Delta store and the Delta store would do the same thing but if you're doing newest to oldest with a Delta store then this doesn't buy you anything because you do store in the in the header for that to bother things been deleted yes but that was his state question is like what about his update the end-time stamp would that be enough to denote that it was deleted I start to have a flag somewhere that says this thing was deleted if the end-time stamp is updated no new transaction can access that version right but it's it and you need an end-time stamp to represent that it was a special delete like maybe put you take the first bit and had that represented was deleted you could do that if I just put an end-time stamp in there I don't know whether that's a delete or not or whether it's actually a new version yeah yeah okay alright so alright let's now talk about different design decisions so I wanted this part wasn't in the paper but I want to talk about how we actually do clean up keys from indexes and then the there's only really two papers on garbage collection for MVCC systems I had you read the guys from the one from hyper which actually just came out a few months ago there's another paper from 2016 that I had the students read last year from the SAP team they're not but there they're both okay papers the what I don't like is that they sort of define the same define the same concepts but using different terms like I think like the hyper guys calls things like precision and frequency but that you know these guys called like identification and things like that like so the concepts at a high level are the same just that the nomenclature they're using to describe these things will be slightly different so I may mix up as we go along some bits from Hanna the Hanna paper and the hyper paper but hopefully it should all sort of make sense so we'll talk about how we're gonna track versions the frequency which we invoke the garbage collection the granularity we're going to look at potential versions we can remove and had to compare whether it's okay to bring them or not okay all right so for indexes again what's going to happen is that as my transaction runs and I'm creating new you know I'm creating new tuples I'm creating new versions I have to store that in an index because now if I try to go back and read that same thing that I just wrote I want to be able to go through the index and see my own rights if you're doing OCC was a private workspace you don't do this because you stage all the rights at the end but what we've been talking about is doing the timestamp ordering approach where you apply the rights in that in that global space because that allows you to do speculative reads for other transactions that are ahead of you in in in sort of logical time so now the problem is going to be is that if I need to abort or if I need to go clean up versions I need to make sure that I remove any keys that correspond to older versions that I need to remove right so the way to basically do this is that while transactions running and it's updating the indexes we just need to record you know what keys did we insert or do we do we do we invalidate from our index because we're making changes and then when we go to commit or abort a transaction we had to have the garbage collector kick in and say all right well let me go clean up the let me go clean up the the indexes because these are things that people shouldn't see right so the way hyper got around this was anytime you would you modify an attribute that's indexed then you treat that as a delete followed by the insert for that for that transaction because then you don't have to worry about going and finding the key and updating the pointer right you just say this here's the old version from the key and the index to remove it let me let me insert a new one so we did not do this in in Peloton we did something really really stupid like we're tardily stupid I don't know why this is one of those things are like the student did it this way because it because it looked like it may performance go better for our benchmarks and it wasn't till like later on we're like I've got to go modify this to fix some things like oh my gosh they did something bad we shouldn't have done this right so here's what we did in Peloton so again we were doing a pen only oldest to newest that's true yes and what would happen is if I have a transaction comes along and say the update a right and we set this key now to 22 well I would do an append only are dependent new to pull into my table space but then I would add a new entry in my index to say for key 222 here's the version of it so even though logically they're the same tuple from the indexes perspective they look like separate things all right so then we got into trouble was when we try to update the same key again on the same tuple instead of making a new entry in the index we would actually go and overwrite the previous version that we created so in this case here I would replace 222 with now 33 or 333 and then now I would also have update my index to now point to it as well and I kept doing this every single time I did an update right so this sets us to 44 and I would have an update entry to point to here so now the problem is when we go ahead and and how to abort this transaction we had no idea what was the other no 222 333 that we inserted and so we could go delete 444 because we know this is the version we need to clean up because that's what we're seeing in you know in our dirty table space or dirty dirty tuple list for our transaction but we had no idea that we've left there's other keys inside here so we would run this for a while run some benchmarks and all sudden you'd get if it was a unique index you'd get back an error and say that a key already existed even though it didn't exist in the table because we were leaking these keys from you know from onboarded transactions so you know this is embarrassing this is stupid you know this is not really anything specific about you know paper you guys read this is just showing that like it's hard to get these things right even if you have the best intentions and just not keeping track of all the keys that you update as you know are all keys you you modify in the index as you're going along can end up you can end up losing things so again we don't do this anymore this is good okay so let's talk about now how we're actually going to keep find a keep track of the versions that transactions are going to create that's the type of here sorry so the the first approach we sort of talked about we first talked to these two in the first lecture and PCC right this is where we just have some mechanism where a transaction threads or a separate garbage section thread can go through our tables and identify versions that we need to prune up right the background vacuuming was it was a separate thread the cooperative cleaning technique was where the transactions or the queries as they were running if they notice they had a version that was not visible to any transaction then they would go ahead and clean that so hackathon did this because they were doing oldest and newest so as transactions ran and had a traverse the version chain to get to the newest version along the way they would see a bunch of old versions that they knew were reclaimable and they go ahead and remove them right there right the transaction level approach is what we used to do in Peloton and this is where transactions would keep track of all the versions that they created and then when they commit they handed off this information to the garbage collector who then had a view on what transactions are running what are their time stamps and excuse me and they could identify which which you know which versions that were invalidated by these transactions are now are now prunable the last approach is it's an epoch based approach and this is basically the same thing as transaction level the idea here is you're gonna execute a bunch of transactions and not as a batch meaning all exact same time but you put them under this the same epoch and this will make well we'll see this again on Monday when we talk about the BW tree but basically it's like this other counter that's always going forward in time and we would know that when we go from one epoch to the next we would know whether there's any transactions that could still be seen something in the previous epoch and then if not then we know that anything there was that was invalidated in that epoch can be reclaimed we'll see again this more we'll see more about as we'll see this approach or technique used in the the BW tree on Monday when you read that paper all right so let's have to do the version tracking at the transaction level again so my transaction comes along it's time stamp 10 I do an update on a I create a new version right and then now because I know that I had this was the version that the latest version that I saw and then I created my new version so therefore I know for this transaction a to is now potentially reclaimable so if my transaction commits then I know I can go ahead and clean up a to so I just record that in my my transaction local space for my old versions and this is just a pointer to this location here and this can work these pointers can we can use these pointers because we're not going to end up we're not moving this thing around right we're not doing compaction where we can be moving it from one block to another thing we just say that this thing has to stay there now I do a an update on B same thing I have the old version that I know that I created I create my new version after the previous old version I update that now in my old version list then when my transaction goes ahead and commits I just pass this information long to the garbage collector we then look at and say well the commit time stamp for this is 15 so I know for these versions here anybody that's less than 15 times time stamp less than 15 should be able to see them right so if now there is no transaction that has a time stamp less than 15 then I know that these versions are are removable right so this is called I think the the the low water market high water market in the paper basically I just need to know for for these tuples what's the lowest time stamp I could have or if someone can actually still see this so if no transaction has something below that time stamp then I can remove it yes yes yes yes this question is if I have another transaction that's running that has a begin time stamp of what you can't because this guy already has 10 so nine okay so he comes along and he's gonna read a he would read this one right because nine is it isn't in between one and 15 right so yeah he reads a two no no so again the garbage question knows like it knows what other it has you have to know what transactions are running so in the hyper paper they talk about how the house this link list that's sorted by transaction ID and you can just look at the head or depending what order is to find out what the lowest one is that's like that's the low water market high water depending what you're looking at and the idea is that I know what transactions are running so if I if there's a transaction that's running with time stamp nine then nine is less than 15 so therefore it could potentially still read these things and therefore I can't reclaim them it may never read B but I don't know that because I don't know what the transaction is actually going to do so I have to be conservative and leave everything you know leave it around correct like and again you could use the sorted link list that the hyper does you could just record a single value there are different ways to do it okay so the next question is how often are we going to vote the garbage culture and again is this trade-off between if we're very aggressive in our garbage collection then yes we can go free up space as quickly as possible assuming there's not transactions that are sitting around the long time stamp but the issue is going to be we'll reclaim space more quickly but now we could be end up slowing down transactions because now the garbage collection threads if it's running in a background setup they start using you know the CPU and that's going to eat up cycles and make your queries and transactions run slower and obviously if we if we're too we're less aggressive and run this into infrequently then now we're gonna have the size the database is going to get larger because we're not reclaiming versions as quickly as we maybe should the virgin chains could potentially get longer and that means that it takes longer for queries to go find the exact version that they want so it's a delicate balance between the two of these I actually think the hyper approach is the better one than the background garbage collection so again so you run it periodically you run it continuously this is what hyper does this is what we do this is what those systems do so periodically means that at some fixed interval or within when some threshold is met like if I have if I know I have 20% of my memories being used by on our by reclaimable versions somehow I could compute this then I kick off the garbage collector and go find things I sort of the JVM sort of does a similar thing like when you get to your heap size when it gets by a certain percentage then it kicks off so this is just saying that right again I run it in the background thread and running it every so often and for some systems like in hackathon they can identify that if my load or my if my churn rate for my versions is really high then I can make this I can run this more frequently the the hyper approach is to run as continuously where the the garbage collection procedures are just a part of the normal transaction processing or query execution steps right so in hyper they did it on commit so anytime a transaction committed then they would have that thread go through and see well what can I reclaim what can I clean up clean up the the steam version in the in the newer version of hyper this is just doing to our query execution so I call this the same thing as cooperative query execution our cooperative cleaning the way same way hackathon does where just as I'm running my queries if I see things that I need to clean up let me go ahead and clean them up but because hyper is doing newest oldest they don't have the dusty corner problem that hackathon has when you could have versions that are never visited and they never get reclaimed right if you read it you'll see it and you can reclaim it so I think I would like to do this although the way we're sort of our system is sort of set up now our garbage culture is integrated with this other sort of background cleanup process that does a bunch of other memory management things so I don't think we can switch over to this I think we're stuck with this but again I like this better because it's like all right well if I actually a lot of queries that do a lot of updates well it's sort of as they put itself regulating because I'm creating a lot of versions as I run queries the those queries will clean up the versions that that are reclaimed all and if queries run slower because I have a lot of versions to reclaim well that's actually gonna then up and up slowing down the rate in which I create old versions so I like that model all right so now the question is how we're going to organize in internally the metadata for our garbage collector to determine whether we can go ahead and reclaim things right again and now again there's more trade-offs between whether we're gonna have sort of fine-grain tracking information to say here's a single version that that can reclaim within this timestamp or whether we can combine them together and just sort of have a timestamp across multiple tuples and that can amortize the storage costs of the tracking information for this in exchange or maybe not reclaiming things as fast as they possibly could be so this is just everything I just said now right so single version tracking would be for every single tuple I know what their version is and what their timestamps are and then when the garbage collector kicks off I can make a decision at that point decide whether it's okay to claim stuff so you get this sort of for free if you're doing the continuous or cooperative cleaning because as I scan along and my version chain to try to find the version I want then I would end up cleaning things up as I find them and I'm already storing that metadata anyway in the headers of the tuples or the records so it's not like any distorted the stuff separately the group version is what I shared before in that one example with the vacuum thread and the idea is here is that say here's a bunch of tuples that were invalidated by a transaction at some timestamp so anything that less than that timestamp for any transaction has a timestamp less than this could still possibly see them otherwise they don't right so again there's less overhead attractings but it may delay the time in which we can reclaim something and get memory back so there's a third approach that was in the HANA paper that you didn't you guys didn't read but I think the the hyper paper mentioned this where you can actually do reclaim all the versions from an entire table if you know that there's no transaction running right now that could ever access it so how does this work right because remember I said here's one example like there was it could be another transaction at timestamp 9 could it read a it may read B but we don't know because we don't know what queries that transaction is going to execute but in some cases if you execute your transactions as prepared statements or stored procedures then you know potentially what all possible queries it could ever execute like a prepared statement is like a predefined function that you can install in the database system set that runs some logic for transaction like and I'll have invocations of queries so you can see what all the queries are ahead of time not always but sometimes you can if everything's pre-declared so I can look at my stored procedures that are running as transactions in my system and if I know what queries they could ever possibly execute then I know what tables they're tough though they will touch and I see that they are that they're they can never touch a particular table then I can go ahead and reclaim all the versions for that entire table without doing any you know fine-grained tracking in the case of Hannah they're doing the time travel storage so for a given table all the versions are sitting around in another you know another table space so I could blow away that entire time travel storage space without having to do any you know any examination of their timestamps it's basically doing like a drop table and creating it again which is super fast again this is this is a as I said a special case or a corner case right if your transaction if your applications is invoking much of store procedures then then you can do this most systems do not do cannot do this though right and I don't think if you're doing the Delta storage or the pendulum storage where the versions are mixed together with regular tuples or as Delta records I don't think this actually would work right this is sort of special case for for Hannah as I said alright the last thing we'll talk about is how to determine whether something is reclaimable right and ideally what we want here is that in order for our system to be scalable then we we want to be able to examine what are active transactions and what are the reclaimed old versions we can deal with and we want to do is what I haven't acquired any latches so this is what I was saying before in the hyper paper they maintain a latch free link list that I can keep sorted pretty efficiently and I can use that to figure out what are my current transactions so an important concept I also understand too is that unlike in when we're actually running the queries on a snapshot isolation where we can't have any false positive or false negatives right of missing data we should actually see and somehow we miss it and for garbage collection we actually can be a bit loosey-goosey here and it's okay if we end up missing something right so if my garbage lecture runs and at that exact same moment another thread commits a transaction with a bunch of tuples that are reclaimable bunch of physical versions that are reclaimable and if you know if I end up missing that the government to end up missing that during its past of the you know that that current current invocation who cares because the next time I run around then I'll then I'll be able to see it so we can use we don't need to you know we don't need to have you know super tight protection over the critical sections of where and we decide how to reclaim things it's okay for us to maybe miss something you know at least once in the second time around we'll see it so this is now is the important part of the one of the main contributions of the hyper paper although they did not invent this interval stuff it actually comes from the from the HANA paper but it basically saying how are you going to determine whether something is reclaimable or not so for this one for the timestamp this is what they call a traditional garbage collection in the paper this is just and I know what the minimum timestamp is for all my active transactions and therefore anything less than than that timestamp is not visible to any of those active transactions and therefore I can go ahead and reclaim that the interval approach is that is when we a bit more bit more crafty and we can then now look at ranges of timestamps and identify if there's part of the version chain that are not visible at all and instead of actually waiting for to you know to take the oldest one out first and sort of prune it along in timestamp order we can actually excites out that range that's not visible reconnect the version chain and things are things are perfectly fine because everybody can see whatever you know what they're supposed to see so the tricky thing is going to be is how do we identify these ranges and how do we do the the consolidation of our virgin chain to remove those invisible ranges so let's say now we have a simple example like this your transaction one he's going to read a right transaction two is going to update a I and so now we have an older version or a one that we eventually we think we're gonna move is get this guy goes hasn't commits which it does at 25 so now we have another transaction comes along updates 30 and it commits at 35 it's not this point here a 2 is reclaimable because the only other transaction running is is the first guy here he's at timestamp 10 so he can never see 25 or 35 you can't see this a 3 either but on a snapshot solution he's not he's not allowed to so we have this this guy here that we need to go ahead and remove so if we're doing the timestamp comparison then our garbage that you can't remove a 2 because our high watermarker that the low watermark the lowest time stamp of an active transaction is 10 therefore 25 is greater than 10 so we can't remove this but if we're doing the interval-based approach then we can reclaim this because the timestamp 10 does not intersect with the lifetime range of 25 and 35 for this version 10 can never see this thing so we can go ahead and remove this so for this approach if you're doing a pen only storage this is easy to do because all I have to do now is just update if I'm going oldest and newest I can just update the pointer for a 1 and now point it to a 3 and then a 2 now is essentially missing and I can go ahead and reclaim it because everything I need to reconstruct the tuple to say what you know what is the version of this tuple at a particular timestamp it's contained in the tuple itself because again a pen only has all the attributes this is now harder to do though if you're doing Delta storage because you may not have all the attributes so let's look at an example here so say I have one tuple right at the timestamp 60 and then I have a long version chain like this so I have now my first transaction he's running at timestamp 15 so again I need to see a versions of tuples as they existed from committed transactions at timestamp 15 or less so I can read a 10 down here my other guy here he's he's going to read anything above 55 and less so that means that this basic chunk of tuples here are potentially reclaimable not 50 but all these other ones are here are reclaimable but now the problem is because we're doing Delta storage some of these have a touching attribute to some of these are touching attribute one so when we do our consolidation we certainly take need to take a union and of these different Delta records and only have the latest version or latest modification to a particular attribute in our final compacted or consolidated Delta record right so in this case here I update attribute two three times 77 88 99 so when I create now my consolidated version the latest version of attribute two is 99 and therefore I can discard these other Delta records here but this guy down here modified attribute one so I need to know that in order to understand or in order to recreate the version of this tuple at timestamp 50 so I have to record that in my record as well and then the timestamp I'm going to assign for this consolidated Delta record will be the max timestamp of what I consolidate right so again now if I you know if I come along this guy wants to read 15 times 15 he can still get to a 10 this guy wants to read a timestamp 55 he can still find this one and it will have all the information for all the Delta records that occurred after timestamp 50 here so now to install it right I have the point to that I just do a compare and swap on the version vector to now update it here again first writer wins we're treating this as this you know it's like an update but it's like an internal update so if somebody else comes up to this version vector why we're doing our consolidation then we fail and have to restart but let's say again it succeeds so now my my version vector points to this consolidated one and then I can blow away the rest of the version chain yes your question is do I require that these these things are committed again under Delta storage first writer wins the it's either this is always going to be the committed version if the version vector is null or the latest committed version is the first one everything else is already committed as well right so it's not like this is like an inflight transaction because nobody can no one can create another no two transactions can create Delta records on the same same logical tuple because the first guy would be able to succeed yes sorry yeah question is how do I actually do the union of do the consolidation of these Delta records so you would go back in time so you would say I know I have x number attributes and so I need to make sure that when I do my pass soon as I see soon as I see all all the attributes that I know that there's nothing else that that I would care about that comes after this and therefore that I you start the process it because you have to be able to say these you follow the version chain and say this is what I'm going to reclaim but you don't need to you're not updating the Delta record with a new Delta record with that information so we did this as a class project last year I'm trying to remember why it was tricky I think it's something in the ordering of reclaiming variable link data that makes this challenging because you would have like this is showing the string embedded in the Delta record but if it's a large string then it's a this is actually a pointer to the variable link pool and I think we got into trouble of trying to get the order of that correct but this would be something we could potentially explore again this semester the it really if you know if you sort of do this project it would really force you to think through like this verging information and what are all the corner cases like this is obviously an over oversimplification of the problem so okay yes your question is for doing garbage question on Delta records do we have to do this consolidation of compaction first and then we can clean some no so how would you know actually one so if say I remove thread one right all I care about like if I know that nobody else can read anything that comes after this then they're always going to read the version up here it's I'm only doing this consolidation because I know that this guy here could read this and you didn't know what was the version that came after that right if I don't if I'm I don't need to do this interval consolidation it's a nice thing to have but you don't need it for correctness right because without it otherwise without it I have to wait for this guy to commit or be go away before I can then prune everything else yes yeah yeah so this question is do I really need to consolidate this couldn't I just yeah remove these guys keep this one and would that be enough yeah that would actually still work to it would need the more point no because I think what he's saying is right so we're back here so instead of consolidating into this new guy I keep this and I can remove these guys because those are ever written by this and all I do is now update this pointer now to this that would work oh yeah so yes when I when I scan I want to get back to see what the correct version is yes yeah yeah I don't think the paper covers like you can obviously devise different synthetic benchmarks to exercise their different update patterns I don't know what real work was actually doing you know say what one way is better than another when we did this last year I think we we created a new Delta record and then the tricky thing was you create the new Delta record yeah the tricky thing is yeah remember this the the issue was going to be this these things are sitting in the the each of the top records are sitting in the thread local memory for the transactions that ran them but then now the problem is going to be and it would be normally only the thread would reclaim that memory but now you have this consolidation thread in the background wants to be able to reclaim things and you have to like take latches on protect the memory space and it actually made up things go slower I don't know the exact details of it we have notes from before okay all right so the the next thing with the sort of finish up with the discussion is in all these examples it's sort of obvious that yes reclaiming memories a good idea right we want to get you know if it's we don't need the data we should go ahead and free up the space but now the question is what do we actually do with that memory we just freed up right because say I have a I insert a billion tuples and then I delete a billion tuples what should happen right should I give back the OS that the memory for that one billion tuples or should I get back some of it or just keep it all myself I think it's actually want to be in the middle right I think you don't want to keep some of it but you don't want to give some back right because I people would end up thinking their system is broken if I insert a billion things then delete a billion things and then the memory usage doesn't go down right so for the verilink data data pool we can always just reuse the memory spaces right because we're actually doing a bin packing problem to find you know for everything new actually we need to store in a verilink data pool we just find a free slot and put it in there for the fixed link data slots it's a bit more tricky because if we start reusing the slots for tuples you know as we need them as we claim versions then that could end up causing us the the sort of the temporal aspect of the temporal dimension of our data is now sort of randomized so what I mean with that so like say I have a I have an application where most of the time people touch the latest data being like reddit or hacker news mostly will only comment on the latest posts right nobody goes back five months ago I don't think it will even let you and tries to update you know write a comment on a post from five months ago so that means that now if I am if I ignore the multi versioning stuff if as I create new tuples for all these comments on these articles they're they're going to approximately going to be located close to each other and have the same time to say I'm inserting new comment sort of going chronic chronologically they're all going to be related together or close together in the same amount of time since when they were created right it's not like I'm going to go try to insert a comment from an article five months ago and now that's interspersed with with articles from from today and so the reason why that matters is because in all to be workloads the probability that you're that tuple is going to be updated is depends on the sort of the last time there's access to when it was inserted so if my my first version was inserted today the probability I'm going to update it is is higher today than it will be five months from now because most of times you only can only update the the latest things so if now within my blocks of data if they're all roughly around the same you know create at the same time right the versions are created at the same time then I know they're gonna have the same probability that they're going to be updated and therefore it get potentially invalidated and so therefore if I have data that's from five months ago on one block I can now compress it make it read only and not worry about having to go uncompress it to update something and then rerun the compression scheme all over again does that make sense like if data is located together they're all created at the same time therefore it's all going to be updated with the same probability then I can use compression now to come to reduce the size of that block of data and not worry about having to rerun compression later on so you don't want to compress the newer things you want to press the older things because the older things are going to be read only so this is sort of the two issues divide if I reuse the slot any way that I want as I create new versions and reclaim old versions then in my physical layout on memory it's going to be randomized like some some stuff will be new some stuff will be old and now if I go to compress it if there's an intermix with old new if I compress that block of data then the there's something in there that could be updated and then I think you know do compression all over again so we lose that temporal locality we could just leave leave it alone and basically have these slots unused these holes in these that in the slots and this is bad because end up you know we end up how much of holes and space that we can't we can't use or but we're still allocated that memory so at some later point we have to go back and do compaction and and sort of consolidate multiple blocks of the bunch of holes and you know combine them together so they get we get better we get better utilization and less fragmentation there's a third approach that's sort of tied to this like when you do truncate the command truncate truncate basically delete without a wear clause but you can special case special case it because instead of having to scan through and delete tuples and basically examine them with without a wear clause and set them to delete the easy way to do truncation is just to drop table and then recreate it blow away all the indexes blow away the tables ways and just recreate it and that way you don't worry about any of this locality information or look at locality attributes it just be start from scratch all over again but truncate truncate is a special case so now to do the compaction that I was talking about yes sorry the question is for the for the the five month example why am I assuming that I would not that would compact them all together and not delete them all together you could be you would do both so like there's some websites where they only expose like the last 90 days of data to you and essentially they're doing like a TTL or deleting data as it gets older in that case like if all the data is in a block as it has was created within the same time stamp or the same time range when I do that pruning the whole block it's blown away and then I can recycle it if it's all intermixed then I end up deleting some tuples that are five months old and some other tuples that are three months old they have to stay now but I now have a bunch of holes yeah again for compression assuming we want to keep all their data compression is we want we want to block a data that is never going to be updated again we could still do it if we need to we have to support that but it's unlikely to be updated so therefore we use heavyweight compression to reduce its size and then if anybody tries to update it well we'll take care of it we want to avoid that so if all it's all the same size it won't be updated again or it's all created around the same time it won't be updated again in the future it's a compaction again it's basic idea of refining these empty blocks so blocks that have holes in them all right this one's half full this one's half full rather than having two half full blocks I combined them together into one full block so ideally again this is what I've said before if the if the tuples are likely to be accessed together in the same window time we want to put them in the same block because when we compress them you know the likelihood that one of them be updated in the other ones won't be will be low right there's another technique we can talk about later on the semester is like if we now also know that this this data is very unlikely not only is it not likely to be updated it's also unlikely to be read then we can start shoving it out the disc right and start saving memory space we still keep track of some information in memory so that if you try to read it we'll go fetch it from disk but you know it's still but it's the primary locations out on this and the idea here is that at the beginning of the semester I said like we want to be in memory assume everything's in memory and we can run fast but this is sort of bringing back the disk in an intelligent way and say well some data can be shoved out most of the time we're going to be in memory but if we never spilled a disc we can handle that I think we covered that on the the second or last lecture now the again it's like we're not bringing back our buffer pole because that's slow it's just sort of secondary storage all right so the three ways we can figure out how to do what's compact right we can just go look at the timestamp since the last the tuple was last updated we already had that information because we're storing the begin timestamp every single tuple right and we just look at that and say well these things are all roughly around the same time so we go ahead and use that to figure out that we can go ahead and remove things or answer yeah to consolidate in a faction the other approach is to look at the last time it was accessed right because again it's if a tuple was accessed now at some timestamp more read the more recent it was accessed the likelihood to be accessed again in the near future it's greater than if it was accessed a long time ago right so it has a decay effect for this one if you're doing timestamp ordering the basic timestamp warning protocol with a read timestamp or you're recording every time the thing was read then you can use that otherwise you have to maintain some additional metadata maybe at a block level because it'd be too expensive to maintain this the access timestamp on a per tuple basis because now everything will read is going to turn into a right because you have to update this this timestamp a third approach which is far as I know nobody actually does this but a bunch of people want to do this is if you can infer some information about what the data actually how data is related to each other across tables or within the same table then you can then maybe combine or come back together things that are going to be accessed together more frequently in the same block because then you can apply the same compaction or compression scheme you want on that so foreign key would be an obvious example of this if I know if I have two tables that have a foreign key then the likelihood that I'm going to access the parent table and then follow the foreign key and get good at the child table is very high so maybe I want to put those guys together close together in memory so I can do compaction and compression together or another another example I heard once was it was an order processing system where they wanted to keep everything in memory but then shoved shoved cold orders out to disk but if the order status of this of this of this record of this order was was marked as open which could be open for months at some later point they're going to come and access it again so if I can learn that if the order status equals open keep it in memory or keep it in these locations anything that order status close got shoved off to another location then I can do this kind of optimizations okay all right in the sake of time I think we already talked about truncate right basically truncate is like a delete without a where clause delete everything run your garbage collection when once you know everything is is not visible and then recreate the table so we'll talk about this in a week or two about transactional catalogs so the catalog again is storing the metadata about what tables exist what what what attributes or columns that they have so if your catalog is transactional meaning if I call drop table then anybody that has a timestamp before my drop table transaction can still see the table and then once they're all gone I can reclaim the space so if this is all transactional then doing this doing this this truncation approach is super easy same thing actually with compaction right compaction is what compaction is taking two blocks and combining them together we're just moving the physical location of those tuples in those blocks into a new location so it's like it's like a it's like a delete followed by an insert so if I can do that transactionally then I don't worry about any false negative false positives from transactions that are running at the same time I'm running compaction so in our system our catalog is entirely transactional and it makes all this stuff easy we just we just haven't done it all the up all right so to finish up again the this is just more of the classic trade-off we see in computer science and in databases of storage versus compute so you can be more aggressive in doing a garbage collection and reclaim more memory space but that's going to slow down our transactions right or we let our versions our old versions accumulate and that could you know save our computational cycles but we're spending more memory or spending more storage space to handle this right and so in talking with people that are running infc systems especially in memory systems everybody is willing to pay a penalty for performance in exchange reducing the memory footprint and I think that's why what's more expensive Ram Ram's more expensive not only more expensive to buy also more expensive to maintain because you have to pay energy for this so if I can run my database a little bit slower but use a lot less memory then I can that can be a big call savings so that's why I like the the hyper approach in the paper you guys read from steam because they're running the garbage collection as they're running queries the queries run slower but they're reclaiming things as soon as possible and reducing the memory footprint okay any questions what you see all right tips for profile okay so I assuming I don't know at some point you might have covered this kind of information I don't think 213 covers this but this is a reoccurring theme throughout the semester like all right we want to determine whether our system is running slow and then if slow if if so why so let's look at a really simple example here I have two functions foo and bar my program all right so I want to I want to speed it up I want to figure out how to get why these guys are running slower and what can I do to figure out you know how to make them a faster so a really stupid way to do this would be our naive way to do this would be run our program in in a debugger gdb or whatever LLDB for for Apple and every so often we just hit pause on the program look at the stack trace figure out what function it's in and just record that and every so often keep hitting pause and according to information and then over time I would have information about what what function was being called the most right it's stupid but it would work right so let's say that I did this I got 10 called call stack samples and then six out of the 10 times I did that pause and looked they were in the function foo right so we would know that based on our measurements that roughly 60 60 percent of our time of our program was spent in this foo function now obviously the accuracy of this of this calculation would increase the more time I hit pause like if I get like a little motor hit pause over and over again I could automate this right I could get more samples and they have a more you know more accurate measurement right so now say we're most the time we're running foo and so this should be our target of what we should optimize first right we have two functions foo and bar we should optimize foo first because most of their time is being spent in that so let's say now we're able to make foo run two times faster what would be the expected overall improvement of our system who here is a scene Omdal's law before less than half okay so Omdal's law is a way to calculate what the expected improvement will be if we know what percentage of our program is spent in a particular function or different part of the code so if we make this thing go 2% faster or sorry 2x faster and we know that 60% of time is spent in foo then we can cut this time in half but we're still spending 4% of our time in this function bar so Omdal's law gives us this nice formula that would tell us the overall expected speed up based on our distribution of the functions that we're invoking and the speed up we expect to get right we run replug and chug the math and we'd say that we would get a 1.4 times improvement so even though I made that function which is called the majority of the time 2x faster the overall system speed up is only 1.4x so we can use Omdal's law as a way to figure out when we look at our look at our database system where are we spending our time and how much effort is it going to be to make that particular function go faster and we can then do a back of the envelope calculation decide what should be the improvement we expect to see and it's sometimes useful to do this when you're doing as you're doing you know performance profiling and development because if my function is not that call that often then I probably don't want to improve it but it's not how many times it's called it has how much time I'm spending in it so I can use that to figure out what are the sort of the high poles in the tent what are the what are the parts of the system that we're spending most of our time in and those are the things we should target to improve it but as we assess how to improve it we can use this formula to decide well I think I can get it to be 2x faster what is the real benefit I'm going to get from that and sort of it's an old adage in systems you want to avoid premature optimizations so yes there might be a fancy new lock-free algorithm we could use for some part of the system all right that could be better than what we have now but at that part of the code is not executed that often then we're just wasting our time there's other things we should worry about all right so now we need to figure out how can we actually get this information so my little example of hitting pause over and over again on my keyboard is stupid right but there's actually real tools that I'll do this for us so the two main ones that we're going to focus on is Valgrind and call grind and perf so Valgrind is it is a heavyweight instrumentation framework that is basically going to inject inject things into the binary as it's running the laws that collect information about what part of the system that that it's in and so this will make your program run slower but you know this will be it would be any sort of sampling otherwise it would go too slow but like this will actually show you yeah this will tell you within at the code level where you're spending your time this will sort of say things at like a hardware level so perf is a way to get out the the low level performance counters from that Intel provides on x86 and record things like how many cycles I'm spending on like individual lines of assembly instruction or code so both of these can be used for different things this one will give you lower level information like cache misses cycle counts whereas this one tells you like instruction counts for for but this time instructions counts because it doesn't know how long it actually spent in the CPU whereas perf will give you give you that so you actually want to use both but I'll give a demo of perf all right so again Valgrind is actually a toolkit a bunch of other things we can use memcheck is a way to check for memory errors this thing what this is what the original version of Valgrind was this is checked for things like if you if you have memory leaks of things like that it'll find those things we use address sanitizers as part of Clang or GCC from Google and it's a more lightweight of identifying these things call grind it will show you where you're spending your time in the source code and then massive is a way to show you sort of within the total heap space of the address space of your process what parts of the you know of the program is allocating the most amount of memory now obviously for us the database portion will always be larger the largest but within that we can drill it out and say like how much time is you know how much space we're spending for the data structures to keep track of things and and indexes and other stuff right when we when we first built the BW tree it was allocating a little memory when you turn the thing on and when you looked at massive like even though you put zero data in onto the table the it would show this huge block of memory being spent for the BW tree because it allocated a mapping table for no reason or without actually using it yet right so let's see how to use call grind so for call grind I I think you need to have root privileges because you're instrumenting the binary and so you can use this on the command line for this one I'm there's a instructions on the wiki how to compile in release mode without asserts but maintain the debug symbols so that when you look at the the trace in call grind or kcash grind it'll show you like here's the source code or here you know here's the function call here's the line number in the source code that maintains the information so this compiling under this mode will will maintain all that simple information without all the extra asserts that will slow the system down so it'll be close as you can get to actually running the real system the other thing too is here this is a I pushed this to the the project one branch last night but now you can pass in through an environment variable how many threads you want to run the benchmark with so by default if you just run slot iterator benchmark it'll give you one thread you can set this to say how many how many threads you want to use remember for identifying with a bottleneck is it won't show up if you have one thread it'll show up when you have more threads right it'll be really obvious so you want to run with a higher thread count and so so you run this for a bit and then it'll spit out this call grind file and then there's tools like kcash grind that'll give you a nice visualization and breakdown of what's calling one so this is actually from the benchmark that we gave students last year but the high-level ID is the same so in here you can see the committee distribution time where at sort of at the top of the call stack within within the system you know what percentage of the time you're spending in these different function calls and then this is a nice call graph you that you can drill down and say well for this function it invoked this function you know a million times but I spent 0.34 percent of my total time in this I spent nine percent here but I looked at nine million times so you can drill down and see like what function I'm actually spending the most time on right but again this is showing you this is like invocations and just a wall clock time it's not going to show you that the cycle count which is going to be useful as well alright so perfect again that's collecting the low-level performance counters that for from from Linux that the XATX provides you I think Mac should still work I think it should I don't know max I haven't tried but this should still work if it doesn't we'll figure something out but so basically there's a bunch of different things you events you can collect from the from the hardware about your process that's running and then you can set how often you want to sample so this is going to sample every 2,000 occurrences of the event and then as you run it's going to record this in this perf.data file that gets generated when you when you know after your program runs so for this one this one you definitely have to run under root permissions because you're asking at low-level hardware get these hardware counters and it's something that the OS normally doesn't provide you because you can do it for any any process that's running right so basically what happens is like there's these internal counters in the hardware and then when they go to a certain amount right based on on the on the event counter then it records a sample right and it contains all the information about about the individual lines of code and actually at the assembly level of what called what and again you want to compile this with release mode with debug symbols so that when you look at the perf output it'll show you the actual lines of code yes what'll happen is if you run the lower we can do a demo if we have time on a lower core count what the bottleneck is may not appear because everyone's not hitting it on the higher core count it definitely shows up and it'd be really obvious you can try it on your laptop first I mean up I'll show you right now actually what it is it's not it's a lash right it's not not a mystery right in what in this lot anyway yes yes right so anyway so you run this if you run perf I think I tried this this morning if you run with a higher core count higher thread count it takes a long time and the benchmark won't even finish and the file is like 10 gigs so you I'll show you can kill it off early and then there's nice third-party visualization tools like hot spot it's for Linux I don't know if it works on a Mac that I'll give you like flame graphs and show you things that are more better a bit easier to read than like the perf tool but let me give you a quick demo right so this is again I compiled this with debug info this is just H top of the bottom this is the machine we have in the lab with 40 cores so I've already set the environment variable actually I did not do that I mean look back answer so again I setting the environment variable terrier benchmark threads to 16 and that'll guarantee that that'll make it run with 16 threads again here I'm not gonna run the slot I'm gonna run perf the slot editor benchmark I'm gonna record cycle counts and I'm gonna collect samples every 2000 and then just this is the binary that I'm gonna run right and then this is just showing me at the bottom right that it they shot that I'm using I'm using 16 cores as expected right so again I'm maxing out at a hundred percent because there's a bottleneck in our system where they're trying to do something over and over again but they're not actually getting making for progress they're not actually getting work done so even though it looks like my utilization is fantastic it's actually crap because it's I'm stuck on this bottleneck so again this could run forever a run for a long time and and take a while so you go at control C and kill it it'll tell you how many times that it it got woken up for some data and then the perf trace was 32 megs again for only one a few seconds it got to 32 megs so now I can run I can run perf report actually here so in in my directory it'll generate this perf data file right so after you run perf it'll spit that out so if I run perf report again you have to be root for this right it'll process the file and then give you a give you a list of where you're spending all your time for in your program because again I ran with debug symbols on I can see everything so lo and behold I look at the very top for the slot iterator benchmark I'm spending 48 49% of my time in some function here and then 45% of my time in the other function here so the rest is just where it actually did work so this is this is obviously the bottleneck so what you can do is you can drill down perf has a nice tool you can drill down into this you say annotate the slot iterator plus-plus operator and then now I see source code and I can drill down see that I'm spending 94% of my time for this function in this test operator and because what am I doing I'm trying to try to acquire a spin latch so I'm spinning on this latch trying to acquire it and that's why I spent 94% of my time on that right that's the bottleneck and I can go back I think it's escape and see the other one annotate now I'm spending 91% of my time on the same on the same spin latch TVB is thread building blocks this is the library we use in Intel to provide us the spin latch it's actually a pretty good implementation but again it's like anything like if you use it incorrectly that you're gonna have problems so even though it's a great implementation of a spin latch if you have it and a contention point it's gonna be slow okay so I will send out a link there's this great video it's like an hour long about how to use perf a bit further and this is the same I showed you there's a nice tools that will give you flame grass like this but for this one it's pretty obvious okay and there's much other events you can get back much of links here all right so next class we'll pick up on indexes we'll spend a lot of time beginning talk about t-trees because they're from interesting from my historical perspective we'll spend most of our time talking about a latch tree index the be the BW tree from Microsoft and then we'll talk about how to do version latching for B plus treats okay any questions