 This is the second lecture on multivirtual current control. Remember I said last class that the part one of the MPCC lectures was to discuss the high level ideas of the protocol. And then today's lecture is now we want to get down into the weeds and talk about how it's actually implemented in a real system. So for today's lecture, I'm going to start off talking about with hackathon. And then we'll talk about how Hyper does MPCC. And then we'll finish off with the CMU Cicada system. So I had this real big debate with myself over the weekend and figure out how exactly I wanted to present this. Because I realized this is the paper I had you guys read. But in order to understand why this system does certain things, you kind of understand the things that came before it. So I decided to have you guys read this anyway because I think it's a really good paper. And it just came out in 2017. But I'll cover these two other ones as well. And then you'll see why there's certain design decisions that they make in Cicada that overcome some of the deficiencies, in particular with hackathon, in their implementation. And then as promised for real, I'll go through and actually discuss project two for real, OK? So the first thing I'm going to discuss is hackathon. And I mentioned hackathon a little bit earlier, I think in the second lecture or the third lecture when we were talking about query compilation. And so hackathon is this influential system that came out of Microsoft that was led by two really awesome database people, Paul Larson and Mike Zwilling. Paul Larson was at Microsoft Research. He's been involved in a lot of major database advancements the last 20, 30 years, like he vented linear hashing in the 1980s. And then Mike Zwilling, I mentioned, was the guy who was at University of Wisconsin and helped write the shore system that you guys read in the Looking Glass paper. And then he was later hired by Microsoft to lead the team or be part of the team that was going to port CyBase to Windows NT. So SQL Server was originally based on the CyBase code. And then Microsoft had it, they bought out the license, allow them to fork it and make a bunch of changes and actually sell it as a separate product. So the legacy of SQL Server is that it came from CyBase. But certainly now, SQL Server has vastly surpassed the capabilities and performance of CyBase. Like CyBase is, and they had a new version came out last year, but it's SQL Server is, I consider, recutting edge. So in 2008, they started this new internal project, the secret project at Microsoft to build a new in-memory OLTP engine for Microsoft SQL Server. So the way to sort of think about this database engine is that in the same way with MySQL, you can replace NODV with RocksDB or other engines or MongoDB, you can replace WireTiger with RocksDB. So they essentially wanted to do the same thing. They wanted to have the ability to plug and play different database engines all within the same SQL Server system. And they had to do this because they had two key design constraints that they had to deal with when they were building Hecaton and how they, and this influenced how they design their MVCC protocol. So the first is that they had to have their new engine integrate seamlessly with the rest of the SQL Server system, right? That means they couldn't make a brand new system and sell it as a separate product, right? And the reason is because there was this already this vast SQL Server egress system that people were using that you didn't want to replicate, right? Things were like administrative tools or like crystal reports and other utilities that already existed for SQL Server. So you put out a new system, you didn't want to have to replace all those things. So they had to make sure that Hecaton fit with sort of slide in seamlessly with SQL Server and used all the existing stuff. And then the second issue they had to deal with is that they need to make sure that the new engine provided predictable performance for all O2 workloads. So what I mean by that is they couldn't have a new engine where for 90% of the applications you got much, much better performance. But then for some, you know, unbucky 10%, you got worse performance than you would with regular, than you would with regular SQL Server. And so you saw this behavior in the silo paper when I showed that one graph with the partition version of silo, which is essentially based on the ASTRO or VoltDB protocol, where if all your transactions are single partitioned, it got amazing performance. But then if you went beyond like 15%, then the performance actually would start to degrade. So because of that, they couldn't choose to do the same kind of protocol or conventional method that ASTRO or VoltDB was using because it would be really hard to sell a product where 90% of your customers are gonna do awesome and an unlucky 10% were gonna do worse, right? Like you couldn't sell a product where 90% of people are gonna be fine, then 10% of people get cancer, right? You can't sell that. So, especially in a database world too where the customer wouldn't know whether they were the unlucky 10% until they actually bought the new system. So because of that, they're gonna forgo the sort of optimal performance that you can get for if you have everything be single partitioned as you can get with ASTRO or in exchange for having more reliable, more predictable performance. So let's go through how they do MVCC. So one difference we're gonna see here then the previous protocols that we talked about before is that now transactions are not only not just gonna have a single timestamp. So we saw this in the basic timestamp ordering protocol, the transaction was assigned timestamp when it started. And we saw this with OCC where transaction was assigned timestamp after it got past the validation phase. Now in Hecaton and their version of MVCC, each transaction is gonna give it a timestamp when they begin and then they'll be assigned another one later when they commit. And then we use these two timestamps to figure out the visibility of the various tuples. So just like how transactions have two timestamps, the tuples themselves also have two timestamps. So the begin timestamp will correspond to either the begin timestamp of the actual transaction that has created it or the end timestamp of the committed transaction that also created it. And then the end timestamp is the combination of either the begin timestamp, the extra transaction that created the next version, thereby invalidating this one or infinity meeting that transaction has a committed yet or the end timestamp of the committed transaction. So the tuples themselves inside their timestamps, they're either gonna be the begin timestamp of transactions or the end timestamps. And we'll still be able to figure out based on these timestamps whether a tuple is as visible to us or not. So let's look at an example and hope this will be more clear. So let's say we have a really simple table, has two attributes and we're gonna have a begin and end timestamp in the tuple and then because Hecaton is doing a pendulum storage going from oldest to newest, we have to have a pointer that says here's the next version, or here's the next version in my version chain here. All right, so let's say we have a simple transaction that wants to do a read followed by a write. So if it calls begin to start the transaction and then it's assigned a begin timestamp and the way Hecaton is allocating timestamps, they're doing the shared counter that they'll just increment using a compare and swap. So now I wanna do a read. I'm gonna find a record where it actually equals John. So I'll do my look up at my index. It'll follow down to the oldest version in my version chain for this attribute. In this case here, I landed this tuple and now I can do my comparison to see whether my transaction begin timestamp falls in between the begin and end of this tuple, right? So 25 is not in between 10 and 20. So we know we don't want this version. So we follow the version chain, land down here and 25 is in between 20 and infinity. So we know that we can read this. So now we wanna do an update. So same thing, we land, we go in our index, we do our look up, we land down here. And so what we're gonna do is we're gonna replace the end timestamp of this tuple with a special timestamp for our transaction. So here I'm saying transaction 25. So what Hecaton's gonna do, all the timestamps are gonna reserve a single bit that allows it to specify whether it is an end timestamp of a transaction or for a transaction that's already committed or a begin timestamp for a transaction that's still running. So we take our begin timestamp, we flip a single bit at the most significant slot and that tells us that this timestamp corresponds to an actor transaction. So we don't have to do a separate lookup to figure out whether it's actually true. You may have to do a separate lookup to see whether this transaction is committed or not. But if that bit is not set, then you know the transaction is not active. So they're reducing the extra lookup you have to do to see whether this version was created by a transaction that was committed by setting that single bit. So now I can go ahead and create my new version, right? Same thing, and begin timestamp, I'm gonna put in my special transaction, begin timestamp with that bit flipped. And then I'm setting the end timestamp to be infinity to say that this is the newest version, right? And then we update the version chain and then we can go ahead and commit our transaction and now I get my end timestamp here. And then what I'm gonna have to do is go back to every transaction, every version that I read or modified as part of creating a new version. So the old version and the new version. And now I need to go flip the placeholder transaction ID that I had in these timestamps here, flip them to my commit timestamp. And at this point, then the transaction is considered to be fully committed. Watch this take you back. Every time we got past the validation phase, then we get our end timestamp and then we go back and now update these timestamps to say that this is the final timestamp of this transaction. So is this clear, right? So this looks a lot different than what we saw when we talk about basic timestamp ordering or MVCC before because now my transaction has two timestamps and I'm setting this little bit to say that this is an active transaction, right? I was sort of hand wavy before when I showed the example of how basic MVCC or timestamp ordering MVCC worked and I mentioned that, oh, well, you hold these latches for these tuples, hold these write locks, but you still may be able to read them. Then you have to go check to see whether the transaction is actually committed or not. So this is what I'm talking about here. By using this little bit, the mark that this is from an active transaction, I know I have to go check some other additional data structure to figure out whether this transaction has committed or not. But once I get here, now that bit is flipped to zero, so I know the transaction is committed and I don't have to do that extra check. Yes? Before it can mention section 35, if here comes like three jobs, like what would be of those like the first record, right? So the question is what happens if another transaction comes along and tries to read this? Yep, before it can mention something. Right, good, excellent. So that's my next example. I'm gonna rewind. I didn't know how else to show this, right? I'm gonna rewind and now go back to exactly what you said here, right? So transaction has committed yet and now we have another transaction that comes in and wants to do a read and a write. So this transaction starts, it's begin timestamp as 30 and it wants to do a read on John. So the same thing, we hop on our index, we check here, we're not in the range, we check here, we're also not in the range. So this again, we know that this timestamp can be treated as 25, right? Just as a regular timestamp would. We just had this little bit to say, oh, but it's also from an active transaction. So here, we can't read in this version. So then we land down here. So in Hecaton, they're gonna allow speculative reads. So this transaction is gonna be allowed to read this version created by this transaction, even though this transaction has not committed yet. And then later on, when we talk about validation, it has to go through and say, well, what data did I read and did those transactions actually commit, right? Because otherwise you would violate serialized order because you'd be reading things, you'd be doing dirty reads, right? So there's a big difference between the regular timestamp ordering MPCC that we talked about last class, we're allowing transactions to do speculative reads. But then we pay a cost in terms of additional metadata to keep track of what we actually read. All right, so now let's say this transaction wants to do a write, right? And wants to update that same record, John. So it would be happy to try to create a new version here, but it's not gonna be able to because this transaction bit essentially going to be treated as a write lock on this physical version. And in Hecaton, the first writer wins. So this transaction will be immediately aborted. So it would come along the version chain, find this, see the begin timestamp, has that bit flipped, meaning I know that this was created by transaction that has not committed yet, and therefore I can't create a new version. My transaction, this second transaction here immediately gets aborted and rolled back. So in Hecaton, they're gonna allow for speculative reads, but not for speculative writes. The first writer always wins. Anything that comes after it tries to write a new version for the same logical tuple will always get aborted. All right, is this clear? Okay. So we talked about this before at a high level, but now we can go to the more detail of what this global map is we need to maintain to keep track of what's going on with all the extra transactions. So essentially there's a global hash map inside the database system that keeps track of every single transaction and their state. So things are sort of obvious, like if your state is active, it means that you're still in the process of reading and writing, you haven't tried to commit yet. If you're in the validating state, it means the application has told the database server, I wanna commit this transaction, start the validation process, and you go through the normal OCC checking that we talked about before, right? And then if you get past this, now your state gets flipped to committed. So the transaction has finished. I'm not talking about logging here, but presumably we've already logged its changes out the disk so we know everything's durable. We can tell the application at this point that our transaction is committed. So we can release, assuming we didn't read data from, actually you would not get past validating if you read speculative reads check to say, you have to wait until to see whether the transactions you read data from has actually committed. So at this point here, you can release everything back to the application, which you may have not gone back and updated all those timestamps that we set in our different versions to remove that bit to say that our transaction was active. So this is why I was saying before when you're at here, if you're reading this, the transaction may have actually committed, you can check that global hash map and say, well, is this transaction actually active or is that actually finished? So you would check this, and then that way if you know that it's actually committed, then you know you can go ahead and proceed with reading anything it wrote without having to check later on when you go to validate. And at the very end, when you get determined in, this basically just says that there's no other transaction that could be reading, that we have to care about, there's no other transaction that is waiting for us to commit, and then we've successfully updated all our different versions. And then there's a garbage collection process you can go through and start cleaning things up. Right? Yes. What's the correct order that another friend is validating or when trying to validate you, when they access the two codes? Your question is what's the correct order? Yeah, so you mentioned to some ways that the two codes, one is the global hash map that stores the state of transactions. Yes. When one thread has had updated the two codes, and then another thread is coming and trying to update that two code, is it going to check the version of the two code or is it going to check the global hash map? In this case here? Yes. All right, so I would come here, I would see that the begin timestamp for this version. So this transaction essentially wants to create another version here. He would come along and see that this transaction has begin timestamp. It is, the active bit is set to true. Then you go do a lookup and see whether, you could do a lookup and see whether it actually has committed or not. If not, then you just kill yourself immediately, right? First writer always wins. Okay, so what is some other metadata? So what is the metadata we need to have on a per transaction basis to figure out whether it's actually safe to commit or validate our transaction? So this is going to be important later on when we start discussing Hyper because Hyper is not going to have to maintain all this extra stuff that Hecaton does, and Hyper is going to choose to record less information because it wants to be optimized for OLAP queries, whereas in Hecaton, they're trying to optimize for OLTP queries. So Hyper is still going to support OLTP workloads, but for OLAP queries, some of these things as we'll see later on becomes expensive. So they're going to have the read and write set for every transaction. So the read set is basically for every single tuple that I read, like the physical version that's visible to me, so I don't care whether if I go along the chain, I read a bunch of versions that are not visible to me. It's the one I land on that actually would be exposed to the application. I need to record a pointer to all those different versions or just all to that version in my read set. The write set you need to maintain pointers to different versions based on what you did. So if you updated an existing version, you need to have the old and new, because you're flipping those timestamps to keep track of what's visible, so you need to know how to go find them again. If you delete, you just need to keep to the old version and they put a little tombstone at the end of the version chain and say this thing has been deleted. And if you insert a new one, obviously you just point to your own inversion there. But two additional things we're going to have to maintain are the scan set and the commit dependencies. So commit dependencies are sort of obvious, right? It's the list of all the transactions that we know are waiting for us to commit. So if one transaction does a speculative read on data that we wrote, then they will add themselves to our commit dependency list. And then when we commit, then we just go through the list and notify them sort of like in a PubSub manner to say, hey, I've committed. Do whatever it is you need to do. The scan set though is a bit tricky. The scan set is going to be all the information we need in order to re-execute any scans that happened while our transaction ran or any query that transaction executed. And this is different than the read set because the read set is like, here's the individual versions that I read or the individual tuples that I read. The scan set is like the where clause of a query that was executed. And we need this because we need to go back and re-execute those scans to figure out whether anybody inserted it or deleted something in the range of the things that we scanned. So I'm not gonna really get into the storage architecture too much of how Hecaton works, but essentially the tables are organized as hash tables. So the buckets in your hash table contain all the tuples, right? So in Hecaton, they don't actually support just scanning the raw tables. You have to always walk through the hash table. And so therefore you need to know what all the range queries you did on that hash table in order to figure out later on whether there was any phantoms. So this is essentially the same thing we saw in Silo, right? Silo had to maintain the scan sets and then when we want to validate our transaction, we have to do that again. So that's essentially what I said here. So in order to do transaction validation, there's two steps that we're gonna need to do, right? In addition to the, you know, well, there's no right, right conflicts because if I try to write to the same tuple as somebody else, I get a board. It's not the word about that. So it's really the rewrite conflicts we have to deal with. So the first check we have to do is called read stability. And it's basically as we go back and see whether any version of a tuple we read is still considered visible by the end of our transaction. So we have to do this because we have a spec that it reads. So we may end up reading a version and that version may end up getting rolled back because the transaction that created it would get aborted later on. So we have to go back and do additional check, additional read to see whether we, that version is still visible to us. And then to avoid phantoms, we essentially have to re-scan or re-execute all the scans we ran before again to see whether the results set changes from one time, the first time we ran it, first the second time we ran it. Right, so you may think this kind of sucks, right? Because this could be really expensive. The way to sort of think about this is like if you have a complex query that computes some kind of aggregation or invokes a UDF that computes the 20th digit to pi or whatever, I guess the millionth digit to pi, you don't really have to re-execute all that. Whatever's in the projection list or select list, you don't re-execute. It's really just the where clause I have to reapply. But then I go back to the actual court and the raw data itself, the actual hash table for the table and re-execute the scan to see whether I get a different result. So that part isn't so bad, but if you scan the entire table, this is obviously be a really bad thing to do. So Hecaton is not designed for doing analytic queries, right, OLAP queries, it's a pure OLTP engine. So now the amount of validation you have to do for these two additional steps depends on what isolation level you're running at. For serializable isolation level, you have to do both. For repeatable read, you just need to do read stability checks. For snapshot isolation and recommitted, you essentially don't have to do any of these because the SAP's not isolation by itself guarantees these things, right? In the case of read stability, I can't read anything that, I couldn't have read anything from a transaction that was active when I started. So I'm guaranteed to have a consistent snapshot of the database. So I don't have to check to see whether the version I read was still visible when I go to validate because if it wasn't visible, I wouldn't have read it in the first place, right? So that's sort of obvious. And for recommitted, essentially, it's the same thing. All right, so I'm gonna show one graph, performance measurement they have from their paper. And for this, they're comparing their multi-version Optimistic Occurring to Show protocol. So if you say, oh, we use the Hackathon protocol, you essentially mean this. The paper, as it's written, sort of goes on describing this protocol to start with and then they have another section that comes after this. This is, oh, by the way, here's how to do basically the same thing but in a pessimistic manner. So the Optimistic protocol is the multi-version OCC that's doing all the things that we talked about before with the phantom checks and the restability checks. And then their pessimistic version essentially is strict two-phase locking. Well, you share an exclusive locks on the records as well as the hash table buckets. So this is one way they're gonna avoid phantoms is that all the versions the data is sort of logic ordered in these blocks and I know how to lock a block to prevent somebody else from inserting an entry that I scanned, right? Because you know where the insert would go. So you don't need any additional validation and they're really doing deadlock detection-based two-phase locking. So we need a separate thread in the background and periodically go look at the wait for graph and decide whether two threads are deadlocked in each other and then break them up. So this is the one graph I wanna show you here. So this is a really small table with only 1,000 tuples and they're doing 80% read-only transactions and 20% update transactions. And they're scaling across the x-axis, the numbered threads that they're gonna use to allocate transactions or execute transactions. So up to around six threads, the two are essentially the same but then beyond that we see that the Optimistic version of Hecaton performs the best, right? So you may think over here the gap of 24 threads isn't that much but you gotta look at the y-axis scale. So they're executing 1.5 million transactions per second with the Optimistic version and then for the pessimistic version with two-phase locking they're executing just a little bit over 1.1 million. So the difference between this line and this line at this part over here is like 330,000 transactions a second. That's a lot. So we've run similar experiments, maybe not on the exact same hardware but with PostgreSQL and MySQL and they can maybe do 30,000, 40,000 transactions a second. So the gap between these two is 10x of what Oracle or MySQL and PostgreSQL can do. So this is a lot to pump 1.5 million transactions per second on a single box is really, really fast. I think they're not actually running through the full SQL server stack when they show these numbers. I also don't think they're actually doing logging in this so this is sort of pure in-memory performance but it's quite significant, right? And we have time, I can show you numbers at the end but in a lot of the talks about Hecaton that came out around the same time the paper was released they show some examples from real-world workloads where they're getting about five performance improvement over what regular SQL server can do. And they also show how like the latency is cut down significantly, right? Of course, because everything's in memory you can write to things really fast. So the main lessons that you can get from the Hecaton work is that sort of the main two takeaways that they have are they argue that when you design your transaction processing database system your in-memory database system you should strive to only use lock-free data structures. So that means that for your, you don't want any latches, spin blocks, critical sections, mutexes, anything for your indexes, your transaction state map, your memory allocator and your garbage collector internal state. So I actually just agree with this and we'll see this next week. It turns out using lock-free indexes or latch-free indexes isn't as great as they claim. So in particular in Hecaton one of the big contributions of the system as well is this thing called the BW Tree. I think that's the assigned reading from Monday next week. This is the index we use in Peloton and as we'll see in our experiments we've done here at CMU it gets crushed by non-lock for your latch-free indexes. And then the skip list is what you guys are implementing as well and that's even worse, right? So we'll try to understand next week why, okay? Then they also argue that you want to minimize the number of serialization or coordination points you have in your data spend-in system, right? And the only place you have to coordinate your transactions explicitly is through the timestamp counter. So they have a shared counter that's used by all the threads that's used for the begin and the end timestamp they're all in the same time scale and they just do a compare and swap to increment them. Now we saw on the silo paper they actually argued that's a bad idea because if you have a lot of cores every single time you call compare and swap and increment that counter any core thread that read that counter will now get a cash invalidation message, right? And if you're executing a lot of transactions for a second that's gonna be a lot of traffic on your chip. So in the silo case they would avoid this or they try to minimize this by doing the batching but at Hecaton they don't do that. So any questions about Hecaton? In the back, yes. How are they communicating the commit-dependent thread? This question is how are you communicating the commit-dependency between the threads? So if I read something from, I do a speculative read and I read something from a transaction that hasn't committed yet I know what that transaction is because I would see that transaction ID in begin timestamp. So then I would say, all right I read this and then I do my look up in the map table get a pointer to its commit-dependency list for that transaction I read from I add myself and then when that transaction commits then they notify you say hey I go ahead and commit it to do whatever you need to do to clean up. When you do a speculative reading so what is the other transaction about after you validate that the read is the same? So his question is if I do a speculative read and I read data from a transaction that hasn't committed yet but then that transaction aborts what happens to me? Yeah, after you validate it. So you wouldn't get past the validation, right? You can't validate until you know whether the transaction is actually committed. So you can't commit until the other guy commits. But you just set that like in the validation and you just check whether that's your business name or not checked. Yeah, so I shouldn't add it like you also have to check whether the thing you read is whether the transaction has committed yet, right? And so I mean that is sort of the same thing, right? So you go back and read the same version again, right? And you would check to see whether the begin timestamp has flipped to now say that the transaction was committed, right? And then you just, I think they spin away to see whether the other transaction is committed. Back, yes? Speculative read seems to lead to a higher abort rate, right? So the statement is speculative reads seems to lead to a higher abort rate. Again, it's a cop out but it depends, right? Like it depends on the workload, right? If everybody's trying to update the same record and you read that record then it's likely that the thing you read is going to get aborted. So yes, but there's certainly cases where if there's low contention then that's a good trade-off. I mean like not having speculative reads would surely lead to lesser of it. So why not just allow just reading one version before instead of allowing speculative reads? So the statement is, yeah, your statement is instead of allowing a transaction to read, read the latest version. So let's go to this sample here. So you're saying for this guy instead of allowing him to read this speculately it's better to read this. Yeah. You can't, right? Because my begin timestamp is 30. I need to read the version that came after 25 because this is not visible to me. Yeah, but since it's actively being used by another transaction so just for this, for these cases allow it to read one version before. Like, so this transaction here it's gonna read this version here. Yeah. That's not serializable, right? Because now I'm reading, I need to read this because this is the version that would appear if I executed this in serial order, this followed by this. I can't read that. I can't read this because I should have read this. That violates serializable order. So he's crashing, is it in like isolation level? So your statement is, does it depend on the isolation level? Yes, if you run and read uncommitted, you read that, you don't care if it aborts or not, who cares, right? If you run at, okay it's next one level, read committed. Yeah, you would read this because this is considered committed, right? And then if you read it again, you would read, if you try to read this again and this transaction committed, you may end up with this, which is not repeatable, but that's okay under recommitted. All right, he gets a little broke up. This question is, do they have a right of work? We'll get to that, because the Cata has that, yes. We'll see this at the end. Okay, any other questions? I like the heck out on paper because it's well written and I think it explains the trade-offs and design decisions for implementing a protocol like this in a real clear way. At least I can understand it. So let's make some observations about how Hecaton works. So the first is that the reading scan-set validation process we have to do is expensive if we have transactions that access a lot of data. So in their example, when I showed the graph, transactions are updating, reading, writing, maybe a small number of tuples, like less than 10. And if you're exceeding the YCSV workload, most of the transactions only read or write a single tuple, right? So maintaining the read set for that is cheap because it's a single pointer to a tuple, right? And the scan-set essentially is cheap as well because it's a point query into the index to find the one thing that I want. But if I'm trying to execute an analytical query that may be scanning a large segment of the table, then recording every single tuple that I look at in my read set is gonna get expensive. If I have a transaction that reads a billion tuples, I have to put a billion pointers in my read set. Now, we said before that you can maybe declare your transaction as read-only and therefore you just disable all of this read-set scan-set validation. I don't know how common that is in real applications, right? We don't have customers, we don't know what real numbers look like, but certainly storing every single pointer that's expensive. The next issue you gotta deal with is that, again, if I'm doing large scans, then I'm always gonna have to maybe chase these pointers to go down my versions to find the version I should be actually reading, right? And that sort of, now that means you have if clauses, you have to jump to different locations in memory. That's gonna hurt your casual cowdy, that's gonna hurt your instructions per cycle. And the last one is that, it may be also the case that doing record-level conflict checks, like did I write to the same thing that you wrote to, may actually be two coarse-grain for some transactions and may end up leading to false positive aborts or false aborts where I think there's a conflict when there actually isn't one. So, based on these observations, the Hyper guys came up with their own version of MVCC. And Hyper is designed for H-Tap workloads, so that means they wanna support fast transactions and fast analytics all in the context of the same database. So, the design decisions that the Hecaton guys made, like maintaining pointers for every single version I read, is gonna be insufficient because if you're scanning a lot of data, you're gonna have to record a lot of pointers. So, they're gonna be doing a Delta record versioning with version chains going from newest to oldest. And they're gonna try to do in-place updates for any attributes that are indexed. If I update an attribute that is indexed, then I'll just do a delete and insert, like we talked about last class when we handle primary keys, right? And essentially just creates a new version chain. But the key is that they're not gonna do the scan checks or the block locks, the bucket locks, the way Hecaton does, when they actually wanna enforce serializability, right? And we'll see what they're gonna do in a second, but they're essentially have a more efficient way of checking these things without having to re-execute every single scan over and over again. And just like in Hecaton, they're gonna avoid the right-right conflicts by aborting any transaction that tries to update the same tuple. So, here's a high-level overview of what the architecture looks like. So, as I said, it's a column store. So, every single attribute is stored continuously in a block of memory. And then they're gonna have a separate column called the version vector that is gonna have pointers to the older versions of a particular logical tuple. But one difference from sort of the high-level example we talked about before, that Hecaton's gonna, or sorry, the hyper's gonna do, is that the rollback segments or the undo buffers for every, delta records that we organize on a per-transaction basis. So, rather than having a global space where we put all our old records or the deltas, every transaction's gonna have its local memory pool. And then as it creates new versions, it's gonna copy the old version or the delta of it into its local memory. And we do this for performance reasons because now we don't have a global memory space in the heap where every transaction needs to try to do a compare and swap or to acquire a lock on in order to insert a new delta record. This thread knows that nobody else can be, or this transaction knows that nobody else can be running to the same memory so that it can just append new entries without coordinating with anybody else. So, the key thing about this is that the reason why they wanna do this, have this architecture is that when you wanna now do analytical queries, most of the time you're gonna be able to rip right through your column store and not follow the old versions. So, there are many cases, we'll see when we talk about synopsis, the vector synopsis or the version synopsis, there may be cases where you do have to follow the version to make sure you read the right thing, but if you're storing your entire database in this system, only a small portion of it is actually gonna be modified. Only a small portion is gonna be considered hot where you're gonna be getting updates. All your cold data, things from days, weeks, months ago, they're not gonna have versions. So, you can just rip through your columns very efficiently and not have to chase pointers, right? So, that's sort of why they go with this architecture here. So, the dirty secret about Hyper is that the paper claims to be multi-threaded to the best of my knowledge from discussing with their developers, it is actually single-threaded, but for our purposes, it doesn't matter. And I don't know what's actually in the commercial version in Tableau, but the academic version as far as I know was single-threaded. So, meaning they can do single-threaded transactions and multi-threaded analytical queries. And that was so fast that that was good enough for what they were targeting. All right, so let's see how they do validation. So, again, like in Hecaton, they're gonna do first-writer wins. And so that means, again, we don't have to check to see whether the right sets between transactions overlap. And then the version vector is always gonna end up pointing to the latest committed version because if I go try to create a new version and I know that the master version in the main data table is actually from a pending transaction, then I know I'm trying to write something that somebody else already wrote to and hasn't committed yet, so I can immediately abort myself. So, there's essentially only one uncommitted version of a tuple that exists in the system. But now, the way they're gonna check or read-write conflicts is actually kinda cool and much different than any other system. So, what they're gonna do is, they're gonna check the delta records generated. So, if I'm a transaction that's validating, I'm gonna check the delta records that were created by all other transactions that had committed after I started and to see whether any of my queries overlap with the data that they created. So, they're gonna use a technique called precision locking. And so, what's really cool about this is that this paper is from like 1980s, from before when I was born and no one actually has ever used this. The paper has like 55 citations, which is ridiculous for a 35-year-old paper. So, this paper essentially was forgotten. The hyper guy somehow pulled it out and like, oh, this is actually what we wanna use. And so, this is what they're using. So, the precision locking is sort of a variation of predicate locking. But instead of actually checking to see whether predicates overlap, you just need to check whether the right set overlaps with your predicates, right? So, I'll go through an example of this. So, now this means that we don't need to record all of the individual pointers to the tuples that we read and we don't need to record the scan set like we did in Hecaton. We just need to record our predicates and just compare them with the right set of the other transactions. So, let's look at an example here. So, say this is a transaction that I wanna commit, right? And we executed three analytical queries. And then over here now, we have the delta storage, the delta records that were created by transactions that were committed after this transaction had started. So, say this transaction started at timestamp 10.04 and then all these other transactions then committed after that. So, now I need to go check to see whether the where clause of these different tuples or these different queries overlap with the right set. So, I don't have to check any transaction that committed before I started because under snapshot isolation, I would have been able to read them, right? So, it's the only ones that committed afterwards. So, in this case here, we just look at the where clause. This says attribute two greater than 20 and attribute two less than 30. So, I look in the delta records for this transaction. I see that they modified attribute two. So, I pluck out those two different values and I substitute them for my where clause and then I evaluate my predicate and it's false. So, therefore I know I don't have an overlap here, right? Same thing for the next one. I substitute attribute two in here between the two different values that were generated. I evaluate my where clause and it equals to false. So, I know I don't have an overlap. And then same thing for the last one down here, it's false, right? So, I just keep going down the line and I'll do this for every single predicate and every single record of my delta storage. And then I get to the last one here and I see where ice cube is like the wildcard ice. So, therefore I have an overlap here because this evaluates to true and therefore I know that I have a phantom here and I have to abort this transaction, right? So, what's going on here? When I ran this query the first time, I didn't see this record because this transaction didn't commit yet. But then, now I'm validating, I run this scan check again and now I see ice cube which I didn't see that before. So, now I know I have a phantom and I'd have to abort. And so, they do a bunch of things like they order the delta records they evaluate and the order that they were most recently committed or sorry, the oldest committed ones. There's other tricks they can do to sort of speed things up. And the way underneath the covers they're gonna represent these precision locks is essentially a tree data structure. So, they'll convert these ware clauses into a tree and then you can apply these values at runtime very efficiently. Yes. So, these are three transactions we were checking against. They all are set at time stamp 4 to 1 in reality. This question is the transactions we're comparing against they have a commit time stamp that occurs before our guy. So, again, these transactions have already committed and validated sorry, they validated and committed, they're done. We still maintain their undue buffers because we know that there may be a transaction that started before these guys committed and therefore they might overlap so we have to check them. But once these guys, once they say this transaction finishes the system would know there's no other transaction that started before these guys actually started before these guys committed so therefore I don't need to maintain their undue buffers but their version would already be installed in the system. So, again, this avoids having to do the scan set checks that Hecaton and Silo did, right? So, the other cool thing that Hyper does that I think is pretty clever is that they have what are called virgin synopsis and the basic idea here is that they're gonna have a separate column that they store that's gonna keep track of the positions or offsets in the columns of where the first version tuple starts and the last version tuple stops, right? So, the way to think about this, again, every position is an offset, these columns are fixed length so I know how to easily compute my offset when I wanna jump to a particular position and then in my version synopsis for this block, so they organize things in blocks of 10, 24 tuples, they're actually called morsels, we'll see that later on in another paper but in my synopsis it says that the first version tuple starts at position offset two and then the last one ends at position five. So, right, here's two and then here's four so I know that anything in these segments here I know they don't have older versions so when I cogen the scan on this table I know that up to a certain point I don't need to check this version vector at all, right? I don't have to do any branching, I don't have to go down the virgin chain and figure out anything else, I can just again rip through this data and process it very efficiently and then when I get within the range defined by my version synopsis then I know I have to have the cogen scan actually do a bunch of extra checks, right? This is another example of an optimization that they do in hyper in order to speed up analytical queries but still support multi-versioning and transactions. Okay, any questions for the hyper? Hyper is the system we're to see all throughout the semester, yes? For the precision locking, you mentioned that they use sort of like a tree for this project. Yes. For reasons they don't compile those like project data's argumentation. Oh, so his question is I'm saying they're using a tree data structure for the, to represent the precision locks is there any reason why they just didn't actually compile they don't actually compile? They probably do compile it. Everything in hyper is compiled. In the paper they talk about it in terms of trees, right? You can easily compile that down and they probably do, right? Okay, so now we can talk about Cicada. So Cicada is a OSP engine in the same vein as Hecaton but it's a modern variant of it. So they're gonna be doing optimistic multi-versioning with newest to oldest dependent only storage. And the high level idea is essentially the same thing we saw in Hecaton and what we saw in last class I'm gonna talk about multi-versioning but they have a bunch of some optimizations that I think are interesting that I wanna sort of focus on. So one of the key design decisions that they chose is that they want the system to be able to work well for both low-contention and high-contention workloads. So they talk about how in a single versioning system if you have low-contention, a single versioning system works much better and more efficient than a multi-versioning system because you're not gonna have transactions that need to go back and read older versions, right? Most transactions can just read the latest version and that's good enough. So they're trying to be able to balance both. They've worked in both cases, the low-contention and high-contention and there's much optimizations they do to be able to handle that. So I'm gonna focus on the inlining, the validation and the index nodes because these things are actually really interesting. The synchronized clock stuff, it is one way to do it. TikTok is another way to do this. Silo's another way. The only thing I say is that they, it's very clever what they do and they're basically applying techniques from distributed systems but making work on a single node system which I think is very, very cool. In the sake of time, I'm not gonna have time to actually go through the details of it but essentially what they're doing is they're allowing clocks to drift and then they have a way to boost it every so often to make sure that they're actually in sync with low coordination. All right, so Cicada is a pen-only system but as we saw in the case of Hyper, Hyper makes the argument that having to traverse the version chain can be really expensive. And so what they decided to do instead is that they're gonna have a single data structure to keep track of the essentially the head of the version chain. So it's like a pointer list that's immutable for a single logical tuple. It always says if you jump to this offset here's how I'll have a pointer to get you to the next version, the first version in my chain. And so what they're gonna do is they're gonna actually try to pack in the latest version of a tuple in this sort of centralized data structure here, right? So this essentially ends up being like a Delta storage system but it allows them to decide dynamically on the fly whether to inline the latest version here or not. So this case here, say I have three tuples and so the first guy for the pointer, it points off now to the version chain, right? Which is again some other tuple in our table storage. But for the second two guys, the pointer ends up pointing to a version that's embedded in this data structure here, right? And the paper talks about how if it's the right size and it's read mostly, then you wanna pack it in here. So this is again essentially allowing them to be flexible like an append only storage but still get the benefits of a Delta only Delta storage model or versioning by having the master tuple or the latest version of the tuple be in a centralized data structure. So they're not supporting analytical queries but it would make it easier to scan this through and to find exactly what you're looking for rather than having to follow the chain, right? Because that indirection layer having to jump through from one version to the next, again, that hurts your cash locality and that becomes expensive. So they try to reduce that by inline. So the next thing they have is a way to do fast validation. So the one interesting thing that they do is that they try to keep track of the amount of contention in the system and avoid having to do a bunch of additional checks that you normally have to do in sort of the regular protocol. So the idea here is you relax some of the validation steps you do. You're not violating your serializable guarantees, you're just, you're being less paranoid about the checks. So the first is that they will try to, they will order the tuples in the right set you need to validate against in the order that they were recently modified. So in TikTok and Silo, they sorted them based on the primary key to ensure that there was no coordination and no deadlocks, so that all the threads that are validating at the same time validate their tuples in the same order. So instead of what they're gonna do, instead of doing, but based on that, they're gonna actually do it on the order that they were modified. And the idea here is that if there's a highly contended tuple that everybody's always trying to modify, you wanna check that first to see whether there is a conflict because you wanna abort your transaction as early as possible. Right, because otherwise if it's the last tuple you check, then you do a bunch of validations and checking that's wasted instructions only to find out later on that your transaction is gonna abort and then you just wasted a bunch of work. Likewise for the, they try to do pre-validation to make sure that before they actually do any global rights. And then this is essentially again doing this consistency check. So this idea is actually derived from the TikTok paper which we didn't cover. But again, the idea here is that since the global rights into the shared database is expensive, you try to kill yourself as soon as possible before you do wasted work. And then the last one is that they do an incremental version search where they basically keep track of in your version chain if you read the same tuple, what's the version I actually want, so that way when you try to read that same thing again you know how to jump exactly to that position rather than checking every single version. Right, so the main idea here is that by being contention aware if you know that your most recent transactions have committed successfully without having to abort and roll back then you can skip these steps. And you still don't violate the your acetylizable guarantee because you still have to do the validation at the end. Right. So now the other really cool thing about Cicada that as far as I know I don't think anybody actually does this. And when I read this I was like it's sort of tucked in the paper and I was like oh shit this is actually really cool. Like you feel like they should have promoted this more than like the loosely synchronized clocks. Right, to me this is really novel. And I don't actually know whether it's a good idea in a general system but it's certainly something we should think about. So we'll see this on Wednesday but there's this big tension in a data-based management system or in a transactional data-based management system between the concurrentio protocol that we have for the tuples in the tables and the concurrentio protocol that we have in our indexes. Right, and the issue is that with indexes we don't want to be holding locks or latches for the duration of the transaction because that's gonna prevent other transactions from accessing the index. And we don't actually care about the physical layout of an index so we don't care if the underlying data structure gets modified every single time you read it in the same transaction. We only care whether the logical contents have changed. So meaning if I have a B plus tree I can reorganize the nodes during my transaction. As long as I look up a key and I get back the same result as long as the active reorganization it always produces the same result. That's all I care about. So we'll see this in this class but there's all this extra stuff people do crabbing and then the gap locks and things like that in order to handle phantoms in a data structure of where the underlying physical storage could change. So in Cicada they throw up all that extra crap out and they just say, well, to store the index inside the table itself and which treated as a regular tuple. So we're already doing concurrent control for the data table tuples. Now we do concurrent control for the index nodes. Right, so the way to sort of think about this is like the node itself is an actual tuple and because it's multi version database now we can have multiple versions for our index nodes. So in Cicada all they're storing is a blob, some block of memory that has the key array and the value array and then the low and high keys that you know the store and a regular B plus tree but you could actually store these things in the separate fields. So now you can do other sophisticated things like, well, I updated the keys but I don't pick the values and I don't need to do validation on that, right? So this is really fascinating because essentially you get the serialization protocol that you'd have for regular tuples but you get it automatically in your indexes because you stored the indexes in the table. And as far as I know, as of 2018 unless somebody else does this and it's in a paper and I missed it or a commercial system does this, I can't think of any other system actually does this. I think this is actually really clever and I think this is one of the key contributions of this paper. So you don't have to do that, that the predicate locks and precision locking in Hyper, you don't have to do the scan set validation that you did in Hecaton. If the node is consistent, the version of the index you'd read is the same one when you go to validate, then you know nobody could have inserted a new entry and you're good. Now you have to be careful about things like, well, you really only care about the leaf nodes, whether they change and the internal loads can change, because you're reorganizing. Some extra stuff you have to do to make this actually work nicely but again, you don't have to do that extra phantom avoidant stuff. And I thought that was really clever. Okay, so let's look at the results from the Cicada paper. So, Hyun-Taeq, he is a postdoc here at CMU, he did his PhD with Dave Anderson, he is an amazing hacker. He basically took all the state-of-the-art MCC protocols, except for Hyper, because it's a column store, and he implemented them in his system and he ran experiments to see how well they perform when you scale up the number of threads. So, for this first workload, it's gonna be low contention, right? So, it's YSB where you're reading, writing, a single tuple per transaction. And so, it's 90% reads and 5% writes. So, you know, we can't have conflicts when there aren't read-read conflicts, so we don't have to do a bunch of validation for these things. So, what you see is that there is a cluster of protocols that actually do quite well. Cicada, Fotis, which came out of HP Labs, by Hideaki Komora, and then the sort of his optimized version of Silo, and then the original version of Silo, and then, I think TikTok is in there well, which is the system that the paper that I worked on with a student at MIT, right? All of these protocols work pretty well when you start scaling the number of threads. No surprise, two-phase locking performs poorly because you're holding locks. There's some transactions that are holding exclusive locks and you can't have speculative reads of that. I forget the details of why Irma and Hecaton actually perform poorly here. But this shows that on a multi-versioning, sorry, with the low-contention protocol, some of these systems are single version, but Cicada is multi-versioned and it's performing just as good as the single version ones. The real difference though is when you go to high-contention workloads. So, for this, this is very extreme. So, you're gonna run the TPCC workload, the all five TPCC transactions, but they're only gonna have one warehouse. And so, you'll learn more about the TPCC over the course of the semester, but a large portion of the workload is this thing called the new order transaction. It's a thing of like placing a new order on Amazon. And so, the reason why the number of warehouses is significant is because every new order transaction has to update this counter inside the warehouse table. So, we only have one tuple in the warehouse table. So, everybody's trying to update this single one counter. And so, what you see is that all the protocols now perform poorly. Irma, which is actually one of the worst ones before, is now doing actually the best up until 24 threads. But in the case of Cicada, it actually performs the best. You still plateau here at some point, like if this thing beyond more threads, I think it would stay flat, essentially because it ends up being serial validation and only one transaction can commit at a time. But the overhead of getting that one transaction to commit is much lower than the other guys. So, any questions about Cicada? So again, I think the main contribution is the index only nodes. I think that the best effort inlining is interesting, but essentially ends up being the same thing as the Delta storage model. And the loosely synchronized clocks is an efficient way to do time stamp management. Why does Silo fall off so sharply? His question is, why does Silo fall off so sharply? I don't remember, yeah. I think this version of Silo came from the Xinyal that did the TikTok with me. So, I think there's something in his implementation of Silo that doesn't match exactly what the original Silo did. Like, I know it also wasn't using the Mastery, which is the index that Eddie Kohler built for Silo, whereas the Silo Prime, this one here, that's actually with the Mastery and there's some other optimizations they do as well. Yeah, but I don't know why it actually falls off. Yes? So, like the index, we need to know that there are also now objects that we like in the read, set, and write set. Does that mean like two connections that previously, like just like access different objects that they like wanted to kind of like, wanted to kind of kind of like, now like the two objects happen to be on the same leaf now and the leaf transactions might. So, his statement is, if now I'm storing the leaf nodes as tuples, and if I have two transactions that access, which should be clear, read or write? Like, write. Right, so they both write to the, if they both write to the same tuple, I'm sorry, the same leaf node. So, if they write, say, most transactions say could be updating, creating a new version. Because the pointer is gonna point to that master version list, I don't want to update that every time I create a new version. It's that indirection layer we saw in last class. But certainly, if like two guys try to insert into the same leaf node, there'll be a conflict. But you would have that anyway with, as we'll see next class, would do index locking, right? So, it's the same thing. Yeah, in the back. So, CKD shows great performance across all these varieties to work with. So, what's the catch, or what's being done with them now, why didn't it just take a very long time? Okay, so his question is, or statement is that, I mean in these charts here, CKD does amazing, right? So, why isn't everybody rewriting their system to actually use this? So, I would say that, I don't know actually how it performs for analytical queries. And I think that would be a big drawback. And I feel like the hyper guys make certain design decisions for analytical queries that are really much better, right? So, I think in the paper, the only look at TPCC and YSSB, those are, oh, it's new workloads. Yeah, so I think that's the answer. But certainly I think storing indexes in the tables is a good idea that other people should consider. Okay, so, the protocols we showed here today and last class were all about maintaining serializable ordering of transactions. So, they had different ways that how they would handle check for phantoms. And again, the phantoms are always an issue when you have range scans. So, we'll see this in the next lecture on Wednesday, what I'll call the more traditional ways to deal with phantoms by actually doing locks and latching inside of your indexes. It's also my opinion now that I consider hyper and cicada to be state-of-the-art and memory MVCC protocols as of this February 2018. The hackathon is very good, but again, they're not focusing on that. MenSQL doesn't support serializable isolation and they're more worried about the distributed OLAP. But in terms of the single box, in memory, multi-version period control, I think these two systems are the best. And I regret that our system doesn't do these things. Let me take that off line. Okay. All right, so project two. So, you guys are gonna implement a skip list in Peloton. Again, skip list is a latch-free data structure that the story is actually, skip lists are from the 1990s. They were sort of been around for a while, but they really only became in vogue when people were very, became interested in lock-free or latch-free data structures and algorithms. The story is actually from the hackathon guys. When they first started building hackathon, they were super into skip lists. And they had all of these presentations internally at Microsoft about how great skip lists were, right? And then the SQL guy was there at the time working on the SQL server team, not on hackathon, just the general team. He saw all these talks about how great the skip lists were. So then he left about halfway through the hackathon project, went to Facebook for a while, then he went off and formed the MemSQL company and started building their system. And he saw all those talks from Microsoft how great the skip lists were. And so that's why MemSQL is big on skip lists today. And the problem is you didn't see the second half of the talks from the hackathon guys that says skip lists are actually bad. And then they came out with the BWTree and said that was better. There's better things than the BWTree, but the reason why we're having you guys implement skip lists is because it covers again all the important things that we care about in latch-free data structures and it's something and garbage collection and it's something you guys can build in a month, okay? So the spec is on the website. We're providing you guys with headers for the index API that you have to implement. So your index needs to be a drop-in replacement for the BWTree. We already have a bunch of tests for the BWTree. And essentially there's an index factory. You change the flag and say I want a skip list instead of a BWTree and your things should just work exactly the same. Well, we'll provide some additional tests for you that already sort of set things up for you. So there's a bunch of design decisions that you guys are gonna have to deal with as you build this thing. And you can ask me, we discussed one of the trade-offs of these various things, but there's no one right answer, right? How you wanna organize your garbage collector, how you wanna organize your towers and your skip lists, that's entirely up to you guys. Now I'll say also too, I know I'm sure if you go look on GitHub, there's students that took the class in previous years and you can go find their skip list invitations. I have them all too. And we can use the Moss thing on AutoLab to check to see whether you're exactly the same. So please don't look at other people. We should try to think this through yourself. Because I think you'll get more out of this. I've also been told at, I don't wanna say where, but at some database companies, the questions of skip lists and lot-free garbage collection and things like that come up often. So basically this class essentially is the interview questions at some companies, right? But it's not by design. All right, so again, we're gonna provide you a skip list test and then you also have the VW Tree to compare against. And just like in the first project, you guys are gonna want to extend the tests we provide you and do your additional testing. So I think we'll do, I think we do provide you multi-threading tests, but we're not gonna check all the different corner cases you have to deal with in your index, okay? We're gonna want you guys to also write documentation and the code, explain all the different parts you're doing because Prashant and I will look through and see whether your assumptions that you make or how you design certain things are actually valid and correct. So you should make sure that you do write as much documentation in comments as possible. I think it's good practice to always do this. So unlike in the first project, we're actually gonna try to do speed tests on this project to check to see how fast you are. So the Pram with AutoLab, I think it only provides a single threads. We may have to work something out with them or do additional testing on the side, but we'll provide bonus points for the top three groups that have the highest or the best performance. So there'll be a functionality test we'll grade you on and then there'll be additional speed tests where we'll insert a billion things and then delete a billion things and see how fast you are. And then we'll use the leaderboard on AutoLab to figure out who actually is the best, okay? And then as always, you wanna use Valgrind or the sanitized flag in GCC to make sure that you don't leak any memory and then use Clang Format to make sure that you're following our formatting guidelines. So I think everyone should be in a group. I think there's one student who's not here today who's looking for a group. But I think everyone's fit on the spreadsheet, right? I think I saw 12 groups. We have 38 students. Okay, so if you're not in a group, send me an email and we'll deal with it. So that is way wrong, March 2nd. All right, I think it's March 12th, all right? Whatever the Monday is during spring break, that's when it's actually due. So maybe ignore that. 2017, come on, that's retarded. All right. You know what it is? Because yeah, you don't care. Because I split the MVCC slides from last class and this class. Yeah, this is way off, shit. I'll fix this, okay? It's due March 12th, okay? All right, so next class, again, this is it for concurrency control, but this is gonna come out throughout this semester when we start talking about analytical queries because you wanna think back to your mind, how would I actually do this when I know there's transactions could be updating the database at the same time? So we're gonna look at, again, what I'll call traditional methods that do index locking and latching and then we'll see how to do latch-free data structures where you don't have to do this, but then we'll see why it's bad and we'll go back to adding locks in a more efficient way on the third indexing lecture, okay? Any questions? Mm, I need somethin' refreshing when I can finish manifestin' to coal a whole bowl like Smith & Wesson, one court and my thoughts hip-hop related, ride a rhyme and my pants intoxicated, lyrics and quicker with a simple moan liquor, since I'm a city slicker, play waves and pickups, rhymes I create rotate at a rate too quick to duplicate philipines as I skate, mics at Fahrenheit when I hold them real tight, then I'm in flight, then we ignite, blood starts to boil, I heat up the party for you, let the girl run me and my mic down with oil, records still turn to third degree burn for one man, I heat up your brain, give it a suntan, so just cool, let the temperature rise, then cool it off with same eyes.