 Hello everybody glad to be here. Hope you're having a good time. It's always an exciting Event for me and anything I can drive to from Philadelphia's even better Because I traveled over about 101 days last year and I don't want to repeat that so Yeah, I'm very happy to be here. I think it's been a really nice event. I thought the keynotes were very professional and gave a good sort of professional feel to everything This is not that This is a talk About a very complex part of postgres called multiversion concurrency control Don't worry if you don't know what it is because I'm going to explain it But it is one of the sort of magic pieces that makes postgres What it is and makes relational databases? What they are a lot of the you know There's been a lot of hype about relational databases and you need them and so forth and the bottom line is if you're dealing with data Relational databases make application developers jobs very easy They don't have to worry about concurrency. They don't have to worry about locking. They don't have to worry about durability There's a whole bunch of constraints. They're involved. There's an optimizer's whole bunch of stuff MVCC is just a small part of that But I but I think a very important part those of you who are my age and maybe grew up in the You know in the IT industry of the 90s or 80s remember a time before MVCC When relational databases were a little harder to manage And I'm going to talk about really what this is how why MVCC is important And and I think at the end you're you're either you're going to be confused or your head's going to be spinning because it is a very complicated talk It's my most complicated talk But let's get started. I think I think you're going to I think you're going to enjoy it The slides for this presentation are actually right here. So if you go to that Web address there or you can manage to spell my name halfway In a search box, you'll find it. There are recordings of this presentation and 30 other presentations So if you like this you want to hear it again, or you want to hear about another presentation And we're recording this one so the world be a new product new recording this presentation up It's it's really kind of cool. There's a 430 blog entries there about various aspects of postgres broken up into categories There are lists of all the conferences I attend and a lot of under interesting stuff about me my email address my cell phone number blah blah blah So let's talk about Let's talk about the title here unmasked. What is what is unmasking? I think it's a cute photo. I saw it years ago and I was like Wow, that's really weird. I recognize a couple of the people here, but there's a bunch of people I don't In fact, the reason is because you've seen these people in movies But only half of them have you seen their faces Because in fact the people you don't know are Chewbacca Darth Vader and R2-D2 Which you know now makes a little more sense to me. I think this was R2-D2 if I had to take a guess But it is really really interesting What unmasking does? You now see the picture differently and I hope at the end of this presentation, you'll see postgres differently because of that I'm asking So why worry about NBCC? Why do you should you care? What's happening inside the database? There's a couple good reasons? I'm gonna pivot this right here One is that you're when you understand what's happening in the database. You can better predict behavior You can better predict what's happening inside, you know, obviously Understand the behavior a little better. Secondly, there are performance effects of NBCC It's good to know those and there are also storage Overhead of NBCC and it's good to know that again. Can you run with postgres without knowing these things? Yes Does it help to know them? Yes So what we're going to talk about today I'm going to introduce what NBCC is. It is not exclusive to postgres Then we're going to talk about how postgres implements NBCC with some really cool diagrams I promise and then third part we're going to talk about cleanup requirements Things also some cool with diagrams in there. So what is NBCC? Big amount of text but the part you really need in terms of user-visible behavior is that readers never block writers and writers never block readers and That is pretty fundamental particularly those of people who've used relational databases before NBCC. I Remember having to do, you know work in in law firms and do large updates And we had to do those at night when nobody was in the system and then if you updated too many records You you overflowed the lock table that record not anybody remember that and then you'd have to do it in batches I remember I'd say okay the first million I'm going to update and I'm going to sit there for 40 minutes And then I'm going to do the next million You know sort of to get through this kind of problem because prior to this Without NBCC you effectively had to lock rows you updated that obviously if required them to be cataloged And they had a lock table and also it Inhibited people from looking at the data while you're updating it and what we're saying here is that doesn't happen Then in fact your readers can go blithely by no matter how many rights are happening in the database seems impossible, but it happens And your writers can continue to write no matter how many reads are now that doesn't mean right drunken to block other rights Okay, let me we're not saying that that would be magic but I am saying that the readers and writers sort of end up in a different pool and The only real thing you have to worry about is writers blocking other writers for specific rows Because postgres does not have page level locks. Everything is a row level. Okay, so we will see an example of that a little later, baby Who has NBCC oracle has NBCC DB2 has sort of a partial implementation I'm not going to get into that and it's anybody's really interested in a DB part of my sequel has it inform X has it Firebird has it and MSSQ has it but it's optional and one of the reasons is hard to retrofit NBCC into a system Is that there's a certain behavior profile that everyone's used to and when if you try and retrofit NBCC And even if it's a better solution For an existing database, you can't always do it without really, you know, causing a lot of very angry users And that's why it's either not implemented in the in some of these but but in terms of Feature capabilities NBCC is really the way to go if you're starting everything scratch or whatever you even there are even no sequel stores You'd add to use NBCC. I believe Cassandra Is one that does that so and there's even a programming language It uses NBCC anyone want to help me here? It's on top of Java. It's Closure, thank you. Thank you. Who gave me that? Thank you so much. Yes, I went to a closure talk three weeks ago I can't remember it great But yeah closure actually uses NBCC to control variable changes because it's an immutable language But and then you for you when you change something you create a new version. Anyway, we'll get to that This slide is one you want to remember, okay? And again all these slides online you could you could go download them right now What this highlights is two things one that each row has some invisible fields I understand somebody already talked about X-min and X-max in an earlier presentation But we'll talk about that in a minute to each row has two invisible fields We're going to call it create and expire, but in fact those are not the names and The second aspect you should see from this is that there is that an update is effectively a combined delete and an insert All right, I know that sounds kind of odd to people but conceptually you can see how it makes sense, right? You've got an insert operation. You've got a delete operation. What do you do with an update? It's effectively a delete of the old row and insert of the new row, okay? And this is where the multi version starts to make sense because now you can see we're actually Potentially when we do an update we potentially have two rows around For a period of time. I'm going to talk about that in a minute But the idea is that we've got two rows one is the old row one is the new row You can start to see where this makes sense all of a sudden the readers Can keep looking at this row and the writers can look at this row And that's really why it's called multi version. That's sort of the magic behind it We're gonna see specific examples, but again just from this one slide, which you would think would be pretty content-free You get that concept you get the concept of some some kind of tracking information and the concept Then an update effectively has an old and new version And that's going to carry through You're gonna see specific examples of that Okay First term we have to go into again. I'm not going to read it if you want to download it read it The concept of an NBCC snapshot, and this is pretty pretty powerful Before NBCC systems You just looked at the current data So you would just you doing a sequential scan just run through the data from the beginning to the end And if you got to row that was locked You stopped and you waited for that row to get finished and then you'd keep going Okay, there's two problems with that first you would have to stop because a row was locked Okay, and the second problem is that by the time you got to the end of the table the data You read from the beginning the table may not be current anymore So what did you have to do? Can everyone get out of can everyone log out during lunch between 12 and 1 we want to run a report That's the kind of environment you were in because you couldn't run Financial reports during the day because people were changing the data So you'd have to tell everyone if it was a crisis you had to run a report We would be like 10 o'clock and we'd be like oh, we didn't run that report last night. Okay. Let's eat Let's tell we never email. Let's tell the staff to get it to all take lunch from 12 to 1 and then we'll run the report They get great idea, right? What MVC snapshots do and this is a very weird idea is that it gives you It gives each user in the database a different definition of truth Okay, I know I don't want to sound post-modern but Yeah, some people got that When When you look at a database you think the row is either there or it isn't there Okay, or when you think of a particular value either the customer's in the database or the customer's not in the database Or either the customer's balance is a hundred dollars or the customer balance is three hundred dollars And you would think that anybody looking at the database would see the same number of customers the same balances for all those customers and so forth But an NBCC that's not possible and frankly in the old system where before NBCC It wasn't possible because we had to kick everyone out if we wanted a consistent view of the database Okay, what in but what these snapshot does is when you start a transaction and I'll be showing you some code in a minute But when you start a transaction what happens inside of Postgres is we take a snapshot and that snapshot identifies what the visibility you have for that particular query or that particular transaction Okay, and you you maintain that consistent snapshot through the entire query or the entire transaction Okay, and that's how we can run financial reports and get accurate results Even when people are changing the data now is that is the number going to be accurate? If I run if I start at noon and it finishes at one is the number accurate to 1 p.m No, then the the report is accurate as of noon because noon is when I took my snapshot So I'm going to see a consistent view of the data at noon We can't go backward for one o'clock and find out what changes happen But we can take a point in time and actually Guarantee that that user is going to get a unique snapshot and a consistent view of the data It won't necessarily be the most recent one, but it will be consistent. Okay, how do we do that? Take a look to the right. This is this. This is the previous slide kind of blown up To create a snapshot we take three we take two pieces of information. We find out what the highest number of transaction is okay And then we find out what transactions are currently in progress what transactions haven't finished They stir either running, but they haven't finished. Okay, what does that mean? We're going to assume that every other transaction is committed and not aborted. Okay So what effectively happens here is that when I go to look at these rows What you're going to find is based on the snapshot rules that I have some of them are visible And some of them are invisible. This is where we get into the multiversion part This is where we get into that post-modern idea that different people have different views of the data Because my snapshot rules Determined what I can see in the database Okay. Now, let me digress a little bit How many people have heard of? repeatable read or serializable transaction isolation level Wow, great. Okay, so What's really cool, and you may not have thought of this By default Postgres is read committed isolation level. You wear that. I know it's not batch of standard But that's Postgres is default. Okay in read committed transaction isolation level every time you start a new Query inside of transaction you get a new snapshot That is effectively what it means to have read committed mode because you're effectively Regetting a snapshot for every single query inside a transaction block if you ask for repeatable read Hmm, that's interesting word. What would repeatable read mean? It means I'm using this name snapshot Because I can repeatedly read the same rows. Okay, so when you say set transaction isolation Repeatable read or set transaction isolation level Serializable you're basically telling the system take a snapshot at the beginning of my transaction and don't change it. I Want to run that entire transaction with a single snapshot And I want to see consistent data from the first query I run in that transaction block to the last query run that transaction block Why is that useful? Reports great you can run a report in the middle of the day you run the whole thing in a single transaction block You're gonna get consistent results during that entire no matter how long that thing runs In addition, that's how we do Logical backups when you do a PG dump if you look inside the PG dump code or you look at the Transactions that are actually sent to the server when you run PG dump The first thing it does is begin work set transaction isolation level repeatable read Because effectively what's happening is a PG dump is a single snapshot logical dump of your entire database Kind of makes sense now right once you have the concept of snapshot It makes sense that now that's what repeatable read is that's what serializable is That's how PG dump is able to generate a logical dump without interfering With other work going on you can be doing all sorts of writes in the database while PG dump is running No problem Okay, so if we go back to this snapshot and we look at these rules Let's take a look at the top row so row is created with transaction 30 Transaction 30 is not one of the currently running transactions. And in fact, it's less than a hundred so it must have finished right and There's no expires so that rose visible Let's take a look at the next one though that one's invisible why it was created with transaction 50 Transaction 50 is one of the three transactions that had not finished When my transaction started it was in progress so I cannot assume that's finished Okay Same thing over here This one's not visible and the reason is because the transaction started after my transaction if I took my snapshot Okay Similarly down here these ones have a create and expire this one is invisible because the creation is visible valid and the expires valid okay This one This one is visible because the creations visible the expire is one of those active transactions We don't see the expire Okay, and this one is visible because the expire was happened after our transaction Let me give you another waves looking at this. This is a graph Looking at all the transaction IDs from a hundred down and up. This is when we took the snapshot right here The only transactions that are valid are the green ones the ones that started and Completed before we took our snapshot anything that was in progress Or started after Those are not valid Okay, now you might think this is crazy It kind of is there's a lot of pacing around the house after you read this code In fact, this is a quote from Mike Olson Saying this is the this is actually that the snapshot visibility code and it says The tests in this routine are correct. If you think they're not you're wrong and you should think about again I know it happened to me And that that is really profound It takes you a long time to get your head around this so feel free to look back at the slides again But it's it's really cool, and it's one of the things that makes postgres school At this point, I'm going to take the risk. I'm going to take one or two questions I have a lot of material here as you can see I have only 90 slides, but let me just take one or two questions Yes, sir Okay, so the question is what would be in that snapshot It seems like it would be huge the snapshots are only the transaction IDs It's not the row numbers or anything like that So we're really if we have it if we have a server that has a max connections of a hundred Then technically we can't have more than 99 in Process transactions so the maximum size of this open transaction list would be max connections minus one So it's not it's not that big What you're not doing is tracking What rows have been changed because as I said it said before show you here the actual information is stored in the rows Not in this special area. That's actually in the shared memory proc area of postgres When when you start your transaction when you take your snapshot You're gonna have to scan the proc array to find all of the active numbers But then you just keep using that list every time you look at a row you match it against your proc array list of active transactions Other questions Yeah, I'll take this one. Yeah, so the The question is yeah So when you have repeatable reader serializable this disclaimer in the docs that say you might need to retry and That's kind of beyond the scope of what I wanted to cover I'm afraid but it relates to Mostly for serializable and write transactions So write transactions and I'm giving you reporting which is a read-only so it doesn't matter there But in write transactions you potentially can create Sort of loops where one person is trying to modify the other and and you can't You can't it can't be serialized So my definition is serializable every transaction has to be Pretend as though it's done in a serial order and if there's ways Where you could do things that aren't impossible to serialize and therefore you'll throw an error and require you to redo those Transactions, but it's only for write transactions Okay One word. Yes, sir Yeah, so this is actually C code right here So this is if you look in the C code when in the yeah, this is the name of the file right here So if you go to the C code, we're running this C code effectively for every row That's right. Yeah, good point very good point. Thank you. I'm gonna continue on I want to take more questions But I I'm a little cognizant of time. So I'm having talked about that We're gonna actually show you this in action because I've sort of you know, given you the big cloud idea But let's give you exactly what's going on to do that. I'm gonna have to create a table a Little called a demo table and I'm going to create an extension as part of page inspect so I can show you what's going on inside of Postgres and Here it is In fact, if you don't believe me you can go to this website Download that SQL file and run it on any version of Postgres and it'll look exactly like this presentation In fact, I wrote the queries and then I wrote the presentation by copying the output into the presentation so you you write SQL you run it and then you take 2000 launch and you copy it into the presentation then you add headings and boom you've got a presentation So the everywhere that everything you're seeing here is literally exactly what you would see in output Okay, I didn't have to construct this well, I colored it to make it easier In fact, most of the stuff's in red so you can kind of like look to the red part So here we're going to delete the table Then we're going to insert one row and we're going to query Two fields you haven't seen before X min and X max X min is the creation I showed you earlier X max is the expire It's very similar to what we had in the in the earlier slides And what's interesting in every table has this you may not see it But if you ask for it the star doesn't expand it, but if you ask for it explicitly you do see it And you can see here that this trend this row right here I know now that this row was transaction fifty four or nine Because that's the creation ID on that row and also there is new expire on that row so boom I can see it right and that's really simple But that's kind of the groundwork of where we're going okay, and it's actually this slide right here And in fact, I'm not going to move because the camera okay, so What we're doing here is this is this is an example right here. We are creating a row. There's no expire That's the example right there. Okay, let's do another one with delete. We're going to remove all the rows We're going to insert the row We're going to then look at the insert here. It is fifty four eleven new transaction ID Then we're going to get a little tricky. Just follow along. We're going to create a transact a multi statement transaction We're going to delete From the table inside the multi-statement transaction and If you look inside the multi-statement transaction the delete is valid And this is where we start to get though, you know the the postmodern idea where people do see different things Indented I have run another session So if anytime you see something in then it means it's another session and what's cool is in the other session I still see the row and look at that. I now see an X max but the reason I still see that row is because 5412 is in my snapshot as Running when I took the snapshot if you follow that you're doing really well Okay, but I'll say it again. The reason I could see that row is because 5412 Was in that list right there and when I ran it through here. It said I know it's gone expired on it But hey that transaction means nothing to me. I see that row okay, and In fact, which is really cool. I go back out to the large transaction. What is my current transaction ID? Puh 5412 bingo. It's not what it's not a coincidence. They match Right and then when I commit obviously nobody I can't see it anymore This is the case right here valid creation Valid expire and then it goes away. I will get to that Updates again, this is remember we did insert first We did delete now we're gonna do update because updates combination insert and delete right so we delete the rows We put one in we have a row. We begin multi-statement transaction. We change one to two and Here we go the old the the multi-statement transaction sees the two my separate transaction sees the one and I will put dollars to donuts that this multi-statement treasure is this transaction indeed is 5414 Because notice I did the update there She was that 5414 that's kind of no no no here. I'm sorry This is yeah, here it is. This is 5413 right here Okay, so this is my original. This is the this is the transaction that deleted it because this is the block This is the new row. This is the old row. So this is exactly what you'd expect to see Again, there's no magic here. There's not a special postgres binary. This is stock postgres And it's this example right here. We've deleted Expired a row and we create a new one Okay, and you can see how somebody reading from the database could see the old data here Even though somebody else is doing updates. Yep You know, I'm sorry, I can't get into that. Yeah, I'm I'm feeling the time. Sorry about that Okay, let's take a look at a board of transactions. Somebody already asked about this right so we delete I'll be I can answer you after the class. So we delete the row. We insert a new row We begin a transaction. We delete everything from the table and we do a rollback Okay, and here we have here's our transaction. Here's our query notice. We've got an Expire here. Okay, but the reason this does isn't valid is because we have marked that particular transaction number as a boarded So one of the processes we go through through that heuristic of snapshots is When we see a number here We're not only we're gonna look at our snapshot to see whether it was running We also have to identify whether it was committed or aborted in this case the delete was aborted So therefore that number is ignored Okay We also use And I'm not sure I'm gonna have a lot of time to get into this But we use the X max field also for roll-offs. So for example when I do a for update On a particular table. I actually get a special number here, which is not Technically an expire. It actually is a special bit that I'll show you later So so if you see an X max you cannot always assume That is a delete it might be a roll-off because this is the way we do roll-offs in in postgres Exclusive roll-offs. I'm sorry Okay multi-segment transactions. I'm gonna zoom through this And just give you sort of a general idea of what's going on Effectively there is a third field that you had never seen before And it's called semen And effectively the way semen works is that when you insert multiple rows or update multiple rows or delete multiple rows Inside a multi-statement transaction. We have to identify Which statement in the multi-statement transaction did the insert or the delete of the update like which was it the third Query was it the seventh query was it the 500th query in this transaction block So effectively what we do when we do these things as we have we use the semen field to kind of track Which query in the block? updated it for various reasons Same thing with delete. Here's an example of delete where we've got Individual numbers here indicating which of the queries did the insert Okay, same thing with update Same kind of thing here are the original rows here are all the new rows. These were done with three separate queries inside a transaction block this was done with Technically the fourth or number three Okay, the number three Query inside the transaction block So the first one did this one the second one did which one the third one did this one the fourth one did the update Okay Let's see so you can also do other cool things like I can insert Into a table and get a number and then I can start another transaction block and add new numbers So here we have a mix of X IDs in a table And when I but when I do the update what happens is I actually they all get new transaction IDs And the old row stays around This is this is again. You might need to study this later But what has happened is that I see the row that I inserted originally Outside the transaction block, but I can't see the new rows That I can't see this activity outside the transaction block because it hasn't completed yet That transaction block hasn't completed So when I go to run this the only thing I can see is this this one that I did out Before because my snapshot won't let me see it. Okay. I will also apologize I tend to go backward a lot in this presentation because I'm trying to link together parts You've seen with it's like a like a like a wave, you know We keep sort of going backward to kind of get you the frame so you can kind of see as we go We're showing you something and then we're giving you more detail more detail and so forth Combo command ID. I'm not going to go into this. This is again related to how semen behaves But effectively there is a special field that you can see from this Particular output which actually tells me if the row was insert a modified twice mod was created and expired in the same transaction block and this Involve something called command combo command ID because we can't put two numbers in one field, right? So effectively have that capability These numbers are pretty big X min Four bytes X max four bytes semen C max also four bytes So you can have four billion transactions inside the transaction block In postgres without a problem okay So let me take a question or two because before we get to clean up Yes, sir. Okay, so the question is is a good one Somebody earlier today recommended not to use X-min or X max in applications because effectively it's an internal Aspect the postgres the X-min counter can wrap around Your example was that sometimes you want to use it if you want to make sure a row hasn't changed from the last time you read it Yeah, you can do that we would normally recommend something like a share lock on the row Because if you do a share lock you're guaranteeing that that row can't change between Between when you look at it now and when you look at it later whether X-min is a great idea for that I don't think so part of the problem is That even if you read the X-min, okay? you don't know You know what the X-min was at the time you read it But that only you can't do anything with that data because it could change right after you read it See I'm saying you read the X-min and somebody goes I'm gonna change it and then what do you do with that information? It's it's almost useless because you can't update the row because you can't know That somebody didn't come between your read and your and your update to in the middle there That's kind of I don't want to get into that but I can talk to value later other questions Now they've been dirty So the question is if you have deleted something and then You delete a row but the transaction got aborted then you've got this X max with a non-zero value, okay? so effectively what's going to happen is that when you read the row and When we when we read the row and then we check here and we find out it's been aborted There's a special bit on the row which says it's delete. It's like I think it's called X max Abort or something. I can't remember the name of it So it was a special bit on the road that we set to say, you know, there's an X max here But it's not really valid. Okay, and they're called hint bits. It's technically what they're called and Effectively, it'll just carry that hint bit until we do a vacuum Freeze which would allow the which allows to remove This C log because over time this gets really big so every 200 million transaction or something We're gonna do a vacuum freeze. We're gonna clean up that hint bit set it to zero again And then we're gonna delete this that's usually the way the cleanup works other questions Yeah, is there any optimization for 10 tables? So effectively 10 tables exist local to Your session, but they they behave exactly the same I don't know of any Optimization I think the code is pretty much the same. I suppose we could do some optimizations, but I I'm not sure it would be a measurable value to it. Yeah So let's go let's go to clean up. We can we can have some questions at the end here Because there is there is some there is some issues here And originally Postgres guy kind of beaten up for NBCC Because it was considered to be slow and use of space. That's not too much of an issue anymore But let's talk about cleanup requirements in general In general every relational database is going to have to do some level cleanup just the nature of the beast Typically you're going to need clean up for deleted rows. So if somebody Deletes a whole bunch of rows like we got to clean them up somehow at some point Okay, whether we have NBCC or not you deleted some rows. We want to reuse that space secondly, if you insert a bunch of rows, but the transaction aborts Yeah, again, you've got a bunch of rows in there and and they're dead. Okay, they're nobody can see them They need to be cleaned up. Okay NBCC adds additional cleanup requirements Particularly if you do an update then You're going to create a new version of the row and the old version eventually has to be removed But the old version can only be removed Once no one can see it Right you can't delete the row if somebody might if it's part of somebody snapshot So those rows have to remain around for the duration of snapshot and this is one of the problems with long-running transactions So long-running transactions technically will delay clean up because We can't get rid of the old rows because potentially somebody might want to do a select and see them Okay, and that adds a little more complexity to the cleanup requirement than just removing them But obviously Postgres has been around a long time. It's been optimized pretty well So fortunately Postgres has two types of cleanups. One is an on-demand cleanup of a single page during row access I'll show you an example of that and secondly, there's a bulk operation called vacuum Which either runs manually or in the background And that again cleans up not only the rows, but the index entries as well and I'll show you some examples there Okay, so as much to clean up. There's three things we can clean up. We can clean up the data Particularly technically the heap tuples we call them and that's actually the largest part Yeah, I got right there The second part is the item pointers and I'll show you what they look like And then third index entries. I'll show you example of that. So this is what a Postgres page looks like Effectively I have three rows. We call these tuples So tuple row. It's kind of the same thing So we've got three rows here and we've got three item pointers which point to those rows So there's a little bit of interaction here Okay The reason for this item pointer thing is that the index is point to these item pointers I'll show you that in a minute. So effectively we can move stuff around because the item pointers are gonna remain in place Okay, so anything external is always going to read the item pointer first and then go to the data All right. In fact, this is a good example. I'm sorry. I didn't show it to you earlier So the reason the item pointers are here is that effectively it allows us to reshuffle The 8k page and not touch the indexes at all because we know exactly what those slots are all right What happens when we recycle so for example Let's suppose I got rid I get rid of the two versions of the row I can actually mark these item pointers is dead and I've gotten I've freed up quite a bit of space down here Because the rows are the big part that important really smart. I think there's 16 bits or something tiny So effectively what we've done is we've gotten rid of two versions of the row and we've marked the item pointers is dead But we still have an active one over there. Okay, so this is kind of cool We can actually clean up the table without touching the index at all. Okay, and eventually once you run vacuum We can then mark those item pointers as unused and then reuse the item pointers Okay, now you might think I'm making this all up. So I got to show it to you So here's an example. We're gonna create take our table. We're gonna truncate it Truncates a little different than delete. It actually zeros the file a zero length Then we we're gonna fill the table till it's less than 10% free Okay, and then we're gonna compute the free space is 6% so the table is kind of full You know, it's kind of like had a big meal, right? Then we're gonna insert a row into there And we're gonna take a look at the function we defined way up at the beginning Which effectively shows us the actual row. So we're gonna look at that row. Here's the the item pointer It's a normal one. Here's my x min. It's an insert. Here's my x max at zero and here's like the pointer of the chain Now I'm gonna take the one and I'm gonna delete it and I'm gonna put a two in I know it's kind of awkward, but that's the easiest way to do it I'm basically taking the one of deleting this row and I'm gonna put a two in there Okay, and if I now look inside the page, this is a special function from page inspect allows me to look inside the page I actually see the two rows here. Okay, well look at this This row has the x max expired like we'd expect because we did a delete, right? So that row is not visible to a query, but it's visible to my special Inspector function and then we have this normal right here. Okay Now I'm gonna delete the row again. I'm gonna change the two to a three Okay, this is where it gets interesting The one which was this row now is dead We have actually recycled that row as part of the insert operation Because the page was already kind of full the systems like oh that page is feeling kind of full I know you want to do an insert, but let me look and see if I can get some space while I while I'm here And it cleans it up right away. Really cool. Okay Here's version. Here's the row that says to this is actually deleted and here's the version says three, okay, and If I then do a select again just to select I didn't change anything Take what happens item number is dead Now number two is now recycled the data for the item to the item pointer is still there But the data is gone and then we have item the three is right there. Okay, and that's effectively exactly what this slide is We have deleted two and three one and two and we've marked the item pointers is dead Here's another one We actually look at the free space mat. Let's see deleted. So we're gonna do a delete here We do a vacuum and When we do a vacuum what happens is the item pointers are cleaned up Because remember vacuum is a batch operation. So we've now cleaned up the item pointers. These were dead Now they are unused and in fact, this is exactly what that slide looks like The interesting thing is there was no bulk vacuum operation going on It was just somebody looking at the table It didn't have to be the same person did the delete or the insert or anything. It's like, hey I'm looking at the table while I'm here. Let me just reshuffle some stuff a little bit and this is so Low overhead that you know where notice it and it does not slow anything down. So you're like, why not right? Free space map free space map is something used in Postgres to figure out where Things can be added effectively So effectively what we've got here is a free space map telling it where it thinks there's some free space If I try to get the table when I vacuum it says there's no free space But if I insert some data There's there's quite a bit of free space there as I insert more free space goes down And if I do a delete in a vacuum the free space goes back up again So the free space map is always telling us where my free space map free space is This is kind of cool vacuum operation if The rows at the end of the pay table are empty Postgres will F truncate the table. So if you've got three empty 8k pages, we'll just go 24k give it back to the operating system. That's what we're showing right here We insert we delete the one we run a vacuum and what actually happens is this actual Size is zero because there was nothing left in the table and it just went to it's gone But that will also work if if there's some data, but the page at the end or empty Vacuum full actually rewrites the table. We're not going to cover that too much Let me just show you I'm sorry. Yeah Okay, we'll hold a lot. Okay. Let's take a look a little bit toward the end here at a single page optimizations so Here's an example of the initial state We're gonna look at the chain here I know we've only got a couple minutes left, but effectively the way the system works is that when you insert a row That does not change any indexed fields. That's the that's the key This is what's called a hot update. Okay, we actually reuse the index and we point the old row with the new row So we can basically cycle through as we go from version two to version three We kind of loop them around and we create a chain that are relying on the same index entry What's really interesting kind of just goes back and forth like that and then eventually it can recycle all the rows What's really cool is we don't have four row point We don't have four index pointers. We have one index pointer. So this greatly reduces the amount of index blow Okay Here's some examples Here I'm doing an update and again, you're probably gonna have to look at these slides online as we go Here's an example of an update. This is the old version It gets the redirect pointer to two and two actually points to three as you can see right here And that's exactly what you're seeing in this slide a same thing when The first row happens when we do an update again This pointer from the index now points here and this one now points to this one Which is exactly the kind of crazy thing you see there Okay, and then finally when if we do just do a select you can see that it's actually been able to mark This is unused because nothing points to it and we're able to recycle that item pointer right away This is what we have which we're called hot updates Um Similarly And if you do a vacuum it just keeps it the same it doesn't it doesn't clean that up Okay, if you run the vacuum command then effectively What happens is it will recycle everything and then get rid of all your data Which this is kind of the standard thing the problem with hot updates and something we can talk about is That if you've got if you change index columns during the update Then you're gonna have you're not gonna be able to use this chaining technology because we can't have item pointers pointing To the chain because the index values chain change inside of the chain We are working on something called warm updates for postgres 10. I'll talk about that tomorrow I have a key a panel discussion about that But effectively what happens and just to give you a graphic When you have created indexes Okay, where is it when you've created? Let me word I create an index here Yeah, when you create an index on that table all of a sudden you don't get that cleanup anymore see this Your pot you can get rid of the data But the item pointers have to stay because you have to run vacuum to clean it up because effectively You you have when you have indexes pointing to these item pointers You just can't recycle them because they're still being pointed to by indexes Okay, so that is pretty much all we have Again limited ability to ask questions because it's 90 slides But I wanted to give you kind of this this sort of view of what's happening inside of postgres a view of what happening during Transactions a view of what these snapshots are what what what transaction isolation level is how that how the how the system works how we give you consistent snapshots and also how the cleanup works because whether it's a single page vacuum single page cleanup or a vacuum Postgres is always kind of working behind the scenes to minimize the overhead of The MVCC system because we do feel that it gives developers a lot of flexibility To not have to worry about a lot of problems that potentially can happen in a multi user environment So thank you very much, and I'll be here