 It worked, hey, it worked, sweet. OK, so now I have the library loaded. And now it will start collecting information on, at least, I haven't really hit the real-world use case while I need this. But we can look and see what's inside there of stuff, just for no particular reason. Maybe I kind of like this extension. It was added in 14. This is a brand new extension. You don't get those very often. And I just, I don't know, if this one's fun. Hopefully, you actually don't ever need to use this. But if you do, it is actually pretty handy. So this new extension, PG, I would say, a corrupted page in a table. This is probably the tool that you would work towards. So as an example here, I'm just going to pull up the first row in this actor table. And I don't want to get too deep into the weeds on how internals of PostGust works. But you can, two main pieces to get out there. So an X-Men, which is like, what was the, let's call it the transaction ID of when that data got loaded. And there's a CTID, which is like, where's the physical location of that data on disk? And so this is the first row on that page. So it's like zero one is the first one. I guess I'll, I can show you two of these. So zero two would be the second one. And as things get added and deleted and modified, the CTIDs change all around, depending on how big the table is and where the data goes. The data was all loaded in a copy statement. So the X-Men is the same transaction and put the data in there. Now what we can do with PG surgery, we get this first function here, which is heap force freeze. And this allows you to freeze a database. So if you've ever heard the term vacuum freeze, right? Or had to deal with any of the like freezing settings that are in PostGraphs. As you're doing vacuums on a table, right? You're trying to sort of mark pages as reusable or potentially save some space or reclaim some space. And what vacuum typically is doing is often you're freezing those tuples to say, yes, this is visible to everybody in the system. And you can do that with vacuum, which is the normal way you do it. In this case, now we can actually freeze a specific tuple in the system. So we can freeze specific rows. And I'm passing in the table that I'm putting in and then an array of the CTIDs, so array of the physical locations on disk in that table that I want to be able to freeze. That means it worked, I think. We're gonna find out. Yep, so there's a secret transaction ID, quote unquote, called, or which is number two, which means this page is frozen, which means all queries, all transactions, the system can see it. So now you can see, still in the same physical location, I haven't moved the data or anything like that. It stayed where it was, but I've now indicated the system, this data is actually frozen, everybody can see it. And just to confirm, right, it's still the first row, actor ID number one. So that's heap force freeze. Now we have the more interesting and much more dangerous. Here's a good way if you want to screw somebody not in a good way, this is the function for you. It's called heap force kill. Copy paste is hard. And what this is gonna do is we're gonna actually mark the data on that, like at that location on that page as bad data. And I'll show you how this works first, then I'll do a little more explaining, right? So that ran, and again, it's the same I'm passing in the table name in an array of physical locations. In this case, I'm just doing the first row. Now if I say, give me the first row from this table, you'll notice that I see number two. Because when Postgres goes to pull that data, it looks at that first row and we've killed that row, which means we've marked that as like, that is not good data, like that is no longer valid. So it skips past that and shows me the second row. If I pull the whole data, you'll see. So now I'm doing where actor ID equals one. So remember that first row was Penelope Guinness, if I recall correctly. Yep, and now I get nothing back. So there is no more actor ID equals one. Now you would say, what is the purpose of this? It is not just because you're mad at your employer and you're quitting and you want to screw with all your data, that's not the only reason. So the real reason that you would want this, and I've definitely run this before, and it's always been a real pain to work around, if you start to get corrupted data in your tables, usually I'm gonna blame the file system or memory or something, because it's never Postgres, but if you have corrupted data in there and you'll see messages where like, usually it's when you're trying to do like a vacuum or maybe like a larger sequential scan and you see something about error, cannot access data in whatever or something along those lines. You'll see something that looks really bad and indicates you have table corruption. The problem is there's really no way to get around that. There may be other good data on that page that you could get if you could work around that. Because there's a row of bad data in there somewhere, you have no way to do it. So the old way to get around this, I think I could have gotten around this. So one would be go open the file directly in the file system with a hex editor and do some surgery, aka PG surgery on that file, right, and you could go mark those pieces of data bad manually. But obviously when I say open hex editor on the file system in your database, like you all know that that's not the right answer, even though that might have been the only answer. So this sort of gives you a user space way to deal with this kind of a problem and be able to do some of the same things you would have had to have done, you know, going in with like, doing it directly on the file. So that's particularly useful, yes sir. Probably the main use case that I can think, I haven't really had to do that one, but I've been in cases and it's usually like a wraparound situation where there are old tuples in the system that I know may be a portion of the table, you know, that was like older data and I don't need to really vacuum the whole table, you know, if I had a way to do that. So this would give you that way. I wanna show that again, that's an extension that's included with Pressgres, you know, in the primary package, you should have it on any platform you're in. I would have to look, I would not be surprised if cloud providers did not actually give you access to this particular extension, but I haven't checked. So again, this isn't exactly a homework thing, but it depends on what you like to do in your spare time. If you have a cloud provider right now and you wanna go look today and see if it's in there on their version 14s, you could probably do all kinds of crazy tricks in those databases and then they would be like, wow, we probably shouldn't let people have access to that. If anyone here works for a cloud provider and knows they don't allow that extension, I'd be interested to know it, but I see nobody volunteering that info, so I think it's time to go mess with some people's databases, maybe. Again, I hope you really don't ever have to use that, but it's kind of good to know there is some options there and those are way better. I remember one of my earliest corrupted data situations, we used to have a setting, I don't remember what the heck it was called, it was like zero damage pages. I don't know, is that still in there? Yeah, okay. So what you used to be able to do, you would have to kind of write off the whole page, so however many rows are in there, you really didn't know, and what I would do is like you could select all, so you'd figure out like, oh, I've got a damage like on CTID zero, it says zero two is bad, right? Well, I could get select all from actor where the CTID is zero one. I could get that first row and then zero out the rest of the page and then get anything that was after that, right? So that was one way you might have done this way in the past, but you were sacrificing any of the rows that were on that page and those might have been good. So this really allows you to like pinpoint, you know, that was like the leeches method and this is the surgery method, so certainly a better option if you need it. And again, hopefully you won't, but I really think it's interesting. It gets a little bit under, like what's underneath is going on with PostgreSQL, so it's an interesting extension to have. All right, I'll just work my way up here. Oh, PSQL tricks. How many people here do not use PSQL for managing their PostgreSQL? Nobody? Really? No, that's fantastic. Well then I will show you these PSQL tricks because apparently everybody will use them. I feel, I mean, yeah, I don't have a GUI to use, so like, but I could skip it. Like if everyone was like, no, we all use PostgreSQL or whatever, like, oh, okay, well, let's not worry about that. What are we doing on this? DfSum, oh right, okay. So here I'm showing, backslash df is to list all the functions that are, so here I'm just picking sum, which is the built-in function, right, to add two numbers together. And I can now do a little better filtering of that. So when you do dfSum, we'll run that out, I don't know how many there are. So there are eight different versions of sum, depending on data types, you can now do a little bit better. So if I do dfSum small int, it will just show me the smaller one. So if you're trying to figure out, like, for a specific data type, you know, how do I make that work? That also should work for operators that don't know if I have an example in here for that, but that has been changed for functions and operators in Postgres 14. Let's do dt, I believe the capital T is types. Yes, and so I'm doing the same thing here, where, oh no, this is slightly different, sorry. So this shows types, so the change in, again, this is a Postgres 14 in psql. When you search for, I guess these are like aliases or like sort of shorthand version of the name of the data type, like this will pop up. So I can search for int, and intature will pop up. I think maybe big int is another one that shows up. There's not actually that many that are that way, but there are a few in the system, and it just makes it a little bit smarter for that kind of thing. It's, I believe it is the standard sort of regexy thing that psql uses. I'm gonna create an extended statistic called WhatWare on the address table in my database, and then I'm gonna analyze it, and the specifics here are not terribly important, but what I wanted to show you was there is now a d capital X to show extended statistics. So if you have created these in your database, and these are generally used for like, if you want to create statistics on like correlation of different columns within a table, so like if I were doing a query on address where city ID equals blah, and postal code equals blah, blah, there's no way for Postgres to correlate those two columns together and realize like, oh, there's maybe some relevance between those two types of data. So I could create statistics that will say, yeah, go look at these two columns and realize what the relationship is, right, which could help queries that I'm running on that. So if you're doing that kind of a thing within Postgres, and there's actually, again, you could go look up pre-synthesis, there's a bunch of different options for it. If you go and create those, previously we didn't have a really good way to show that, this is now a simple way to show that. Don't confuse it with the X lower case, which is to show extensions, this is the X capital. Film, so here we are getting, so we've added into the di plus and dt plus output, we've added this column, which is access method. And here, in particular, pick this table because we have different access methods, we have different index types on this table. So you can see it shows b tree, b tree, and we have a gist index on there, we have another b tree. Previously, this information, I don't want, it wasn't hidden, but it wasn't in the backless di plus output. So we've added it in there, in case you're trying to find specific types of indexes. We've also added this now for tables as well, because tables can have different access methods as well. So by and large, I think all you will generally see is heap for most systems as the access method for the table, but I don't know, maybe if you're using Citus, might show you something different. So that's another sort of new thing that has been added in to PSQL. So a few little tricks to show you PSQL, still getting better, I don't know why, apparently it's the best interface ever. I actually do believe that. All right, vacuum, let's talk about vacuum. How many people here have ever been upset with vacuum and post-graph? Okay, I know you're all lying. If you didn't raise your hand, you're lying. All right, so the first thing I'll show, there's a new field that's been added in the vacuum, called process toast, and what this basically does is it tells vacuum whether or not to look at the toast table, it's not the given table when it's vacuuming it. So it wasn't much to see, but it's one of those like, I'm gonna show you, it doesn't do it, because if you look, it doesn't say toast anywhere in here. Let me set that to true and see if we actually get different output. Probably should have done that the other way around. Oh, hey, look at that, okay, so there, right? So normally when you vacuum, it does look at the toast tables, it will vacuum them, but if you pass this process toast as false, then it will not, and you would say, is that actually useful? It's definitely useful. I'll talk about that in one second. So I actually think this is the, like the vacuum changes in 14 or why this is like the best release ever. But I wanna show you two new settings real quick. So there's a vacuum failsafe age and vacuum multi-exact failsafe age, and here's why this is super important. So one of the big problems now that people have is vacuum wraparound. If you're doing high transaction load, or even if you're just unlucky and you have like really large tables, especially if those large tables have really large indexes, right? It can take, I don't know, days sometimes to vacuum a given table and you really can run into danger. We still hear reports of people that are running in this problem, but it's definitely less than it used to be. But I think, I got out on a limb, that from the vacuum perspective of just the regular vacuum age and transactional load and all that, I think 14 kind of does away with that problem. And the reason it does that is because of these failsafe age parameters. Now, I can't not just say, if you're having a multi-transact one, and the difference is, so failsafe is generally regarding like just regular transactional workload, right? You're inserting, updating all that. You have a strong workload like that where deletes are happening and all that. So strong transactional updates like that is failsafe. Multi-transact is more about if you have nested transactions in stored procedures and that kind of a thing. So there's a different sort of counter that you can run into problems with. And I'm not 100% sure we've gotten rid of that problem because it's really, if you have like the right code where you're just like doing some crazy looping inside of a stored procedure, it's pretty easy to burn yourself out of those multi-transact slots that are in there and do it that way. I don't find a lot of people do that, but I think that's mostly because in Postgres we just don't seem to put that level of logic inside of our functions. I think it's more sign of the times really than necessarily a Postgres thing. But I will say like I've seen folks that are moving from like Oracle into Postgres or even SQL Server into Postgres where they're coming with these large batches of stored procedures and a lot of those have looping logic and that kind of stuff. So that's where you start to run into multi-transact stuff a lot of times. There are other cases as well, but okay. So getting back to why is this so significant? Both the process toast, well more so this other failsafe age for turning off index cleanup. There was a way introduced in 13 that you could run a vacuum on a table and not actually vacuum the indexes. And that was super useful because in most cases if you are regularly vacuuming your tables and you hit that transaction wraparound problem, you start seeing those messages and you have to vacuum the whole thing and it has to scan the indexes and that can take a whole bunch of loops. But what we've usually found is out in the field, most people, the majority of the table may be vacuumed already and may be frozen. So all the data, especially the older table is all the data has been frozen except for very recent data. But the problem is vacuum has to scan everything and it has to go look at those indexes by the in the normal vacuum process. So the fact that we can now turn off indexes and we can also turn off toast data which often in cases like toast data is not changing nearly as much as like the direct, if you have like a last updated column and there's some toast data in there, maybe getting modified, but it usually doesn't seem to have the same level of issue that the main heat table has. Auto vacuum will now turn that stuff off by default, right? When you get into the wraparound situation. So it realizes like, hey, I'm in a wraparound situation. Let me ignore the indexes for now. I don't need to do cleanup. I really just need to worry about transaction age, right? And so if you knew that that was in 13 and you ran into that problem, you could go say, you know, I'm gonna vacuum index cleanup off in 13 and run it yourself manually and then it would just look at the heap. And in most cases that's much faster because again, a lot of those rows are probably frozen and that's already marked, you know, inside the internal vacuum information. So it can just focus on a really small piece of the table in the heap. It doesn't have to deal with anything else if you knew to turn it off. And the beauty of 14 to me is that that is now automatic. So if you don't know any of the stuff and you're like, I don't know what you just said for the last five minutes, right? So you probably are not the one who's gonna, or maybe you're gonna learn this the hard way, which is like when you're getting transaction wraparound messages and you're like scouring through docs, I don't figure this out on 13. And 14, like that problem just goes away. I think in a lot of cases, you know, when we like most of us have been doing this for a long time, like we see those blog posts where the company's like, I ran into this problem, my database down for three days, whatever. And you're just like, well, you know, if you were paying attention, like that would have been easy to fix. But you had to actually like know the stuff that makes it easy, right? Like it's easy when you know how, not so easy when you don't know how. And this, I think for like 90 plus percent of cases will make it so you don't have to know how anymore because auto vacuum is gonna do it for you. So I'm really excited about that. We don't think of Postgres as like long-term releases. Like they do, you know, a lot of operating systems have like, we're gonna have this release and, you know. So to me, like Postgres 11 was the last, like everyone should try to get to 11. When that thing came out, there were a few features in there that I thought like, these are critical features that everybody really should have. And that's a good baseline. 12 and 13 is great releases. I like putting a lot of stuff on them. But 14 to me is like, if there was a long-term release, 14 would be it just because of this auto vacuum thing. Because it means that like, again, 90 plus percent of people will never have to worry about vacuum wraparound. And that to me is like one of the biggest scary points when people talk about going into Postgres. I was like, oh, I'm worried because I hear these stories about wraparound, you know. And again, it's usually somebody that's working on like 96 that's been in business, you know, for like 10 years running that thing. And you're like, okay, well, this now makes that go away. So, super excited about that. Yes, so if you imagine like, let's say you had a terabyte size table and it had like 10 indexes that were, I don't know, half a terabyte in size. So normally it would have to scan at least six terabytes of data to have vacuum run. But if you can turn off the index load up, now you're down to one terabyte instead of six. So that's a massive speedup. And even better is that Postgres stores the freeze information in the table. So even that one terabyte, if you've ever run vacuums on it before, it will know it doesn't have to look at most of that data. It only has to look at what's been recently changed. Right, so that shrinks it down even further. So instead of again doing like six terabytes, and that's assuming that you only had one pass over the indexes and a lot of times you have to do three or four passes. So you're talking maybe 20 terabytes of scanning, like no matter how fast your file system is, 20 terabytes versus like maybe 100 gigabytes of data that has changed recently and just the heap. Like that's the difference. And it's really just, it will get you out of the wraparound situation. Now you still would want auto vacuum to go back and run on that table again because it will probably wanna do index cleanup at some point, right? But for the wraparound purpose to get out of that situation, it'll automatically switch to that mode, run it on that and then it will go back and run it separately once you're at a danger on wraparound. So yeah, to me I was extremely happy to see that get in. All right, getting low on time. I jabbered too much. I'm gonna blow through a few of these pretty quick. There's some new utility functions, PG wall replay pause. So this is if you're doing hot standbys and you have wall going on and being right over there. And I get this as an error because I'm not on a standby, I'm on the main one. But there's actually a number of different functions for managing that wall replay that have been added into Postgres 14. So if that's the kind of topology that you run, I would definitely say check those out. And that just kind of allows you to pause and unpause, right? And kind of step your way through just to see where things are going. Oh yeah, let's do this one because there are people who are like hackers in the room and then I get to complain about the work that they do. So there's a new config parameter called enable memoize. And how many of you, do you know the like, do you wanna fight 1,000 duck-sized horses or one horse-sized duck, right? Yeah, okay. How many people are like, I wanna fight the horse-sized duck? No, actually you wanna fight the horse-sized duck, just FYI. If that ever becomes an issue, you wanna fight the horse-sized duck not the 1,000 duck-sized horses. That might be the most important tip you get today. So be aware. I have a little demo here that kind of shows this. So I'm gonna create two tables, one named duck, one named horse. And I'm gonna create an index. New keyboard, still bringing it in. I'm gonna create an index on horse on the fight column and let's do just so you can see it. So I now have these two tables, one named duck, one named horse, right? And we have an index on this one. And I'm going to set this off. So by default, this is on. If you've ever done something like enable sequence scan or enable bitmap scans, like one of those types of parameters also, there's a typo. Okay, so set memoize off. And I'm gonna show you the explain. If I put these two tables together. So the basic idea here is I have one table that is really big and the other one is really small when it comes to correlation of data. If you look at this, so what I'm doing is I'm putting 100,000 rows in each table, but in one of them, I've only got like 10 different options and then the other one I have 10,000 options. So small sample size, not size of data, but small statistical difference on the one side and a lot of different options on the other side. So if I do an explain on this, this is all pretty, right, it's small still because it's only 100,000 rows on each side. I'm getting a sequential scan that does a little hash between the two, joins it all up. Okay, that's great. I have that a second time. Good enough. Oh, that could be, that would make sense. Yeah. So we'll do, I'm gonna set it back to default, which is on. So now it's on. And I was, I think I was probably watching Avatar when I did this. Thanks, yeah, okay. So now what it's doing is, now you can see, this is what, so this is, and this is how it comes, right? So the 14, you would see this. In 13, you would see the previous plan. In 14, you would get this plan. Or if you turn this off, you get this plan. So now we can do an index only scan on that index that I had on the table, which is probably what you would think you would want to do. And then it memorizes that data and I do, I still have a sequential scan on duck. And then it pulls it together and that's the loop. So you get a different plan. The specifics of that are probably not actually all that important other than to one, realize that a faster way to do queries like this, where you have a small number on one side and a small statistical correlation on one side and a larger one on the other side, which tends to happen a lot. So if you think of countries and addresses or something like that, it would be a pretty normal one to see. Or like films and rentals, right? Like your set number of films is not going to be as large as the number of rentals that you've had to that. So there'll be a lot of different cases. And I just, I really wanted to highlight this one because it allows me to complain about the fact that Postgres does not have hints. Because obviously Postgres doesn't need hints. It needs these weird parameters that are hard to use so that if you actually need the behavior to be different in your explain plan, it's really cumbersome to make use of it. And so thank you for that. Anyone who has worked on this? Yes, sir? Yeah. Yeah, so exactly that, right? So like again, since I only have 10 different versions, like it understands, as I'm scanning that table, or I'm scanning the index even, so it's even better, right? I'm just going on the index. Like kind of keep track of those, you know, and as long as that number is small enough, like we only need to pull 10 of those rows into memory. So it's pretty, pretty easy. So, but don't lose fire to the fact that we're going to all complain that we don't have query hints. So this one, this is what he's talking about, right? And it only needed five kilobytes for memory. Way lower, yeah, on memory usage versus driving the whole table. So I mean, don't get me wrong, like the performance optimization is actually really nice. So I know I may be lost sight of that in my desire to rail against the powers to be, but that is actually a nice performance benefit. And as a reminder, right? There are, there's actually a lot of stuff that's like in a demo's edition. So there's improvements with like sorting and other stuff that are in Postgres 14. So I think anyone like you just install the thing, your performance is going to be better. You know, I mean, I guess the usual expectation, but there are some specific things in 14 that I feel really strong that it will help you on the performance side. So that and the auto vacuum stuff in, you know, like PG surgery and the query ID is like, there's a lot of great stuff in here. Again, I actually have a very long table of contents that we did not get to. So I will post this on the line. I'll just run through, let's see anything I want to. Oh yeah, like PG locks wait time. So you ever wonder like how long has my lock been waiting? Sometimes you can derive that from PG stat activity, but it's not always obvious. And sometimes it's a little trickier than you think. So like that's in there. There's just, there's a lot of good stuff in this. More stuff for if you're working with sort of procedures and all that. So if you want to see the whole script and get all the demos, like again, you can add me on Twitter. I will post it at some point either on my blog or my GitHub or something. So, or I can, I don't know what, I can email to you or whatever. So getting the I'm out of time, which I don't think means it's back to the future time. So thank you all for coming. Hopefully you saw a few things that were interesting. Since you're not on 14, maybe this was new. Maybe it was inspiring. Seriously get on Postgres 14. If you don't, then you better be on 15 by tomorrow. So that's it. Thanks everyone. There's gonna be a version four later on. So I have it in the title. It's right. Enable Memo-wise. Double clicking on a line. And you select the whole line in one double click. That it's really convenient. Yes, for that kind of thing. Yeah. And double clicking is the worst, by the way. So that's also convenient. You don't have to select the worst. Yeah, I was doing a lot of double clicking. But yeah, triple clicking would definitely be better. Yeah. So it's really funny. So I just, this is the one. Is this not sure, is this specifically to talk about? No, so this is, I don't know if you can see it here. You can't see it, I think, because I'm, yeah, this thing. Oh, does that, right, like, I always, people were to play it. I was like, I don't think it would bother you. But I usually set this bar to auto hide, right? But when it auto hides, that doesn't go away. Because that's where the camera is. So there's actually an app you can buy that hides the black notch. It paints it with the background, where the notch is. So part of that is all software, that that black notch is there. So you can buy that. And I thought, that's the dumbest app ever. But now I'm actually using it. But it's very weird, because the shell stuff is not actually configured the way that I would normally configure it. So it's almost like a loner or a laptop. So I was like, oh, I need a laptop to do this presentation at the conference. So let me, yeah, so this is when the brand new M2, this is the MacBook Air 13-inch with the M2 chip. So it's actually really nice that I've used it so far. But so it uses ZSH for the default shell, which I've been a holdout and not switched to that yet on any previous Mac. So yeah. I'm using it on my MacBook 2. I've found it really easy to switch to. Yeah, it's not like you're totally foreign territory, but there's a few things. To get back to the start of a line, I used to do function back arrow, would get you to the start of line in the shell. That doesn't work in ZSH. And I probably can configure it, right? But it's like, you got to go figure out how to configure this thing. Pardon? So if you're on the command line and I type a bunch of stuff and I want to get back to the beginning of the line, in Bash, you can do function backspace. I do not have that. I think this, but doesn't it set up a multimodal shell? Oh, right, yeah, yeah, yeah. Yeah, so I don't actually like that, which is weird, because I am a VI user. But on the shell, I've always found that awkward, for some reason. I see. Yeah, so if you want, like I could just email to you if you have an email. Let me, maybe that's the easiest. I think that they are, although because I'm filling in, like we need to do some search around schedule to get this in there. No. Pardon? Oh, yeah, yeah, let me get out of the way here. I am not. I am Robert. Yeah, let me grab my power cord there in a second. What's the, sure. Yeah, up here. Yeah, N-T-A-R-C-N-T-A-O. Oh, two N, you had one. Yeah, and then I'll ask email about that. OK. All right, I will email that to you. I will fix the typos, and then I will email to you. No, so I was the first one, but the speaker who was going to do a talk in here was going to do a Pesco 14 talk, but couldn't make it. So they were like, hey, somebody has a Pesco 14 talk. And I was like, well, I have my demo talk that I can do. So you're Robert? I'm Robert, yeah. Yeah, give me one second. Oh, yeah. There's that. Hello, hello, yeah? A bit louder? I'm not sure. Hello, can you hear me? Could you hear me from the back? Oh, OK, I can't hear myself. Yeah, yeah, yeah. Oh, I'm under-flighting. I flew in from London a couple days ago, so dead lag. I don't know why I woke up at 5 o'clock in the morning. Why I need to sleep in my rest? Yeah, I came in late last night. Where did you come in from? Atlanta. Oh, Atlanta, OK. I didn't think it'd be a part of the transition I've been wanting to do. What's happening? Hello? Can you hear me? Yeah? OK. I was getting the back there. On the back there, I'll try not to be too loud. Hello, hello. Going on. Greetings, everyone. Welcome to the 11 o'clock session in our social track at the Southern California Olympics Expo. We would like to thank our track sponsor Enterprise Database, the EDB Health Organization's get-the-most-out approach. Well, nearly 1,500 customers worldwide have chosen EDB software, services, and sports. Arts is worldwide. Enable us to deploy our global expertise locally. And Mabugala is here to present solving post-critical, wicked problems. Hi, good morning, everyone. Of course, I'm not Alexander. He couldn't be here. Combination of travel, war, vaccines, everything. Hopefully next year he'll be able to travel and present in person. We did give this talk last year. It was at scale, virtually. I will reference that talk, which is a little bit different but has a lot more details. So this is a talk about solving some of the Tony problems that some users hit. And unfortunately, they hit it at the wrong time after they've been very happy with Postgres for a long time. And then suddenly they get into these problems, vacuuming is one of them. So this is a talk about how some of these problems can be solved and a real world implementation on solving these problems for Postgres. So Alexander is a very quickly, Oriole is Baltimore Orioles, the bird. And it's the bird that shows up during springtime. So you're thinking like, oh, this could be a nice springtime for Postgres. Future things to look forward to. And of course, birds are dragons also, right? So very quickly, Alexander, he's a Postgres long time contributor and a contributor. He has made a lot of very interesting improvements to Postgres that everyone uses today. And me, I'm a community advocate person. I've been helping Alexander a little bit, helping him give these talks sometimes in person virtually as a hybrid talk. Many, many years ago when I first was an undergraduate student, the first database I found on a Vax was Ingress. And then I decided I wanted to become a database person. Anyway, so why? What's the basic problem we see? If you go to DB Engines, Postgres beloved, very popular, people love it and all of that. But if you look at that list, there's more than 300 databases, right? And who really needs 300 databases? Most enterprise customers, we see them running anywhere from six to 10 databases. That's pretty bad. But then every month, someone comes up with a new idea and they build yet another database. And just creating a zoo of things. So again, why open source? A lot of users are moving over to Postgres because they found it. They realize it does a lot of things really, really well for what they need. And this code is for Mike Olson. I put him here as the co-creator of Postgres because he was the first one who converted the original list code at Berkeley. That was the Postgres prototype into C. And then it became usable as a actually fast database. So he was also the co-founder of Cloudera. I talked about customers just don't want their data trapped in something that they have to pay for every time they try to use it. And Postgres has now become the default database engine where it's safe for you to keep your data because you can move it around. And you don't need anyone's permission to run it in a new environment. Then Stephen O'Grady, he's a developer analyst. And recently he wrote a blog about there's way too many databases and everyone is starting to migrate back to relational databases. People are falling out of love with no SQL databases. They found out it does certain things well. But then now everyone is trying to add SQL onto this new database that they said did not require SQL. And then people are just deciding to move back to Postgres and relational databases. Because it's been proven they can do a lot of things and they're getting better and better. So this is just one of the original paper about Postgres. We talk about how different storage engines can be used and different types of hardware and all that. I won't go into many details about that. But the idea for OrielDB was that can we go back to the original design principles of Postgres that it was extensible and you can put in different types of models, different types of data types, different types of query types, and do that. Yeah, nowadays we just don't have DBAs anymore. We have data engineers and data platform engineers and we have data scientists. So the world of databases has really, really grown. And people have a lot of tools, right? The problem is that if certain tools only work with certain types of databases, then you end up with, again, a multiplication of all these tools and it becomes a zoo. So what if Postgres could be like a power tool, right? With these adapters where you can do different kinds of things in your environment. And Postgres has the potential of doing that. It's very good and it can even be better. So what can be done? So these are some of the things that have been done in the cloud. Aurora is the most popular example where the Amazon team just took the storage layer of Postgres and replaced it with something completely their own proprietary design for their environment. This does improve a lot of things in Postgres, but it's not open source. So it has no effect on the community addition, right? It's just something that's locked away only available in Amazon. Azure went slightly a different way. They acquired Citus and they decided to build Citus into their service and about a month ago, Citus announced that everything was gonna be open source, not just the extension itself, but all the more proprietary enterprise features that they had that decide to, you know, all the open source. But again, that's an extension that's building on top of Postgres. It doesn't do anything to the Postgres engine itself that moves the state of the art forward. And then GCP recently came out with Ally SQL, which is basically trying to do what Aurora did by doing it the Google way, using Google's massive infrastructure and all the things that are hidden in Google Cloud. Again, those things are not open source. So open source efforts, Yugabyte has been around for a while. I was, I worked with the Yugabyte founders very early on when they did their first version and they decided they wanted a SQL layer on top of Yugabyte. And instead of building their own SQL, I said, Postgres code is there, why don't you try to use it, figure out what you can do with it. At that time, Plugable Storage Engines was an idea and some of the hooks were starting to appear but it was not far enough for it to be useful for Yugabyte. So they ended up just doing their own thing and they did release it and they made it open source. But again, it's their opinion that the model of how a distributed storage engine could work with Postgres and some of it may be useful or not. They are very interested in trying to contribute those ideas back to the community if it fits and if it makes sense. And that might still happen. Cockroach DB, I'm from New York. I'm one of the organizers of the New York Postgres Meetup. Very early on, they came and showed what they were doing building a Postgres compatible engine. But because they were using Go, it became a completely different animal. And then many years later, they just decided change the license. It's not really open source. You can have access to the code but because it's all written in Go and Postgres written in C, there's no real synergy there. Neon DB is something fairly new but it's not built to modify the Postgres core itself. It works alongside Postgres by taking the wall records and replicating it into S3 and doing all kinds of very interesting things to it. So it is open source but again, it doesn't try to change the core Postgres. So I'm here to talk about Oriole DB and what Oriole DB is really is modifying and enhancing the table access methods framework that's in Postgres so that you can build different types of storage engines that are optimized for different types of infrastructure, different types of storage devices and also different types of workloads. Imagine MySQL very early on when it came out, it had MyISM, it was kind of okay, interesting as a database but not terribly useful and then when InnoDB came out, suddenly people woke up and said, okay, we have transactions and we have more advanced methods and then MySQL really took off. So table access methods, hopefully if it can be really mature, it can really change the way Postgres is being used in different places and it can also give people a way of customizing the Postgres core without actually messing with the Postgres core and again, Postgres has this extension mechanism. Ideally, this is a model that should be carried forward and can be very, very powerful. So what Oriole DB does is a lot of these things were discussed over time but the Postgres shouldn't change things in the core especially undo logs. Most of the databases, especially Oracle, has undo logs and redo logs and those mechanisms and Postgres doesn't because it has a different method. So Oriole DB implements that. It implements some interesting changes in the way the buffer maps are done and then for wraparound IDs, it uses C4 bit instead of 32 bits. So I won't go into too much details here but if you don't know the difference between undo logs and the way Postgres currently does it, you can ask questions. I'll maybe a couple of people can volunteer that but this turns out maybe to be a more efficient way of managing MVCC. Do I got that right? By side and you can use it for specific tasks that you need and then all your other data can still be in Postgres. That's really the benefit of it. And then buffer maps, so instead of having a separate buffer map in memory where you have to go through and find out if the blocks are in memory and then you don't find them and then you have to go back to the disk and load those blocks. This method has a dual pointer so you can actually go and find the block and with one scan you can figure out whether it's on the disk or in memory and then if it's on the disk, you can load it. I can't go into the details on this but this turns out to be more scalable for larger machines with multiple cores and more memory and I'll show some results afterwards. Let's talk about some results. So this was a very infamous blog written by the Uber team. Originally the Uber team started with MySQL and then they switched to Postgres and they went pretty far with it and then another team came in and they wanted to do something different and Postgres somehow didn't fit what they expected and they wrote this blog about how terrible Postgres was and why they moved to MySQL. This caused a big firestorm in the community because half of the stuff that they wrote about may have just been wrong or misunderstanding and then the other half were legitimate complaints about the way Postgres worked and why it didn't work for them. So yeah, if you Google Uber, Postgres, this is the first thing that comes up and even though it's about I think five years ago, it's still number one hit on Google. So for a lot of people coming new to Postgres, if they just happen to stumble across this, it gives them the wrong impression about whether they should move to Postgres or not. So OrioleDB kind of solves this problem by the design so that you don't have that choppiness that you see over here and you get a much higher throughput in that same infrastructure. And the same thing down here is that the blue line you can see is very, very choppy. So if you don't want to have performance issues, you have to oversize the storage IOPS that you need in order to get stable performance. So OrioleDB has a much slower threshold and so you get much better performance in the same machine. And then storage efficiency also, this is otherwise referred to as, I think in the blog it referred to as a write bloat on how as you use Postgres for a long time, this just kind of piles up and then the amount of storage that you're actually using does that to be higher than what you really need. And then you have to go through some cleanup and things that Robert talked about in his previous talk. So this was an experiment that was done and I think by October we'll probably redo the experiment once V15 is ready and then we'll compare it again. But what we were able to do is use the obtained storage device, which is like persistent memory and treat it as a block device and run databases on it. A few years ago, Dr. Stonebreaker was giving like a road tour to talk about databases. At that time, he had just built a new in-memory database and he was talking about everything that you can just throw out of the code and the database architecture when you don't have to deal with spinning disks and all the locking and latching that has to be done. And when he showed the efficiency of the code and what the database was actually doing, about 55% of the database operations had to do with storage devices. And at that time he kind of hinted that Intel was working on something really interesting, which could be like persistent storage memory where you could use it to store data and you would not have a power down problem and that given him a grant to do some work with it. So that eventually became what's now known as obtained. So you can now get obtained devices for like eight terabytes or maybe even 19 terabytes. That's a pretty good size database. So the blue line, yeah, so the blue line is just PG SQL and you can see as the number of connections increases, the performance nosedive, right? This is just PG bench. And then the green line was using OrioleDB on that same device, but through the file system, right? So just using the OS to talk to the device so that the OS treats it as a storage device. It doesn't know anything about the memory characteristics. And then the red line was treating the device as a block storage device directly and bypassing the operating system. So this creates some interesting use cases where you can take specific tables and lock them into just talking to these devices and then you can get much, much better performance, right? So you can leave a lot of the medium level workloads in Postgres traditional engine and then just the high right workloads, you can aim it towards these new devices and that reduces what you would need to otherwise get more IOPS from your cloud provider or size bigger machines or get more memory. So this changes the economic equation on how you use Postgres. And as you see, more and more cloud providers, especially SaaS companies, they're using Postgres as a default engine for their customers, right? And a tremendous amount of their overhead cost goes to paying Amazon or Microsoft or whatever because of some of these overheads that Postgres has. If you can reduce that, more people can benefit from it. The economics of using Postgres just gets better. Any questions I can take are very good. Record level walls, again, is a change. Postgres currently does block level wall and I think that's tradition for most databases. There's a little bit, some extra things in here where using record level walls made it easier to potentially build a multi-master version of Postgres where the entire information is on all the other nodes. So when you need to do failover or you need to write to multiple nodes, it becomes easier to deal with that information. So I think this also needs a detailed blog to talk about the benefits of it. And so what are we doing next? So one is that we're tracking Postgres. The extra codes and hooks that are needed to make OreoDB work in Postgres, that's not part of core and hopefully that can be upstreamed at some point. We're working on that. So what we did is that took all the code necessary to run Postgres and backported it. OreoDB was originally designed for version 14. We were able to port it to version 15 and test it working. We were able to also backport it to version 13 because we needed to run it in some cloud provider and they were not ready with version 14 yet. And then we're also working on version 16. And then we had one potential customer, a large Postgres-based SaaS company. Because they had just migrated from 9.6 to 12, they were not ready to try anything else. So they said, if you could backport it to 12, we can test it side by side with the same exact workload on traffic and see the benefits of it. And then that would help them decide which was the next version of Postgres they wanted to jump to. Do they wanna go to 13, they wanna go to 14? Yeah, that's something. And because they get a lot of cloud traffic, it would be easy for them to set up a separate Postgres OralDB implementation and just pipe the traffic directly to that and test it and give us some feedback. So there were a couple of patches that were committed in the commit fest. It's ongoing now. Don't have too many details on that at the moment. And then upstreaming Postgres is, we're trying to contribute some of it into version 16. And all the code is in GitHub. So people are running their own PG bench test and giving us feedback and having pull requests and telling us where things don't work. And one thing we are planning on is to work with, again, large companies that are using Postgres for them to run it alongside their production environment and give us some performance feedback. So OralDB, the storage engine is all there in GitHub. And it's there in the open. So people are putting in pull requests and reporting issues there. There's a separate GitHub repo called Postgres, which is our patch version of Postgres that has all the extension needs. And also over there you'll find a lot of Docker containers for several operating systems, all the different versions that we're building for. So you can just download the Docker containers and run it and then trying to build it from source. So when we released this, I think it was in February, we made sure at the first release we changed the license to have exactly the same Postgres permissive license so that anyone is free to use it on their own. We don't really need to know about it. But of course, people want to use it. We'll come back and ask us for help because now they're using something that's a little bit different from the community edition. And I think within the first week or so, we got like a thousand stars and that attracted a lot of attention. It hit number one on hack and use as something, oh, look, all the things that you thought that Postgres would have problems with, here's an attempt to solve it. So a lot of people are aware of that. And in terms of education, we've done a lot of presentations at conferences, like this one. The second one has been very interesting. What we have done is that we work with certain large startups who want to use this. And the way we offer a relationship is that not just to help them understand the code and not just take the code and try to run it themselves, but we also started a coaching program to help them hire engineers and make them into Postgres experts, right? So we're using the code here as a learning lab to train people because one of our goals is to expand the number of people who contribute code back. Just found like, and one of the conferences I work on, a lot of customers come and say, we plan to move to Postgres, we're gonna go all in, but we need to hire X, Y, and Z kind of people. Can you help me recruit a Postgres Committer or a contributor? In the end, it was just, there's not enough of them, right? So advice, customers like that or users say, hire good engineers who want to work for your company and we can teach them how to become Postgres hackers. And that's a more viable way of expanding the universe of people who are just not just using Postgres, but are knowledgeable about Postgres and then can become productive contributors. Office hours, we haven't really gotten that started yet, but we're hoping as OralDB starts to get more public traction, we can do like every Thursday, and one hour thing about Postgres code, how to commit code, how to write code, how to understand code, and what the release cycle is. Just again, try to train people so that they can start to think about becoming part of the Postgres community as contributors. These are some of the few online activities. Right, so yeah, so these are YouTube, so if you just put Alexander Korotkov on YouTube, you'll find two people, one guy who's into games and then the other one's Alexander, so put Postgres as the assessing and you'll find a lot of past presentations. But the really good one was the one that we did with the timescale community day, where we talked about, because timescale is probably one of the most impressive new databases that is built on top of Postgres using Postgres extension mechanism. And again, back in the day when they first started, they managed to get everything that they wanted done, but now as they're trying to stretch the state of the art, they also wish they could go in and modify some aspects of Postgres to make it easier for them to do some of these new things. So timescale, last time the CEO spoke, they committed to hiring about a dozen engineers and paying them just to be Postgres contributors. So this is the level of investment new companies are making into contributing and making Postgres better. And yeah, so and last year I think there's probably a recording of the original talk that Alexander gave, I would advise you to go and check that out. And then we gave a talk at Postgres, Postgres Vision, which was in June, and Alexander had like the last talk and the title was called The Future of Postgres. So this is not necessarily the future of Postgres, but an idea of some of the things that might affect the future of Postgres. And this is one way of doing it. What else is coming in the future? Yeah, so this was a challenge that Dr. Stonebreaker gave at the Postgres Vision keynote. He just said like Postgres is too complicated, it takes too much experience and too much knowledge for people to run it well, try to simplify it. And one way to do it is that instead of having to tune every aspect of Postgres, you can have specific modules that are designed for specific environments and those should not need to be tuned. Out of the box they should come in with better defaults so that you can just use it without having to be attend your Postgres expert. Some of the other big challenges we see that are wicked problems, connection pooling. Postgres is probably one of the few databases where there's five or six different connection poolings. Every other database has just decided to just build it as part of their core and then people just don't have to think about it. Just turn it on and they have, you don't have problems with too many connections. Multi-threading is something that many community people have talked about. So we continue Postgres' process architecture. I should move to multi-threading. And a couple people want to try to solve this and we're hoping that we can contribute something to that. And then sharding is something also the community is working on. And our position is that in order for sharding to be really successful, Postgres as a core engine also has to be much, much better. Because when you have 1,000 engines running and trying to orchestrate and manage all of that is gonna become a worse problem. So core problems need to be fixed so that the future of scale-out Postgres can be even better. And I was talking to some people who work at banks and data encryption is still a very, very, most of the IT departments are satisfied with encrypting at the file system level, but the DBAs prefer to have encryption in the application model. So these are other big challenges we see that as we try to evangelize Postgres and build out the community of more developers, hopefully more people will be able to come in and contribute towards this and accelerate some of the development. So this is a startup I'm working with. They built a really nice cloud-based service where you can take a sample database and run it in a specific cloud and it will do all the math and show you what's the cost of that cloud service, running a specific database and your cost per query, cost per operation and things like that. So we're hoping to use them as a partner and to put OrioleDB as an engine that they can deploy across multiple clouds and show the difference and the benefits. So that's a cool thing. And then again, the best way to deliver a database is to deliver it as a database as a service. So we're working with a company called ScaleGrid and what was really nice about that is that you could just go ScaleGrid, you can turn on a 30-day trial and you get a nano instance and that same service runs on everything from digital ocean to Oracle Cloud and Amazon, Google, Azure. It also runs on VMware, but you could go online and try it out and we have like a one and a half pager where you can log in with SSH as a super user, just go in, shut down the Postgres instance running, delete the binaries, install OrioleDB and bring it back up and voila, you have an OrioleDB database as a service. So it's a quick way for anyone to try things out. You can reach Alexander as OrioleDB.com, you can reach me at postgres.thinkx.com if you're interested in trying this out and we would love to work with people to test OrioleDB. And that is the end of my talk. I'm ThinkX on Twitter and normally I give this talk with Alexander online but it was a little bit shaky whether how well the Wi-Fi would work. So I'm kind of flying solo, but if you have questions, more technical questions that you wanna learn more about it, everything's on GitHub, they're design documents explaining the architecture, design decisions and things like that. All the code is out there. Feel free to go in and put in an issue if you have a question about some feature or you read the documentation and you're interested in learning more about something that everything is there on GitHub. So I hope you will give it a try and if you're interested in participating or contributing or learning more about the code or maybe even trying it in your environment, be happy to talk to you about it. And I can take some questions, yes? Yes. Right, right, yeah. It's an incubator, right? To try out a new idea. No, no. No, we've taken some of those, all the extra code that enhances the table access methods. What Alexander explained to me was that they were very, very biased and oriented towards heap type storage and it made it difficult to use it for everything else. So just speculatively, that may be one of the reasons why some of the other engine efforts may have, you know, not failed, but just had difficulty getting traction. So that's why he went a bit further off and started to build this new engine because he knew that these problems could be solved and then he worked backwards to figure out what was missing in the table access methods to make things happen. Right, so the engine today is a monolith and all the pieces are very, very interconnected, right? So we had the discussion before whether some of that could be useful for anything else. We're still thinking about how that could be done. But right now, you have to take the Postgres V15 code, add these extra thousand lines of code which are sped out about over 12 different functions and then you have a version of Postgres that can run Oreo DB, right? So a couple of these patches have been committed into V15. Hopefully they will go in. Some had to do with the trees and black and red trees, simple things like that. And then there are a couple of other things that we're working towards trying to see that it can get into 16 because these are generic things that will help other companies or other developers try to model their own storage engines. Yeah, these patches have value in themselves but the new buffer architecture and all of that is just very, very closely tied. That's why it's a monolith right now. So we talked to Alexander, can other people see this and either get inspired by it, come back and say, well, if you can make some more abstraction changes, it can be more reusable, right? So that's why we're gonna be talking to some of the AWS people, we're talking to the Yugo Byte people, we're talking to the Timescale people because these are all the companies that would like to see Postgres customized for their use but they want to stay within the community and don't want to have forks. Everyone wants to come back into the community stream and the storage access, table access methods is one way to get what you want but also be inside the Postgres world. So we're having these discussions and hopefully things will get better. Like, I mean, a few years ago I was talking to a startup and they were building their own Postgres engine designed for AIML running on GPUs and they had to hack everything themselves without any guidance from anyone, right? But having these kind of models available and then showing how maybe Postgres can be extended in other ways to solve other problems will just bring a big army of developers and talent back into making Postgres better. So that's the hope. Any other questions? So Avis, thank you for your attention and apparently I stand between you and LUNCH but I don't know if they're serving LUNCH yet but thank you for coming to scale and we're very happy to be back in person at present. Please. Yeah. Yeah, follow me on Twitter. My DMs are open. You can reach me anytime there. Thank you. I came in two minutes under my time. Yeah, that's good. Hello? Yeah, there you go. Hello? Oh, that sounds great. Do you know where PSAV is or do they have a radio? Can we get them a radio? Okay. Well, I don't know, they're wandering around. I'm gonna see if they happen to be set up and... Testing, testing. Testing, one, two, three. Hello, my name is Bruce Momjin, but it's off. Great, thank you everybody. Wonderful to be here at scale. This is my eighth open source event since COVID so I'm feeling strong. I've actually done 12 events, but eight of them were open source, the other ones were church. But it is kind of odd, like every time I come, I'm like, what's it gonna be like? I just don't know anymore. But the eight have been very good and this one looks to be good as well, so I'm having a great time and hope you are too, we're just getting started. This event can be really crowded and really exciting. Quick, so you can probably expect that to be coming. My name is Bruce Momjin. I am one of the Postgres core team members. Been with Postgres since 1996 and I've been with Enterprise TV since 2006. So I've seen a lot. My website is right here. In case you're looking for copy of these slides or video, there's 58 presentations and 98 videos or something like some crazy number. So again, wealth of information. QR code does work, people say. So maybe you have to get closer, but it does kind of take you to that website. I am really excited about giving this talk today. I always like this talk. I don't know, I just, this was started as a series of blog entries because I was just baffled by nulls. There's this very quixotic behavior of nulls and it was a series of 10 blog entries on my website. In fact, I have 650 blog entries, so we'll categorize. So if there's a topic, it's probably, I probably discussed it and nulls is one of them. But once I wrote the blog entry, I said, you know, I really should have, like to kind of bring this together. It feels like it kind of gelled and I made this presentation out of it. So I hope you enjoy it. Glad to take questions at any time. There's really no like sections here. It's, I don't wanna say it's a stream of consciousness, but it's sort of a journey through the null landscape. And hopefully from here, you're gonna get a sense of how nulls behave, where nulls can get you in trouble and then sort of how to wrestle nulls into the behavior that's acceptable to you. I don't know how many of you are application developers. It was, I was an application developer in the 90s. There was a whole bunch of cases where nulls bit us. We're like, why isn't this query, it's like it's got the wrong total. I know there's this amount in the accounts, but when I run this query, I don't get the same number. Why is that? And often it was the culprit was nulls. So I'll highlight that as I go to kind of give you a sort of places to watch out for. Nulls are incredibly powerful. They're also unavoidable and it can easily get you in trouble. So hopefully from this talk, you're gonna come out a little bit of a pair, okay. Any questions? Okay, so null. Well, null in English means nothing, but you get philosophical is nothing something, right? It's sort of a weird concept. But in computer science, null typically means undefined. And you see that in C, you see the variety of languages where if you haven't defined a value for something, it just becomes null, which usually works. But in the data space, nulls can be kind of complicated. For example, what value do you place in a field if it doesn't have a value? If you don't have nulls, well, if it's a string, that's kind of easy. Zero length string is kind of a good null, right? Like, okay, there's no string length, so it's a null, right? But for numeric types, for example, what value? Because you don't really have a zero length string for nulls. And then you start to get into the magic value, like minus one, minus 99. I don't know how many of you've seen these kind of numbers. Everyone uses a different number in the same application, right? Because there's no predefined, without a null, there's no predefined really idea of what to put for an undefined value. So you hopefully as an organization pick one and keep it consistent, but it doesn't work out that way a lot of times. Same thing with dates. There are a couple of famous databases that have weird dates for nulls. Postgres Forts never went there, but you can imagine a date has a similar problem. So we've got the concept of nulls, it's in the SQL standard, it's well-supported in Postgres. And again, what I'm gonna talk about here is related to Postgres, but it relates to all the other relational databases too. Yeah, there's some specific Postgres things in here, but in general, you should be able to get a lot out of this. One of the initial, one of the problems right away for null is it has three purposes. And that is a serious problem. It even has three different purposes and they all look the same. You kind of get into trouble. In fact, it was one, I think one of the SQL standards people actually was thinking we should have three different nulls to represent these three ideas. Now, never went anywhere, but you can kind of see a little bit of the logical why you might wanna do that. The first one is for unknown values. So if you, that would be a use for a null. The second one would be for an inapplicable value. You know, it's not that you don't know it, it just doesn't apply, right? And the third one is like an empty placeholder. And let me give you some examples of that. So let's suppose we have an employee spouse column. I think this is like the clearest one for me, right? So if you put a null in the employee spouse column, does that mean we don't know the spouse's name, right? The person has reported it. The employee's not married, and therefore there's no spouse, right? Not applicable would be a representation of that. Or we're doing a join with the employee table and the spouse column's not joined. So it's just null, right? So even if you like decide as an organization, we're never gonna let nulls in our system, you still buy out our joints can get nulls, right? So this is sort of a problem. I think that avoiding them completely is not a realistic solution. You really have to kind of buckle down and take it on. And hopefully as I show you some of the gotchas, if you keep those gotchas in your head, that'll get you 90% of the way there. So as I said before, nulls can cause their own problems. Don't use them in inappropriate situations. They have enough, you know, there's a, what is it? Biblical verses, you know, every day has enough trouble of its own, right? Don't worry about tomorrow. Every dollar has enough trouble of its own. Don't add more of them if you don't have to because it can be a very, very complicated place to be. I love this quote. It's by Joe Selcoe out of Austin, big Postgres fan. And he's writing about another book that mentioned, that mentioned nulls, CJDate, obviously most people recognize that name. And basically the idea here is that nulls are more trouble than they're worth and they should be avoided. They're strange, they're inconsistent behavior, sorts of very confusion, very true. And he basically says try to avoid using nulls but then if you have to use them judiciously. And I think this is a wise, a wise approach. I, I, you just can't use SQL without at least understanding that some of the traps that nulls can put you into. So, there's gonna be a lot of text here. So if anything you see in red, that was what you should be looking at because you'll see a lot of text on the screen. And again, something's red like the fingernails. You kind of look at them right away. So here you can see the red right here. So what I'm gonna show you is I'm gonna start really slow and then I'm gonna work up into weird, okay? So just, we're going there. Any questions? Okay. First thing, select null, perfect. But by default, PSQL just reports nothing. And that can be quite confusing because nothing is hard to identify from a zero-length string or from no rows at all, right? I mean, you can see the confusion there. So one thing I did for this application, for this test is I actually used, I actually did a P set and I made null equal to that string right there. So now if I do a select null, I get something that I can see. So again, this is ill for illustration. If you haven't done that and you run these in PSQL, you will see different results. So be careful. So how do nulls get into your system? The first way you get them into your system is to effectively literally insert them. So here's an example. I'm inserting a null into the second column. And as you can see, when I do a select, I get my null back, right? Pretty straightforward. Here is another way that nulls get into your system but here we're doing it implicitly. And how are we doing that? We are specifying only one of the two columns from the null test table. So remember back here, I said there's an X and a Y column, right? So here I'm just inserting into the X column. So what you see here is by not specifying the Y column, I implicitly inserted a null into that field, right? Here's another way that nulls can get into your database. You can block nulls and I do recommend this. Frankly, I have a blog entry about it. Normally when people are creating tables, they think, okay, column name, data type, column name, data type, column name, data type, right? That's usually the way I type my create table statement. One of the things that I recommend for people is think of it as column name, data type, constraint. Column name, data type, constraint, okay? And the reason for that is you're losing a lot of the power of SQL if you are constraining columns that shouldn't be null, columns that you should be unique, columns that maybe you should have a check constraint with a pattern, right? There's a whole bunch of foreign key references. There's a whole bunch of things that we can do with constraints that really increase the power of relational systems. And if you're not using that, you're kind of getting only part of the rich relational experience. Yeah, I can't believe I just used that word, but anyway, I think it's actually true. So if you take a look up here, I created another null test table and what I did was I added constraints not null and not null. Now, this might be the user name. You don't want the user name to be null. This might be the user ID. You don't want that to be null. There's a whole bunch of reasons that you might wanna do that. Now, sometimes it's implicit. If you use primary key, that is unique and not null on its own. It's like a package that comes with primary key. But if you're not doing that, then every column you don't wanna be null, you should really say, I don't wanna null here. I'm talking about how to avoid some problems. Problems not only from SQL, but problems when applications misbehave. They forgot there's a new column and they don't insert into it, right? You wanna see these errors early on rather than collecting a bunch of garbage data and having to go back and clean it up because I used to have to clean up that data and it wasn't fun. Typically at the end of the month, we had to do the data cleanup. I don't know if anybody else has experience. So for example, if I insert in a null test too and I use a literal null, I get an error. Violates non-null constraint. If I do an implicit null in the final one, again, the violates non-null constraint. So just keep aware of that, okay? So that's kind of the basis of where we wanna go now. I'm gonna show you some of the behaviors that hopefully will give you a pattern of where we're going here. So if I add one to a null, I get a null, right? And that actually kind of makes sense. It's an unknown value or it's a non-applicable value or it's a value of an outer join, right? It doesn't exist. So adding one to it doesn't make a sense. Concatenating it doesn't make any sense. No matter which order you do it in, you get nothing out. And there's many cases when we wrote applications. Remember I was talking about guarding fields that shouldn't be null. If you're concatenating columns together and one of them's null, all of a sudden you're like, why does my report have this blank there, right? And then I'm the guy who has to, I'm in there and figure out why. How did that get to get in there and then what application is not filling that data in, right? You can see why it gets to be confusing. So again, a lot of data sanity would say not to allow null if you're gonna be concatenating certain columns together. Null is basically an unknown value in a way. So here I'm inserting, I'm gonna look at, show you some of the values into an increment test. So 30, 40 and null. And as you can see, x plus one, if x is null, then you get null out. And we just showed you that in the previous slide, right? That makes sense. Now for comparisons, it gets a little tricky. Because unlike the C language and a lot of most languages that we work in, that I work in at least, there's usually a true and false kind of an understanding of comparisons. Either a value's true or it's false. It's non-zero or it's zero or it's T or whatever. There's always usually some kind of true, false dichotomy there. It would do when you do a comparison. SQL doesn't have that. They have what they call a three-valued logic, which means that a comparison can return true, can return false or can return null. And there's some ways around that I'll show you in a minute, but effectively when you're logically thinking about, okay, either it's false or it's true, that's where you're gonna get into trouble. And I'll show you some queries where we sort of come at that, oh, either this is true or that's true. No, no, no, because there's this third value that can pop up and cause all sorts of problems. So for example, if you compare null to one, which seems logical, it returns null. It doesn't return false because it's an unknown value. So it may be one, we don't know. So again, this unknown concept comes across. Does it equal to your length string? We don't know. Does a null equal another null? Right? No, we don't know. We don't know if it knows. Maybe they do equal, we don't know. They could have come from two different places. Even if they came from the same place, it's still null. And this is where you start as an application developer, you start to get confusion. Like, well, logically, where you start as an application developer, you start to get confusion. Like, well, logically, if I compare one null column with itself, they're both null, but they're both the same value. No, it doesn't think that way. It's not a thinking thing. If you compare it, it's not. Even this last one, which I love, is null less than null plus one. Well, you know, no matter what value null has, that's gonna be true. But no, we're gonna call null here. Okay, so again, this is one of the tricky things you start to get into. Remember, null can represent unknown, non-applicable, or unassigned. So imagine non-applicable, a non-applicable value, if you increment it by one, it's still non-applicable, right? So maybe that's the way it should be. So this is the problem. There isn't a one-to-one mapping between unknown, non-applicable, unassigned, and the behavior that you see. That's where, that's the primary problem here. So again, that's kind of interesting. In queries, it gets interesting. So one, if it's like one where it's true, you get a one, right? But it's like one where a false, whatever that false would be, you get nothing. But if you select one where no, you also get nothing. You don't get a null where you get nothing. Okay, so really, the where clause only returns true results, not non-false results. See what I'm saying? Like non-false is not always true in at least the way we think of it because a not false could also be a null, right? Because there's three possible values for that conditional expression. So if I say it's not false, you're like, oh, it must be true. No, it could be a null value too. Kind of really interesting. No is not false either. So if I say true and no, I get null because it doesn't know. It could be true. It could be not. We don't know, so it's null. Not null is actually null, which I don't even know where to go with that. It's saying that is something so it's taking a null and it's nodding it, right? It's reversing it, but it's still a null. You know how you say like not x equals one, like it's not x equals one, right? But this is not null, so it's not, this is not not is null, this is not null. It just doesn't know. So it doesn't reverse out with a thing. Any questions? Slide from the head hurting. Okay, yeah, I know. We're only on slide 18, so we have a lot more to go and it gets weirder, so yeah. That's why I like this slot, this deck because it's just your head is just, I don't like, wow. Like I just don't want to shut my eyes at night kind of thing. So this is one area that really got me when I was an application programmer and I can't tell you how many times we'd be like, I don't understand. Like these numbers don't add up, right? I used to work for, I used to write financial applications for law firms so we'd have to end the court, you know, close each month and reconcile all the reports and they would say, why doesn't this add up? And I'm like, I don't know. And I'd have to figure out why and that usually a lot of times the null was the base of the problem or in my misunderstanding in nulls at that time. So we're back to the ink test table and we have 30, 40 in null, right? So this is where your brain kind of starts to kind of fall over, right? So you do, okay, I'm going to take all the values greater than equal to zero and I'm going to put them over here and I'm going to put all the values less than zero over here, right? Simple, I got all my values, right? Either they're over here or they're over here. What do you see on this? What's this? All your known values exactly because if you look, I got three rows here, right? Thank you so much. Got three rows at the top. All the ones that are greater than zero are two rows, right? How many rows are less than zero? None, right, right here. No values less than zero. Even if I do this crazy thing, you look at that as your program and you're like, it's got to be everything, right? No, it isn't. And this is some of the quicksand that you get into. That only returns the non-null values, yeah. So you have to actually, as you're writing queries, if you expect nulls to be in there, you actually have to be proactive about what you want to do with the null values in that table. Because if you don't, you're going to end up in trouble like this. Another kind of example, not equal to 10, that's another great one. So how many rows don't equal 10? Two rows. And the last one is crazy. How many rows either equal 10 or they don't equal 10? You would think that would be all your rows. Some of these mistakes my colleagues made, so I didn't make all the mistakes. Some of them, the guy would bring me in, he's like, Bruce, this query doesn't return the right number. And I'd be like, okay, walk me through this and we'd eventually figure out why that was happening. He made some kind of assumption like this. Like either the person's in Pennsylvania, they're not in Pennsylvania, and we're doing a report for taxation maybe. But the two numbers don't add up to the grand total of all the customers. How can that be? Well, it's because we've got null states in there and nobody had a null designation on there, so we got a couple of null states and now we have to scramble around to find the person who entered the data. Where is that customer? I don't remember, I put them in and it was six months ago and let me look them up and yeah, you're in for a world of hurt, right? A not null would have fixed that, honestly. Not in, if you thought that, not in is just its own problem. Because a lot of people when they think of not in, they think of it as sort of a set operation. They're like, okay, is it in this set or is it not in this set, right? That's kind of what not in is. So, you know, like you would say, at the top you say, is one not equal to two or one not equal to three? Well, that's pretty easy, that's true, right? But then if I say, is one not equal to two, one not equal to three, and is one not equal to null, that's null. Because we don't know if it's not equal to null. We don't know if it's equal to null, right? Which is pretty clear, that I can get through my head. But we don't know it's not equal to null, right? So then I have to say I don't know and I'm gonna show you some queries in a minute where we do in clauses and all of a sudden you do a not in and you don't get any results back. You're like, what happened? Like, I'm not getting any of these, there's like, I should get something. Right, and that's usually the indication that there's a null in the sub query that you didn't expect. So here's probably the canonical example, is A in null, right? So is A in a set of nulls? And the answer is we don't know obviously and then is A not in a set of nulls and we also don't know that. Okay, so that one I can kind of see because I can see the null right there, right? But when you start to do this kind of thing, it gets a little more confusing. So here we're saying is A in A or null? True, right? Everyone good with this? Any questions? Okay, so is A in A or null? Well, it's in there, we know it, we're good, we're great. Okay, but then the next one is harder, is A not in A or null? What's not, we know that's false because we know it's there, right? So we know that it's saying is it not in A and we can see the A right there. So we know for sure it's not there. Even though there's another null there, we don't care because we've stopped as soon as we see there's a match. But the next two is where everything kind of falls apart. So is A in B? Well, yeah, A is not in B, so we're good so far. Let's go to the next one, is A not in null? I don't know, I return a not null, right? And the last one is A not in a B. Yes, A is not in the B. Is A not in a null? We don't know, three-valued logic, it returns a null and we don't know if it's not in there. So you may have a whole bunch of values here and as soon as you have a null in there that not in becomes I don't know. And then because the where clause doesn't return anything that is not true, meaning false or null, doesn't return false and doesn't return null, then you don't get any rows. And this is a very common cause of problems. In queries, we can expand these a little bit. So is A equal to B or is A equal to null? Well, again, we know A isn't equal to B, that's true. But this is where it gets kind of confusing. So A equals B, we know that's false, right? And then, and usually, so the bottom line is false, so we have to keep going with an or. Remember with an or, as soon as we hit a true, we're done. But if we hit a false, we gotta keep going. So if A equals B, that's false or is A equal null, we don't know. So we don't return false, we return I don't know, right? And similar at the bottom is A not equal to B, that's true, right? But again, remember with and, you have to keep going through all of them as long as it's true. Remember, as soon as it's false, you know you're false. So if A not equal to B, that's true. If A not equal to null, I don't know and then return null, okay? So that's why the not in queries are often problematic because you have that sort of behavior where it continues going, it's used it to null, everything just falls apart. Even though you kind of know it's not equal to any of them, but there's this null in there and then that's where I'll show you a little later how to fix that. And here it is. So first, how do you compare nulls? This is how we're gonna get out of our problem here. So does a null equal a null? Again, we saw before, we don't know, right? Returns null. But the powerful use is the is null clause, okay? So a null is null, that is true. That's the way you compare for a null. You don't compare it to E to zero length string. You don't compare it to another null. A lot of people are saying, and this looks really clear, but where you see that people get stuck is we'll say where column not equal to null or where column equals null, doesn't work. We just saw, but right here, a null doesn't equal to null, it doesn't know. So you get caught up in a lot of that where people will say column equals null and whatever and you'll be like, that doesn't work or if you're comparing two columns and one of them's null doesn't work either. So that's why there's this special is null construct and is not null construct. So where you can specifically not with an equal comparison but with an is comparison to say, is null equals null? Yes, null is not null. Remember where this is the negation of null and that's false and we're getting actually real values here, okay? So is null, is not null are actually canonical ways of doing this. So when we talked about doing subqueries and having these nulls come into these in clauses and not in clauses, 80% of the time, the fix was to add a clause and the column is not null because in a lot of cases when we were checking for group membership in a table, in a query which potentially could return some null values, we basically didn't want the nulls to be considered in that case and as long as we in that subquery said is not null, column is not null, the problem went away, we got results that matched and everything was good but you have to remember to do that, you have to remember that the nulls are possibly here and if you don't then you get caught up and you have to go like pepper your queries with a lot of is not null to kind of get it working again. I know, I mean I'm two thirds down under the presentation I'm thinking why did they even event this thing, right? It's just bad news all around sometimes it feels but as I said in the beginning, it's a need, we have a need to represent unknown values, values that are not applicable. Null is a good way of doing that but as application developers, we have to remember they're there unless there's a not null constraint and we have to specifically plan for them when we're writing our queries and certain things like a not in query and an in query are just ripe for this kind of confusion because you're doing a query and it could generate an outer join, maybe an outer join in the query so then some of the fields will be null, how do you want to behave, right? It gets very, very tricky. Questions? Yes, sir. I'm sorry, say it again, 15.2? Yes, I'm not, okay so let me repeat the question. So the, what he's basically saying in Postgres 15 which is coming out in September, October which we're gonna talk about tomorrow and I'm not gonna cover here just because we're not, it's not out yet, right? When you create an index and I'm gonna show you this in a couple of slides you can have multiple nulls in an index and what that does, it's a parameter that allows you to basically say I only want one null in an index and I'll show you that later. Doesn't really relate to where we are here because here we're talking queries, that's more talking about an index and index creation and it was specifically added for compatibility with other systems that some of them have that behavior, yeah, good point. Other questions? Okay, so let's get a little more interesting here. So now we have our ink test table 30, 40 null and I'm gonna be more specific. So here is where I'll say select where column is null and I get my null out, right, we're here, right? And then down here I said is not null and I get my column here. So if you wanna split your table between nulls and not nulls this is the classic way of doing it. Another thing that I think is real powerful and I'm kind of excited about although it's kind of awkward looking but it's a way of doing a comparison that kind of papers over the null problem. I guess we can thank the SQL Standards Committee for this but remember how we have like is null and is not null and that's how you're supposed to do null comparisons. Well, there's this comparison called is distinct from, yeah, it's verbose, I'm not a super fan of it but it's basically saying, it's basically taking our three value logic for null comparisons and mapping it down to true and false. Okay, it's effectively like a funnel. It's gonna take in that null out here and it's effectively mapping it into false. That's how I think of it. But it's doing it least in a way that's clear to the user because they can see. So it's sort of like if you understand the weird behaviors that null and then you see this indis distinct from you know those behaviors have kind of gone away. That's how I, in my understanding, I think of it. So here I say is two distinct from one absolutely, right? No question that's true. The next one is actually surprising answer. Is null distinct from one? Now, if you say it's unknown, is it distinct from one? We don't know if it's distinct from one. We can go back to that old confusion that we had early on but distinct basically says, you know, we're not going there. An unknown value in one, in terms of being distinct it's not a literal one. That's how only I can think of it. So these, null is not a literal one. It's distinct from a literal one. It could be one but it's not one now so it gets a false. I'm sorry, it gets a true, okay? And then is distinct null, yeah, is distinct from null that actually generates a false so the nulls actually equal each other in this case. Null is distinct from, and there's also not distinct from which does the opposite. So again, if you want to bring your null, instead of using is not null, if you want to map your nulls down to a true and false space, that's what is distinct from and is not distinct from does for you. So for example, at the bottom I say is, you know, remember we still, if we do the comparison ourselves of not equal, we get null but if we use this as distinct from then we actually get what we would think of as a two-valued logic with our nulls. And we can use this in queries. So for example, this is kind of interesting and it actually does what we want and it's just somewhat exciting in the world of nulls. We actually have a clause which takes a null and some other values and splits them up in a way that actually is, you know, in two pieces, right, not three. So here we say, give me all the rows that are distinct from 30 and we get the 40 and the null remember there's three rows. So we get the 40 and the null because both of those distinct from 30 and if we say not distinct from 30, we get the 30, right? So this is kind of nice supporting all versions of Postgres supported. This is kind of good. I mean, it's a little verbose but it takes a lot to get used to because is distinct from is really is not equal to in my, right? I mean, that's kind of what it is. It's a little awkward because when you say is, anyway, I normally that, okay, is distinct from which doesn't have the word not in it is not equal to, okay? And is not distinct from which has the word not in it is equal to. Did everyone get that? Okay, I'm not happy about that. I'm not gonna be seeing the praise of that but that's what we have. Okay, I would love it if it was the other way around but again, we have to follow the astral standard. Here's another example. This is kind of interesting where we actually compare the first column to the second column. So we have one and one, two and three and null and null and we say flexor from this where is not distinct from why? Remember, a null is not distinct from another null. So we actually get this row and that row because they're both not distinct from each other. Yeah, I just, yeah. This is really useful for joints if you're joining and you wanna join two null columns that have match. This is gold, right? Okay, any questions? Anyone wanna run away? Here we go, okay. Here we have another example where we actually have a bunch of values and null, a two and one and a null and we're going to basically order them by their values and you can see that the order by automatically groups the nulls together. They aren't technically equal to each other but they go to the same place in the ordering, right? I mean, I guess in some philosophical way they should randomly appear in the table but we're not gonna do that and the statistical standard doesn't really wanna do that and I don't think there's any value to it but it would be kind of interesting to sort of guarantee we really don't know where the value is. You don't even know where it would show up in the table. No, but we don't do that, we don't do it. So they appear at the end. They're effectively treated equal for ordering purposes meaning they show up together. Yes, sir. If I did it with a union instead of a union null I'd get one null. Yeah, I think so. Cause somebody can try it for me please. I'm gonna go on my union all right now. There's an argument that I think is valid that says that you should add keywords to do more work and you should instead of less work and the union all does not follow that because the union all will not remove the stinks and the union will remove the stinks. Now there's some set theory reasons that union all is required but as a application programmer when I used to do union I would basically every time I wrote the word union I would write the word all unless I had a reason not to. And that catches a lot of people too. When you, yeah the classic case if somebody does a union between two queries and then the results don't add up because two customers have the same balance and you do a sum on it and you're like what? It doesn't match, right? You can, the battle scars I have from closing, from monthly closings but yeah, so that's why I do, you know it's kind of, I don't, I believe that the, no, somebody will have to test it. Select null, union, select null, see what you get. Somebody can raise their hand when they have the answer. Thank you. One other thing you can do is you can use, you can put the nulls first if you want. We have a clause here, nulls first and it'll put it toward the front instead of the back in case you want the ordering to be different. Indexes and Dave Stokes already talked about that some company coming PG-15 but by default, nulls are not treated equal and therefore if you create a unique index we can have as many nulls as we want but we do have a setting in PG-15 which changes that. As you can see I inserted two nulls into that table even though we had a unique index on the column. System had no trouble doing that for me. Okay. The use case for, so the question is, is there a use case for having a single null on the table? There are some database products which only allow a single null in an index, a unique index and some applications rely on that behavior. Now there's a way of simulating that in Postgres by creating an index only on the null that, you can do a distinct on or something and there's a way of getting around it but the easier way is just to use Postgres 15 and just set it when you create the index. Yeah. So yeah, there's some applications that rely on, I have a table and I only want to have one null in that table. I don't know why, nobody's ever explained to me. Yes. Thank you so much. Select null union, select null returns one null. Thank you so much. Yeah, so select distinct also will return one null even if they're multiple. Thank you so much. Appreciate that research. Other questions? Okay. Come on in kids, we're talking about nulls, you're gonna love it. I'm just, we're just saying hi and we're glad you're here. There you go. Feel free to come in or not. You're good. Appreciate you here. I always used to bring my kids to all these events so they got to meet everybody, we loved it. Okay, finally I wanna talk about some of the more esoteric examples of nulls. And I think this is, again, good to know. First one is aggregates, so nulls and aggregates. So I'm gonna create a table called ag test and I'm gonna insert three rows, seven and eight and a null, great. And here I do a query right up here and you can first see the two red markers right away are different and this is pretty standard SQL. If you say count star, it's just gonna tell you how many rows you have, right, makes sense. If you say count and a column name, it'll count how many non-null rows you have, right? So if you're trying to count the number of non-null rows that's a really great way of doing it. Some and min and max, as you can see, actually do not pretend the null row doesn't exist, right? The sum is 15, right, we just ignore the, now remember if we had added these up literally, it would be null, okay? Yeah, like what, right? Because remember if we took the, if we took seven and plus eight plus null, the result is null. But for aggregates, we're in another world, right? So we get the seven, the eight and the 15, which kind of makes sense. The average is where things kind of go wacky, right? Because I can understand, okay, this 15, you ignore the null, right? The seven is the min, okay, eight's the max, got it. But the average, it pretends there's two rows. It's like the third row just doesn't exist, right? And that's just the way average works. Just live with it. That's what the SQL standard has decided for it. Another bizarre, interesting, kind of cool outcome is if you delete all your rows and you run the query again, right? You get zero for select star, makes sense. Select count x makes sense zero. Sum is null, not zero, what, right? There's no rows, there's no sum, it's zero. But no, we're gonna get a null out of this. That can bite you pretty badly, right? If you might do a query, is the sum zero or not, right? Maybe, maybe not. Min and max works perfectly average, I like those. The sum of the zero is right, I start to lose it. Group by is kind of interesting because all of a sudden, we are treating the nulls as equal to each other, right? We approved, remember how I showed it? Like you add a null to something and you get a null, but for some reason when we do a group by, all of a sudden we're in a different world and the nulls are now equal to each other. So we have two null rows and obviously the count is zero because they're null, but again, you get that result. We've been kind of cool with count of x return null, but it returns zero, yeah, right? It is what it is. I'm sorry, this is a little smaller, but I had to get all in one slide. Another thing that comes to your rescue, particularly for strings, is coalesce. Coalesce, and I know I'm going through this, we've only got 50 minutes, I'm doing my best, but you may wanna come back to these slides. I'm just saying sometimes when I have to get down to dirty with nulls again, I have to go back and remind myself, oh, it's this for this and it's that for that. So just be aware that don't feel bad if you feel you have to go back and look. You don't lose any points, right? So coalesce allows me to map a null to a specific value. So for example, I can map, and it's not just strings. I'm giving strings as an example, but here I'm coalescing null to zero okay, which is great. So now my null, so if you have a query and you know a column that might have a null in it, you just use coalesce, you say, just give me the value, but if it's null, give me zero, or give me I am null, or give me NA, right? Here's an example of a query where I basically have a null column and I say, if it's null, just call it NA and I get a nice NA there. Like kind of nice, right? If I'm doing concatenation, I think I might have a null. Just don't just make a zero link string. Beautiful, right? Same thing down here, right? If it's a sum, this is the classic, it's how you get out of that sum problem, right? If the sum is null, because there's no rows, and I want a zero, that's what you gotta do, right? I wish it was just like one button I could tell you to press to get nulls to work the way you want it to, but it ends up being this crazy grab bag of just like stuff. And that's what we've gotten from the SQL standard. I don't know if I could have done any better. So it is what it is. You can go the opposite way. So you can say if, you can say null if, which means if the string is in NA, make it a null. That's what you have right up here. So you can see the NA that I inserted up here is now coming out as a null. And this is the combination. We're coalescing null to NA, and then we're making it null again. We're going back, so that's like a loop we've gone around out and around again. Nulls in arrays. So if you have a, if the value of an array is null, and you say is null, you get true because it's a null array, like the value is null. But if the array is empty and you say is that null, it's false. And if you have a null in the array, it's also false. Okay, remember, this is a null value and this is a null inside an array. Rull expressions, this is kind of interesting. If a row, this is very post-graph specific. If a row is, if an empty row is null, yes it is, is a row with a null in it? Null, yes it is, and are two nulls a null, yes it is, right? When you mix stuff up, it gets a little more confusing. So a row and a null, is that null? No, because there's values in it, right? If it's a null and a one, is it not null? Also it's not true, because there's stuff there. And obviously a one or true, is it not null? Yes it is. Again, it's really hard to explain, like it is a null false, and it is not a null, it's also false, right? Like, figure that one out. Finally, queries in the target list, you don't probably use this very much, but if a query in the target list returns no rows, it actually returns a null, right there. So my select from empty test returns null. So that value of a non-query is null, kind of cool. And even I can check it to see if it's null, in fact it is, it says true, okay? This is the conclusion, this is the, I think it'll make sense to you, and I'm just gonna read this. This is from Jeff Davis, I don't think he's coming, but he's one of our Postgres contributors, he says, oh, he says, oh that makes sense. When you see individual behaviors of null, they look systematic, and your brain quickly sees a pattern, and extrapolates what might happen in other situations. This is everybody, right? Often that extrapolation is wrong, because null semantics are a mix of behaviors. I think the best way to think about null is as a Frankenstein monster of several philosophies and systems stitched together by a series of special cases. And I think I've shown that here today, that's why nulls trip people up, because you just see different behaviors that are not necessarily consistent. I don't know why it was done that way, but it's probably not gonna change, and we just have to learn to use it, and hopefully this talk has given you a little more strength to understand when you might get into trouble with nulls. So thank you very much, and I am out of time, so I appreciate it. I can take one question, I guess, or you can just come up with AFB questions. So thanks very much, appreciate it. Normal voice, it'd be something like this. Yeah, it's on, for sure, because you can hear it banging. It was picking it up. Hey, okay, you can hear me. Complicated. We need a health organization to come up with that as well, so we can send it to customers worldwide as children's EDB software services and support. Our office is worldwide, and we'll have to deploy our global expertise, work with, and support our customers more efficiently. So with that, we'll turn it over to Rob. Alrighty, hey everybody. I see nobody that came up with my first talk is here at this talk, so that tells you something good about the first talk. I guess we'll jump on in there, so there's a fair amount of stuff in here to go through. The talk today, or at least this one, is advanced into big ins conversions. It's basically talking about dealing with data type problems at scale, in generally high traffic, 24-7 environments, all that good stuff, to make all your problems harder. As a quick intro, Robert Treat, I help people do Postgres stuff. Most of this, I will post this online later. If you follow me on Twitter at RobTreat2, I'll put a link up. There's probably a version of these slides available publicly now, I think. But if you ask me, I can definitely send you a copy or whatnot, we'll see. Quick ground rules, okay to ask questions. Again, slides should be online at some point. Feel free to take notes if you want to, but it is obviously not required. It's up to you. All the code you're gonna see in here, I will warn you, there is some code, it's not designed for you to actually run it. I probably did use it in production at some point, like late at night when something was broken. So this is all based on real code, but some of this is like I went back in the notes or I looked in the server log and I pulled some stuff out. So your mileage may vary. I would strongly encourage you to not copy and paste this into prod, and I've put some subtle bugs in here and there just to make sure nobody does that. But it should give you at least enough of an overview to see what you'll have to accomplish if you want to try to use this stuff. All right, so let's talk a little about the overflow problem, there's sort of the classic version of this data type problem that people tend to run into, even still, and it usually the first time like you get a sign of this is you'll see this error message where you get this integer out of range. And I don't know that it's necessarily common, like you have to be running things long enough to get to it. The reason why this comes up is kind of the way that Postgres handles numbers in quotes, I'm gonna put that in quotes, in general. And there are other types that are around, but these are sort of the three most common that people run into. And so there's a small it type of big integer and an integer. And usually what I find, so when people are doing schema design, they kind of look at like small ints, like if I need numbers that are very small, right, it goes up to 32,000 in value. So if I need a small number, like that's fine. But they'll look at like the big int number and say like that goes to, I believe that's nine quadrillion or something like that. So it's a very large number. And they say like, well I'm never gonna need that much. And so that's too big. And then they look at the last one, which is integer, which is generally the default that most people think to use. It goes up to two billion. And you generally would say like, two billion ought to be enough numbers for everybody. If you think about it like, if I could have a two billion customers, like that would be great. I would love to have two billion customers or two billion users. If I could rent two billion widgets or whatever, right? Like that, that would be good. So most people pick that two billion number thinking it will be good enough for a very long time and it'll be great. I'll make a note that the minimum value of these fields is not zero. And we will come back to that in the future before we get done. So, how do other ways people get into this? So I said most people end up picking an integer type, but there are other ways that this sometimes gets picked for you. If you've heard of the serial type, that's also commonly suggested for people to use as like an ID column in a table. You pick a serial type and then it will auto generate the number 40. So like the first row will get a one, two, three, four on and on and on and it's great. And you'll see this in code like ID, serial, primary key. Under the hood, it's basically creating an integer type and then creating a sequence and tying that all together, kind of automating that piece. There is a big serial type that is also available on Postgres, but I find that like either, I guess people don't use it but probably more realistically, most like frameworks don't make use of it. They just use the serial type. Again, assuming two billion will be enough. If you do the big serial, it ties it to the big int column. And so it's one way to avoid this problem sort of ahead of time. But again, most people don't end up using that. There's also things like identity columns that you can use. So if you do like ID, integer, primary key, generate it always as identity, it's very wordy. That might also work, but you might still end up with the wrong thing. So just keep that in mind. You may see other versions of schema that are in there where you're not necessarily picking the thing or some code generation tool has generated this SQL for you because this is easy to port to other databases or whatever. And so you can end up with these integer columns. In most cases, it'll probably be fine, but just realize this is where I'm going. So you'll have an integer column, you'll start to get an out of range with that. I don't really want to debate logical versus serving keys in this talk that could be a different talk for other people or like a topic at the bar. I'm also not going to talk about UID based primary keys, which are also a thing and also have many slides that can be spent on them or beers that could be had and people could discuss whether there's a good idea or not. So I'm not going to talk about those things. We're talking the real world problems like somebody picked the ID as an integer and now you're dealing with it. Okay, so remember with an ID integer, the nature of the problem that we're talking about, you have a table, it has an integer column. Usually when you hit this problem, it's surprising as the first thing because somewhere it said two billion is always enough and you have somehow run out of that. And you'll see this, like if you see it in a user's table and say, we don't have two billion customers, so why do we have an integer out of range error, right? That should not have happened. The other thing is like it often takes years to actually get to this problem. So one thing I'll say is like the ID, you would say two billion will be enough, but maybe enough is not like for the life of the company. Maybe it's like the average career of a DBA, I think is like two and a half years or something at the same company. So you're like, it'll at least last three years and so that'll be somebody else's problem in the future. So there are many reasons why somebody might pick this and why you might get stuck with it. The other thing is of course remember that it can be that you actually do have that much data. So if you have two billion rows of data in your table, the table's gonna be fairly large in size, which will make it more difficult to actually work with the data when you're trying to get out of the problem, right? That's just gonna make everything harder. So keep all of these things in mind that like as you're trying to deal with this problem, the unfortunate thing is it'll be more difficult because of the size of the data and probably your production system is broken and now suddenly the pressure is on. So we say like, well, could we have avoided the problem? People always like, let's go back to talking about that because if we could just avoid this, that would be better. And so there's some ways people try to do that. You can try to use big and where needed. Again, I said because this is usually surprising, like people don't know really where they're gonna need it, right? There'll be some feature that they didn't think was gonna be two billion needed and then suddenly it becomes something that people use a different way. A good example, a long, long time ago, I remember Etsy had like a shopping cart ID that was supposed to be per person and they thought like, well, this should never get that big but then people started using the shopping cart as a bookmarking mechanism. So you would find people that had like hundreds of thousands of items like in their shopping cart and they never attempted to buy it. They just wanted to save it someplace so they could go and get it later. I use that same technique at Amazon. So I don't know about you, but like there are features that are out there. Yeah, Twitter is another good example. Like it has a like button. I use the like button as a bookmarking feature so I only have like 10 likes. I know other people that are like, they have a script or something and every tweet that ever goes out, they just like them automatically and they've got hundreds of thousands. So you might think that it's, your table's gonna be used one way and then users come along and they bring you some really interesting use cases and at least more of these IDs get insured through than you expected. Another big one that I've run into a fair amount, bugs in the ORM code. So I said ORMs, they like to generate the stuff by themselves. There was a very unfortunate bug in Django a few years back where if you picked big cereal as a primary key, it would create the cereal as a big and space cereal. So it would actually generate up to nine quadrillion or whatever, but it would still create the column in the table as an integer field. That's really unfortunate because if you look at the sequence, it looks like it's gonna be fine and you don't realize again that you're gonna run out of space until you hit it. So that was a bug that was in Django for a while and people were trying to do the right thing and not realizing under the hood, something bad was happening. The other big thing we see is artificial escalation. Sometimes, if something happens, there's a failover event or like you're doing maybe a logical replication upgrade. And so part of that, you have to manage the sequences yourself. We'll see that people will just bump the sequence value like, oh, maybe we're at like 300 million after like three years and we do some operation. There's a system crash. We think we have corruption. Just bump all the sequences up by like 250 million more values. And so you just skip all those and you'll get up to these numbers much faster than you otherwise would have. So we've seen artificial escalation as a big problem or one I've also seen where people are like, we're gonna shard our database in a couple of years and we're only gonna use like every other number. So we're gonna put the odd numbers on this database and evens on the other one. They don't actually shard their database, but they still leave the old code in that's only doing every like third number or something. So you're getting escalation much faster than you would have expected in that scenario. So okay, can we eliminate that problem? No, with artificial escalation, you can look into the table and here's one, like it's another really subtle one. You get artificial escalation on errors with rollbacks. So like you're doing inserts into a table and it's causing an error. Well, it could still be burning through the sequence value even if you're not putting data in the table. So you have to also be aware of those kinds of things. And it really depends on how your application developers have decided they wanna handle that stuff. I like to use the thing that like you put foreign keys into a system so that the errors will prevent problems and you can rely on the database to help you with that. So that same thing can be a problem if they say, well, we're gonna rely on errors from the database and not do any checking ahead of time to see validity of our numbers. That's the kind of thing you see here. We're putting in three different values. There are no rows in the table. When I count them, because I've rolled it back, but if I select from the sequence, the last value is three. So it's used up three values even though we have no rows in the table. So that's another kind of thing that like, even if you had picked the right thing or you think you have, you have to watch for those kinds of issues that come in. This also is an issue with on conflict. So if you ever use insert on conflict, do something else like either update or whatever, people don't realize like those inserts are actually still burning through sequence value. So here, like I've got one, two, three with that like insert into where I've done union all with multiple ones. So just so you can see the extra bits, right? So like I've burned through a few even though my on conflict says do nothing. Well, it's too late by that point. Like the sequences already had the values pulled out. So that was another one where like, you know, insert on conflict comes around. So now application developers decide to adopt that SQL and you think it's really awesome because it simplifies a lot of stuff. But now you're suddenly burning through, you know, these, these IDs in a rate that you didn't think you were. So you get extra, extra issue with that. Again, I guess that sometimes people do this on purpose with set val or alter sequence. So that's where you also see people jumping up those numbers on purpose. The other option, I guess, so, okay. If you say like, well, we could use it where needed but we don't understand what needed means and we'll probably get that wrong if we try to guess at it. The other one would be to use big int everywhere. And there are reasons that people don't wanna do this. You know, and most of it comes down to like disk space or memory or like network traffic or whatever, right? But there's more bytes involved if you use big int than the regular integer field. And so people get concerned about that kind of thing. I don't know, I think there is, that is an argument that can be made. I'm not sure how legitimate it really is. Other databases actually do handle it that way. So like, as an example, if you use CockroachDB, integer is just an alias for big integer and everything is like, you know, everything is a big int in CockroachDB. And nobody actually seems to complain about that. It's a feature, not a bug in that database. So that's more philosophical on the part of the database creators. But it does show that like, maybe it doesn't matter so much. Granted, that's a distributed database. So the properties it has with memory and storage are a little bit different. So take that with a grain of salt. But that would be one answer. So you just say, we're just not gonna allow integer fields anywhere. And certainly not an ID column. And we'll just use big int in the extra space it takes. We're not gonna worry about that. Of course also you could use UUID based primary keys. But I already said we're not gonna talk about that. Did you have a question, sir? No, okay. All right. So if we cannot stop it ahead of time, right, we're not gonna be able to guess the future well enough to make sure we don't have this problem ever. Then the next solution would obviously be, we just, if we monitor the system well enough, then obviously we can stop any production problem. And then I have to remind people, and I think we often forget this, and all of the systems that we work on are complex distributed systems. I know it may not seem like even a website thing, like WordPress or whatever is not that, but certainly at scale it becomes that. And even that it's still a distributed system. So it's definitely hard to reason about our systems in a way and have enough knowledge to make sure you're always getting those assumptions correct. And I would tell you that there's no possible way you can actually do enough monitoring to avoid production outages. I don't think that should be like a hot pick or anything, but some people seem to think it is. Which doesn't mean we shouldn't monitor at all. So just have limited expectations about how far you'll get. As an example, so one thing you could do, you could have a quick monitor that just like select the max ID from every table in your system. And if you graph that over time or set up alerts and say if it gets over 1.5 billion, then throw an alert and somebody will go look at it and we'll go and address the problem before we actually hit the integer overflow issue. That's probably fine. It does fall down really quickly though when you realize like if we have IDs, we probably have foreign keys. And those foreign key columns, we have to be very careful about because we'll have to build extra indexes on them that we may not have needed otherwise. We may not have generally been querying against those on a child table. We just needed the referential integrity to make sure that that worked. You will definitely have to do it. And Postgres is not adding those indexes by default. You have to do that yourself. So make sure you have that otherwise it's still sequentially scan a table and may cause issues. Again, you can kind of try to work around that. I don't like many of these work arounds. So in billion right systems I've found people often just drop the foreign keys because that gets rid of a whole set of problems that you might run into usually around locking or performance or whatever. Of course the problem with that is then like it's harder to actually find which ones are foreign keys and not foreign keys like if you haven't defined it. This monitoring thing also if you have like integer arrays an array of values from another table that's not gonna work externally referenced IDs. So as you really do start to scale up and you start getting multiple databases and you're reusing IDs like maybe from external APIs and putting those in your system that is much harder to track to realize even where these numbers are coming from. You can still graph them maybe but watch out for that. And then of course you can also have integer columns that aren't part of a foreign key and that might be something you need to deal with. It's not an ID and it's not a foreign key. We still have to monitor those. So that's great if you know all those but again like you're sort of assuming you'll be able to catch every use case in your monitoring and set up these types of queries. I did put together a query that is fairly simple if not easy to read and will not possibly fit on any given slide but you get slides online and this was probably the one piece of code that I think will actually work if you were to copy paste it. That basically looks at the statistics information within Postgres and what it basically says is it goes and it looks in there in PG attribute, PG class, PG namespace. It says go ahead and look in all of those. I need every column that is either like an integer or a big int or maybe an inter-ray. You can kind of fudge that as necessary to get those columns back. And then it goes and it grabs the min and max values from PG stats that we have collected when you run analyze. So if you're not aware under the hood when you run an analyze on a table Postgres is sampling that table and looking at what is the smallest value and what is the largest value that I've seen in that table and storing that information in the PG stats table. So we can go use that and take a look at it and say what does Postgres think are the largest values in there or what are the smallest values and whatnot. We kind of smash all that data together so I have to do this for each different column type and then I can put it all together and then sort of select that back out. So that last piece, this will give it to you if you run it in like PSQL to see the data pretty easily. You can obviously change that because we've got them in CTEs to be whatever output you need. So if you were doing like a Nagios check you probably want like a count on something or whatever some filtering in there to return like error or not error or whatever depending on what you need for the system that you're running. So that works, it's definitely much faster because PSats is not that big and you're not scanning these large tables. Even with that query though, like I felt really smart when I finally worked that out and there's some definitely tricky bits in that when it comes to data type matching and stuff. But even this is not foolproof. It is only as good as your last analyze. So if you're not analyzing your tables regularly the largest value it's gonna find is from whenever the last time you analyzed was. On most systems at that level of operation we analyze them a lot. Sometimes that leads to different problems. So watch out for that. This also doesn't really help you with negatives. And I'll talk a little bit more about that in a minute. This also isn't gonna really protect you from artificial escalation depending on how quickly you jump through those numbers. So if somebody bumps it up really fast you may catch it but you may not have time to respond. So very hard to really stamp this problem out and not have it at all. Probably the only real way specifically with integers is to just say we're not gonna use integer data types. But a lot of the techniques you're gonna see here is like well if I run into problems with other data types and I need to convert them these will help you with those types of problems as well. So talking about the overflow though. All right so we see this integer out of range. If you get that in production somebody calls you and says the system is not working anymore like we can't sell our widgets for having this error. What does that mean? And you're like you saw that in test right? And they're like no. Okay so now your production system basically stops taking orders and that's bad. If you need an emergency break like here's like pull this handle and we can get our business up and running this will probably work which is to do an alter sequence and then pass in whatever the sequence name is on that table and then change the min value and a restart value and set it to the negative max for that data type. So in this case if we're integer I'm gonna set it to approximately negative 2.1 billion that will probably work. What will happen is like the numbers will go negative and then they'll start counting back towards zero. So however long it took you to get to 2 billion the first time if you're lucky you now have that much time to resolve this issue and fix whatever needs to be fixed or change the data type which is probably what you're gonna need to do. I will say be careful break and just like an emergency break if your problem is you're going too fast like the emergency break helps with that if your problem but when you do stop like that like it's really hard to control a car when you like just pull the emergency break like 20 miles an hour. Don't ask me how I know that. So you will be spinning a little bit out of control yes sir say again. So you it's hard. You just want the max value integer that's less than zero and that will give you the last one you did. Yeah don't do that right. So there are other things as well that could happen if you're playing interesting tricks if you're assuming that the order of integers increasing always is gonna match you know like here's the latest one and we always know this is the latest one because it's the highest number in the system that this is gonna break that. So this could still break things in your application for sure so but the hope is that it like if you're inserting data you at least buy yourself some time to go fix this in a more proper way. And so let's talk about some of the more proper ways there's actually a few other cases this will break too and I'll get to those. So we'll go through these relatively quick and there's some pseudo coding here don't don't worry too much about it I'll kind of walk you through it but imagine if we have a table right this table till the M there's an integer and a jth on B we ported this for Mongo at one point and so we have two columns and we run out of space on the integer well the first way to go at fixing this we can add another column future y to replace our y column that'll be a big int so now you see a table there future y with a big int and then you can create a view and one of the nice things about Pestgres you have transactional Ddl so you can do some really neat tricks you alter the old M table and you do some transactions so you alter the old M table rename it to something like other M and then you make a view that looks like the previous M table where you mash the big int and then call them together and just return that back as the y table so the view looks like the old one but under the hood and you see the highlighted part so we have a view now public M and then there's that coalesce of the two fields from the other round table so I'm just gonna pull those two pieces together now, like I said, it's pseudocode and I don't want you trying to use it in production so I don't give you the full deal but you may have to add triggers for insert and update or delete depending on exactly what you're doing and I would say may which means you probably will have to do that you may not be all of those like if it's an insert only table obviously you don't need triggers for updating and delete but you'll probably have to do an insert where you're mapping data around but this will buy you time to allow you to start backfilling all of the data that was in the first column into that new column that we built and that's a slightly painful situation but you backfill all that data so you do an update of every row in the table and that's only two billion updates so hopefully it doesn't take too long but at least you're taking new data in while that's going on and then when you get it done with that again we make use of that transactional DDL we can drop the view that we had we alter the table other M and now we rename the future Y column to Y and commit all that and now you look at the M table and you've changed it from Y as an integer to Y as a big integer watch that sequence though because if you haven't addressed your sequence problem that could bite you so just be aware of that and at that point if you had done like the negative trip if you find that early enough you can do that without really having to mess with anything if you've flipped it negative and now you get into this like this will still work but you need to make sure you flip it back to something that's over two billion so there are definitely corner tapes to watch out for if you're doing that kind of method but it's certainly doable again have done that in production more times than I really want to admit and we've had to use that quite a bit so yay you can also do that at the table level so that was at the column level just add a new column onto the same table you may have cases where you need to do it some other way here we're doing again a new public M where we have a big inch which is the primary key but here we have Y as an integer field so maybe that's like a foreign key or something go into some other table we create a table future M and this one we're gonna change Y from integer to big inch because maybe the foreign key field now is getting new values in that are not they're too big for it and we do the same kind of trick here we're gonna alter the table rename that to original M again we create a new view they coalesce those columns together and we present that to the user so now when they look at the view they get public M and now it looks like there are too big in columns as a view of course season with triggers and update insert manage that stuff as needed it definitely depends here we probably need one because we have two tables in mind here right so we've got two tables we're dealing with so we need triggers on M to handle the inserts and the updates going over to the other table and adding like where X equals one so like if you insert on the new column we wanna populate both fields on the other one and then actually you may wanna add triggers on the other one to do a backpopulation into the first table so in any case once you backfill right you update your two billion rows now all the columns have all the data and we're gonna do the same transactional DDL trick again we drop the view we rename the table back and we commit it and now you've got you know X and Y as big in again watch out for sequence values I would say you can play the same trick right as views with like tables using logical replication or foreign data wrappers it is more complex and definitely wanna say I glossed over a lot of things in those examples I just kinda wanna give you like the basic outline of like you can make new columns and new data types and those could be anything right if you were using like a text field and realize like actually we're just storing UUID values so maybe wanna turn that into UUIDs right you could use the same basic method to do that like make a new column put some triggers in place to do the data changes and have that happen but I will say the more complex your schema the more likely this is to be a real pain because if you have a lot of like dependent views or you have triggers on tables or you know just any of that constraints and whatnot you have to account for all that stuff you know in that like DDL transaction piece so that can be more complicated if you have like you know 20 dependent views on the table in question you've gotta actually go and like reconfigure all those views to look at the new table so that takes more time and effort the upside is that time and effort you can put in you know it doesn't it's not like while the system is live in production necessarily like you get those inserts flowing again get the system up then all of that is like stuff you can test out on a test system right and make sure the code's gonna work when you go and deploy it so I believe it or not I do sometimes test stuff not directly in production so there are definitely other problems to consider I don't wanna make it seem like I've solved all the problems so the first thing is you had to think about your app code so as noted already some people here do really funny things in their application that they shouldn't be doing and that's with my full on DBA hat because I love telling developers like you shouldn't do that and if I am the developer I usually say to the DBA like well no your database should just work better but in this case we out of the DBAs are out and over the app developers here so like the app developers are the wrong ones right so they're doing it wrong so if your app was based on you know original ORM schema so if you've used something like rails or grails or whatever Django or any of those like you usually have like a section where you're supposed to map out what your columns are on the table and what the data types are and all that you have to make sure you go back and update those models within either like the Django or the rails code right not just within the database you need to do it in production out of band of what you're doing inside your app structure but then also make sure you go and update the app another really fun thing that you might run into is you know what type what number types does my language support and are those the same as what my database supports so like let's say you're using a language where it uses unsigned ints instead of ints and the simple way to think about it is if an integer is like negative 2.1 billion to a positive 2.1 billion an unsigned integer is 0 to 4.2 billion right so it's all in the positive space so when you do that emergency break I'm gonna flip it to negative and that'll solve all my problems well if your programming language doesn't know how to deal with a negative number because it thinks it's an unsigned int that's not actually gonna solve your problem you're in real trouble now and I apologize but you know Postgres job market's pretty hot you'll be alright just saying also I would say you know again modern systems these days complex distributed systems they're like ogres which is to say they have many layers right as we all recall let's say you're getting values from like an API or you're feeding values into an API from an integer field and you publish some public spec that says this API returns integer values and now you switch it to big inch values and you fix all that in your database and so now your app starts passing out you know larger than 2.1 billion values out to the public and now their apps start breaking if you have a compiled app where it's compiled in to believe it's gonna be an integer and now you're giving it values that aren't that way you have to like recompile the app if you'd like to put those two ideas together imagine you have a mobile app that has been told it's going to get integer values and now you need to recompile the mobile app and the good thing about that is when you recompile a mobile app with an integer overflow fix there's actually a little flag you can push when you push it to like Google or Apple that tells them I have an integer overflow problem please put this out immediately to all people who are downloading our app and have them update it and so it'll get through the approval process like instantaneously basically like they just fast fast so I'm making all that up yeah so if you have that kind of situation with a distributed application or you've sent out code to people like you have to figure out like how do we get them the new code so they can actually get the new values from the API that we're doing so and if you have multiple apps hitting that database or every app has to go and make sure that it's still going to work that leads me to another little piece of code idea that I've run into that I've had to use here's where I create a new primary key selecting function right so if you think about the normal definition where you have ID and then it's like NextVal off of the sequence NextVal is just a function and in Postgres you can sub in pretty much any other function as long as it does the right stuff meaning gives you the data you know data that fits in the data type you can put another function in there and change it from NextVal so we've generated different versions of this where we're gonna generate a primary key ID and the way that I've set this one up you know we're gonna randomly send values through the system and in this case you know we will pick from which sequence we wanna actually pull the value from so we have our regular like PK ID sequence maybe that's the default one that was on the table right that's the one that ran you out of space and now we have an experimental primary key ID sequence and in that one we can set the sequence value to negative we can put it to like you know four billion to see if there's an unsigned in problem whatever we need to do we can go to nine billion and make sure that like that's gonna work so you can play around with those values and then the idea is you just feed those values through your system right so like one percent of your traffic is gonna get these and also because this is in a function you can change the function definition without having to do DDL on the table so you can also make those experiences really easy to turn on and off and you can decide whatever is the right percentage so we usually would go with like one percent or less you know maybe you want 10% or whatever and you can put more logic into that if you want to right you could say like maybe if we see this as an insert on an employee versus a customer or something like let's send the fake values through but it allows you to seed those values through your system because honestly like I've found most people don't really know if you tell them like hey we need to switch this positive number to a negative number is that gonna break your mobile app? Like the mobile app people don't know like they're just not gonna know I'm sorry so the only way to really know for sure and it's not even their fault right like there's seven different API layers away from the database so like they think it'll work but in any case so you can see that through make sure it gets all the way through and see what interesting effects you get if you have some weird order by because you're a bad person and that's how you code you can see the effects of that weird order by like it happens so I think that's mostly it I do have some additional readings you can look at and so we talk about some of these I gotta say where some of these ideas were generated from but it turns out totally coincidentally DoorDash has a couple of blog posts that they've put up and I include these as slides I'll put the slides online or I think if you Google advance into big int conversions you probably can find these slides I think I already have them on my slide share account but there's a couple of different ones where they talk about these same techniques of using triggers and multiple views and that kind of thing in order to reshape their data and clean up situations in the heat of like an outage probably the way you're gonna wanna do it is you're kinda focused on just solving your problem but when you do that one where it's like a new table and that kind of method you can actually go fix up a lot of problems in there cause you can add not mill clauses or put in new defaults or whatever that you needed before or something like any of those anything you wanna clean up you can do other fields that you aren't the integer problem but you just wanna change some other data types anything like that you can do so there's a few sort of public posts from a couple of companies here that have done this kind of thing and actually talked about it publicly so you can get even more detail on that and those really go into more of the code piece of it so other than that I think that is all for now any questions any application of ours wanna heckle me some more okay good all right well then I guess we're good thank you everybody Hello Tess check, check, check testing check one, check, check Hey did y'all know that Enterprise DB is a sponsor of our PostgreSQL track here at Southern Cal Linux Expo EDB helps organizations get the most out of PostgreSQL nearly 1,500 customers worldwide have chosen EDB software services and support our offices worldwide enable us to deploy our global expertise locally and support our customers more efficiently Hi everybody this is the PostgreSQL ask me anything section we've got a distinguished panel of PostgreSQL experts here let's let them introduce themselves starting from here and going that way Hello, I guess this is on y'all hearing I'm Robert Treat I've done PostgreSQL a long time and nobody will hire me to do it anymore I guess so I come to conferences and just talk about it I'm Stephen Frost I'm one of the committers major contributors of PostgreSQL and I've been working on it for a long time Chris Momjin working on PostgreSQL 96 Hi, my name is Jaime Casanova I'm from Ecuador and the CEO of a Latin-based PostgreSQL support company specializes in Spanish support I'm Mark Wong I live in Portland I think I've been involved since 2003 doing random stuff Cool So this is an AMA Do y'all have any burning PostgreSQL questions? I have some that I can use to in case things slow down a little bit but let's start with y'all who's got a question for the PostgreSQL folks I'll play Phil Donahue here Yeah, yeah Somebody get a patch accepted into the PostgreSQL base Great question Oh, sure I have two now There you go You do have two That'd be interesting So, I mean, the common way is you post the patch to the Hacker's mailing list and you add it to what is called our CommitFest application This is commitfest.postgreSQL.org There's a number of Wiki articles and things like that on wiki.postgreSQL.org that if you search on that So one of them that seems reasonable is if you search for developer on the wiki site you can come find us so you want to be a PostgreSQL developer thing and that's got a lot of different resources on it But the main way of getting a patch into PostgreSQL is putting it up on the Hacker's mailing list and then registering into CommitFest and then kind of following along as people respond and make comments and shred it for the first couple of rounds is usually how it goes I don't know if anybody else wanted to serve? Yeah, one of the common mistakes I see is people sort of just showing up with a patch and then trying to go forward with it If you look at the to-do list there's sort of a flow we like to follow because what we find is if somebody shows up with a patch just out of the blue we often have to go backward with that patch to say okay, I understand you have a patch but like, do we really want this patch? What is this patch trying to solve? Is this the best way to solve that patch? Is there a history of how we've dealt with this missing issue before that makes a patch maybe not the right direction or we can give you guidance before you actually start coding something then we have to talk about the implementation how would we go at it? What would the user API be? And once we have clarity on that then the patch is much more likely to be accepted. So a lot of people who are new show up with a patch, could be small, could be big and there's no reason, there's no explanation of why I care about this change what it does, why you chose that user API and then you end up having to sort of undo a lot of stuff to kind of get to the core of the problem so starting at what's at the top of the to do list in terms of desirability and then implementation and the user API and then going to the patch is much more likely. So I'm saying talk first then patch is usually the better approach. Yeah, I mean I certainly agree with that I think the only thing I was trying to lay out was kind of the mechanics of it if you have a typo patch, post it to the list and register it in the commit fest, right? If you're thinking about developing some new capability or new feature definitely make sense to float it on the list first and have a conversation with the hackers about it just to ideally avoid any wasted effort, right? That's the idea there. I would say what I would recommend is why don't we have, they just ask the question we'll repeat it, that way we're not sending the mic back and forth. Yeah, so go ahead. So the question is about common security issues that you might end up dealing with from maybe a default Postgres configuration. I'd say from my part at least running the application as super user is like the number one pitfall that people just automatically do all the time. You really want to lock down your users and what their permissions have and what permissions they have right from the get go as part of your development of your schema and the application and all of that. The other big one that used to be a gotcha that we actually made progress on is create access on the public schema is something that we're actually changing which I think is a big thing. So revoking the create right from the public schema from public is a common thing. Yeah, in 15 we changed that so that that won't be included going forward. I mean, there's probably other, yeah, the HBA defaults. Well, on most of the distributions it's okay but you definitely don't want to use the trust method for authentication so that's a common gotcha if you're building from source. There's also a talk on this tomorrow, I'm told. So that's, come to that talk and ask the same question. One other quick thing, if you're into security, like really seriously, I'll pimp it. If you Google Postgres Stig, F-T-I-G, yeah, there is things for it. I think primarily developed by Crunchy Data and you can find that and kind of align that to your organizational needs. Yeah, there's both a Stig and a CIS benchmark. So the Stig is DOD, DISA and the CIS benchmark more or less the same thing, but for commercial. Yeah, go ahead. But definitely use, Graham, for your... By the way, there is this opinion that passwords cannot be similar one to the other and in progress implementing that is actually insecure because you need to know what the text of the password is for doing that. And the way Postgres manages passwords, it doesn't know the real text of the string. So please don't try to implement that, you are not doing security security. I love certificate authentication, so just saying. Yeah, if you're really smart, Kerberos, if you're not certificate. So there's a lot of, the question is there's a lot of talk about distributed databases, particularly Cockroach and Ugubite too. What's the Postgres project's take on the implementation of distributed databases for the community? I was just trying to remind you to repeat the question, but I can't handle it with it, okay. I mean, I don't know that I would necessarily speak for the entire project, I'm sure each one of us has their own general opinions about that question. There's trade-offs with it, right, just like there are with a lot of other things. When you start going into a distributed database environment, you're dealing with a lot of additional latency between the nodes and if you end up having to do the kind of, provide the kind of guarantees that you get with SQL and RDBMS and ACID compliance and all of that, you end up having to do things like a lot of lock management that can be really painful between different nodes, things like that. I think there's a lot of trade-offs there. I don't necessarily know that they're bad or good, but it's a different type of design of a system. In terms of what Postgres is actively doing, certainly we're doing a lot of improvements with Postgres FTWs. We're doing a lot of improvements around things like partitioning, so you can start building your own kind of federated environment using the various FTWs that are available, which are, is a really very capable way of building out that kind of horizontal scalability that a lot of people are asking for when they look at a distributed database environment. That's really what they're going for. It's not that they want it to be distributed, it's that they're looking at that from a scalability thing, but of course, even with FTWs and things like that, you're gonna have to deal with the complications that come with that, but I think, I would say the project continues to move in a direction that allows us to leverage multiple systems concurrently, as long as you're willing to accept the trade-offs that come along with that. Yeah, I had a customer ask me this last week. They said basically, where do you see Postgres can't go or has trouble going? And certainly embedded, I think the idea of us going to an embedded environment like SQL Lite would probably contort Postgres so much that it would not be a good move because we're more of a generic database. I think the distributed is kind of on the other direction, that to go in that direction with Postgres at this point would kind of contort Postgres in a way that would make it harder to use and more complicated to use maybe as a generic database. I think as we grow, again, as you were saying, as we go to the foreign data wrappers, as we maybe have an ability to do distributed locks better, which would probably come with sharding, then I can start to see Postgres kind of moving organically in that direction as we have, sort of sharding is going to be foreign data wrappers and parallelism and partition push down, right? So would distributed eventually look like that with something like logic replication? EDB has a distributed Postgres, which is sort of taking Postgres and morphing it in that direction. I think it's really complicated when I see it now and I'm not sure we would go that direction, but I don't know, the question, I look at Ugubite or Cockroach and I say, okay, they took a dramatic step and went right to distributed and kind of left the non-distributed behind, right? I don't see us doing that. I think the real question is, would we be able to grow organically into that space? And I don't know the answer. It'd be interesting to find out. I didn't think we'd grow into sharding organically and I think we will. So whether we're gonna grow organically in distributed or not is a great question. No, but that's a different sheet. That was my thought, but it wasn't. It's not really different sheet. It's not a different sheet, no. I think one of the things that's interesting that's being worked on in that same space is client-side encryption that Peter Eisenhower has been spending a good bit of work on. And I think that that's a great direction, I think, for us to be going in as one option, right? I don't think it's the only option. I think TDE is another way that we wanna go, but I would say there are some other databases up there that provide those kind of primitives for doing client-side encryption, but it's not as, I wouldn't consider it as prevalent, right? As the more typical TDE that you see with the other products that are out there. Yet, there's certainly advantages to doing the encryption on the client-side if you're able to, there's disadvantages to it's trade-offs, but I think that's an interesting feature that's currently being worked on. So table, yeah, so Bruce mentioned table access methods. So I think it'll be very interesting once we get to the point of having an alternative table access method in core besides, the traditional one that we have that's called HEAP. So this is, you're talking about things like ZHEAP or Citus columnar store. I think columnar in general will be interesting, but again, it's not really something that one would argue is differentiated, right? It's not, I think JSON B would be a good example of something that we really kind of took and ran with early on that was something that wasn't as prevalent among the other systems. Some of the work that's been going into actually implementing some of the features from the SQL standard, I think period is one of them that has gotten some work that, it turns out that there are other database systems. It's funny because the standard will run off and write things sometimes and nobody implements it. So there are some things in that area that I think we may end up getting before other database systems and that might be something that's differentiating us. Even though the feature already exists in the standard, some of these features have been written in the standard like match expressions and things like that maybe one other database did, but most of them did not. So those could be interesting things that I know that there's been interest in. I haven't seen a lot of patches for that though, so I don't know, it's all very early on in terms of the thought around those kind of things. I actually feel like I don't keep up with some of the new developments that's as much, but how's that temporal data type looking? So the question is, has FTW performance improved since nine? Oh, is FTW performing well enough today to look into? So are you doing postgres or postgres? Yeah, so that's the big thing related to sharding actually. The Japanese and a bunch of people have all been pushing things like wear clothes pushed down, order by pushed down, joined pushed down, parallelism of the shards, but parallelism along foreign data wrappers in a partition, pushed down of aggregates. So in nine, we just like threw the query down and like, oh, go do everything and then we'll filter it when you come back. And that was very naive. Because we were pushing toward sharding, obviously that every year has gotten, I have a, my website has a presentation about sharding and there's one slide that basically says, okay, nine, six we did this, 10 we did this, 11 we did this, and 12 we did this, and then 13 we did that, and 14 we did that. So there's been dramatic changes every year in that, yeah. I would say read-only sharding is certainly something that can be tested to see what you get numbers on. I haven't seen a lot of people excited about it, unfortunately, but I think it's ready. I think read-write sharding, we still need a better distributed lock manager and a distributed snapshot manager. Now we have solutions for that, but we need to get to square one and we haven't really seen enough people wanting to test it, yeah. One of the things I think we may end up, the distinguishing question that we got, one of the things I think it has been distinguishing Postgres for good or for bad is the cloud deployments of Postgres. I mean, it's Google, it's Amazon, it's Alibaba, it's Microsoft, right? Everybody's got their own custom cloud deployment, and I think that the concept of some of the storage layer of Postgres moving into the literal cloud infrastructure customization is a distinguishing feature of Postgres because it's open source. Not necessarily done by the community, but it is a distinguishing feature of currently deployed cloud versions of Postgres, and I think that could have dramatically affected. In fact, Maboo talked about using, you've talked about using specialized hardware and stuff, so there's a lot of this lower level storage change that I think is gonna be distinguishing in the coming years for Postgres. I think that's pretty clear because nobody else is doing it but us, and the numbers are crazy good, right? So the table access method might allow people to make a version of a cloud specific or hardware specific version of Postgres that doesn't have to fork the code and plugs into the table access method. That's absolutely correct, yeah. So I guess it's getting back to the question we had earlier, table access method, and a cloud and a specialized hardware environment. Yeah, we're doing column and stuff like that, but I think this would be, in terms of performance, potentially orders of magnitude more important, I think. I mean, so the question is about Postgres in containers, and I like to share a little anecdote of, I was doing Postgres in v-servers, back when v-servers were a thing. I mean, I don't know how many people here remember that, but it was like a patch you had to patch the Linux kernel with, which was kind of the precursor to containers. So Postgres in containers is not actually something that's that new, and there are certainly trade-offs to doing it, right? Some people are not a big fan of it. I would argue that the issue with containers is less about the fact that it's a container, and a lot more about the fact that a lot of systems have been designed around the idea that containers, and often the storage underneath the containers are ephemeral, and they just go away, that's okay, right? Because we'll just re-download that or re-image it or whatever, and so you really have to be cognizant of the storage that you're using underneath the Postgres, which is true even in the non-container space, right? It's true anywhere. It's just that in the non-container space, it's a lot more typical to just have traditional storage where f-sync works, right? Whereas in some container storage systems, f-sync is quite intentionally not actually going to f-sync things, right? So that's where you really have to make sure that you're using the right storage modules underneath of Postgres and making sure that they actually do what Postgres needs them to do. So there's more to think about, I feel like, in when you're deploying Postgres in a container environment, and so I think that's a place where you just wanna make sure that you've thought through those issues or work somebody who does, there are some, there are other things around memory management that are also can be challenging in containers, which is arguably a Linux kernel issue, at least that's my consideration for it, and Linux kernel people have been given patches to try to make it better, and unfortunately they have yet to agree that that makes sense, so that's an issue, but that's something that we're also working on on the Postgres side to try to find a way to deal with that on the Postgres side to avoid runaway queries causing problems, basically, but in a reasonably managed environment where you don't have arbitrary queries coming in that end up sucking up tons of memory, that's not something that you have to stress about, you can work around it and address that risk that way. Of course, you have the same problem in a misconfigured Linux server if you don't go set over-commit memory, right? You're gonna have potentially the same problem of the UMKiller running around killing things, like it's not actually a new thing, it's just that in the container world you don't have quite as many tools there easily available to prevent that kind of thing from happening because over-commit memory, and we're talking about like Kubernetes or something, it doesn't, you can't set it, right? And you can't set it on a per-container basis, so that's where it becomes a bit more of a pain if you don't have the ability to control what the queries are that you're sending to the database. But, yeah. So, I, what? I'm not gonna repeat that, yeah, go ahead. Yeah, so in Leipzig there was a talk, or I'm not sure if it was a full talk, but someone was mentioning, there was a discussion about this kernel issue and one of the approaches that somebody else had done was to essentially override Malik, right, and handle that inside of Malik by making sure that it had a cap, right, which is basically the same kind of thing as what I'm talking about, and I actually, we're actually working on developing a patch for, which has been discussed on the hacker's mailing list also for tracking how much memory each backend is actually currently consumed and then having that in shared memory where you can then say, okay, well, once we get to a certain point, we're gonna be like, no, we're gonna start failing, user-oriented, or user-facing backend allocations, and that works just fine in Postgres, always has, right, in the event that we have a memory allocation that fails during a hash join or something, we'll just happily free up all the memory and fail the transaction and off you go and everything's fine, right, and so it's just a matter of getting something like that into core to address the issue that the kernel isn't doing that for us, unfortunately. And it's funny because the kernel patch is like only a few lines to make it work, which is, it's odd that the kernel is that much pushback. So if anybody happens to know any memory-oriented kernel developers who are around and want to talk about this, that would be really cool. But yeah, we're aware of that as a concern, so that's something. But there are lots and lots and lots and lots of people running Postgres in containers, so Mark keeps, you know, sliding in, I mean, I guess, you know, one of the challenges to Postgres adoption that I would see is exactly the kinds of issues I was just talking about around the server-side configuration pieces that you have to think about, things like making sure that your server respects everything, making sure that your server will fail a malloc allocation or having control over the system enough to make sure that you avoid that risk of running out of memory on a system that doesn't have a way for the kernel to fail the allocation for you. I feel like there's not, a lot of people end up just deploying Postgres without thinking about these things and then discovering that, oh, F-Sync doesn't actually work, so everything breaks, right? In the event of a crash or they end up having the Oomkiller come along and kill the Postgres system and they end up with a bad experience from that, right? So I think one of the challenges is working with the kernel folks and working with the storage folks to make sure that we have a way of telling them, like, here's what we need, right? So in the Kubernetes world, things like the storage engines that exist there, we need to be able to tell it, like, we actually care about F-Sync here. Please make sure that it's actually F-Sync works and that F-Sync's all the way down and is persistent storage. So there are some things like operators and whatnot that exist out there that try to help with this, but not all of those things exist today in the CUBE APIs and it would be nice if more of them did and there's some progress being made there, but I do think just making sure that people have properly configured systems underneath of Postgres so that Postgres can provide the guarantees that we try to provide, but we need to have those OS level things set up in a way that allows us to then make those guarantees, because otherwise, people get a bad taste in their mouth. The question was about running Postgres on Windows. I will say that it's not something that I would typically recommend. However, there are folks that are doing it and it's gotten better, right? We continue to make improvements to Postgres on Windows, just like we do to Postgres in other environments. I actually am, we're working on porting PG Backrest over to Windows, finally. I know that that's something that a lot of people have been asking after and we actually have someone actively working on that now, so that's something that will provide a solid backup tool for Windows, another one. I mean, there's other ones out there, but I think that'll be helpful. One of the arguments I've always had about Windows when people ask me that question about Postgres on Windows is that it's not, Postgres could certainly be improved on Windows itself, but there's a lack of a lot of tooling that goes around Postgres that doesn't exist for Windows and that ends up being a real problem when you're trying to deploy Postgres on a Windows environment. It's not the core server itself that's necessarily gonna have issues or break because we test it all the time and obvious things are tested and checked and will work, but you don't have the same ecosystem of tooling around Postgres that you do on a Linux platform today. Anybody else wanna comment on that? No, all right. I just wanted to add that I think it's good to run Postgres on Windows. When we started doing that, we found a lot of bugs because Windows doesn't respect any standards, so we started to find that the code doesn't protect about this and about that, but unless Windows actually improves itself, which I don't think it will happen, I mean, the blood screen has been since what, 95, 98, and okay, it's now black, but it's there. And I remember reading a paper saying that SQL Server doesn't use the memory API of Windows. It uses the primitives. So I don't feel that Windows is something we can actually run smoothly. It's something it can be done that I won't expect to be fantastic. It's a sad story. It's, in Latin, people, well, I think it's everywhere. People think about money first, and while open source is actually less expensive, yeah, well, its adoption is not zero cost. It will cost, and people doesn't want to invest on that because it's a two or three years investment until you actually see the money retired. So in Latin, you can't think about one or two or three years investments because government changes rules too quickly. So it's difficult. We need to learn to teach people, but with some companies, big companies, actually using PortraitNow, actually tomorrow I'm going to talk about our Latin-based company, which is of the financial system, and it moves money for lots of companies in Latin, the West, and Europe. And with those cases of a study, people start using Portrait, start considering it. Sadly, those companies doesn't want their names to be mentioned, so I could be inventing things. There is no way to confirm it. Yeah, so the question is, are there enough Postgres professionals to really keep Postgres moving forward? And I guess so many people, compared to Oracle, yeah, we're a tiny group in terms of availability. And that gets to the larger adoption question. I mean, I understand, Steven, to people who don't want to have a bad experience with Postgres, but there's a whole swat of people who are never going to experience Postgres for a variety of reasons, right? And they kind of fall into two camps, right? There's the one camp, which fortunately is getting smaller, which basically says that only relational databases from the big three are enterprise-ready, right? So it's IBM, Microsoft, and Oracle. Obviously, the ones like Sybase and Informix, they all have gone away, and Ingress is on life support, whatever. But they still see those, for some reason, the other big ones who have left, they understand they've left, but the fact that these other three are still around, they still think they're really the place to go, and Postgres is just something you use if you're gonna want to save money or whatever, right? The fact that IBM and Microsoft are supporting Postgres also doesn't register in their thinking, they're still thinking that those are still the better solutions. And it may be because they feel that they have better professional options, right? They may think, all right, I'm not gonna get involved with Postgres because that's gonna be a step down in my career. I'm not gonna command the same pay, or I'm not gonna be important in the organization anymore. It's a cost-cutting database, so therefore, we're gonna put it in low-level apps, and all the important apps are gonna stay. So this is a psychological problem with adoption to get people there. And when I talk to people, I basically say, listen, Oracle's not gonna be the gravy train forever. It isn't already. It's clear new people aren't really choosing Oracle. A startup would never choose Oracle, or almost never choose Oracle, is it? You know, and it's almost a legacy database, all three of them. And obviously, if anybody working in the legacy field, you end up like a cobalt programmer, right? They're not there yet, but the writing's on the wall, I think, in those cases. So when I talk to people, and you're absolutely right, we don't have enough people, so the only way we're gonna get enough people is to get those people who understand these other databases to learn Postgres. And we've tried to basically, I've pushed training for these organizations to bring in people and do significant training, a week-long training that is targeted between their old database and Postgres, can use the terms that are the same, and the concepts that are the same so they can basically transition from one relational, because that's really the only way we're gonna build that community quick enough. We can't build brand new people from scratch, probably. We're gonna probably have to pull from these other databases and get them kind of up to speed on Postgres. And I think it's been pretty successful if you see people coming from other relational systems. The other area where we've had a challenge in terms of adoption is people who think that relational is old, relational is not where technology's going, and that somehow no sequel is gonna solve all the world's problems. That actually is in decline as well. You see less people thinking that no sequel is easier long-term. It's probably easier short-term, sort of like putting together a go-kart is easier short-term, but don't take it on the highway. So I think people are seeing the problems with the no sequel, and I think Postgres is benefiting. So I think it's kind of over time, we're kind of moving there, but it's always slower than I want it to be. I just wanted to make a quick note about, I mean, if you look at DB Engines, I mean, yeah, I think we are in a situation where we really need more people getting involved in Postgres and helping to move Postgres forward, but I don't think it's really holding us back that much in terms of actual adoption. When you look at things like DB Engines and the huge markets that are out there for Postgres, I don't think it's really holding us back actually all that much. Sorry, somebody else wanted to add on something? I feel like I should give the mic to Dave because he's literally doing a series. So for those who don't know, like you've been posting against on YouTube, right? Like you're doing a series on like Postgres for MySQL DBAs. So there is a series of that for people that are trying to make that transition. Yeah, Dave Stokes are esteemed, I guess, person in the audience.