 So welcome to the second talk in the Postgres Dayroom. And our speaker is Stephen Froze. He's a Postgres Committer and works as CTO at Crunchy Data. And he's going to talk about taking Postgres. Thank you. Well, thank you all very much. I don't know that that was necessary, but we'll get there. So as mentioned, I'm not going to hit too much on this, but CTO at Crunchy, Committer, Major Contributor, blah, blah, blah. I worked on default roles. I did a lot with role-level security, column level privileges. And I actually implemented the role system itself way back in 8.3, maybe do some other stuff. So for those of you who want to kind of fall along at home if you're interested, here's my little Git crash course. So if you're wondering, where's the Postgres Git repo? It's on git.postgl.org. We do have a mirror up on GitHub. I think there's actually a few on GitLab as well for those who want to go there. But the base gist of it is that you can pull this down and then you can start to look at all the wonderful directories and things I'm going to talk about here next. But it's pretty straightforward. When you're hacking with Postgres, we typically do things on feature branches. So you do a git checkout dash b, which creates your local branch. And then you hack on Postgres, right? Make changes. Do what you want. And then the way we tend to operate is you add the files you commit, and then you use Format Patch to actually create a patch that then gets sent up to the hacker's mailing list. And we'll talk a little bit about what that looks like later on in terms of how you register things inside of what we call a commitFest application. So now let's start talking about the actual source tree. So Postgres has a lot of different components. And as we're going through this, feel free to ask questions and let me know if you have questions as we go through. I'm happy to take them as we go, and we'll have some time at the end as well. So in our top level source directory, we have a config system. This is pretty straightforward. We do use autoconf and all the auto tools and what have you for our build system. We also have what's called contrib. So for those of you who were in the prior talk, all of the extensions that come with Postgres are included in this contrib directory. So these are all contrib modules. Now you don't have to use the contrib modules that are in contrib. They're all optional. But they typically are going to be installed when you install Postgres, and they'll be available through just using create extension. Some of the great ones are things like pgstat statements, which is really, really handy and is included in contrib. There are also extensions outside of Postgres that are not included with the core Postgres source code that you may be interested in things like postGIS, which is a fantastic external extension. But if you're looking for how to write an extension going into contrib and looking at some of the examples is the way to go, what I'm going to be mostly talking about is actually hacking the backend, hacking the grammar, hacking the parser. And we'll start talking about all those details after we go through this rundown of what is in the source tree. We then have doc, which is all of our SGML documentation, uses the OpenJade for some versions of Postgres to build a build system. So that's pretty straightforward. If you're wondering how to build the docs, I cheat. The Debian maintainer has done a fantastic job and has everything you need to build the docs. So if you just apt-get build source, you're good. We then have SRC backend. That's what we're going to be working in today. That's where we're going to start talking next. So let's go over a few of the other top-level items, though. You have SRC bin, which is where psql, pgdump, all of what we consider the client utilities live. These are client-side tools. These are things that are going to be installed with your client package. So if you have postcaskl-client installed, that's where all of that comes from. Other things in there that are interesting are in NetDB and PG upgrade. Kind of a little-known secret that maybe not everybody is aware of, PG upgrade uses pgdump in a special mode called binary upgrade. So if you're looking at how does PG upgrade work, the gist of it is actually mostly in pgdump inside of this magic binary upgrade system. So that's something to be aware of. We then have SRC common. We actually have started to build out this kind of what we're considering kind of a libpg common thing, which is code that's actually common to both the front end and the back end of Postgres. Unfortunately, today, a lot of things like logging are done completely differently in the front end than from the back end. So we're looking at making changes, maybe to unify those interfaces and put both of them eventually into possibly SRC common. We then have SRC FEutils. So these are front end capabilities, right? So certain things that make sense to be in the front end that we don't use in the back end go here. So if you're wondering about some new thing you're building, should the common code go into FEutils or should it go into common? Well, the question is, is it going to be used in the back end? If not, put it into FEutils. SRC include is pretty straightforward. It's all of the include header files. The one big thing here that people should be aware of is that we have something called SRC include catalog. Who here knows what the Postgres catalog is? All right, fantastic. A few of you, let me go over it then. The Postgres catalog is that set of tables inside of PG catalog that define how the rest of the system works. So if you're wondering, in Postgres, we have all these tables, right? And tables have columns. Well, how does Postgres keep track of that? We have tables and we have columns, right? So we actually have a table called PG attribute, right? That's one of the pieces of the catalog. The PG attribute table has a row in it for every column for every table in the system, right? PG class is another one of those catalog tables. PG class has a row in it for every table in the system, right, as well as every view and every index, right? Because there's a lot of things that are common to that. We all consider those classes in Postgres or more traditionally relations in Postgres. So if you wanted to, say, add a new column into one of the Postgres catalogs, right? Say you're adding some new feature and you need to add some new information that's gonna be tracked by the database system. Where would you add that column? You add it in SRC include catalog. And in fact, when you go and modify SRC include catalog, magic happens. You just add the column in, you rerun the build system, and magically, suddenly when you init DB Postgres, you have that new column in this new table. Now, nothing's happening with it. There's nothing to do with it. But you have it there, right? And then a bunch of things like pound defines and whatnot. Some of them get automatically created for you. Some of them you'll see inside of those include header files. You may wanna define yourself for specific values that go into those columns, for example. We then have SRC interfaces. So SRC interfaces is the set of libraries or other pieces of code that interface to Postgres directly. So these are typically, libpq is the big thing here, right? That's kind of huge. And then there's also ECPG, which is, I'm not sure how many people here are familiar with the idea of this, but there was a time once when people would write C code with embedded SQL in it and run it through a system like ECPG. And that system would actually take all of that and figure out how to make all the connections to the database to run that SQL code and get it back. So it made the development of C and SQL code a little bit easier as you're working through, writing some new program. I would say it's not used very much today. How many people here use ECPG? Exactly. For the record, nobody raised their hand. Maybe I should have, I played with it a little bit, but that's about it. And then we have procedural languages. Now, SRCPL are the core procedural languages. You can also install procedural languages as extensions. So this is nowhere near the full complete list of all of the procedural languages that are available for Postgres. But the big ones here are here, right? PLPGSQL. So this is similar to PLSQL. It's kind of like T-SQL if you're down for that. But it's a traditional kind of procedural SQL, right? That's what PLSQL, PLPGSQL is. But you can also write code in Perl, right? And when I'm talking about procedural languages here, what I'm talking about is the fact that you can write a bit of Perl code, shove it inside of a create function call inside of the Postgres database. And then suddenly you have a new function inside the Postgres database that when you call that function, it runs Perl, right? It loads up the Perl interpreter and runs your Perl code on the database server, right? And that's true for PL Python, PL TCL. And then we also have things like PLR, right? So if you are familiar with the R statistical language and you wanna run R on your Postgres server, you can use PLR. There's also a PL Java. There's also a PL V8, which is JavaScript. So if you wanna run JavaScript inside your database server on the server in a back end process with direct access to the data, you can do that with Postgres. Very cool stuff. We then have SRC port, that's pretty straightforward. It's just a bunch of platform-specific hacks because Postgres is supported on a stupidly large number of different platforms. We then have a test system. The regression test system is actually really straightforward in terms of SRC test. It is write some SQL, run it, track whatever the results were and shove it into an expected file and that's your regression test, right? That's when you're adding new tests to Postgres. That's how it works. We then have some pieces of code that we suck in from other places. In particular, SRC time zone comes directly from IANA. So IANA actually publishes code for dealing with TimeZam garbage and we import that and that's where it lives, is in SRC time zone. We then have SRC tools, which includes things like PG indent. So of course, Postgres being Postgres, we couldn't use any kind of standard indenting system. We had to reinvent our own. So we have PG indent, which it's interesting. We've actually got someone hacking the BSD indent to add in features so that our indent doesn't have to exist except as a wrapper around it. So that part's pretty cool. I forget who was doing that off hand, but it's really neat. Questions on the top level source directory for Postgres? Did I forget anything? Probably. All right, so let's start talking about the back end code of Postgres. All right, so these are all of the different directories that exist inside of that SRC back end directory. This is all server side code and they all have different components. They do different things, right? So let's start talking about them. The access directory includes all of our access methods. These are methods for the heap, right? So the heap are where all the tables data goes, right? When you create a table and store data into it, it goes into the heap. When you go and create an index, right? Like a Btree index, that's the default index. The access method for Btree indexes and all of that code around Btree indexes lives inside of this access directory. And then just and Jin also live there. If you're thinking about creating your own index for Postgres, this is what you want to go look at, right? Because what you need to do to implement your own index for Postgres is basically just define a few methods, right? And then basically register it with Postgres and you can do all of this in an extension if you want, right? And then once you've done all of that, Postgres will happily let you create a table with your new index and off you go. In fact, Jist is in fact, another level of generalization around the ability to have alternative indexes in Postgres. We then have Bootstrap. This is basically for initdb to kind of kickstart things because kind of eat our own dog food in a way, right? When we start up Postgres with initdb, we kind of run Postgres to actually go and create things inside of the data directory. That's part of what initdb does and that Bootstrap mode is what initdb is running. We then have catalog. So we talked before about what the catalog was. All the definitional pieces for the catalog are in SRC include catalog, but all of the code for working with the catalog exists inside of SRC backend catalog. That's how much lag we have. We then have the commands directory. So commands are kind of your top level DDL commands, right? Things like create table, altered table, all of that code lives in SRC backend command. So if you wanted to add some new feature to say create table, then that would be where you would go look is in SRC backend commands and you would look at the define table command or define relation actually function. We then have the executor. Okay, so who knows what the executor is? All right, a couple of people here and there. All right, so the executor in Postgres, the way Postgres works is that you have all of your queries come into something called the traffic cop, right? We'll talk about that on the next slide, but that traffic cop, what it does with that query that you've sent is it first runs the planner, right? All of the actual planning happens inside of the optimizer down here, right? The query optimizer basically takes in that parsed query after it's been parsed. So first it actually goes through parsing, which is here. Look at that, it works for once. Sorry, most of the times you're on TVs and it doesn't work. So the parser, here we have the lexer and the grammar, which is how PC understands the queries you send it. We parse it first. We then send it to the query optimizer, right? Which generates a plan. That plan then gets sent to the executor and the executor actually runs the plan, okay? So whenever you do an explain on a query and you see all those different nodes, right? The planner created that structure, created all of those nodes. The executor is where all of the code lives to actually execute those nodes, right? Like an append node. For example, there is a file in that directory called node append, right? That's the code that does that append node. That's its job, right? So if you're wondering about hacking on SQL itself, that's where you would go. I'm just gonna say this once right now, but people sit down. The reason I say that is because FOSDEM unfortunately will be very upset with us if we have a whole bunch of people standing on the aisles because the fire marshal will come and complain at us. So as you come in please, if you need to come across the front to go sit down on this side, please do, that's fine. But let's just get it done. Otherwise we're gonna have a whole bunch of people lined up over there and that's not okay. And I don't want FOSDEM to get mad at me. Thank you everyone. And if people can shift over maybe a little bit too, that would be great. Something, let's tell people when they come in. Robert, just tell them to come across the front. Like, I mean, I don't want them messing with the camera but we have to have people sitting. In case you're not sure, I'm also one of the bouncers you all are gonna be dealing with outside if you wanna come back into this room after this. So that's part of why I'm being particular about it. All right, thank you very much. All right, let's move on. So we talked about the executor. Let's talk about foreign data wrappers. So Postgres has a system called foreign data wrappers. This allows Postgres when you execute a query against the database to go out and reach out into another server and get data from it, right? That includes another Postgres server. We can push down joins. We can push down where clauses, right? It's great. That also includes things like, I don't know, Twitter and Google. If you wanna do a Google search from within your database, you can implement a foreign data wrapper to do that. The baseline code for how foreign data wrappers works lives inside of foreign. That's not where the foreign data wrapper itself lives. This is all of the backend code. The foreign data wrapper for, say, Postgres fdw lives inside of contrib. We then have just-in-time compilation for Postgres, right? This is where Postgres can take a query of yours and actually, for some portions of it, build a little program in memory, compile it, optimize it, and run it using LLVM. This is the infrastructure for that. Lib. So, Lib is just a kind of placeholder for a lot of stuff. We dump a bunch of different stuff in Lib for backend code. LibPQ. So, we talked about LibPQ being under SRC interfaces, right? Why is it here? This is the backend half, right? You have a front end and a backend, right, when you're talking the protocol. So, the front end protocol, the thing that talks to the front end protocol is SRC interfaces, LibPQ. That's what clients use. And when the backend is communicating back and forth, that's from SRC backend LibPQ. Main. Main is where main is. Define starting Postgres. And in fact, when we start Postgres, we often start up a whole bunch of processes with it, right? We start up a check pointer process. We start up auto vacuum. We start up a bunch of other stuff. That's where everything kind of gets kicked off. Nodes. When you see an explain plan, you see those different nodes, we have a generalized node infrastructure for doing that. That all lives inside of nodes. We talked a little bit about the optimizer. The optimizer is what's gonna actually take your query that you bring in and optimize it for running it. The parser is what actually parses the literal string that comes in. And then we have partitioning. This is native partitioning. So, Postgres now has something called native or declarative partitioning, right? We used to have an inheriting system. Now we have native partitioning. And this is where that code lives. Po is just translations. So, if you're interested in doing translation work, that's where you would go to check out our translation system. So Postgres actually translated into multiple different major languages, which is pretty neat. Port is just back in the big platform hacks. The Postmaster process. So the Postmaster process is that process that lives all the time. It's constantly running, right? Every time Postgres starts up, the Postmaster is the thing that actually listens for inbound connections. So that's where all the code for that, right? It answers requests that come in for new connections. Regex is Henry Spencer's Regex library, which is also used by Tickle. And in fact, we've kind of become the de facto upstream for it. The Tickle folks look at us for fixing bugs in it and whatnot. And yes, there are bugs still to be fixed in things like the Regex library. In particular, we had a very interesting case where somebody wrote a tool for basically fuzz testing SQL called SQL Smith. And one of the things it found was that there were ways to make our Regex library go dumb, right? And just chew up lots of CPU. And so some of those things have now been fixed and patches released and whatnot. And some of that has gotten back into Tickle as well. We then have replication, which just basically handles the replication protocol and shipping wall logs. We have the query rewrite engine. So the query rewrite engine is something that kind of happens early on, right? It actually is before the optimizer. So I lied a little bit, I skipped a piece, right? So when things come into Postgres, the first thing that happens is it gets parsed, right? After it goes through the traffic cop, then it gets parsed, then it goes through the role system, then it goes to the optimizer, then it goes to the executor. So there's a lot of different pieces here. But the rewrite engine also is what handles row level security, right? And it's important that that happens before optimization because you actually wanna have the constraints that are implemented by row level security be able to be optimized, right? So that we can then more efficiently pull out the rows once we've gotten whatever additional constraints are required by row level security put in place. Snowball stemming is just used for full tech search. We actually pull code from elsewhere for that. We have an extended statistic system. We have a storage layer. Storage layer is basically what handles the actual direct IO into the underlying system. So this is things like if you wanna go read a page, that's the part of the system that actually goes and reads the actual page up into shared buffers. We then have the traffic cop. So that's what actually gets the queries in. And we have T search, which is our actual full tech search engine. And then we have utils, which has got a bunch of different stuff in there, including our caching system and memory manager. Questions about the back end code directory stuff. I know, I'm going too fast. But that's okay because we're gonna slow down a little bit and we're gonna start talking about what do you wanna change when you wanna hack Postgres? This is what I always wonder about first, right? What do you wanna do? You wanna new back end command, right? You wanna modify table, I don't know, right? What does it do? Or maybe you wanna implement something like merge, which is an actual SQL standard command. Maybe you wanna add a new backslash command for PSQL in case you would go look at SRC bin, PSQL, where you wanna make improvements to PG bench, right? That's also an SRC bin. If you wanna improve performance, come talk to us because it's a lot of work. But it's good work, it's good work. If you wanna add a new authentication method, right? Then you're talking about changes to SRC interfaces libpq and SRC back end libpq. If you wanna add support for another TLS or SSL encryption library, a lot of that goes into, again, SRC interfaces libpq. At least if you're talking about it for the protocol. So we're gonna talk a little bit about changing an existing back end command. So it depends on where do you start, depends on what you wanna do, right? When I'm talking about adding a new back end command or modifying one, I wanna go hack the grammar, right? Because the grammar drives a lot, right? What is the grammar? The grammar is that thing that takes whatever you have written out in text and decides whether it's legit or not, whether it's acceptable. And if it is acceptable, what is gonna happen, right? And so some of the things you have to worry about when you're hacking the grammar are things like ambiguity, right? You don't wanna end up with two identical statements that could be considered in two different ways, right? Postgres needs to have a grammar that is not ambiguous. So where does the grammar lives? It lives inside of the parser. One of the other things about the grammar is that it's one of those things that can be difficult to get agreement on. Not everybody agrees with what you want your grammar to look like. Should it be create index concurrently or create concurrently index, right? These are things that we argue about. So let's talk a little bit about the parser versus the grammar. The parser actually consists of two pieces. We have a lexer, which takes all of the individual words inside of the statement that you've provided and tokenizes them. Then the grammar is what actually defines what tokens are allowed to be used with other tokens, right? And in what order? So while we're doing all of this parsing, we're collecting up bits of information about the command so that we know what you wanna do. And then once the full command is parsed, a function gets called from the grammar. So the parser lives in SRC backend parser. We talked about that. Scan.l is our lexer. Gram.y is our definition of the grammar. And we have a bunch of parsed routines. We have an analyze that transforms the parse tree into an actual query statement and then, or a query structure, and then we have support routines for it, right? That's all pretty straightforward. So, all right, we're gonna modify the grammar now, okay? So the grammar is a set of productions, right? Every production starts with a production name, a colon, and then a list, essentially. And that tells us what we can do at this point, right? So the very top point of all of this is this statement, right? And then we have a list of every possible statement that we accept, right? And they're separated by these pipes that are OR commands, right, they're just indicating OR statement, right? This, or that, or that, or that, or that, right? Copy statement is what we're gonna look at here. And that's a beamer mess up. Cause that should be a pipe, I'm sorry. Okay, so we then have at the top level for copy, we have this copy statement, right? And then we have all of these other things in here, right? Things that are capitalized here, those are direct tokens, right? Those match directly into the lexer. Things that are lowercase are other productions, right? So here we have things like op binary, qualified name, op column list, et cetera, right? These are actual other productions. So let's look at some of them, right? So here you can see there's a copy from production that says whether it's copy is from or to, right? Dollar dollar says assign this node this value, right? So this particular copy from is getting assigned either true or false based on whether it's from or to, right? And then we have cop, you know, op program, which is program, so again, true or false, whether we have program. So this is a optional additional keyword, right? To the copy statement, et cetera, et cetera. Then we have multi value productions, right? So here we have cases where we have, say, this copy generic opt list, we have copy generic opt element, right? And that is a production itself, but when we get to this bit of code, the value from this production is gonna be in this dollar one variable, right? And then when we set this equal to dollar dollar, this list makes, we're gonna turn this dollar one, oops. We're gonna turn this dollar one into a list, right? And then if you have multiple options here separated by a comma, we have a dollar one and a dollar three, note that we skip over dollar two. Why is that? Because this goes in ordinal, right? So the dollar two is actually the comma itself. We don't need the comma. We're gonna go past the comma, right? So we have dollar one, dollar three, and we're gonna append to this. This makes it recursive. So you can have however many of these you want, and we'll figure out how to make a list out of all those different options for you. Everybody kind of following along with this. This at one, it provides a line number, right? So when you see post or a position, rather I should say. So when you see a post gonna say, you screwed up here, that's what that's about. Any other questions about this stuff here? Come on, come on, come on. It's all good. Thank you. All right, so I think we actually came through a lot of this, but this is kind of a summary of, right? We have seed template code in the grammar. It's compiled as part of the overall parser in the gram dot, in the gram dot c. Dollar, dollar is this node, dollar one is blah, blah, blah, we talked about this stuff, right? Everybody get it? No, all right. Copy options list, here we talked about the production of the different copy options. So this is basically making default elements. So we're saying format, oids, freeze. We're just passing those through. All right, so we're making up a options list here for all the different options. All right, so if we wanna add a new copy option, what do we do? Well, the first thing we're gonna do is add it into that list of options. So this option that I'm adding is gonna be called compressed, right? So we're gonna add it to this list so that now the grammar will understand it, right? The grammar will understand it and accept it, but other things won't. Come on, come on, come on. Right, the rest of the system isn't gonna have any clue about this yet, right? This is just the very beginning, but this is where we start. And you wanna make sure that you can get the grammar to build correctly, right? Because if you start seeing things erroring out from that, that usually means you have some kind of ambiguity. You've introduced a case of ambiguity into your parser. If you do that, we're not going to let it go in, right? That can be either like a shift-reduced conflict or a reduced-reduced conflict. I'm not gonna go into what those are, but if you see those, those are bad. Don't go fix it or talk to us about what it is. You do also wanna avoid adding reserved words if at all possible, right? When you do need to go add in a new token like this, you wanna go add it to the list of unreserved keywords if you can get away with it. If it has to be a reserved word, we're gonna have talks about it because we've gotta have a discussion with people about any new reserved words coming in. We also need to update the actual list of tokens. That's in keywlist.h, it's pretty straightforward. All right, so now that we've got this in here and in the code, right, what are we doing next? So the code for copy is in srcbackincommandscopy.c. Copy's got a function to process all of the options that are given. That's pretty convenient. So what we're going to do is we're gonna go add in a new Boolean into this code, right, to keep track of this. So this is inside of our structure. We're adding a new item to this .h that's gonna be basically, is this a compressed file or not? When you are defining a structure inside of a .c file, we do ask that you put it near the top because that's a pretty important thing. We don't have a hard and fast rule about only having them in .h files, although typically they're gonna go there if they're being used by other parts of the system. So now talking about the code itself, right? So this is pretty straightforward here, right? We have an if def for have lib z, and then we say, okay, if we have it, then if this C-state compressed, so this is the option that's come in and said it's compressed, then if we have it, we're gonna say, if we already have it set, we're gonna say conflicting or redundant options because you've specified it twice. Otherwise, we're gonna actually set this compressed flag based on this Boolean value that's been passed in. If we don't have lib z on the system, we're gonna throw an error saying we haven't been compiled with lib z, sorry. So is that all you have to do? No, there's a lot of stuff. This is the actual diff stat from when I wrote this feature, right? So this included things of having to track gz file, right? So when you're working with gzip files, you have to track gz file stars instead of file stars, right, that's a lib z thing. And then you have to use gz read and gz write. So I had to go make changes to parts of the system to address that. And that's where that SRC back in storage file, fd.c came into play. And then I had to make documentation updates over in doc-source sgml-rath-copy. And then I added some regression tests which were inside of SRC tests. So hopefully all these different pieces look familiar. We're gonna talk a little bit about what happened with this because this never got committed. We'll talk about that in a minute. So all right, Postgres has a bunch of different subsystems, right? We have specific ways of handling memory management, error logging and cleanup, linked list, catalog lookups, nodes, datums and tuples, let's talk about that. When it comes to memory management, all memory allocated in the backend is done so inside of a context of some kind, right? If you wanna allocate memory inside of Postgres, that is going to be living throughout the lifetime of the backend process that you are currently operating inside of, you would allocate it inside of something called top memory context, right? If you allocate something inside of top memory context and you don't free it, then it's basically a memory leak, right? On the other hand, if you allocate inside of current memory context, which is typically the correct context for whatever you're doing right now, we will typically clean that up automatically for you, right, because current memory context is gonna be something like a per query context or a per tuple context, right? And what that means is that Postgres, when it's done with that query or it's done working on that tuple, it'll throw away the context and anything below it, right? So we clean all of that up for you. So our allocator for all of this is called Palik. So that's what you need to be using whenever you're allocating memory. This is also true for extensions. So if you're writing extensions, you should be using Palik, right? Because your code is running in the back end, right? Errors and asserts. So in the event that you have something that can't happen, right? These are situations that just shouldn't happen. You can use E-Log, right? E-Log always runs, and typically it should not be used where a user might see the results, but it can be useful for debugging. Assert is your typical standard assert system where it's only running in assert-enabled builds, but do be aware that assert-enabled builds can therefore act differently from non-assert builds. But it's useful to make sure that other hackers are using your function correctly, right? So when other people are writing code, they're typically doing it with asserts enabled, and it's good to have asserts in there to make sure that the arguments that are getting passed to your function make sense. When you're logging from Postgres, you should be using eReport, and you should be passing in an error code and an error message, right? So we have a style guide for our error messages that you should go make sure that you're familiar with, but the basic gist of it is that one of the important things about eReport is that when you are doing an eReport, an error or a higher Postgres will just take care of cleaning everything up for you, okay? So this is really important because running an eReport with an error will actually roll back the transaction for you and free all those memory contexts. The way we do this is actually through a long jump, right? So the eReport system will actually long jump out of that code, back up higher up in the stack, handle cleaning up everything, freeing all of your memory contexts and whatnot, and rolling back the transaction itself, okay? And once all of that's done, we're ready to accept new queries. So you wanna make sure that you're using this appropriately. If you do an eReport with a warning or warning, notify, anything like that, right? Those lower levels, the code will keep running. It'll just issue a warning or a notify back to the user, but the code flow will continue from that point. So everybody can understand how that works? All right, syscache and scanning catalog. So we have a function called search syscache. So we talked about the system catalog a little bit earlier. It's tricky when you wanna modify or deal with the system catalog, because there can be concurrent things happening. Other people can be making modifications and we have this whole caching system to make it so that the system is performing, right? So whenever you wanna go look things up inside of a system catalog, you should be using search syscache. And that basically is some key that you pass in, right? So if you wanna look up a table, right? You would use search syscache and you would pass in the OID of the relation that you wanna look up, right? The OID being the table's OID that you wanna go find, right? And then when you get that back, you'll get back all the information about that table in a cached form that you can then access and work with. When you're done with it, you have to call release syscache to let the system know that you're done with that. You don't need to work with, you're not gonna work with it any further. We have some convenience for routines around this for else syscache as well. Just in general, always look for existing routines before implementing a new one because we have lots of pieces of our system. So I talked to him about nodes earlier. Expression trees are made up of nodes. Every node is a type plus a bunch of appropriate data. The type is stored as the first item inside of the node structure so you can use ISA on it to determine what kind of a node it is. Whenever you create a node, you should be using makenode and then you need to make sure that you add into nodes.h and then all of the make copy and equality function pieces inside of back end nodes. So if you add a new node into the system, like there's a bunch of stuff that you have to go add, a bunch of boilerplate essentially, right? Things for knowing if the two nodes are equal, things knowing if you know how to copy that node. You can't cover the thing, thank you. It says zero minutes left and I've got 10. All right, datum. So inside of Postgres, we have this thing called datums, right? That's basically a structure for a single value. And you can convert these back and forth between C items and Postgres datums. So int32 get datum or datum get int32, for example. Note that datums can be out of line. Postgres will sometimes toast things. If things are big enough, if a single value is large enough, we'll toast it and that means we'll compress it and we'll store it out of line. So you have to be aware of that when you're working with datums. If you try to go access some datum, you need to first check and see, has it been toasted? Do I need to do anything to deal with the fact that it's been toasted? So be aware of that. Tuples are then rows, right? Tuples are datums that are all shoved together inside of a row. Heap tuple is defined in include access htube.h. We have a heap tuple data, which is an in-memory construct of a heap tuple. And that includes a length of the tuple and a pointer to the actual header. And then we can use these in a lot of different places. So be aware of that. In particular, we have what's called a minimal tuple structure. That's what we use for hashing, right? So if you wanna hash something, you typically don't wanna necessarily hash the entire row, right? So instead we have something smaller that allows us to build up just what we need in order to put together the hash table. All right, running a little, a bit of a long time here. So I'm just gonna kind of skip through this. This is how a tuple itself is actually defined. It has a number of attributes, some flags, and then the data follows the header. We have a number of other subsystems. So I'm just gonna kind of hit on these, but just realize that this is stuff that, like, don't go implement another one of these when you're hacking on Postgres. We have enough of them. For example, we have a linked list implementation. Then we have a doubly, and another single linked list implementation. We have a binary heap implementation. We have something for solving the knapsack problem, right? We have red-black trees. We have string handling, right? We have all of these things already written inside of the back-end code. So don't add another one. Go see if you can't use what's there. When you are hacking the Postgres way, you need to be working with the pgsql-hackers mailing list. This is the primary mailing list for discussion about Postgres development, right? You need to get into account at list.postgresql.org in order to subscribe to it. And then you can discuss your ideas and thoughts about how you want to improve Postgres, right? Always watch out for other people working on similar things because that actually happens pretty often. Try to think about general answers, not specific ones, right? And be supportive of other ideas and approaches. So what happened with this copy-compressed feature? Well, somebody wrote a feature called Copy Program, which will actually send and receive data from a program and that program could happen to be like GZIP, right? Or GunZIP, or maybe BZIP, too, if you wanted to do that. So it's not quite identical, but there ends up being a lot of overlap between what copy-compressed could do and what copy-program could do. So in the end, we went with using Copy Program because it was more generalized and allowed you more flexibility in what you wanted to do. And it also meant that we didn't have to have direct Zlib, like GZ file and things like GZ-Read and GZ-Write go into the system and it was nice to be able to avoid that. Whenever you're working with Postgres, think about your code style. Try to make your code fit in. PG Style Guide is in the developer fact. Be aware of any copy and paste. We try to be C99, actually, in some ways go back to like C89, but the standard today is more or less C99 compliance, right, with some caveats. We only use C-style comments. We don't do C++ style comments. Generally, you want to have comments on their own independent line. We're not a big fan of having like an inline comment on the same line as code. Always describe why, not how, right? We can see what the code is doing. We want to know why is the code doing this, right? That's what really ends up mattering here to us. Use a larger comment block for larger code blocks, of course. And in particular, comment your functions and tell people why they should use this function and what the function does. When it comes to the error message smile, style there's three main parts. There's a primary message, a detail information, and then a hint, if appropriate. Don't make any assumptions about the formatting. Use double quotes when quoting and then quotes for file names, user identifiers, and other variables. Try to avoid using passive voice. Postgres does not consider itself to be a human, so use active voice. When you want to submit your patch, you should be using context diff or git diff. Ideally, just pick whichever one makes it better. We use multiple patches in one email. Don't multi-email, we're not the Linux kernel. Include in your email two hackers, a description of the patch, regression test, documentation updates, PG dump support, in particular, is something that's really key when you're hacking on Postgres. And then register your patch on this thing called commitfest.postgresql.org, right? This is where we track what patches have been submitted. If your patch is not on here, we probably aren't looking at it. We have a number of upcoming events. These are Postgres community events plus FossAsia, which I'll be at. So if you're interested, check these out. There's a number of them that are even relatively local. I'll be at scale, I'll be at FossAsia, I'll be at PGcon APAC, and I'll be at PGcon. So when it comes to the big kind of annual Postgres hacker meeting thing, that's at PGcon, that's in Ottawa, Canada. All right, questions? No questions? Fantastic, thank you all. You guys have been a fantastic audience. Thank you. Exit out this way. So if you are leaving the room, please use this door. Exit this way.