 So y'all are here for postgresql hooks for fun and profit. I don't. You can actually make profit. I wouldn't try it, but you can. My name is David Fetter. I'm a husband and father for a long time postgresql contributor. And I've expanded SQL. If anybody's used the writable width clauses in postgres, that's the thing I spearheaded and drove to. Inclusion and now other people are starting to use them. I think there's a way to do this on Meetup, but I depend utterly on feedback from people so I can improve. So, why don't we get started? Have you ever updated or deleted every row by accident? Alrighty. Yeah, that's not super fun. I wanted to slow people down when they were trying to auth your database too quickly or maybe there was a bug and something was offing very fast or maybe there's an attacker. How about made a cool constraint that just happens to require that you be in serializable isolation? Quick aside, who thinks they understand isolation levels lower than serializable well enough to write code against them? You sure? Nope, no hands? Yeah, so that's one of my pet projects. I would like to make serializable the default and make it difficult to pull away from it because nobody really understands the lower levels. How about if anybody here wanted to replace the traveling salesman problem-based planner with one based on simulated annealing? Y'all are bright here, maybe somebody will want to do that. Excellent as it is. Yeah, well. Okay, so to solve this, you could try to lock out anybody who could make a mistake. Let me know when you find somebody like that that couldn't make a mistake. The person doesn't have access to my data. Yeah, right. So you could hope the network layer handles the offing too fast problem. But while hope is a strategy, it's usually not a great effective strategy. I mean, it's a strategy. And you also better hope that your network layer isn't Mallory. Is this familiar to this terminology familiar? Okay, so in security circles, you have a bunch of people with a bunch of different names, Alpha, Bravo. So there's Eve who's an eavesdropper and Mallory is a malicious person who can go futz with your packets. And Mallory is kind of what you don't want to have your network layer acting as. So let's see, you wanted to make that cool constraint that requires serializable isolation. You could add a check for that in all the triggers that you write. And you still wouldn't be 100% sure that you've gotten it all. And as far as replacing the TSP base planner, I understand that in other systems, it is possible to replace the planner, but you basically do a complete rewrite. You don't like, there's not a way you could do this. You just take the code and you try to make a new thing out of it. So hooks to the rescue. So I'm going to guess that not everybody here knows what these are in the Postgres context. But I show hands, who knows, we've seen them. Okay, so y'all bear with me, I'm just going to kind of... So just in general, a hook is a thing that at a certain junction in the code takes a look to see whether there's an executable that has been sort of pointed to somehow. And if it's there, it executes it and then it returns control to the underlying thing, whatever that junction was. And so you can think of this as going in kind of a few phases. There's snag where you attempt to intercept the attempt to do an action. So this is pretty broad in general and that's because it's broad in general in the Postgres code. So for example, trying to auth. The attempt to do that would be the thing that you catch. You haven't executed any of the code behind it yet because your hook comes in first. Then we're going to make some attempt to be polite about this. So we catch this state that it was in before. You could in theory write several different hooks for a given action. And when you do that, they are executed in alphabetical order. So if you need an ordering, you can kind of make them tweak it like that. Then you actually run the code that the hook is supposed to do at the point where it's supposed to do it. And then because we're being polite and we'd like a system that actually functions, we set the state back to where it was before and continue. Assuming it hasn't aborted your query or whatever it is that it was meant to do. You do need to write these in C right now. I don't know for sure if anybody is working on ways to write them in like Python or Rust or whatnot. But right now they're C. It's not too bad. Here's what a kind of a bare minimum piece of this looks like. I don't have a pointer here, do I? So basically the hook code will have this preamble and it'll have postgres.h. That's the one that it must have. Then any headers you need for the execution that you're going to do. So like if you're in the parser context, you would include headers that I had to do with that. If you're going to make a function. You decide what things you're going to need. You'll know pretty well whether you needed them by removing one randomly and seeing if it still works. Anyway, then there's a thing called PG module magic, which is just what it sounds like. It's a macro that handles all the stuff of being in module and postgres. You don't really have to worry too much about the inner guts of it. Other people have banged on that pretty hard so that you don't have to. But you can if you really want to. And then there's all the hooks have to have these two functions. Which is init and finny, which is to say the entering the hook code and then the exiting the hook code part. Alrighty, so this is sort of a little bit like an example, but this is not an actual hook. So when you grab the state of the execution or the thing, each hook will have its own data type. And so you have to kind of persist that in a holding variable. So it's like previous system hook is now set to the current system hook. And then you assign black hole hook to the current system hook. And you're off to the races. This bit could be very, very small or very, very large. A little later I'm going to show you a fairly small one that I think is still useful. But if you're, for example, replacing the optimizer, there might be a few more lines of code in there. And then in PG finny, you just restore the state that you had cached before. Just so you can move on and be a good citizen inside the code. It's hard to get a count. I'm working on this. I think there's 82 hooks in the Postgres code. I'm working on a patch to document them all and make them show up in man pages and whatnot. Because I got sick of trying to figure out how many there were. But I haven't got that there yet. They go all the way up and down the code. So like from the very first part of any Postgres execution, from off all the way through the planner and the executor and the way to get back tuples, it's like there's places where you can inject yourself. Is that how the expansions are being entered right now by the Postgres code? No, they're just, like in many, many, many cases, the hook is just there in case somebody wants to use it. And so it checks for the presence of hook code. And I'll show a little bit about how that works operationally anyway. And then it just keeps going on. One major exception to that is the PSQL client, which appears to be written entirely out of hooks. I don't know why, but that's what they decided to. So, now here's where I bring up some boring C code and I hope it's... Okay, is this big enough for people to see or ignore bigger font size? Oops, let's try bigger. Let's try bigger this time. Okay, so this is a hook that... Well, let's see what this code is doing. I'll just kind of step through it. I hope you don't get too bored. So, as previously mentioned, include Postgres.h, that just means a bunch of stuff comes in. In this case, there's a function that we're going to write, so the function manager header needs to come in. And parse analysis comes in because we are looking for things in the query tree. Now, what is this thing? And of course, we need to log stuff in case we send up an error message visible to the user. So, E-log, E-report, that kind of thing. So, what are we doing here? Well, in a few lines of code, we're just going to make sure that nobody does any unqualified deletes or updates anywhere in your database. And I've snagged myself enough times on this that I find it useful, and I would like to make it a part of Postgres, at least optionally. So, moving right along, so there's this module magic that makes it into a Postgres module and all that kind of fun stuff. And I guess I could have pulled these out into their own header file, but for two of these, it seemed a little... I mean, it's basically an aesthetics call. You decide. If you want to make a special header for this, you can go ahead and do it. And here's where we're getting to a piece of it that's starting to be part of what we're actually doing. There's a Post-Pars-Analyze hook. So, what that means is that once you've finished parse analysis, which can be kind of a complicated thing where rewrite rules happen and views get expanded into the tables that they're composed of and stuff like that. So once you're done with parse analysis and you're starting to hand off to the planner, this is the point where we want to step in and say, whoa there, if we need to. And complains where this is not present. So there's one function here, so that's what we needed the function manager to deal with. It takes a parse state and a query. So if it's an update or a delete, so skip everything else, skip the selects, inserts, utility commands, all that. If it's one of those and we continue, we want to make sure that there is something to look at in the join tree. So I just stuck that assert in there in case it should ever become, in case there should ever be a time when the join tree doesn't magically appear when you do one of these. What is the join tree? So that's the part that hints to the planner that a join is actually going to occur. But it happens that it gets instantiated for everything and we may need to refactor that at some point. But at this point, everything that's, I'm not sure if this is true of utility statements anymore. Anyway, anything that you normally think of as an SQL command, select insert, update, delete, has a join tree structure that comes out of it once parse analysis is done. So if we have the join tree crawls, and that's where that where clause would be, if it's null, then we have found our query that we don't want anybody ever to execute. So that basically says that there wasn't a where clause of any type. Again, this could change like if in parse analysis phase we start to throw away identity clauses, then I'll need to rethink this a little bit, how this works. Because right now if you put in a tautology in your where clause, it will still get passed down as a not null qual. It'll still be there even though it means nothing. So now we basically just decided which error to throw. Update without a where clause gets one error, delete without a where clause gets a different error, and we're done. Just for completeness, I mean here's what PG and NIT and PG Feeney look like. There's not a lot of there there. I just persist the state of the parse analysis hook into a variable that's in a memory context that lasts long enough for this to actually go. And then in the PG Feeney I just restore it. Yes. You said before that these functions above would execute alphabetically. And you're talking about this as a single function, right? This is simple. But you could have multiple functions, but you would want to do them in a way that was alphabetical to make sure that they happen sequentially. So the question was just for the mic. Do the functions in this part execute alphabetically? And I apologize for not being clear. It's the names of the hooks that execute alphabetically. How you arrange your code in here is up to you. So the hook names at any given juncture will execute in alphabetical order. But the functions inside this code are kind of opaque to that. Questions, comments? Yes. That's only hooks where there's a pie at a particular juncture. If one hook is called earlier in the code then another hook is alphabetical. Right. So the question was, is this a global ordering of hook names? And the answer is as you guessed is no. It's just that at any given hook, at any given place where you could put hooks, if you have multiple ones, then they execute in alphabetical order right there. And then the context is entirely forgotten and you move on to the next place where there might be a hook. And then it has its own ordering. So the ordering is maybe it's scoped at the individual hook level? Yes. Yes, it's scoped at the hook level. And yeah. So it seems like an awful lot of bookkeeping if you wanted to make a total ordering. But I guess there could be a reason. I just can't think of off the top of my head what that might be. If you were not a C programmer and you wanted to use this hook and you started assembling them, you're like, oh, I want to get this feature and I want to get this feature. Most people would assemble that into one like mega hook. And it would probably... Okay, so if you're not a C programmer and you want to use hooks, you're in kind of a rare category. Because generally if you want to use hooks directly, you're kind of in a rare category. But let's say you have three different hooks at the off stage, let's say. And they come from different vendors. You just want to make sure that they don't step on each other. And you might not even have access to the code. You might just get a DLL on Windows or a .SO on other platforms. And that would be your hook code unless you're disassembling it. Now I don't know of any proprietary hook vendors, but I know that there are proprietary hooks running all over Amazon RDS, for example. And that's part of how they manage to make the guarantees that they can make. They have hooks in there that kind of prevent you from really being super user and hitting the actual file system on their little nodes. Or their giant nodes, depending what you feel like paying for. Okay, so that was our boring C code. I hope that the cyclomatic complexity of that one wasn't too, too bad. Oh, here we go, live demo. This is where I show either foolishness or, well, probably foolishness. Okay, so let's see here. Okay, so this does kind of what you would expect it to do. It just lacking the hook, it just goes and deletes it. Here's a little magic you use for developing and debugging this thing. If you really wanted to insert this hook code for production, you would have it load on instance start. And you would do alter system or something along that line in order to make it go. But in this case, okay, so now let's try it again. Right, so now this was a low stakes experiment, right? Who has nothing in it yet? But you can imagine that if you are at a console and you're hitting control C really, really fast. Because you're hoping that you can undo the terrible thing that you just did. This keeps you from having to suffer that particular indignity. Now, this could go a lot further. Like, for example, it could look for tautological ware clauses that basically come out to wear true. And it could disallow those, too. But there's only so far you can really go with that because you're basically... Well, okay, so you have a piece of code and you're trying to figure out whether it's going to stop executing. Does anybody remember what that one is? Not that problem is called? It's the halting problem and it's known to be unsolvable by any system. So at some point your ability to analyze the random calls that you get is kind of limited in that way. Yes? Is there a way that you could hook into making implicit transactions? If after delete or update... If after delete the number of rows in the table is zero, you implicitly fail the transaction. The question is, is there a way to abort the transaction if the number of rows in the table at the end of it is zero? And the answer is, I don't know. I mean, you would have access to information of that type at the end of a statement. And you would probably have it before the transaction closed out. But that puts you in the nasty situation of having actually executed that query. And so you've made a whole bunch of new... Or well, in this case you've marked a bunch of rows dead. And then you would then have to... Oh no, they're not marked in... I should take this offline. Sorry. Yes, it would be possible. It would be expensive. Not as expensive as recovering from data loss? That's true. It would not be as expensive as recovering from data loss. Alrighty, so that didn't utterly fail. So I'd just like to talk a little bit about the serializable. I'm still working on this hook because I want to make it broadly general. What I'd actually like to do eventually is make a GUC setting, which is sort of Postgres' configuration variables. It stands for Grand Unified Configuration. I'd like to make a setting so that you can just say that this database or this instance is in serializable isolation. I think that would actually help. But right now you can make a hook and I have some preliminary code for it. Let's see. There's one called Off Delay that does exactly this. It basically just backs you up, sends it, does a sleep before it returns, right? Obviously it does it the second and further times. How does it know that? Well, it keeps them stayed around. It keeps them stayed around in that memory context. So because Postgres' architecture at the moment is one process per back end, you can have state that you keep around for the length of the process and that'll generally do what you wanted. And yeah, somebody went and did their master's thesis replacing the traveling salesman problem solver with a simulated kneeling solver, which at that time anyway was almost as good as the TSP based one. And I don't know if that says bad things about where the optimizer was at that point or whether it says that this guy was genius because he managed to do the entire thing in like three years. Any questions, comments? Brick bats? You have a question. So the hooks run in process with the server if you crash? You crash the server? The question was the hooks run in process with the server and if you crash, you crash the server. So not necessarily. I mean, there are ways that it could crash, like scribbling all over PGD. So there are ways to crash the server using them but as a rule they're attached to the back end that they run in so if your hook crashes in some not deliberately destructive way it just takes out the back end that it was running in and the rest of the back ends which have been forked are still running and you're still okay. So as long as it hasn't done a broadly destructive operation. And the server will spin up a new back end? Yeah, yeah. The example you showed earlier had low loading and extension in the current session. So presumably the hook initialization and destruction will go through that one process? Yes. In the normal case where you use auto-system, is that happening inside the master process before the fork? Or is initialization deferred until after connections? The question was when do pre-loaded things actually get loaded into the process space? And the answer is I need to get back to you on that and please remind me afterwards because I don't want to answer in territory I don't know. Yes. So if your hook does crash, I mean it may not take down the whole server but if it's in something, I mean, you know, if you have a bug that's crashing the same amount, there's no way that whatever you've hooked into is going to ever complete until you fix that, right? Or am I misunderstanding? The question was if the hook crashes, is there a way that the thing that it hooked could ever complete? And the answer is no. It's the hook's responsibility to hand back control and if it's crashed, it can't hand back control. I mean, you're kind of stuck there. Yes. So I mean your example is, if I have it correctly, you're inserting the leading. So I have kind of two questions. Out of those 82 hooks that you mentioned, what are some of the more popular locations in those processes and how does this system know where to insert the hook that you want it to do and what that does? So the question was basically what are popular hooks and how does the system know which place to hook the code? So that's where all those headers and the data types for the hook come in. Because I had a post-pars-analyze hook, that's how the system knows that a post-pars-analyze thing is going in there. If I had made it some other kind of hook, then it would have executed there assuming that I had all the right headers to make it actually function. So the answer is basically that each hook has its own type. You must specify that type and that's how you're communicating to the system that the hook is going right there. Can one library hook multiple hooks or they each have to be their own separate PG unit? Separate. Right. The question was, can one library have multiple hooks? And the answer is I don't know. The case that's common is that it's already pretty rare. Hooks are pretty far out in the space of exotic stuff that you do with Postgres. So the thing that's more common that I've seen anyway is that two different libraries would be attached to two different hooks and then you just want to make sure that they're executing in an order that makes sense. Ideally, you want them to commute because you don't want something quite that fiddly and delicate in your hook code. I guess at some point we might change how the order of hooks firing goes, but let's see, how would that work? If you... Yeah, you could make a... Yeah, each shared library would have to have just one PG in it and one PG Fini in it as far as I know. Yeah, I was merely thinking... I'm sorry, the actual hook itself is basically just throwing a function in a global variable, right? You're updating a global variable in it. So you could update multiple variables. As long as you have each hook within your library have the right code, you could update three different variables to update three different hooks. Yes, you could have multiple different hooks in your code, I think. I say I think because I'm not sure whether... Let's see, how would this work? I'm not sure of PG in it if in it could actually have some logic in there as to which hook it's looking at. I'm not sure how it would be able to tell any such a thing. I can't think of a new case for it. I was just thinking what if I had something I wanted to do and I needed to hook multiple places to do the thing I wanted to do. I think you probably just want multiple hooks. Frankly, it's a little easier to test and debug each hook separately and then when you decide to combine them, you can test the combinations as a thing that you're doing for integration testing. So your unit tests would be on single hooks and your integration and system tests would be on the multiple hook scenario, whatever it was that you were doing. Anyway, so... Lots has already been done here. Your imagination is really the main limit. And as always, when you're writing code, have some fun with it. Just like this week, somebody decided to write new hooks for session start and session end. Don't know about session end ones precisely, but at session start, you can picture something back in, preparing a bunch of queries that are getting ready to be executed so the client code doesn't have to worry about it and the pooler doesn't have to worry about state. I don't know if anybody uses prepared queries, but they're this rare thing that's both a security feature and a performance feature at the same time. So what they do is they let you make a parameterized query and do that parsing just the one time. And then because it produces a specific abstract syntax tree, there's not ways to inject more code in there because more code isn't... there isn't a place for it. Like, by the time you're done, it's got a space for this integer and that varchar and that point and that's all you can put into it. So it's also a performance thing because parsing takes time and if you just have the AST ready to go with just parameters to put in there, it takes less time. So it's both the security and the performance that you might want. Anyway, really appreciate you all coming out tonight and taking the time. Thank you.