 On the master slide, that wall sender on the bottom is the process that is reading the data, going into the transaction and all of the binary data that's basically saying modify this file, update these bits, put those bits into something at that kind of low level. And it's sending that over to the standby through what we call the wall receiver that is going to take that transaction information and write it up to the transaction log on the standby side and then make those updates to the data file from the database. And at which point, depending on whenever the statements are coming in from the application to the standby server, it's going to give you back whatever is current and the heap on that file. The actual replication is built on top of that, that typical replication architecture that's already in place over there. It simply adds to the wall sender side such that there's a component called logical decoding which is going to take that binary information that's saying edit this file, change these bits to these values. It's going to say, well now we're going to edit this table and we're going to modify this row and choose this column for something a little more human readable if you will. And with the API, additional APIs that are provided over on the wall sender, what it doesn't need on this screen is the output plug-in. You can create data streams that you can send out to whoever wants to receive them. So then on the standby side, what we have is a supply process that will consume the data streams that are coming out of that output plug-in with the wall sender. And then apply those changes so if an update comes in, it's going to look like if the table is being updated, these rows with ID 1 through 10 need to have this column updated. It'll actually perform those individual statements if you will over some providing. And so when the transaction log is updated, the sheet gets updated, the files gets changed. And from the application point of view, because these are not a clustered sort of systems if you will, these are two separate instances to stress. Your application could actually write data or write to both these nodes that data is going to have to be built. So it's that much obstruction of people. I'm not sure if I'm using the right word there, but it's not even the internal ID that's actually using the primary key. I'm from, it looks a little different on my screen here. I have been in the computer business for a long, long time. Started off with punch cards and programming in an IBM assembler language, doing communications on IBM mainframes, and have continued to grow with the industry to the point where today we take older applications and we make them function like newer applications, like something that users think, oh, that's a computer app on a mobile phone or a web browser, that kind of thing. So that's what we do. I decided to act a little funny. Now, another application launched. You guys can't see that. So let me get that minimized so I can advance the screen. Okay, but this is the wrong presentation here. So my computer is not cooperating. Pardon me. I have to go back to the basics here. This is weird. This is not good. I have never had this problem before. Let's see if I can solve it this way. Presentation is using essentially Postgres as the cornerstone or the foundation of a process to take older legacy applications and make them modern, move them fast forward into the future. And that's what this whole presentation is all about. What I hope to accomplish, what I hope you take away with you are the following. I have a view of what a modern application, business application should look like. And I think you'll find that it's farmed from a lot of current thinking so it might be consistent with the view most of you have about what a modern application should look like. I want to share that with you in several ways by demonstrating something that's really representative of a modern application so that we have a common target we agree on or at least you understand what my goal is. And then you're going to see a before and after case study on something real that we're working on today that implements these ideas. And then we're going to drill down into how all this stuff is accomplished. Throughout all that we're going to talk about Postgres and its role in the whole thing. And then I'm going to leave you with a bunch of things that you can try on your own. So introduction to modern architecture. I want to emphasize two things that are really important to how we go about doing this modernization. The first thing is we need to have a database management system. And Postgres is a really good kind of a poster child for what the kind of relational database management system that we need. So we need to get the older application functioning with a modern database management system. And the second thing we need is something called REST APIs. How many people are familiar with what a REST API is? It's becoming more and more well known. So those two tricks really the magic that we use in order to accomplish this. This is generic definition of REST which I won't bore you with. But the bottom line of REST is it's just a variation on an old theme that gets repeated over and over again. Remote procedure calls essentially. The ability to have objects, some objects someplace in the world that is an interface. And if some other client someplace else in the world uses that interface according to some rules then they can talk to each other. And it doesn't matter whether you are talking Windows to IBM mainframe or Windows to Linux or whatever. The API is the whole determinant of how they talk to each other. And that's what the REST API standard is. So by the way I actually do have this presentation on my website. And somehow I think I've got a slightly older version of my presentation because of the follow up. But at the end of this I'll tell you where to get it. Essentially it's modernization.exformics.com and then you navigate resources and articles. And then you'll have a copy of this plus some other notes that I'm going to talk about later. So next generation with a REST API. If you look at this model right here where you have a middle tier that exposes a REST API you could replace that with Microsoft's idea of how to expose a REST API or anybody else's. You could do it with Python with anything. And as long as you have a REST API and you have a JavaScript or AJAX client and again as I mentioned my model needs to have in this case we're using Postgres. Applications that have a modern look and feel tend to look like this. That looks like a form that you can relate to. It could be coming from a Windows machine or a Linux machine or whatever. You would have if you point to the URL that you can see it on the tablet. Same thing with a mobile phone. So that's what we're looking for in an application. So if we go here if I can change the picture. We'll be able to follow in that link. Just go to www.axelor.com and do their demo. You'll be able to do this yourself. And we're looking at a real ERP application. Modern by today's standards based on a relational database using RESTful APIs. Adaptive clients. And if you just go at it with your mobile phone you'll see that it all works that way too. So that's the kind of thing we're talking about. And then if we go here if we can describe the workflow and end up with a generated code. I can't get it to show my screen. I did not know that. Let's see how do I get this to not show anything. I need some technical support. Okay. So what the application looks like. So that has no doubt a modern look and feel. And so this is an outfit in France that developed Toolkit that allows you to build applications that have this modern look and feel with the adaptive user interface. And that's what we're using to accomplish the stuff that we're doing. So let's see if I can get it to show this right now. Go backwards on me. Before an after-case study. What prompted this whole thing is there is a customer of ours. The Hotel Playa Mazatlan, a beach resort in Mazatlan, Mexico. And they developed this application that is an ERP. Does anybody know the difference between an ERP and just a regular set of applications? Not only. Okay. If you walk into the Hotel Playa Mazatlan and you say give me the status of your whole corporation today. I say, oh well, let's shut our computer systems. Let's take the users off our computer systems for five minutes. Five minutes later they can tell you everything. They can close their books in five minutes at any arbitrary point in time. Because all their systems are integrated. So that means that the accounting, the room registration, everything that business does is all on that system. So they said, okay, well, the developer said that he did not, he wanted to retire. He's 70 years old. All kinds of hotel registration systems, all kinds of hospitality systems available. They tried and they couldn't find anything. They could find things that did aspects of what they were doing. But they couldn't find anything. So he came to me and he asked us, how could we take his application off of this obsolete platform which stopped being manufactured in 2006. And make it work on not the kind of screens that I showed you before, but this is what the base application looked like. A different kind of data application layer. But in the middle, same basic model application were to move it. That is, they didn't care. They wanted to modernize the user interface. To be able to see it more. 1980s application, 1980s technology. This is the technology curve down here. And what we're trying to get to is today, 2017, in terms of technology. So what my company does provides lift and shift. We can take an application that's running on an obsolete computer platform and with no manual changes to the source code or to the data structures and all that. We can take that and we can have it running on Linux or Windows or some kind of Unix. And the users can't tell that they're... So that's called lift and shift. So that's a migration. But there are also some tricks in there. Instead of it using the old database, it still uses the database call. You know, like dbget dbput. But there's a wrapper that implements that on a relational database management system. So we fold the application into thinking that it's still running on the old platform. So, and then we continue to do other modernization things that you'll see. So, which depends on a modern database of some kind. And in this case, we use Postgrease. And the prerequisites for doing, going further or implementing the kinds of things we're talking about is we do need to have some kind of a platform that supports a relational database management system. So Unix or Linux qualifies. In this case, we're using CentOS. We could use Red Hat, which is about the same thing. And we need a platform that supports the latest Java version. We need to be able to use Apache Tomcat, which is, you know, a Java-based application server. And we need some database migration tools in order to accomplish this. And you're going to see the role for something called the Axl or Development Kit. That allows us to take this thing into the modern world and extend it and make it, create the illusion that we have a modern application. It will be an illusion. It will be for real. And you'll see that. So we had to make some modifications to this Axl or Development Kit to make it do its thing. And we're going to talk about that a little later. So here is our abstraction to the one we did with our goal. You know, what we really want applications to look like. But this is done with the legacy architecture. See, always you see the same model. You see a front end, a middle tier, and a database. And then here we start employing our tricks. We've changed the database in our first move in the migration. We've changed the database to Postgres. And we have, we're putting a REST API on top of the old application. And so this is the architecture when it all comes together. You will see that the calls, if you look at the source code for the application running on Linux, it looks identical to how it did on the HP 3000 MPE system. And the programmers that developed it can still maintain it. However, we put some adapters in there that allow it to have a REST API. And that enables us to put a web browser or a mobile phone or a tablet in front of it. So that's two tricks. The database Postgres with two different APIs. Turbo Image API, the original one. And the, well, I'll show actually. I'm jumping ahead of myself right here. If I do it this way, if I show you this one. In this case, we're showing the second API, the sequel. Just anything you can do with a relational database management system. It is a Postgres database. So any tool you can use with Postgres, you can use with this database but at the same time the old programs work. So we merged the two architectures together. And that's, so, you know, why Postgres? I don't know. I'm not sure anybody cares. Yeah. Would anybody say, well, why bother to take this old database that a lot of you have never heard of before and change the application architecture to Postgres SQL? There are a lot of benefits to relational databases. But the short version of it all is that because of the features available in any relational database management system, it reduces the amount of effort necessary to write code for a developer. Because the database itself contains things like type checking and native date types and things like that, that if you just look at the program side by side you'll see that it cuts down the amount of code necessary. That's one thing. But being able to then access your data with other outside tools is also very useful. And it goes on and on. So, Postgres versus a commercial database. Well, at this point, I know we've worked a lot with Oracle and with IBM DB2 and Microsoft SQL Server. Those are all good products. The problem is that I can't justify the cost compared to most of Oracle all the extra grief because Oracle consumes you. It does a lot for you, but it costs you a lot of effort. So, other open source databases, MySQL, all the other ones. Yeah, you can use those, but for our purposes we needed something more powerful, more robust. And so we've tended to standardize on Postgres. It's just our own preference. So, and then once you get the data into the relational database management system you can start adding additional indexes. You can index on partial keys. I mean, there's just all kinds of ways you can go and you don't impact the original application. It still works, but your new applications become easier and easier to compose. Database modernization. I'm not sure anybody really cares about the whole process, but what we do is we convert the schema to something that's compatible with the relational database managers. We convert the data and then at runtime the old programs think they're working with the old schema and the new programs are working with the new schema. Yeah. Yeah, yeah. Yeah, it's just the migration. The first thing we have to do on the HP3000 what we have to do is a platform change because the platform itself does not support relational database management system that's current. However, for example, on an IBM AS400 or IBM iSeries they've got a DB2 database that is really capable and it's already there and it's free. So we don't even have to do all this. All we have to do is just do some change how some of the programs are using it. We have to do a little modernization there, but we can use... So we don't have to do a platform change there. But the point is we need a relational database management system. We call that modernization in place because we're not going to take the application away. We're just going to add another way of getting at it and some other indexes and just do some tweaking. If you look at the IBM has done a lot of engineering to it to the point where I can do anything. I can show you how to do anything you can do on Linux with an AS400 if you're already there. I'm not sure you can convince anybody who doesn't have one already. But I can get that person... What you're ultimately going to see here I can get an AS400 customer there quicker than I could somebody from a platform like the HP 3000. So what we do is we're taking the legacy application. We migrate it like for like with the lift and shift and then we modernize it. That's the whole pattern for repeat. So if you look at... Again, this is more looking underneath the covers. The programs remain logically unchanged and then there's a wrapper that's inserted between them and the database to make that work. Okay, so now modernization beyond the basic... If CentOS is closer to Red Hat for us... Okay, we're indifferent in one sense. It's just that what we found is that if we build on CentOS and we try to move it to Red Hat we don't have to do anything. For a question, Mark, we might have to do a little bit more. I'm not aware of that. Just whatever we get as a standard distribution makes it real easy. We're kind of lazy. Life is going to throw work at you no matter what you do. And so we take the easiest path and we support customers who have Red Hat, Enterprise Linux with CentOS. So it works. But we have a customer and they have one here, in fact, at $3,000. They were running their order processing system on. And I said, what do you want to target? What do you want to go next? They said, well, we want to go to Linux. I said, well, we're a Fedora shop. All our other applications run on Fedora. But if any insisted, and you're not going to get any Linux distribution bias from me. So what makes my life easier? Okay, so we're getting close to the point that we really want to be. And that is an environment of a dual personality system, almost schizophrenic. Because it thinks, on the one hand, if you are a cobal developer, it thinks it's that over here on your left-hand side of any other Linux system with that platform. And that's where we today. Well, we're looking for two years to try to find a way to preserve this system. And I said, well, I can have you up and running in a month. He could actually keep trying screens and they work. That's another, that's kind of a big deal, you know, the testing. It took about four months to get it to the point where he could have actually gone. There are business pressures. There's no time of year. It would be foolish to go live at their peak seasons. And then, you know, they had to find time to test. But before they ultimately got around to testing the thing. But in February of 2017, they went live in that hotel. She went down there today. They're running on our listed and shifts. The next phase right now, we're waiting for them to be ready to now turn on essentially the feature of being able to use their old screens with web browsers and tablets. But we've already showed it to them and all that. So that's the next phase. And that was, you'll see, we'll talk into how we did that. And then the next phase that Lance, Lance Holtrum is the developer would like to see is, he would like the ability to take somebody from a community college who's never heard of MPE or the HP 3000 or Cobalt and have them continue to develop on his application. And that's what this Axel or Development Kit is going to enable them to do. And we don't have a timeline for that because it will be based on their ability to absorb the whole thing. But that's where we are in the whole process today. So they did not have it before, but as of right now, they can. Some of that, he started playing the sequel with Cobalt. He hasn't actually embraced modern tools, other tools. You can just pick things off. Anything off the shelf, all these standard things like birds or jasper or forts, then you can just point them to the database and use them. He hasn't played that yet. But if he did, I mean it would be, it's huge. It's huge. All right, and then so, I don't know, one of our developers wanted me to insert this and it's an editor that we use to, notice that he still has this old application and he still has his old forms and he needs to maintain and the hands goes, so we have an editor that we created in order to help him through that. So here is the ugly truth. You'll be able to see this when you download the thing from our website. But we have like five before and after screens, what the screen looks like on the left and then the first level of enhanced screen, but then we can start putting in widgets, drop-downs, all kinds of enhancements to the screen without impacting the original app. So, now we're going to talk about a thing called Axelor, www.axelor.com, French company, and there are mainly business applications, open source, ERP, CRM, all kinds of nice applications running in the cloud, and you can use it either with an AGPL license, in which case you have to contribute any code that you create to open source, and you can pay them a commercial fee and use this. It's ready today, they have lots of customers, and they're mainly an application software company. Beautiful software. I have been, for the last, literally, it must be 10 years. I have been looking for the Java framework that we could use to do this kind of thing, and I have tried lots of different things, but finally I found the framework these guys produced, and we were able to adapt it to our needs. They were shocked when I contacted them and told them what we had been doing, because they don't think of the framework as being something, it's just a means to an end for them to develop their very nice applications. So we found and noticed that what it does is it converts stuff to RESTful API, and it creates clients that use this. Beautiful software. You can take it out and play with it and do whatever you want to with it. So, we adapted this XOR developer kit to our needs, and that's what we used to do. It's based on all of the nice current bugs, juries, all these things, all this stuff, studying modern stacks, and do no work, because it's like drinking from a fire hose. There's so many of them, it's mind boggling. Which ones are worthwhile, which ones will be here tomorrow, and all the good stuff. If you just want to do techy stuff and not do any work and not accomplish anything for a business, you can do that with this. But I have always been inclined to look for the bottom line and to want to get some real work done. So with these guys, we're able to let them... We actually help them with their development kit nowadays. We're partners with them. But we let them worry about all this plumbing stuff, and we develop applications very swiftly, and we're doing that. So that's the model we have of application package solution, and is that if you want to keep it, if you want to supplement, they have, so you can add their human resources of an approach where you... First thing you do is you just go to XOR site and you play with their software. Your heart can take a copy like I have on this machine and play with it and modify it and see how easy it is to do development. And then we have some other exercises we're creating for you to where you can take an existing database like we do. This is with our modified copy. And you can then import it into XOR and build an application. I'll give you a punchline. Another thing I'm putting out there is a COBOL program that's used for a simple application of time entry, just entering a time sheet, right? And I don't remember how many lines of code it is, but it's at least 2,000 lines of code, right? And it's not done by a stupid person, so it's efficient code. Can anybody speculate on how much code it takes to do that with Axelor, with their method of doing things? Anybody want to take a... Let's say 2,000 versus what? Anybody? XML, some XML files that define what your forms are going to look like. That's all. Okay. Would you believe less than 100 lines? I think it's probably more like 60. But I'm going to put both of that... I'm going to put that all on the side. So this is something... And you don't even have to do it at a low level. You can do it sort of graphically. And then all of a sudden you have this application. But then you can do it on your... You can show it on your mobile phone, on your web page or whatever. And it's database-based. It's pretty amazing stuff. So to me, that's kind of like... It puts a point on what... The difference between modern development versus what people like me started with. And what some people are still doing today. And so that justifies the kind of thing we're doing over at the Hotel Playa Mazelon. So you don't need to go through all this here. So, yeah, summary. Conclusions. All right, we talked about... And I'm giving you visual aids. Where you can go to the hotel... Where you can go to Axelor site using our little set of... Just go to modernization.xformix.com and then navigate to resources and then articles and the top articles are my presentation here plus the notes, the guide. So if you just want to, on your own, educate yourself about what these modern applications look like. Get familiar with how to build an application with this toolkit. You can do it and it won't cost you a dime. But I also have a link to... If you really want to get your hands dirty and you want to see how they do it underneath the covers, then there are hundreds, literally hundreds of articles on how to do RESTful APIs and how to do the backend yourself. And when you compare the amount of effort required to use Axelor versus doing it all yourself, I think you'll be impressed. So... And so you've... So we've talked about... Briefly, about database modernization, its benefits. We've talked about modernizing screens and application development. And that's... I didn't emphasize that enough. What you're doing is you're taking not only your applications but your developers forward at the same time with this kind of approach. And so... And we had a case study, which is our main example. So this is my contact information if you would like to contact me to ask additional questions or whatever, that's fine. But that's what we're up to. And we're going to keep updating this. We're going to start blogging on our site, showing some of how we're doing these things, because we're really excited. But we got so excited by this, we actually, me and one of my developers were friends to actually meet with Axelor and make a deal to partner with them. So you're going to hear a lot more about Axelor going forward. And that's pretty much what I had to talk about. Sorry about all the technical difficulties here, but I think we survived them. So thank you. Are there any questions? Because it was a little... The way we select database management systems has to do with our ability to implement this older Turbo Image API that I showed you. And so what we do is we take for Oracle, for DB2, for anything, what we do is we take the CAPI that they publish, their code, and then we have a bunch of things we must accomplish to pretend that it's a Turbo Image database and we hit a wall with MySQL. There were some things we could not do that we needed to do. So that was the number one qualifier right there. We wanted to use MySQL, but we just couldn't. And then as we got further into it, we found that Postgres is a lot more solid, it is a lot more scalable, it is more robust, and it is more like it compares very favorably with IBM DB2 or from a functional point of view from a good, useful point of view Microsoft SQL Server and Oracle. So it is just for our own bias. If we are going to do it, it is what we prefer to work with. Okay, sorry. Two different ways to major it, of using that old code. Because of this kind of a screen interface on it, we actually talk to the screen interface, we interact, we pretend to be the screen and we have some Java code that interacts with it, and that implements a RESTful API. If it does not have a user interface or we can strip the interface off, we give it a simpler interface and we turn it into a service and we can use it that way. So let's say you have a module that does a whole bunch of really complicated processes on business rules and you don't want to rewrite that, then what you do is you turn that into something where you can send queries in and then you can get the responses back and then you can put a RESTful API on top of that. Sorry, I had enough questions. Applications, hotel, free restaurants, so everything that they do. In this case, that is what we are supporting for this customer. We have some other customers that we are talking to, same thing. But in one case, all they have is they took all of their applications except one and they moved them to a canned ERP years ago. They have one application that is very specialized that has to do with feeding instructions to a metal bending machine sort of like manufacturing automation and they want to use this to replace not to replace but to enhance that application. So it has a lot of different applications or another example that we did just for squeaks, just because is we took a Microsoft Access application and we converted it to Postgres and we implemented it using this. It wasn't very difficult, surprisingly easy. If you use automated tools, you can harvest the queries and the forms and you change the queries into views or stored procedures and then, yeah, there is a lot you can do. So it is not just restricted to these really old applications. There are a lot of, if you start looking at some of these state governments or whatever and the kind of contractors they are, people have all this deadwood, they have all this clutter and you can clean up a lot of that clutter with something like this in a lot less time than it takes to just enhance what you have. That is completely legal. That is allowed by the people who own two different rules. If you are going to use it open source then it won't cost you a dime because then you have to contribute because you can pay them a license or whatever you want to with it. You can build your own product. Anybody else? I think we are out of time, but good. Thank you for grinning and varying and staying through the whole pause here. I have never had anything like that happen before. Thank you. I like it, but it has got some issues and it is only, the main thing I don't like about Apex is the main reason I went after looking, I had enough knowledge to find Axelor was Apex. I went looking for an Apex replacement that I could use for Postgres. My name is Dave Kramer on a Green Plum project, not sure if you have ever heard of that, a MPP database based on Postgres been around for I guess 10 years now. The other thing, I have been in the Postgres community about 15, 16 years primarily working on the JDBC driver, so the talk is on the JDBC driver. How many people have used Java or JDBC? Everybody is in here. Cool. So I was at another conference and I was eavesdropping on somebody's conversation and I kind of heard somebody talking about how the poorly the driver performed mainly because of how it was built. I was kind of curious. They said it was allegedly very CPU intensive and a lot has changed in a sense. I thought I'd do it, you know, gather some statistics, do some performance measurements and see if it's actually true. So that's what the talk sort of wrote. There's a little bit of history of the driver, how it was previously laid out, how people use it or abuse it. Some other how to make the most out of it. And then finally we just released a new version and there was some new stuff in it and I'll talk a little bit about that. So the driver was originally written in 1997 by a guy named Peter Mount who decided that about 1999 he didn't want to do that anymore so I picked it up at that point. It's had gone through a lot of iterations. You can see there's sort of 1.2, 2.1 all the way through 4.3 this year, but it's not certain that's going to happen. So the important part about this is the way the driver is designed and the way Java works is the driver just the JDBC spec just defines an interface and you have to write to it and the interface is embedded in the JDK. So when you use a specific JDK you have to provide the driver that works with that JDK. So in Java 8.2, the driver breaks the drivers like that but it's closed. So it's kind of eventually there's a couple of factors that kind of convince you it's okay. One of them is you switch to get it. You get this kind of nice because it allows you to have a different branch and somebody wrote the whole branch and I was able to build the driver, test the driver and everything worked. And then a lot of people tested it and saw it and I got fairly convinced that it was going to work and not break anything. So the why was to make it easier to actually use to include in your project it's very easy to include the driver and now you just put in an even dependency in the common all the time you have to drive it. You notice the new driver version 4.42. It's coming up in a slide later. We're still in some of the same problems. As I mentioned we have this a couple of things that we have to filter out. So now we filter out using amp filters we use or the text filter is kind of like an include statement or an if, desks and and there's a little program that runs and just filters it out and we use these types inside the if, may even that project, that property and filter it out. So as reorganized in December 2015 all of these abstract classes are going to be removed one class file works riveting and the good part is I actually do have a lot more people working on the driver and a lot of people that are providing a lot more functionalities so that part works fine. And it is much simpler. This is what that connection looks like. So time to figure out whether it's actually faster. So I borrowed some code from a friend of ours who at 8Kdata who wrote something called ToroDB which is a connector from Mongo to Postgres. It's a plug for him, ToroDB if you run in Mongo for some strange reason you can actually store your data in Postgres using Mongo protocol. Anyway he wrote a couple of performance tests for Java basically this code just shows that we've created a very large table with 10,000 10 million rows as an integer as an integer of two sets and we just have about six different tests one selects the integer one selects the string one selects the integer and a string etc etc and I ran the test suite for a bunch of different versions 12.04 through the latest 42.0.0 This is an example of exactly what the test does basically it uses JMH which is a Java micro benchmarking framework and it just executes query over and over and over again and measures it and it falls on inside Java This is the main class which shows you that we do a couple of iterations 20 iterations over it it got a little faster but for the most part there's no magnitude of performance increase nothing really dramatic it's not really useful for the point of performance but it is a lot easier to understand the code we are getting a lot more people working on it it's a lot easier to work working with projects and it's easier to push the main which some things that really did improve as a result of this and most of the result of people other people working on the code we found out that by setting the fetch size properly we could actually speed up a lot of queries it fixed from deadlocks another guy wrote another cool way to insert a lot of code if you're inserting batches by using insert rewrite and it's just a few bugs which are very performance related look at the fetch fetch size and this is kind of one of the things that people don't really pay much attention to and they should this particular piece of code just selects the fetch field and it does it for different size fetches now the important part fetch size doesn't work unless you're doing this in a transaction not doing a transaction if you don't turn off auto commit then it doesn't work you have to do this in a transaction the interesting thing here is that if we're doing a bunch of fetches and we're setting a lot of data and the fetch size is something like 10 we have at least two orders of magnitude difference we get the fetch size up to a thousand a thousand is kind of a sweet spot depending on the size of your row but generally speaking somewhere between a hundred and a thousand is a better fetch size than not fetching it all or setting the fetch size to this another thing that we do is when people are trying to insert a lot of data they tend to have a big loop that runs continually over this data inserting, inserting, inserting into a batch and then they try to commit what happens is internally in the driver we parse the prepared statement we run a bunch of bind injects which are actually Postgres protocol protocol operations and then we deallocate and unfortunately it doesn't really work that well what ends up happening is as I said the client is running insert so it's sending data to the server and the server is acknowledging them but it's not actually it sends data back to the client but it doesn't read the data so most of them are sending data but nobody's reading it so what ends up happening is we end up filling the tcpib buffer so we ran into that a few times and we ended up having to put some code in the driver that have to sync everyone so basically look at the size of the buffer that's being sent and we calculated a number that at one point we say okay we've sent say 10,000 bytes and I think currently inside the code any questions about that the other thing I mentioned that we talked about earlier was the concept of inserting data using insert rewrite we've got a bunch of ways we can actually insert data you know one very simple way that tends to happen a lot is people just reach through, they do an insert and they commit the other way you can do insert batches so you're inserting into a batch and then if they commit at the end this is not well known but you can actually insert values manually by inserting row one, row two, row three through commas you separate the data through commas and you can insert it alternatively copies a well known way to feed up inserting data in the database French mark, it says Marker and other integer there's the code this code's a little different it has a bit of a loop it does some interesting stuff so if the outer loop is equal to one then we just insert the rows right but it gets more interesting when we have the outer loop greater than one basically what happens is we want to insert a thousand rows we have two values here we have the C2 multi and the one is C1n rows and the n rows is the number of rows we're going to insert and C2 multi is how we're going to break it up so we can break it up as you say here if we have ten rows and the outer loop is set to if C2 multi is set to two then the outer loop will actually be executed five times and it will be inserting two things at a time you can see what it ends up writing and as I said you can insert data as it's shown on the last quote and that just inserts the row every time an insert that has multiple rows in the same insert between less inserts but adding more rows at a time and we're an interesting one whereas as a result of more people working on the code somebody actually rewrote the code so that we do this all inside the driver re-write and the driver actually re-writes your inserts so that it ends up putting in multiple rows in the one insert this has been enabled since version 12.09 copies basically the same thing to build a string the string ends up looking zero, hab, s-zero zero, back five n and then we use the copy HDI to put it for other results you can see inserting a string straight inserts a thousand times is very very slow copy and insert batch are relatively fast inserts real slow even though a number of rows is quite small compared to batch inserts, plain inserts are very slow for a large amount of data the more interesting one is when we have insert batch insert re-write copies copy the insert batch is pretty much now the same speed as insert re-write that kind of went through some of the interesting ways to one fetch a lot of data so the one fetching a lot of data is we found out that if you analyze properly you can see if you're reading the data quite a bit and if you use the insert batch you can speed up the writing there's another bug trick I did mention there was a bug trick that ended up getting us ten times the increasing throughput um somebody had managed to create two values that are the same and the effect of the decision batch is so this kind of the reason I'm going through all this stuff is this is kind of as a result of re-rading the colors that we like and sound all this so that's kind of most of the performance stuff so far as new releases we decided that we're going to divorce ourselves from the server release schedule we've already kind of done that but there's always this confusion as to which version of the driver you should use which version of Postgres because the driver had this 9 dot something and Postgres had this 9 dot something we heard that Postgres has got to change the numbers so we thought we'd get a jump on that why 42 more or less at random it does and we did also introduce semantic version of this so the latest version is 42.0.0 we'll check it out so there's notable changes we dropped support for version 408 before 8.2 there's now previously we had a hand-rolled logging mechanism inside it and that was all replaced so the more cool thing is replication protocol HCI is anybody here for Mark Long's talk this morning I'll miss the logical replication I'll see that okay so you can set up the logging using the URL or the properties file there's a logger called org.postgres so there's basically only a couple of logs settings of debugger trace corresponds to one final finest in Java logging world you can also use a properties file you can also set that I'm really excited about here's logical replication everybody knows what logical replication is in Postgres so I'll just go through it quick what it is previously we had physical replication streaming replication replicates your entire database it's pretty much under the covers you don't have any access to it because it's all physical which means that it's replicating the actual binary the server and the masters to stand by our slate and you don't get to see any of that because it all just happens and there's no access to the there's no API and there's no access to the data unless you write your own way of decoding Postgres introduced logical replication around 9.4 logical replication allows you to decode those wall logs as they're being written out to the world and by decoding it you can actually decode them into a logical stream meaning you now get to see things like I just insert into a table this column this value and the old value that comes out in text format from the project it comes out in any format you want but for argument sake for this column text format so what this does is if we can actually hook into this system we can now do change data capture so change data capture is somebody writes something to your database and you want to see it in your application previously you had to write a function you had to use something like Java Java procedures or you can do it in any of those procedures using a trigger now you have the ability to hook right into it right into your code so if somebody writes something to a table you can see that directly in the code so any questions about logical replication that is like a really really quick overview so in order to use replication in the JWC driver I've been told that Cycle2G just put this in as well so there's a couple of drivers that have it and I imagine the rest of them will get it soon but first thing you have to do is create a replication connection then you have to create a logical replication slot then you can read the changes you can send confirmations through the changes that you've read because now you have to sell the database yes I've actually read it, I've consumed it and then you can go read more changes so let's go over this creating a replication connection this is basically how you do it there's the code the important part here is you have to set a replication property you got replication in the database what that does is it instructs the wall sender to connect to the database in the URL of this connection so you can actually have a wall a replication connection for all your databases in your database or just one database or whatever because the replication query only uses symbol mode Postgres has an extended query and it also has a symbol mode we have to set it to symbol mode that's the important part however I strongly recommend you go read this there's a bunch of other things I've left out of here like how you have to set this up on your server but this is just in the database perspective you have to create a replication slot so what's a slot to start with as I said the logical replication replicates takes the data out of the wall the wall lock and puts it in to export that but in order to do that it has to have what's called a plugin and the plugin is a piece of code that's on the server that knows how to read the output of the logical replication slot and knows how to decode that and send it to you via a name the name can be anything just a unique name on the database and the name of the plugin and the plugin is a piece of code that has to do it create the replication slot and what else is in here we have to actually call this function select asterisk and create the logical replication slot and give it to the slot name and it opens up so now we have the application system we actually want to read from it the next thing we have to do is create this replication screen so the driver now has a replication API you have to get that it's all done with Fluentfire it's kind of nice and easy to read to create a stream give it the connection that you created create the API a replication stream the pg replication stream will actually read the binary as well but I don't know give it the slot name give it a start position I'll just read that a little bit and then there are a number of options and the options that are here are specific to the output plugin that you are using so with the current output plugin that we use is the demo output again it is in the contrib module and it understands the option include XID but your output plugin could understand anything so as far as the LSN which LSN do you use and what is an LSN an LSN is a logical slot number which is kind of a bad name what it really means is where are we starting in the wall lock if you have an existing LSN or you can give it an invalid LSN which is basically 0 if you give it an invalid one it will start from wherever you start this particular stream so we've got the connection we've got the stream now we want to read the data how do we do that well this is this big loop we want to keep on reading until we're done which is forever first we're going to read all the changes of the database into our application so first thing we do is do a stream.readpending that reads what's over there it's non-blocking so then we check to make sure the message came out or not more important thing is once we've got it through this section here we're doing the message array offset the magic there just because of the way we say we don't return all of the information that's coming to the output plug-in and we're kind of cutting off part of the buffer that we're using and then we have to give it feedback to say we've actually read it so one of the reasons we're doing that is let's say the program crashed in for the middle of this I don't know if you can see this but I can to crash in the middle of this we want to be able to tell the logical replication in Postgres that we've read the data and the reason we have to do that is a little bit more information about why Postgres plays logical replication how it does it Postgres won't get rid of the wall log until you've read it so it's going to keep it around until you say yes, I've read it otherwise it would lose information so that was a whole lot of stuff like through slides then that's all the new stuff that I have in the driver the most notable one being exciting when I found it is the replication screen to go over how not to use gdbc or see what people actually do you'll actually just open a connection prepare a statement select a script and do where I do the custom arc and then they execute the query and close the connection they don't use without a pool of connection without a pool a java pool or whatever connection creation is fairly heavyweight in other words Postgres uses processes to reach the track through the process and every time you open up a connection you have to create a process and it also does not take any advantage of caching so the better solution here is open a connection prepare a statement and then re-execute the same statement over and over and over again and if possible never ever close that connection one of the other little things about the driver is Postgres has something called unnamed statements and you don't use unnamed statements right away the reason we don't use unnamed statements is because the first couple of if you don't use it very much we get everything back in text format and that's because we don't want to take the cost of an extra trip back to the database to find out the types of the parameters one of the things we would like and everybody should ask for is for the protocol to send or the protocol to accept the message and all these particular things in binary format and we could avoid that the driver would stare at it and say we want this particular type this particular type in binary we know it was coming in binary ahead of time and we would just give it to you would be fine so anyways back to the five executions after five executions by default we will switch to an unnamed statement we will send the information to get the flight back and then everything after that the data will be sent in binary and conservatively faster for things like timestamps why don't we do this I guess the rationale is after five executions we think we really want to keep it around we can actually change this value but we also cast data we'll cast queries from version 12.03 some interesting things about caching queries generated queries generated being things like the column names and then the values make sure you use place markers do not change the type of a parameter halfway through a statement there's a classic example first time we're going to do setInt on the statement next time we're going to do setNull but you can see that we changed the type halfway through setNull to type.varge I think there used to be some hibernate some other kind of just decided that if you're going to setNull you're setting null on a particular type so as I said there's a configuration parameter called prepareThreshold defaultSci when I talked about setSci and and one thing is I get a lot of I used to get a lot of messages on the mailing list saying I did a big selection I ran out of memory and that's because you get a big selection in the name of the memory you can only take so much memory you don't have a setSci setup you just read forever prepare the statement once it does work well with outside tools like TDounter it does work well with Java tools re-write batch statement re-writes your batches for you it actually does this re-write where if you insert multiple times it will re-write it into multiple values this provides about a 3x speedup it also invoited calendar coding which is about a 4x speed increase I give credit where credit is due I didn't write most of this stuff as I said the real bonus of re-writing the driver and mavenizing it was other people did so most of the optimization work was done by a guy from Russia called Vladimir Sitnikov much is not all of the work to mavenize it was done by a guy named Stephen Melton who's at a U.K. re-writing batch statements he was done by Jeremy Writing who was working with Red Hat at the time and he was just provided recently by Vladimir Govich at the end of my talk any questions? I'm not sure I understand that question the person the driver is there's something called auto-commit mode and it's normally on so what that means is when you have auto-commit on every statement is a transaction as soon as you turn auto-commit off then whenever you do a statement you actually do a commit is that the question? the transaction is in the database not in the driver some of the things I was talking about I don't know for sure but I suspect not I think the frameworks are a lot of what I was talking about is one inserting a lot of data which is typically done by people who are building databases or they're translating information so I don't think that's particularly well suited for an ORM and the other thing that I talked about a lot was fetching a lot of data and again I think that's outside of the realm of where ORMs are their sweet spot is more of the worst case similar to where you open a connection and you can stop and close it I do think they're getting they're all pretty fast now they don't do an open connection they do honor some of that they're not going to set fetch size that's something that so implicit in what I talked about there which may or may not have been obvious is setting the fetch size is dependent on how wide your row is so if your row is is really narrow setting a large fetch size makes more sense if your row is quite wide the same fetch size as something like a thousand may not make sense setting a thousand really wide rows I don't I think what you're getting around to is something that I've noticed people that want really, really fast codes do a lot of hand optimization to be honest they don't use ORMs anything else? so who is this talk for? this is a talk really focused on cool craters in the context of creating two or more databases sometimes I have people who want to see more about using Go for applications for spread stuff for application development I'll point out some resources for learning about that and some of this stuff is very applicable but this is more focused on creating tooling from like a DBA or databases here so Go is an open source programming which builds the idea of making it easy to build simple, reliable and efficient which created by Google and it was built for Google as well and this is important because in language in the standard library and the entire ecosystem you'll see a big focus on things like the liability being good for teams and being good for building the kinds of things they need to build so for example the standard library has really awesome photography libraries really great networking libraries things like that another important thing is there's this article by Rob Pike who's one of the three original creators who settled lessons exceptionally more and where he makes his purpose that for them they were coming from this idea that Go is purposely lacking certain features and that that's a good thing and it's not that they don't understand why you would want generics or something like that it's that they made these distinct decisions for terms being one but there's many others where they didn't have that and they were compiled in garbage and it looks a little something like this Hello World I think there's three main reasons the first one is performance so Go is pretty fast this is a little bit of a contrived example but it's a good visualization so that's what makes it fun so this is a request for a second for a JSON over hdapi where Go is 21,000 requests per second notes third at 14,000 and then third is paper which is Swift on the back end about 10,000 requests per second the garbage collector is pretty solid so the last like three or four releases that Go do this focus intensely on making the garbage collector significantly faster and moving things like stop the world pauses or making them like very few notes concurrency is a breeze so in this example you've got this function say loops 5 times and that loop it takes for 100 milliseconds and then prints if you want to launch something in the background or in what's called a Go routine you just put the keyword Go in front of your function invocation and that will launch you in what's called a Go routine and a Go routine is very similar to a Go routine and you can kind of think of it like a thread it's not at all a thread but it's very similar and they're super super lightweight and they really run like 100,000 on the macro and they're concurrent and potentially parallel so Go will try to make use of all the system resources it can and if it has the ability to so the number two reason is reliability so this is a strange example but I think it makes it to walk through it you've got this struct event processor it's very similar to a seed struct and it has an event queue and it's a start method which is an infinite formula that events off of the event queue and then run its process method and a Go routine and so of course this is not like a fully fledged background but if you just want something like this in your tour of your building just a little piece to do this this is pretty much it just missing the event, decoration and the process method it's very very simple and it's awesome it's very hard to read and in Go I think it's an awesome combination but you can do a lot but it's still very readable and very simple testing is simple and built in so the Go test command will look for files with the service underscore test.go and run your test and that's all stuff in the standard library they have some really cool features like this built in race detector so if you add double dashed rates and it will run this race detector that will look for things like return writes and return write and read to the same place in memory and it of course makes your test significantly slower but we'll do things like here's the line of code it's detected at this rate what you'll see a lot is functions will return the first thing that it does is call a function run so a lot of times you'll see this checking up if error doesn't go return to whatever error else return the string and the error value and what this allows for is of course you can ignore this error or just panic on it but a lot of times it allows you to make be very considerate about how an example of the most input pretend of this is a prepending the word you must to a function and that means if an error occurs the third one is re-write it into the service what's awesome about this is even though you may not necessarily agree with what the default style is and I don't completely agree with it even though you may not agree with it this consistency that it creates within the community is awesome for me I came from the first thing I was doing was a lot of Python in Python you've got Pepe there's code style documents but anything before Pepe doesn't have to follow it all the way and it's just completely destroying what the community is testing, open go doc and go build for building your binary I'll talk more about that in a second then go run, basically build your binary and attempt and then run to it and it really feels like an interpretive libraries are often tons and tons of small medium and big organizations and companies as well as often individuals building open source libraries and then screen or even install you set this go path environment variable and then once you do that you're able to just run go get and as well you'll notice there's some sabbatical names by default so at curl4 we run whether on my laptop or in address and small go of course and I've got a binary that I can ship out, that's it performance for live is not just the main reason why that was awesome but also in kind of comparison to take a step back curl4 is a Ruby on Rails shop and we have tons of Ruby scripts and scripting languages are what I have experience with is that I gain a lot in performance and reliability while retaining the use ease of use on the other side if you're using C++, Java, like there are of course in your tool building in the context of building tool ease of use while retaining that performance, quick little slide track to interacting with Postgres so in the standard library there's database slash SQL go-users slashers for self-packaging and database slash SQL provides a core API for interacting with SQL database and you have to implement a driver and there's open source driver many open source drivers for Postgres like Tenors, etc and this has that query structure doing things like open and close the connections begin to roll back to make the transactions exact for executing a statement query for doing the query where you can get multiple Rails back query Rails where you can do one row back and it has things like ping and the built-in connection and you use it a little bit like this so here at the top whenever you install or go back to use you have this little underscore right here and you won't see this and you to create your db object which is what you use for database you call sql.open passing the name of the driver and then the connection parameters and this is what we're trying to do with db so that's all I'm going to really talk about directly with Postgres for interacting with SQL there's this great website for learning more about about a year ago now my box came with our selection on Postgres graph so what this is is the network bytes in and out our production database the blue line is bytes out like looking around and finding this crazy rainbow graph and what this is in trying to figure out where is this data going this is a graph of all of our background worker servers and network traffic going into these servers so this is aggregate bytes in spicing up to 70 megabytes a second and it's it should be enough to where it's not like a single bad actor so we're looking around trying to figure out we're looking at slow query log turn on full query logging not really seeing anything weird let's go take a network capture on one of those worker servers so you can see something weird so we get that capture and like not really seeing anything weird it's just like Postgres traffic and so ok, so what we really want is we want to get a network capture on the database server and see if we can correlate the queries coming in to the the data going out so like what queries are resulting in the most the highest traffic we're not seeing anything like any slow query that would result in a lot of data so we get this network capture and I feel for protesters and scapey which is a Python library for doing hacking and manipulation stuff and now loop through and see if we can get it and it works but it's super super slow so it's taking like 20, 30, 40 minutes and the problem is like when I'm running the distance you don't command on our production database so we're trying to get like a reasonable sample of like 2 minutes what's called do-go-bytes so looping a little faster we're going to rewrite this in go the first thing I needed was a library to do similar to what we can do but there we go and I come across this awesome library called go packet it's basically you know full on package manipulation for go and it's on the created by Google and I love how you do something like this so in looping through these packets you know if this is just Python 2 you can keep it on the payload and if it starts with like a harsh packet in the postgres bar protocol it starts with a capital Q that's a query packet in the postgres bar protocol harsh is what you'll see what we see a lot more of for prepared statements and then let me tell if the source port is too specific this is where it's getting the screw information for those pages it's funny when I first ran this my boss was like no no no I'm not sure and I went back and like worked on it and like there was some little bugs but you know this was the biggest one and it's like a 1,000 query so to take a step back as I mentioned go for is a rail shop the ORM that we use it doesn't store the speed information in code instead of when you need to boot your process up if at any time it interacts with a model it will query the database and get the first super super fast queries and then it it only has to do it like once on start up and so you'll see those but you won't see back any of them and they're very very fast but as I mentioned as well those servers were background worker servers they're using something called residues residues is a really common way for doing background processing rail applications the fun thing about residues when it runs a job it actually supports the process and then it runs that job but in doing that that basically busts that cache that active record cache so every time it would run a job it would have to query the database for all the human information for all the models that job touch so you know 10,000 jobs per job 20 columns per table that becomes a lot of data really really fast and so during spikes we were seeing this hit basically like hit our 1 gigabit limit on that production database and so where does it go with so of course out of the box generic performance I ended up being able to do for the performance optimization out of the box just doing it pretty much copy paste from Python to go was so much faster and as well the community was a big piece of easy use I needed a solid packet processing library I didn't really want to mess around with libpcap I wanted to be a lot higher level and I was able to get there by Google the second example is dbduk dbduk is not open source only just extremely perfect to go for so if any of this stuff is interesting enough it's just very different in context here we restore staging QA testing databases frequently and a lot of them and it's important that they restore success and when these restore failure productivity die for QA's and the process and then when they do fail the process of QA is fault free and as staging into that server creating it between mods becoming the Postgres user like comtab-l to print out what the cron job is servers like copy and paste that and run it and attach it to it as well I get it to restore some kind of failure and you need to restore the Postgres fingers to manage your own fault quality with jobs this is not anything to solve it's that if you wanted to do something like this you don't need to do it extremely you need to do it the right way to ensure that you do it as well with some ideas here right it's like run to the payment and then run whatever it's doing as go which is your backup it's doing that in that process is extremely easy so then going on to fault calling the idea here is to treat your stores you know the simple terms that databases are databases are in use and they're ready to restore and they're restoring and restoring it to go to the failure state of fail to restore and you want to bring it back to the store and bring it ready for use and again like this is not anything to do with that it's doing this process of like being mindful of where these state transitions are and the errors that can occur and then handling them and being mindful of what you want to do in these different cases and so if you can't look at it like what is that error handling looking flexible so erring out for this one it's like if so yeah you can't connect to Postgres just quit and like whoa going to lodge in the morning so you can't connect to Postgres just quit and like go and then lodge in the morning so job schema exist migration cascade so if there's this will for us will exist in the middle of a small immigration and if you can't if the you know for whatever reason that errors and dropping that like it's okay like we don't want to just blow up the restore as a result instead just you know log a little error and then continue on and then retrying with a timeout in this case without back off so you might not I put with the timeout here for because in not having that especially in this pool I have forked some file systems and databases and all of like trying to restore like every 5 seconds or 7 hours straight which is just destroy the server so definitely use timeouts so again so in this code you know if the state is able to in use going to for loop 15 seconds get the current state if it's no longer in use break out of the for loop so you know again like this these ideas are not hard by any means it's just that doing it what I would what I call the reveal what I'm considering the right way for this problem that don't make it very easy to do so and that's kind of the case there and so where to go in with right air handling huge piece for liability in me doing that very easy and then the currency which was performance for sure but much more of an easy use thing I want to do this thing where multiple things are occurring very easy and so in conclusion right like in this context of building tools go provides a lot of benefits and to go back to that comparison you know in coming from scripting languages python really pearl bash whatever there's a lot to be gained in reliability and performance while retaining that ease of use and if you're coming from cc++ java there's a lot to gain in ease of use and reliability while retaining a lot in that performance and again like something I want to make sure I'm being clear on is I don't think go is like the one true way to write code it's a great tool and I you know in the toolbox right but I really enjoy using it as a language and yeah and I hope you guys check it out so finish up if you liked what I've talked about and you want to build cool stuff with cool people and you like these pictures pro core is hiring check this out yeah thank you any questions where who's that Carpenteria in between which is in between Santa Barbara and Ventura is pro-corded pro-cord does construction project management software a big SaaS application specifically focused on really big construction jobs like skyscrapers bridges like bigger projects like that and the whole suite of tools for that currency help in the context of tool creation like what I'm talking about or just generally okay yeah so in the context for dbduk the idea of being able to do multiple jobs as in creating a you know you've got your core daemon process that you interact with in whatever way and then it can do those things and then test count one two three four five six seven eight nine ten two three four five six seven eight nine ten one two three four five six seven eight nine ten two three four five six seven eight nine ten two three four five six seven eight nine ten one two three 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. Okay, awesome. Well, thanks for your patience everybody. Can everybody hear me okay but not too loud? Okay, great. So, I'd like to thank you for coming and I guess I'll get started. Anybody recognize this? So my name is David Federer. I'm a husband and father for my children are all under 6 and that's why they laugh at my jokes. I've been a postgres contributor for quite a while. I think I started out in 2000, 2001 depending on how you count. I've done some expanding of SQL. Anybody use writable with clauses? Okay, it's a little abstruse but it's really handy when you've got it. That was kind of something I got started. And right now I work for Open SCG over at G Digital up in San Ramon, up in the Bay Area. And I would like to ask your help. On the app that you guys may have downloaded, there is a rating system. I haven't talked in a while so I really need this feedback and it would be great for you to help me out with that. So, let's get to the meat of this. Have you ever updated or deleted every row by accident? Anybody? The rest of you just have yet to experience this pleasure. It will happen. Have you ever wanted to slow people down when they off too fast? If you haven't yet, you will soon. Especially if you are put in a situation where you have to expose your database to the internet. It's not always avoidable. Have you ever made a really cool constraint that required serializable isolation level? Come on, show of hands. Really? That's great. I was expecting zero. These things do happen. How about you wanted to replace our traveling salesman problem planner with a simulated and kneeling one? Anybody? They do let you do some really deep and abstruse kinds of things but they admit to very few limits as to how far you can go if you use them. You could try and lock everybody out who could make a mistake. Anybody know somebody who couldn't make a mistake? Well, I found the first person that can't make a mistake. If you want to slow people down when they off too fast you could hope the network layer handles it. It isn't Mallory. Anybody know Mallory? Mallory is one of the nice people in the crypto talks like she sits and listens to your packets and sometimes alters them and is malicious. Hence the name. If you make a cool constraint that required serializable you could add a check for that in every single trigger. If you wanted to do this well I guess you could go write your own database management system and get back to me in three or four years having gotten a crack team from Oracle or whatever. Or you could use hooks. When you're using hooks is anybody familiar with this outside the context of postgres hooks? Just briefly for those that haven't done this before a hook is a place where you place yourself. You snag something that's about to happen. You bag up the state because you're being polite so that when you go on the next thing you can sort of release the state back. You act whatever it is you want to act on and then you hand it back. I left this very, very fuzzy because that's about all the hooks have in common really at least in postgres. You get a place to insert yourself, hang on to the state and execute some code and then go back. In postgres right now you do need to write some C. Is anybody fleeing the room yet? This is not super complicated C and I'm actually going to show you some of the things that you need. This is from a black hole hook thing that I just put up on github. You should be able to see it there. You have to start off with postgres.h because that's what makes it a postgres thing. You do PG module magic which fortunately you don't have to know a lot about the operations of. You just call this macro in your code and you're good. Then the third thing you do is make these init and feeny and this is your being polite. You're starting, you're initializing the you're grabbing the state at the beginning and releasing it at the end. What you do is that there's a data type for each hook in postgres and you basically need to give it a spot. Give it a previous system hook. You set the current hook to your black hole hook or your save the world or save the world whichever one you want to do. Then you have your actual hook function. What goes in there is somewhere between 6 and 60,000 lines of code depending on what you're inclined to do and how tersely you code. When you're done you set back the system hook to the hook you had earlier. Questions so far? Comments? Nope? By my rough estimate there are 82 different hooks in the postgres source code. That is just an estimate is a problem and I'm going to see about getting this all documented so that people know where the hooks are and have an easier entry into them because I had to basically search the source tree with various kinds of tools some of which might have steered me astray so I'd like to do an example of a hook. I've made a hook for requiring a where clause Let's take a look at that MIG file first. It's really not huge and complicated but it tells you a little bit about what you're going to do here. Of the MIG file that you're writing the hardest line is actually right here It's the name. There's two hard problems in computer science. There's naming things, there's cache coherence and off by one. Everything below that cursor right there, that's just boilerplate It goes in every MIG file. Here's the meat of the code and let's see if we can sort of comprehend this as we're walking through it. fmgr.h means I'm, does anybody know what that is? I believe it's a function manager. I could be mistaken. Anyway we're at a spot where we're going to do some parse analysis and we're going to need to be able to log errors so this is what's going on here and what I'm doing in this hook is seeing to it that when you try to do that unqualified update or that unqualified delete but it doesn't actually work. So that's our infini. This time it's a post parse analyze hook type. This tells you where in the execution of your query that you are and right now we make it null. Let's see, down towards the bottom that's the actual body of the init infini. This is just being polite and cleaning up after yourself but the meat of the code is all right here. So if we're doing an update or a delete not super complicated here it almost reads like English. We do need to know that there's a join tree and the join tree is the thing that gets handed off to the executor that's the plan for the query. It's called a join tree because the vast majority of queries have joins in them but in this case it doesn't. So the tree is very short. It's going to have basically a root and a leaf. So if the calls in this join tree are null which is to say nothing has been specified then the fireworks start. Basically say update requires a where clause when the require where extension is loaded if you really wanted to do this you have to make your intentions plain. And then similarly here's another one where hard work is done trying to figure out which error code you want to throw because there's an awful lot of them so you have to choose these things wisely so they make sense to people. Anybody want to guess what this part does? Really? Yeah. There's not that much there. It's the error message for deletes that are unqualified. Okay so let's take a look. Okay so now we have a fresh build. I've already built a module but I can show you what that looks like. I think I may have needed to build the whole source here, sorry. Okay so what I'm going to do here is just make this a permanent feature of our system or at least until the system is changed again. Anybody used our system recently? No. Okay two people. Do you find it more handy or less? Okay. Well it's just a way to ensure that all your changes are in a place that's distinct from the baseline postgresql.conf and for that reason I kind of like using it because otherwise like if you've gone and edited a confile it's never 100% crystal clear what things you edited. So I'm going to restart. By the way if you have, don't restart immediate on a production system. It's probably going to be evil to things that you do. Okay so I have a table. Nope. Okay so there we go. That's everything that's in the table and if you can imagine that that's the table we actually care about you don't want to be able to do this. Ta-da! Similarly if you want to delete everything in the table anybody want to guess what's going to happen here? It's going to be a slightly different ever message. Very slightly different. So like half of the body of that code was basically deciding whether to yelp about deletes or updates. Anyway so and if I and you know per the hint which by the way it's frequently good to read the hints when postgres errors out on something. You can save yourself an enormous amount of time by at least providing what the hint said when you go ask somebody. Anyway we're true be careful you don't get sequel injected on you because we're true is a frequent thing that people try to inject into sequel. Anybody had that happen to them recently? How about less recently? Sorry? Yes yes that would be there they go. So yeah this is the thing that makes you have to express your intention about what you're going to do. So that's our simple hook for simple errors. You saw our boring C code and our live demo which didn't break. I'm not going to play the lottery anymore because I've used up all my luck. So here's the cool constraint that requires serializable. It's kind of curious what sort of constraint was it that required serializable. I have one here. I'm going to make a simple hook. Boring C code. I had already built this but we can take a look at its source code. It's not going to be super complicated. In this case what's different in the headers is that we need to know about what transaction state is happening here. This is one of the nice things about the Postgres source code is stuff is kind of where you would expect it to be. It's named in ways that are usually not crazy except for X log. As I recall we made a fair amount of money off somebody who thought that this was just a log and deleted it. Did I ever have that happen? Anyway, so all the usual stuff happens here. Here we're looking at is this not a command utility which is to say is it a select, insert, update, or delete and is the isolation level not serializable. That's our queue for making the fireworks go. If it was a select and it wasn't deferrable, we're going to yell. Anybody know why to do deferrable when you're doing when you're in serializable and selecting? It's a little abstruse but basically serializable is really the only transaction isolation level that most developers will ever understand because the anomalies you get from any lower level of isolation are basically incomprehensible to people who love to write Python and love to write HTML and stuff like that. They're not super comprehensible to any of the rest of us either. That's why serializable and the reason it's not on all the time is because I haven't made that argument successfully yet. Deferrable when you're in serializable isolation means the read you're trying to do and it does have to be a read can't happen until all the possible serialization anomalies are cleared. Your select will never have like a write skew anomaly in it. It will however wait until that condition is true. That's why it yells at you about deferrable on your read and serializable on everything else. There we go. Again, not too complicated a thing that we're doing here. We're going to load up the required serializable module. This by the way is a great way to test things with that load command. You can sort of make things appear and disappear that were .so's that you've created. There we go. Did I do something bad here? This is the part where the live demo is not serving me well. There we go. Let's see if that's any better. What's the echo commands thing? I forget. Sorry. There we go. I think it's echo all because the tab completion is being nice to me. By the way, that's a really awesome thing about the PCql client is that it actually helps you to figure out what the heck you're doing. I don't know of any more helpful one. We created a table. Notice trigger does not exist. That's not echoing this thing here. Let me take a look at bar. Sorry. What are we going to do here? Here's our idea is that basically depending on the number of limit in the food table, we're going to say that bar can only have up to that many in it. Less than or equal to in this case five by default, but we could change it. This is actually kind of handy when you have a resource and then some other resources that it has access to. You don't want to jack it through the sky because those other resources are limited. I'm working on a thing at GE Digital that had this in it because the foods actually cost money and you only have so many of them to dole out. It's both a budgetary thing and a resource limiting thing. What do we do? We limit the foods in the bar. This basically says if the count is greater than the limit, then we error out and say goodbye. We create a trigger that actually does this. If we want to make a limit of three, we just make that limit and consider deleting some bars. That's what's supposed to go on. What happens here is in recommitted this fails because we've decided the transaction isolation has got to be serializable. When you say started a transaction in recommitted and try to do something after that, it just summits a little nose at you and says no. Similarly, in serializable, remember we put three in the limit. This is going to fail because the count is wrong. What happens when you try to lower the limit to less than what you've already got in the less bars than you've actually got, then this is going to fail because it's going to say you can't squeeze the room to be smaller than the number of occupants that are in it anyway. That was what it was supposed to do. Awesome. Here's how to slow people down when they're off too fast. You could hope the network layer handles it or you could take a look at off delay and contrib. This is not one I wrote for this talk. This is just a generic one that chips with Postgres. You see there's a few more things up here. Since we're limiting off, we're going to include that off header to make sure the port's right. We're going to make a new GUC. Anybody made a new guck recently? Okay. Somebody has. Postgres has way too many global unified configuration variables. That's my understanding of what guck means. This is going to make another one. Here we have... In it. Where's Feeny? I think somebody did something naughty here. There's an init and that defines a custom variable that's your off delay in milliseconds, which is to say how long you make everybody wait before they off. Is there kind of a better way to make people wait when they're offing? Exponential back off. This is kind of a primitive thing, but it's at least worth looking at for how it does it. You basically define a custom variable, and most of the rest of that is boilerplate. You say when you can set it, and now it's at SIG HUP. If you send a HUP to your back ends suddenly this thing can be changed. If once the hook's loaded, if there was one, if it isn't status okay, then you have them wait off delay in milliseconds. It's kind of handy. Here's a really fascinating one. Remember when I said that the body of this thing could be fairly large? Well, somebody went and did their master's thesis and replaced our planner with a simulated annealing. As I recall, their work over the course of that three or four years was about equivalent to hours in the ten previous. It was very enlightening, let's say. Actually, I wanted to show a little bit of what that looks like. This is that effort, and in the Source Tree, 2313 lines, including some deep trace probes. It's actually pretty impressive. I hope somebody will pick this back up again, and then we'll have a better planner. Anyway, I have the end of my presentation. Questions, comments, break baths? Yes. Well, that's one of the things I want to fix. My first contribution to the Postgres project was a documentation patch. Generally, there are other presentations on hooks that will tell you a little bit about something that was there. You can start off in the contrib part of the Source Tree and look for underbar hook. That would sort of guide you to what people have used already. You could look for underbar hook in the Google attached to Postgres QL. That would do it. And there's always GitGrep to see what the latest hooks are. But yeah, you're right. A big problem with this whole system is that almost nobody knows about it, and that's part of what I'm trying to address here. Well, check out the Source Tree and have a look. There's a few resources on hacking that I can get you later. There's a couple of tutorials out there. There's a couple of presentations on how to start hacking Postgres. The Source Tree is actually pretty approachable if you look inside, you don't wonder where you are and you don't... It's divvied up into things that are fairly obvious. So like if you LS-recursive the Source Tree, it should be relatively straightforward to find out where things are. There's also a lot of readme's. Those are the next place to go. There's obviously comments in the Source, some of which are better and some worse. There's nothing like you are not meant to understand this magic in there. There's like, you know, they do try. Other questions? Okay. TLDR. Lots has already been done with hooks. You can find those in those presentations. Oh, PGXN. That's another place where you might look for them. PGXN. Anybody from there with CPAN for Perl? Okay, so like CPAN, there's a thing called PGXN for the Postgres Extension Network. My former company Postgres UL Experts set that up and now it's a community resource. So you can find things there. So lots has already been done. Your imagination is really the main limit to what you can do. Most DBMS don't let you rip out the planner and stuff in your own planner, but we have the places where you can do that. And always have fun. Anyway, thanks so much for your time.