 Who's actually still awake now? We'll try it what we'll do I think we'll use is we'll do the session and then as you all fall in asleep We'll do the time travel back to the beginning and just do it over and over and over again and get stuck one in one of these little time loops I think that sounds like fun No, okay, so let's do the actual session instead As Gabrielle said my name is Magnus Hagander In postgres I work on the core team and I'm one of the committers I also do a lot of work for postgres Europe. I see a couple of people in here at least we're over from that side of the pond When I don't work on postgres community things I work for a company called red pill in pro Which is an open source services company and postgres is one of our focus products I'm the you know principle guy on all things database for some strange reason Our company is entirely based in Scandinavia, which is why I'm betting many of you have never heard of it I'm based out of Stockholm in Sweden myself And again doing sort of all things postgres Consulting support training and things like that And this talk is actually coming out of more or less directly out of a customer project that we did last year So the talk is all about say a TARDIS for your ORM So who in here is actually let's do this or who in here considers themselves an application developer Okay, quite a few or DBA Okay, and anything else Okay, there's least a couple you don't count So probably everybody knows what an ORM is right It's the object relational mapper. It's how we you know application developers today to a lot of their database access And presumably who who knows what a TARDIS is? I'm almost disappointed every version of this is obviously the TARDIS right As we know it's a time travel machine it does a lot of interesting things other than just time traveling But we're not going to focus on exactly on the TARDIS part We're going to focus more specifically on just the time travel part and in this case we're going to look at application level time travel I think there is actually another talk right at this conference talking about temporal support in the database Which is sort of deeper down in the database layer This is implementing something similar at a higher level in the system Where we can do it on things that we have today So it's application level time travel, but it's still implemented in and on the database So in order to get going I'm going to start with a step back and just sort of look at what are the requirements? What were the requirements when we got involved in this project? So actually initially I had a bunch of people from a different department who do a lot of Java development In our company who got into the project and then they read through the requirements a couple of times and then they came to me Saying can we do this in Postgres so that we don't have to try to solve this problem? The first parts are fairly simple. So One of the requirements was there is an existing data model and we'd like to prefer to change exactly nothing Now we do have to make some small changes But there is a general ever high requirement on minimizing the changes to the data model because this is an existing application This is not a green level application It's an existing application that we want to add this functionality to so we need to minimize the changes Now what's contained in this data model is basically both detailed and statistical data So there is a lot of data collected over many years The customer is a government agency This statistics contains Very very sensitive personal data which leads to some of the more interesting requirements Unfortunately, I can't tell you what type of data it is But let's just leave it with the fact that it's really really sensitive and it contains data both at the very detailed level about individual people and Then you know step-by-step greater levels of aggregation where we in the end have a lot of reporting and things like that Now data gets into the system Mostly by being loaded in batches, which is true of most of this kind of system, right? I think there's like a four-hourly batch that loads one type of data a nightly batch that loads another type of data So we don't have a lot of individual inserts. It's larger chunks of data goes into the system But there is also people who are working on manual corrections all the time as These are entirely manual. They're done by the staff or at this agency. It's not a huge amount of them It's not The in particular there's no concurrency in this they have like people who are working with the data who just noticed that This data is wrong. Let me go fix that kind of level Or they pull out a report and it doesn't make sense and they have to go back in because the origin of the data Some of it is very historical So obviously data entry wasn't necessarily correct Some of it is manually added even going into the systems that then feed it into this now So there's a lot of manual work attached to the data as it comes in so it's incorrect So we need to fix it. So we have this one part of the system that's Getting the data in loading and batches and manual corrections And then of course we look at the data. We use it for something without that. It would be sort of pointless So there's a lot of things running like larger aggregate reports these things that can run for a few hours and dump report at the end But there are also quite a bit of like zoom in work where you're looking at I don't think they ever actually look at individual records I don't think they're allowed to do that But they look at like smaller groups of people and smaller or other statistical Selections that leads to a small group of people. I do actually think they're not allowed to look at a single person That's against the rules. Obviously. It's in the database if you have sequel level access, you can do it So this all sounds like pretty standard requirements, right? I'm sure you've all run into similar kinds of applications But there are a few requirements here that made it a little bit harder When we looked at it purely from an application level perspective One of them being that there is a requirement to reproduce incorrect reports Okay, so we're correcting the data we need to be able to reproduce the report before the correction And we also need to run the report with the correct data And at some point we may need to compare those two actually surprisingly enough that was not in the requirements Being able to compare what actually changed in the report because of a correction It was just the requirement to go back and say this report that I ran Back in October last year How did it look? from the source data Which at the core means we can't actually ever delete any data and Trying to do this without changing the data model from the application side becomes really hard There are some other challenges For example, we need to be able to go back and identify which reports Contain data about a specific individual Somewhere in the past. So which reports ran back in August last year that returned any kind of data about Steven Luckily, this is not a performance sensitive thing. It's actually okay if answering that question takes two weeks No, he's not Steven is never okay So these are the things that are like database level challenges, right? We need to reproduce the incorrect data which kind of feels wrong like there's a reason we corrected it And we need to be able to run this identification of what actually ended up in return now Luckily, we're not running, you know tens of thousands of reports every day It's a fairly small group of people who are running the reports against a large amount of data Now the other big challenge is we need to do this while maintaining application flexibility So we need this not to be present at every point in the application. It's a fairly large Java application that has you know reporting from a whole lot of different places And it contains something that's interesting as a manual query interface Because these are statisticians right they can input queries Now the input the queries they don't actually put them in SQL But they can partially put them in I think it's called HQL for those of you know hibernate. It's the That's exactly my reaction as a few people here said like I don't like HQL, but you know, they're Java people And they also have a simple UI where you know drag and drop fields and joins and things And we need to deliver these Requirements for those queries as well It's reasonably simple ish as long as you actually know exactly which queries run, but we don't We can basically run arbitrary queries and we still need to be able to do this And then as I mentioned whole preferably we will do this with zero changes to the application. Obviously, that's impossible Right, it's not going to be zero But we need to minimize it and we need to minimize it as much as possible because also we have you know these Multiple different teams working on the different parts of the application and we all know how well You know cooperation of those things can work in particularly in government organizations. I'm not biased at all So the tool box that we have here is basically we have J boss and hibernate. That's the part that we can't touch Because you know the application already exists Now it helps us that they at least choose an open source framework to build it up And actually a fairly flexible one as far as ORMs go hibernate is not that bad There are many examples of ORMs that are much worse than hibernate Hibernate lets you do some interesting things now, but you can do this I mean the solution is actually independent of our M. You can do this in most ORMs But the one that we used was J boss and it was hibernate and we got the option to use postgres Originally the system wasn't running on postgres, but when we got involved they were already thinking of moving it to postgres So we got to use that which gives us the flexibility of using the whole set of tools that we have in postgres Now if we look at the actual database schema of this existing application, it's a fairly simple schema because you know it's generated by an ORM Those schemas tends to be fairly simple It's at least an ORM sophisticated enough to know about the concepts of foreign keys So we do have foreign keys in the system. We do have these things. So it's an ORM generated. It has quite a few tables But there's really nothing difficult in the schema itself Because you know hibernate is designed to work on any database So the original schema doesn't use anything postgres specific, you know, it's integer. It's strings It's you know date times things like that and some foreign keys, but fairly simple Now it gives us a couple of things that hibernate does at least by its default that will make this solution easier for us All tables live in the public schema They don't use multiple schemas and they don't use fully qualified schema names in their queries We're gonna exploit that and All tables have an ID column With the serial attached to it This is enforced by hibernate. Actually, I think you can set up tables without that in hibernate, but it does it by default That also makes our life much easier and The way it's been set up the application has very few schema changes over its lifetime So there's a lot of changes to the data, but new versions of their application itself Generally doesn't change the schema much It changes the application layer. The scheme is fairly fixed. There are changes to it But that made us gave us the option to say that we actually can turn off and we will not be using hibernate's ability to automate schema migrations Just say any schema migrations will just require a DBA to extract the script from hibernate Then run it and potentially adjust it to work with our solution So we automated the initial setup of our solution But we didn't automate the incremental steps. We probably could have done that. It just wasn't a requirement So we got a little got off a little bit easy So based on that, how do we actually solve a problem like this? I'm sure most of you who work with databases for a while can figure out step one, right? Step one, we need to keep our old data around. We can't actually go delete it How do we typically do that? Well, we stored in the history table somewhere But what we need to do which we don't necessarily need in in like your typical auditing table Which is how we usually do this is we store when did somebody change something Now instead of storing that we're gonna treat keep a track of for every individual row Throughout which time frame is this version of the row? Existence so it's sort of like we have the multi-version concurrency control in Postgres It's sort of that but at an application layer. We say this version of the row existed from this period in time until this period in time and Then if you updated it you get a new version that has a different validity period and That way we can reconstruct the history of this row So yeah, everybody who does not know how our history table works Okay, good at least your database people right who does not know what a range type is Okay, that's more impressive like one or two. This is clearly a Postgres crowd I Ask that when you go to like Say a JBos conference Like who knows what a range type is you get maybe one person who knows it So what we do is basically we do a traditional history table and We put a validity period on it using a range type Query wise this is just like this So for a table called table one that would then based on the previous requirements always reside in the public schema We just create a history table one always with the same name That is exactly the same as public one as public table one But we add the column we just call it underscore valid range to make sure it doesn't conflict in case there happens to be Column named valid range in a table somewhere That is a tstz range. That's the timestamp with time zone range Now in fairness, I think there is also a not null on that that I forgot to get on the slide But that doesn't entirely matter in this case So who knew about the construct that you could say create table like and then also add more columns Okay, that's quite a few It's good to know it saves you a hell of a lot of typing and It makes automating this much easier because I can just loop over all the tables and run this Whereas if I didn't do that, I could do the same thing by you know Inspecting PG attribute and get all the different columns out and like recreate them on their other side This is obviously much easier But what like does is it copies all the columns as the definition is now? I've heard people who think it's different in that if you go after you've done this and you go back and add Column to public the table one. It does not appear in the like table in in any way So it's a snapshot of the table as it looks now plus one call And that's the part where we have to go in and manually do this if we do a schema migration And we add a column to the old one. Well, now we need to add it to the history table as well And in that case, we're also gonna have to add it with any not null constraints removed Otherwise our historical rows just won't work So we create these for all of them Okay, for those of you who may not have used Range types very much. This is just an example output from one of my test databases of this underscore valid range column I took just the valid range column Obviously, you'd need more than that to make it useful But this will show two versions of I believe it's the same row. Yep We can see on the first row. That's the validity starts at February 17th at, you know, 1449 52 or whatever and it's valid until 14506 and Then there is a second version of this row Which is valid from 14506 until infinity There's also the important thing to note that the start of the range type here has a square bracket and the end of it Has a regular bracket Now that defines whether it's inclusive or exclusive Right is this exact timestamp because that's exactly the same timestamp as we have down there Is that actually included in the range type and in order to create our proper indexes and our constraints? We need to have one of them being inclusive, which is this one here and one of them being exclusive, which is this one here If we didn't do that, we'd have if we were to look at the way the data or the data looked at this exact timestamp We'd get both versions and That's not good That would give another interesting type of error in our reports getting two versions of everything So this is basically when we insert a row like this is a row that's the current version always has a validity into the infinity Because we insert it or in this case. We updated it at this timestamp And we've read it now and don't say anything else then that's the version that we're gonna look at And then of course to make this to make sure that we get this right we need an exclusion constraint Which is again using this syntax we end up having this is where we can rely on the fact that there is an ID column in every table So we're just for each table We're going through and saying for this history of table one added constraints called table one exclusion in this case Exclude using gist, which is our standard syntax for creating an exclusion constraint With ID sorry ID with equals underscore valid range with double ampersand Says for each ID value so for each entity so to speak from the Java side There cannot be any overlapping ranges of validity That's what guarantees that when we look at this whichever sort of a valid point in time We're looking at we'll only get one version back And if we had created this one with a square bracket on both sides that constraint would not create Because it would not complain that we have an overlap between these two timestamps So this will guarantee that and it will also give us an index By which we can make fast Searches if we include the ID field and a valid range with with a valid operator And as it turns out, you know, that's how hibernate issues most of its queries It will look them up by ID So if we can just specify get away of specifying it into the valid range and setting the ID It's going to work now to make this part here work You need to install a contrived module called be tree jest But it's part of the standard postgres packaging It just needs to be you know create extension Steven it would be nice to fix that we should just include that in core, but we don't right now So right now you have to say create extension be tree jest. It's not very hard It became easier once we got extensions in 911 Okay, so now we have a history table and we have a main table or actually we have a couple of thousand history tables Then of course we need an update trigger This is our simple we say create a trigger We do it before insert or update or delete because we need to trap all these operations On the table for each row execute procedure and we just decided to put the procedure in the history scheme as well Just log table trigger Means for every row that ever gets updated or inserted or deleted. We're gonna run this code So what we basically have here Is our public schema contains all the current data at all times and The history table will contain all the historic data So basically we need to track every operation on the current row and somehow create an entry in the history table To reflect the changes and to then to tag this with the time that we're at right now So I'm not gonna show you the actual create function statement because it's too long But if we look through the actual code, this is a PLPG sequel function. That is this history trigger And the first part is pretty simple the insert trigger right? We'll say if tg op equals insert So if this is an insert Then basically as we insert it into the main table. We're also going to insert it into the history table And we just say insert into history dot and we copy the name of the relation tg role name And it's select dollar one point star We're dollar one. It's gonna come from new down here new being the version of the row that we just inserted And then we just add a comma so because this will contain all the columns from the original table Which we copied into the history table using the like syntax and then we're gonna add the validity period Column where we just say then the tstz range is a function that will take three parameters and return a range type for timestamp with time zone The first is the starting point When does this row first become valid? Well, that's gonna be now That's right now How long is it gonna be valid for it's gonna be valid until infinity because we just inserted it into the database It's the only version that exists But if we start looking at the database the way it looked an hour ago, it didn't exist, right? So from right now until infinity and Then we add the final parameter is this inclusive and an exclusive part of the range type So we're saying the the left-hand side the start side is inclusive. It's from and including now Until you know, well, I don't think it makes much sense when it's infinity at the other end saying whether it's including or excluding of infinity itself Because just before infinity is still infinity And we just insert that using new and then we return it new if we forget to do the return new Well, actually if we forget it completely it won't go in anywhere if we do a return null If we forget it completely we get an error if we return null it will go only into the history table Now we did actually look at that initially to store the data entirely in the history table, but it turns out there are still Unsurprisingly most reports most things still run on what's the data like right now? So we keep the data in the main table as well as basically a performance optimization To making sure that you know, we can fast access the current version of the data Now it gets a little bit more interesting when we start doing updates Because well, so the first one is pretty simple right if trigger op is update And what we're doing then is here is you can open a cursor selecting the valid range From this history table For the same ID This is the second part where we rely on the fact that every row has a unique ID column That makes life so much easier for these things So where for this ID and we order it by the valid range descending So basically this will get us the latest version That's in the history table If there's a hundred versions we get the latest one if we just inserted the row we get the only version that's there And then we do a limit one Steven no, we cannot check for just where it's infinity because the row might have been deleted concurrently So it we could almost do that Anyway, so we had the limit one for update so that we also at this point look the row and Then again using new ID because we run this once for every row And then we just fetch that into a temporary variable then we create a TSTZ range so create a new range saying the new validity period is from Lower TT so that's the start the previous starting point Until now With the same reasons and if that is empty What that means is that somebody else already updated this row with exactly the same time More or less or in for the within the restrictions that we have that means we already updated this row in the same transaction If that happens, we're just going to overwrite that So what we do in this case is we delete the history entry Because we're going to be writing a new history entry soon Whereas if this thing is not empty That means okay, we now actually have a new validity period for this row So what we do now is we go back to this current latest version of the row in the history and we update it and We change the valid range To be again a new timestamp range Starting at so dollar one going by which is the lower of the previous one meaning Okay, we don't change the starting point of the validity if it was inserted an hour ago We're going to change it. It's still going to be inserted an hour ago But we change the end time To be now So basically we're saying this version that's already in there. It used to be valid until infinity It's no valid until now Because we changed it Not if it's a new no, not if it's a new transaction Yes, you could so if you this is where we get into the point. We don't have a high concurrency on the updates so Yeah, no, we don't we don't have to worry about that part now We can actually and we have in and we do have a slightly more complicated version of this that also checks the TX ID So you can do that I explicitly excluded that version from the slides because it's longer But we do have a secondary check in here actually based on TX ID But we because we initially thought it was fine and it was actually fine in production, but it wasn't fine in testing We broke it in the unit tests Yes, you mean this row I Had to cut this proceed. Yeah, I had to cut the procedure into two slides So once we have updated the old row and set the validity until now We also need to copy in the new version of the row With an insert so we get a second row in the history table for the same thing Where we insert all the data and the new valid range starting from now until infinity Because yes, we do need one of the old versions of one of the new ones and again We need to specify this one here so that the previous one ends on the regular bracket The new one starts With a square bracket so that we get no gaps Where am I missing the single quote? Yeah, oh, yes, that is correct Okay, so yes, there should be a single quote over there because execute takes a string Thank you. I will tell you that typo is only in the slides Because it wouldn't actually work otherwise you wouldn't be able to do create function otherwise Yes, we did There were arguments for and against In particular in being able to update the function and how and getting this done with people who are very uncomfortable using database interfaces aka Java developers I mean no offense to any Java developers in the audience if you are here You're probably not one of the Java developers who don't like databases But there are certainly those and then in the end we do return you so that we actually let this change Happen on the master table as well so the update gets tracked on the master And of course there's one operation left Delete is very similar of course to an update What we do here as well if the operation is delete again within open cursor this this is exactly the same query So we get the latest version That's in there Again the interesting case here is if this timestamp range which is again the same one as before Do we already have an update at this exact time? That means that somebody did first did an update and then be the delete For those of you who are not used to ORM applications. You'd be surprised how often this happened I've seen cases where they does like 10 updates and then deletes the row all in one transaction Which is kind of pointless But you know, it's what they do So we do the same thing here if it's already been updated, but it's not been deleted We're going to delete this updated row because you know transactionally It's never going to have existed at the end because we committed everything at once and If we actually need it or if we actually did the delete it's a new one All we do is we found this last row and We update the last row and we set the valid range Well, it's the starting point is still dollar one lower TT, but it's not valid until now So it's no longer valid until infinity because we delete it. So it's valid until right now So basically a newly inserted row will have a validity from insert time until infinity When we do an update The new row the new version will have an availability from now until infinity and we change the old row to be from infinity to right now and As we delete them we just say well, it's not valid anymore We don't like this guy anymore and then we do return old And That's one of those things that tend to confuse some people like yes on a delete trigger We have to return old because there is no new version of the row because we deleted it Again, we could return null and then we're just going to block the whole operation But we want the current version of the data to be materialized in the table itself So now we have the data now. We're actually keeping our historical data Yes, Jeff Yes So yeah, you we could have a version which had just the latest version would only exist in the current schema and All the previous versions alone would be in the history schema with this one We keep both the current and the history in the history The main reason for that is it's much easier for the application To query because now it doesn't actually you can input a timestamp whether it's within the current or not The application needs to know less if we do that or our view creation can be simpler You could but they become a lot more complicated No, well Not easily because the history has multiple versions in it I guess you could have implemented that on top of a history like one partition for all the old and one for the new But then you'd still need a view in in that hides the the actual time validity Look at both tables. Yeah, I Don't know we didn't look into that in detail because we looked at how it would affect the application side And that was enough to say that we're not going to do that I am not sure that we would know it might well be a real performance hog So accessing this data Fairly simple anyone who's worked with the range types know the syntax right if I want to look at this You know, I just specify the time I see I select you know ID in this example. I have columns A, B and C I don't know why it highlighted that guy From history where ID equals 42 and valid range and then we have the act greater than the sign Which says, you know, give me the version where this timestamp with time zone is inside the valid range and The exclusion constraint that we created promises that this can be zero or one rows, right? It can never be more than one row It can be zero if we ended up after the row was deleted or before the row was inserted But we'll always be able to do that Now this got us I mean we can just put this in the application right and we're basically done Unfortunately, that doesn't really fall under minimum modifications Modifying every single query that's done anywhere in the application That is not very minimum It is sort of what their developers had considered doing Before we got involved and got starting You know seeing what you really do because in particular when we start doing this and doing joins on top of it You have to like qualify all these validity checks against or each individual table It does work, right? The query just massively explodes and if you're putting that in your database, nobody's gonna be happy So we're gonna fix that at the database level and we're just gonna do you know create schema time travel Because for some reason they didn't like us calling that schema TARDIS in the actual production release So we're gonna create another so we have a public schema that has the current data And we have history schema that has the whole history So we're just gonna create a time travel schema that contains nothing but views And each view is gonna look like this Create time travel dot table one which is the same name of the table as select And we do actually expand the names of the columns here because we don't want to include the valid range column Because this view is gonna look exactly like the table not the history table But we selected from the history table and we just say where valid range app greater than and To get the timestamp and we use a custom guck So we just create we invent a parameter that is called history dot timestamp Current setting will just read the current setting in our session of history dot timestamp and Cast it to a timestamp with time zone if there is no setting we don't have that guck set This is just can give us an error So basically this view requires the application to set the value of history dot timestamp But once you've done that this one controls the time in which you are looking at the database and You can join these views and you'll get this sort of the the exploding query with many many validity checks But it all gets auto-generated And what we then do is we set the schema search order To control whether we're looking at public or history This is where we initially at some point we ran everything through this time travel point and realized We're paying quite a bit of performance overhead for the case when we're actually looking at the current version of the data So that's why we also made the application actually talk to the main schema So when we just connect and run a query we'll get the current version because it's in public that's what our ORM is doing and To enable time travel what we really need to do is we need to do set the search path to time travel and Then set history dot timestamp to this value and do a select star and that's the way this table looked at you know the 7th of March at 1432 And to continue on that the same results, but if I then just change it to 1429 So three minutes earlier you can tell I was not running this on the production database When I was building the slides and we can see it's the same table But the row is looking different Because we had an update obviously happen in between this and that's sort of the core of it is the application needs to switch It schema and set the timestamp and when it's done you're going to reset all and you're back to looking at things the way They're looking right now Which is course the reset all don't forget to you know configure your connection pool to actually do that Interesting things can happen if you don't because another session gets your connection back Which is like time travel back to August last year and you know, that's an interesting result So what we basically get is automatic trying travel right? Once we inject these variables we do need to do that now exactly how we do that is of course going to depend on the framework You can do it driver level we tried a bunch of different things I'm not going to tell you those parts in detail because I don't really know hibernate that well And I don't really want to know hibernate that well But we did some different examples you can inject it at the driver basically at one point created our own driver And injected it there. We can wrap individual queries. We can wrap it in a function call We tried a few different ways so just an example of how we did the driver injection part is basically For those of you love Java right we this one is well that used to be the company name Postgres dot driver and we just created our own driver that extends the original driver we override the connect and We call a little function called inject time travel which will run these set commands on it Of course, yeah, this is simplified right that's not quite enough code, but you don't need to do much You just need to trap those two things and then using annotations or whatever to actually set the time In some cases just passing instead of calling, you know execute SQL or whatever it might be called you much as they execute SQL time travel include a timestamp So there needs to be an interface for it, but the advantage is the main OLTP app that does all the corrections needed zero changes and The reporting app basically needed to add a field where you could input the time and then we need to capture that and Inject it in and then we could run exactly the same Java code of Iran before and we'll automatically get injected for the Hibernate language queries as well So it did all those now there again. There are a few things to consider about that, you know, don't forget to reset it weird things happen And query the public scheme of recurrent data Once you get a bunch of versions and a lot of data, that's a very efficient Optimization like don't pay the overhead of going into the history scheme if you don't have to It was when we're looking at that's when we decided that yeah, we're actually going to materialize the current row in two different places Purely as a performance optimization because it does make quite a bit of difference And of course, you know, it depends also the one thing that this the automated system does not set up for us Here is indexes Right. It only set up the index on the ID Any searches on anything else would actually by default only use the index on time and nothing else Now for a number of reports that was fine because we didn't have the same level of performance requirements on the reporting For some you could just go in and manually create that index or well manually would go version controlled on all that stuff But you do need it. We didn't try to auto generate like time travel indexes for every single Index in the system that just seemed unnecessarily complicated Of course, this covers all except that one last requirement Identify which reports contained a person Now this is the part where we don't have quite as nice a solution I think the solution can best be described as brute force So to do that we log all queries all select queries are locked And then we rerun the reports To verify which data so basically if the question is which queries that ran through August Actually return data about Steven, which is just going to rerun every query that we ran in August and See if they include Steven So this is slow. There are some heuristics for some of the reports We know how to like short track some things and make it more efficient But that's sort of at the very base of it and that's actually acceptable because this is something that Frankly, they expect to never have to use but if they do Having this take a couple of weeks is perfectly acceptable Yes, so it's likely unrelated to the time travel part, but no you don't well you need to Probably if you want optimization You need to run it manual because you might be able to say like this whole set of queries cannot be there But as long as you rerun it with the same parameters, it's gonna return the same data So yeah, the brute force method is just rerun everything. Yes. Yeah, so Agree if you have like people they have the depends on which level the aggregate is at So yes, you're gonna have to adjust some of the things Because some of the real thing the reports that actually return, you know aggregates over a million people are Never gonna have this requirement Which nobody cares if Steven was included in that the problem comes if Steven is included in a report that returns like ten rows Because he's probably identifiable within those ten rows So yeah, that's there there is manual stuff involved in when you get to this point Like you need to do some some thinking about what what's actually represented But the main thing is we needed to keep all the required data And really the only way to know who looked at something is to record all the queries Well, the no the other way would be to record all the responses and That would have been much more data as in much more data Materialize every single row ever returned from the system just doesn't really scale So one other word of warnings around this is or I'm level caching Hibernate does caching at two different levels Query level or entity level if you actually have that enabled Very interesting things can happen because you can get back different rows from different times Basically, you need to just either turn that off, which is what we did Or you need to make it aware of the time travel and like key it including the timestamp But you can't leave it on in default mode that will give you really really funky results So I think the first conclusion may be the best one here like anyone is not used range types You you probably have used cases for them. You just haven't figured them out yet Of course, we could have done this without range type with just regular start and end times But it would have been so much more complicated Yes, much of the code is auto-generated, but it would still have been so much more complicated range types are great and We can actually you know trick ORMs in a lot of cases so that they're sim they're so simplistic that they're easy to trick Like if the ORM actually made it easier to spread your data out across multiple schemas, this would have been much harder Now we knew everything was there the ORM gave us the requirement that there will be an ID column everywhere Okay, so let's exploit that and then trick the ORM because hibernate itself can run Like perfectly exactly normal as it did before having no idea that this happened once you manage to inject these things And I think it's a good example of like the flexibility of Postgres that lets you really take this problem that trying to do this at the application level I mean you could do it, but it would not have been minimally invasive It would have been many many many thousand lines of code This is a fairly simple way to solve what's actually a fairly complicated problem So if you haven't already look into range types, that's good fun, and you know, it's fun to trick the application developers Especially the ones who aren't here. Okay Well, I'll thank you very much, and I think we have time for a couple of questions. Yeah, yes So statistics on inserts and updates. We are there were some impact. It was surprisingly low In in the initial benchmarks people didn't think it was working Because the thing is we're adding this we're not adding any more commits It's kind of shows like because the problem is they were limited by number of commits in the benchmarks and it doesn't add any more commits It adds more data So actually the insert and update speed were barely affected at all. It was less than 5% Now query speed when you go into time travel will drop obviously because we're looking at much more data and we've got more complicated Queries but insert and update were surprisingly low. I would have expected more I think we had I don't remember what we had the requirement but the requirement allowed for quite a bit of slowdown like 30 plus percent But it was nowhere near that requirement Yes transparency to the application Complete transparency in that the views look exactly like the tables And that was one of our original requirements and making it simple and like making the application devs More or less not have to care and in particular those statisticians who actually input the queries need not to know at all That was the main reason yes no, actually party we Given that we know we're not gonna have a huge amounts of updates through them We're not currently in a position where we even had to start thinking about partition Partitioning them. I guess you could probably partition based on validity time But again, I'm not entirely sure how smart the optimizer is in figuring out Sending the queries to the right place, but it also depends on why you're partitioning if you are actually partitioning for the access Or if you're okay scanning five partitions every time of when it comes to all data You just need partitions for you know vacuum or something like that You could also partition based on ID Because almost all of these have I mean they have a uniformly increasing ID where some rows are deleted But in the history table, they're never deleted. They're always gonna be there Okay anyone else Slides are not available yet, but slides will be available on that website The code itself is not open source unfortunately The parts that are on the slides obviously are and actually just the other one is basically It's just a loop that runs the create table and create trigger statement on every Statement the the only part that would be useful that is not either in these slides or trivial is the injection code in the Java side and that's unfortunately not our property at all. So that part we can't help you with yes Yes, if there was significant schema updates, there would definitely be a significant uptake in DBA things if we were looking at a lot of schema updates We you would have to script that side It depends on what they're I mean adding a column to this is very simple You just need to remember to add it to the other table and then alter the view Things like changing data types of a column would be really really hard because then then the whole thing would sort of fall back You'd have to switch to a different Type so that would be very hard, but just adding columns is fairly simple Okay, without I am out of time So thank you all very much