 I'm happy, I guess, to have some more material to throw at developers when you have a chat about this topic with them It's a very important topic that we are going to cover and maybe with some surprises for for you guys So let's dive in. I've been using a PostgreSQL for a long time. I began on the previous century and I'm part of the contributors to it I've worked at Cytus data and and I'm told by our marketing department that we have a nice socks that I can throw in the audience if you want them So usually it's if you ask a good question, I throw a sock But if you just say this talk is awesome, you will get a sock anyway, so So be sure to yell something and I throw socks Okay See there is one guy who knows the rules And So as mentioned before I've been working on a Pg loader The idea of Pg loader is that you can migrate from another database technology to postgreSQL in one command line It only do needs two parameters So your former database connection string and your postgreSQL connection string it will figure it all out by itself Figure out the schema the data types the casting rules the everything the default values Foreign keys primary keys indexes etc. And it's pretty good at it The goal when I wrote this tool is that there is a lot of potential software that I would like to use But they're using my sequel and I will not trust my data to that technology. It's personal, okay And so no you don't have an excuse if you're working with my sequel You just run one command line and you can use postgreSQL instead so no excuses anymore. Just use it So okay, let's dive in with a data modeling The time of the link is a very important part of using postgreSQL because Some databases technology around here are proud to be schemar s We are not we are not schemar s and We need to see what it means So this quote is from Rob Pike and from a book issued in the 70s I believe So it says that if you if you have chosen the right data structures and organized things well The algorithms will almost always be self-evident Data structures not algorithms are central to programming Okay, so it's meant that application developers who are writing code if you're writing Python code Then you know you need to think about or you organize the data in your code like in a class in a structure You know named tuple in something else and all you pass data along That's the most important thing in JV it's about the same terminologies if you're using go it's strict If you're using C etc etc But it's always the same idea that if your data in the code is well organized Then the code is easy to read and you don't have so many bugs if the data is a mess You have a lot of bugs because everything you want to do is going to be complex to do when the data is not Doesn't lend itself to being processed a specific way It's exactly the same with the database modeling if you're if you're having a really hard time writing your sequel queries most of the time It could be one or the other reason The first reason usually is you don't know sequel very well So it's normal that you find it hard to do because you're not there yet The other reason is the schema is not adapted to your business case So of course every query you want to write is going to be hard to write So if you have a proper schema design then sequel is really easy to write so In this talk we're going to overview the classic rules that allow you to design a schema properly Before we do that. I want to show some Some example just to get your mind started because the rules are a little complex to handle so data modeling usually it revolves around data types constraints and When we say constraints in databases So most people we don't want to have constraints in our lives, right? We want to be free to whatever we want to whatever we fancy We nobody likes or almost nobody likes having constraints So most of people when we talk about database constraints, they're like constraints. It's boring. So The I think the term should be changed. It's not a constraints really that you're having It's a guarantee a very strong guarantee for your data in the database. So we'll tell more about that in the more slides but Constraints is the technical term that everybody uses but the one you we should have in mind is guarantees So how much how many? Guarantees or strong the guarantees you have in your database you want to have if your schema less You have zero guarantees. It's all up to you That's why we are really happy with PostgreSQL to have Data modeling issues and the schema and cetera because it means we have strong guarantees and For implementing constraints usually we use primary case for in case checks to write stuff so a couple examples This is a Table where where you would handle articles like news articles. It's a really classic one with a primary key And this chemise wrong. Okay. Is it obvious for everyone why it is obviously wrong? Okay, so we'll get back to that later in the talk. So just I'm just trying to tease you a little This one is okay. It's a Partial unique index example that I just copy paste it from an example on example on stack overflow or something so it's implementing toggles for like a UI for For the users and they can toggle it on and off so they can enable it or disable it And when it's disabled you want to remember they did use it before and then they disabled it So if it's currently have a value in disable that column Then you don't want basically you don't you're not taking care of it in the application So you can say that you have a unique index on the toggles, but only It only needs to be unique if it's not been disabled Okay, who knew you could do partial unique index in post-gray school. Yeah, not very Yeah, some of you. Okay, good. So it's a really nice trick to have and Another kind of a guarantee that we are going to see later is exclude Eugene using gist So who's done that before in post-gray school? Okay for the other guys We are going to dive into that later in the talk The goal of database modeling the first goal is to avoid anomalies So there are three possible kind of anomalies In search update delete basically so update anomaly this schema is wrong The model is not good because you see in the same table We have the employee idea its address and a set of skills that are associated with the employee but this employee Apparently he moved in in between where he learned not to be a public speaker and when he learned Apparently he's not living somewhere else. So now we have two entrees in the database that are disagreeing with each other So which is the right one? What is the current address of this guy? Well, we don't know We don't know and why don't we know because the schema is wrong. We should have The address at only one place not two of them So how to make it happen? We'll see that later So that's the update anomaly because we have updated the address and now we have two of them And we don't know which one is the right one Insertion anomaly is let's say that you have a schema for the in the faculty for the Professors who are giving courses and so each professor might be given one or several courses and this new one Has been higher in the faculty, but he doesn't have a course yet and With this model we cannot even enter the information because he doesn't have a course yet Okay, so it's obviously wrong right so that's insertion anomaly You cannot insert the data with that model and Then of course The other side of it is the deletion anomaly So now if this guy is not doing that course anymore You want to delete the line, but if you do you forget about him you don't have him in the system anymore So he cannot show up at the office or whatever. I don't know or it works exactly their system But because he doesn't have a course anymore. He is not on the list anymore So I think it's quite obvious for everybody why those schemas are wrong. So No, we are going to try To explain the rules around how not to get that schema and and some more But the thing that is central to a database design is that So I like this other quote to explain it so I will read it out loud again So it's from Fred Brooks and he said show me your flow charts and conceal your tables And I shall continue to be mystified Show me your tables and I won't usually need your flow charts because they'll be obvious. Okay The database design should be that obvious when you look at your database design if you don't understand the business from it Usually it means it's not so good It means that you've been lazy you maybe you implemented the database design the same way you did your object oriented architecture in your software, but your software is going to be tasked with Implementing a particular user workflow. So you take one user in the system if it's a web application for example, you're going to drive the user and Enable the user to do something like a particular workflow and the code is all about that but the database is awesome Guaranteeing the data business Models guarantees constraint, etc for the whole system at one at one style at one time So it's not the same thing serving one user and making sure that the system Globally is consistent. So you let the base is about the whole system being consistent and your code is about Allowing users to do what they have to do with your system. So there is few reasons why it should be the same model anyway So practically for the tooling Some people would use a graphical stuff, etc. So it's a it's a very nice way if you are More visual. It's a nice way to Basically to understand the problem set But then what I usually do myself is I just write a sequel script and I do begin and roll back and So I needed that thing to fit on the slide So it's a very small one usually I insert more things into the tables and then I write some Simple queries that I know I will have to use in the application and again if the queries are complex to write I'm like, yeah, maybe that's not the right schema if every query is easy to do I'm like, okay Maybe I can go with that and you see this trick here roll back So I think to this day. I'm not sure to this day but post-grad school is the only one or maybe one of the very few systems where you can actually have transactions and DDL's So you can create schema create table and then roll back and nothing is there anymore Okay, so in other systems That I won't name but as soon as you do create schema or create table Then there is an implicit commit for the transaction So when you do roll back there the system is like, yeah, there is no transaction running Which is the most scariest warning ever Like commit or roll back and no there is no transaction currently happening. What? Okay, so with post-grad school you can actually do that and so you can refine your schema as You go because when you do roll back you're back to a pristine stage again So you can do it as many times as you want to and you can change your mind like maybe 50 times in a day and Play around with it and see what happens. You just run the script and at the end nothing happened. So you had some Stats around at the end of it some queries that are giving more information about the schema and when you're happy with it Then you change that into a commit you run it again. I know you have your that have a schema to play around with Some people are using I've heard tools called the ORM and they say it's object relational mapping and Almost every one of them that I had to have a look at is doing it wrong because The R in ORM stands for a relation, right? Relation is the result of a sequel query So one kind of so do you know that in the sequel standard? So we have select query and we also have another query that is named table So if you have a table like in the previous example category If you do table category, it will dump the whole content of the table category Okay, so obviously a table is a kind of a relation So the if you have a table you can use it as a relation, but it's only a one Specific case for relations. It's not the general idea The general idea is that any and every SQL query that you are using is defining a relation and Can be used as a relation That's why when you do a select with a from close in the from close you can have sub selects Because does any select can be a relation so it can be used as a from source in a query Okay, so the ORM should be actually mapping The relation that is the result of the sequel query into your objects in the code and that would be a good ORM to have So if you're doing java have a look at juke goq if you're doing doing PHP have a look at pump POMM and There are others in others Programming languages that the the two of them that I have in my mind So if you're doing something else find it There is there must be a tool that is doing it the proper way But if you have an ORM that is mapping the base tables to your objects in memory Well, you don't have a use case for that really so it helps no one and it's Yeah, it's complex to use extra and it's very people would say that It's really hard to map those two things together. That's why I have this schema here But it's actually because usually they do it the wrong way if you try to solve the wrong problem It's way harder that if you try to write to solve the problem you actually have so the the mapping should be done with a sequel results so that So that I want to say about where I'm so we we're going to talk about database design and Modeling and normalization and I won't talk anymore about ORMs. Okay So that was the basis for Database modeling and the next step is although, you know if your model is good enough or not so We're going to introduce a set of rules that are really helpful to to know that Before I do that any questions yet No, maybe not so nobody wants any socks. I guess yeah, some people do I'm not good at aiming. So please figure it out There will be more if you want more. Yeah later. So normalization. What does it mean? Basically it's following a set of principles that are a low Allowing us to think about the schema in a theory and practical way those are a Few of the rules that are in a book from the 70s 80s about the basics of the Unix philosophy with a strong principles and I like to use that because as I said the goal is really to have a chat with developers application developers here and Unix principles they apply well to a building software and what I want to Have people think about is that the same Design principles that you can use to write software. You can use them to write your database schema. It's the same thing So clarity simplicity transparency robustness That's nice goals that you want to achieve when you Do a database schema and because it's all about the schema One thing is dry. Don't repeat yourself. Well, I'm using also that slide as a warning This topic is a little dry. So it's not that fun to talk about but it's very important. So let's do it anyway So who knows about normal forms? Yeah, so, you know, it's from the 70s. So everybody's like, yeah, we're in the 2000 something already It's like stuff for granddaddies, but actually, you know, nobody said they're wrong now. So They're still very much to this day important to remember figure out and use so When you design a database schema remember about those rules and actually do them So the first normal form the target is that you don't have duplicated rows in the table and No repeating groups arrays each cell is a single value. So I've seen many schemas in I've been doing consulting before and Many schemas where there is always this table with a commerce a common separated set of values because you know Creating a new tab in table in production was bored some so they just added fields in the in a single field And as soon as you do that then sequel is going to be really hard because sequel doesn't lend itself very well to that and So that those are the first rules Second normal form. So the trick with the normal forms is To reach the next level you need first to reach the first level So if you want to be second normal form, you need to make sure you you are compatible with the first normal form first So you need to do them one after the other always So it's second normal form if it is already first normal form and has no partial dependencies So here is another way to say it None key attribute is dependent on only a part of the composite key It means that if you have many columns all of the columns should have should have something Some kind of a relationship in this very definition towards the primary key if it doesn't it has nothing to do in this table Remember about the employee idea and its address the address is not really something Specific of the employee. Maybe model one people are living in the same house. It's not a property of the employee So it it's not dependent really and it's an unkey attribute that is not dependent on the part of the composite key extra extra As I told you it's a dry topic. So I'm not going to dive very deep into it if you if you like theory approaches Make it so that you have some times to dive into the topic later We're going to switch soon to a more practical ways to handle it After two normal the second normal form you have the third one and the BCNF one with a voice code Usually what I do when I design a schema is I try to target BCNF because it's the It's the point where it's it's still useful and that's so hard to achieve and There is more of course you can go To the first the fifth which is projection join normal form and then you have DK NF This is a step further. So I'll give you an example if you are for example Handling addresses in your database. So you have an address field How much? Or deep you want to be in normalization. Are you going to do first normal form or up to DK NF for an address field So if you want to design it properly because You have cities in a city you have streets street names But actually the street names you find them in Most other cities around it So you would have a catalogue of all the street names that are known and for each street name You want a relationship with which city is actually using that street name and then in this city We for this street name. What are the number of the numbers on the buildings that are available in the street? Because sometimes a skip number sometimes you have in Paris there is a street where you have four number fours in the streets Yeah, so I had a meeting once and I showed up at the meeting number four in the street I owe four for this for there for something else and what so I had to call them But if you're a job if you need to design an address database model because your job is for example to I Don't know either deliver things to people or maybe you are installing a Fiber or electricity or phone to people homes and you need the exact address And maybe you need the exact location of the box that is owned by your company and that is available in the people home Maybe you need to do it the proper way And then maybe you need to do that in a way that you can then optimize the traffic and the routing when there is like Uncle people that needs to show up at the people's houses. So now you need also a GIS system on top of it, etc Now if your business model for having the address of people is because you want to invoice them Like, you know, you generate a PDF that you send by email, but legally it's Required to have the address of the company on the PDF. Well, maybe a single field is going to be everything You know all you need for the address. So it all depends on The level of normal forms that you want to go in a database depends on what you're going to do with the data If all you do is accept it and then put in the game You don't care. It's a text field if you're going to actually have to process it And it's going to be important to you in your business then you need to you know make the effort to make it actually possible to use it so it's a set of rules and You need to figure out for yourself how much of it applies to your use case trick is if you're not sure just do them all and As you go do as you do them There is a point where you will say so This one was easy to achieve This one was a little more work But I'm happy I did it because I saw some bugs in my schema doing that This one was more complex. I didn't find many bugs. This one was fine But it begins to get boring and I don't get out anything out of this one It's it's only adding new tables and new things and I don't see the purpose of that But at this point now you have an opinion that in your case your schema is fine in BCNF and maybe boring in 4NF But if you don't do the exercise you will never know Okay, so please do it and figure out at which point it gets very boring for you and stop it then But just do it first. Okay, you will be very happy you did that because you will actually find bugs in your schema each and every time and An example is coming It's coming now actually Because the way we implement those Guarantees in a database is with as we said before the database constraint. So I told you This is wrong So who wants to win a sock by guessing why it's wrong Yeah, I'm going to throw them anyway, but please try to figure it out. So Yeah, I don't know at the end of the talk you will have some more if you want to apparently it's a really I'm not good enough at that. So anyway primary key means Your first normal for compatible because there is no duplicates in the table, right? That's what it means primary key Have you heard about surrogate keys? Yes Yes, yes, he's a DBA. He knows about it Yes, so a surrogate key looks like a primary key, but actually it's not it's a substitute for natural key The natural key is what allows preventing duplicate entries. So let's switch to this example in The article table the primary key is not listed here because look it's big serial So the idea you don't need to give it to postgreSQL if you don't give it it will be derived from a sequence and By the way sequences in sequel are the only sequel objects that are non transactional and that's on purpose. So everything in postgreSQL is transactional except for sequences and that's in the sequel standard, so it's not just postgreSQL and If you don't know why maybe you don't understand transactions, so you should have a look at that I'm not going to explain Not today at least so we insert category update title and do you realize that it's exactly the same values we inserting look at that Do we have a primary key really? Because the fact that this entry is ID 3 and this entry ID 4 No single user of your application cares, right? No one is ever going to be exposed to the fact that this article is ID 3 and The only difference is you have in between those two rows is the ID So basically you have a duplicate entry. So basically the schema is not first normal form compatible Okay, so it it's really shitty. You don't do that. So how do we fix it? Well, we want a natural primary key and then we need to understand the business model A little more. So what I did here is primary key over the category the publication date and the title Which means that in the rule History of the news article you're going to publish you will never be allowed to reuse a title from the past well, is it your No, because you have the publication date. So you can reuse the title But maybe you don't want to so you need to figure it out. So I was too fast Sorry about that. So see it again because there is a publication date in the primary key You have you can use the same title of the article in the same category more than once at different times like maybe you want to React sarcastically about an article you published last year or something like that But maybe your business model is that you never ever reuse title because I don't know the Journalism rules and maybe it's important for them. I don't know so because I'm not a journalist or a chief editor or something like that I'm not sure what should the primary key be but the the point I'm trying to make is that the only way to know is that You need to understand the business model before you do it So as a developer if you don't understand the business model well enough to do that Then spend the day or two with the other guys to project the eyes or if it's journalism spend the day with the Journalist asking them How it should work because you need to figure it out now And once you did that it means that any other table that has a foreign key relationship to your table No needs to have a foreign key relationship to the three columns Which is not only boring to write but also it's taking a lot of space on disk and It's making the indexes are going to be bigger and maybe We'll have an impact on the right capacity of your system and it's try to stir So really we say it's a distributed cost because it is going to be costly in more than one way and But it is So it's the right way to do it, but it's costly. So is there a trade-off? Yeah, I'm happy you asked So how you have a You can use a surrogate key which means that you have a an integer here the idea Or maybe you want to be a big int actually generated always as identity. It's the new way in the standard to spell a Sequence or a big int or something like that So please use that spelling is better than using the previous one for some reasons There is a really good article online that explains in details If you if you type that in Google usually the first thing is this article. So it's easy to find out So we will use that as A unique column, but the primary key is still that's okay That's the natural primary key, but we guarantee that the idea is going to be unique so that so that you can use it In a foreign key reference So I didn't write a slide for it, but now you can for it key Reference only the idea because it's guaranteed to be unique by post-grad school So you can use it that way But it's only a surrogate key and it's made clear in the schema because no the primary key is written here, right? Any question about that? Yes That's the usual question. Yes, so we could so I'm going to repeat it We could do it the other way around and say this one is primary key and the other one is unique Mostly it's the same thing, but I like to declare the intention So your intention is that the natural primary key of this database design is here And this one is just a facility you're going to use in other parts of the software because it's useful So it makes it obvious that is the case. Yes Yes, it's a Yes, so the question is about what if I have an entry there, but I need to change the publication date because it was wrong Well, then it's going to change. Yes, but not the idea That's why you the idea is useful But the idea has nothing to do with the fact that it's going to guarantee that you reach first normal form That's not helping but it's helping for other purposes. So that's why it's a good trade-off Okay. Yes Not new would be good. Yes. Thank you. So you spot a bug in my slide here is a sock for you Yeah, so anyone who spot a bug you you get a sock, okay We have other constraints to in a sequel so I will just at least some of them We did talk a little about primary keys and foreign keys Not new we just said it's important Check constraint you can have a manual a check that it's not new and there is a nice trick about check constraint For example each time it's from Vick over there each time Vick has seen a customer use a start date and an end date in a table He found some data where the end date was before the start date Each time unless you do a check constraint on that the database is not going to be able to guarantee that for you And you will have bugs because we always have bugs, right? We just had one before but it was in a slide So it's okay, but in your application code. It's a little different. So Mind your check constraints if it's important for you that you have things that are Like for example start before the end Well, maybe you should instruct the database that is the case so that you can guarantee is that guarantee that for you And then in postgresql and we also have exclusion constraints Basically, it's it's like a unique constraints. It's spelled a little differently than unique what we say that We if we meet with data with the same currency and an overlapping validity period Date range, which is not exactly the same as period, but we're using it that way here Well in that case we want to exclude the new line So it's like a unique index in a unique index when the values are the same one of them is going to be excluded Rejected you have an error you try to insert and it says no sorry I already have this idea in the database So here it's the same thing but the the rules about when do you decide to exclude the data are more complex than just an equality There is this strength sign here that everybody knows you need to read equal and this other strange Sign here that not everybody knows that you just read it overlaps Okay, but you need to learn that once after that you know it, but if you don't know that yet that is overlapping Okay Questions so far No, okay Denormalization is the is what you do when you have done a thorough job at normalizing your schema and then You reach some difficulties in production usually around performance But maybe around the size of the data on this car Maybe about something else and you want to fix them. Okay, so denormalization is an optimization technique that you implement at the schema level at the design at the design level and you only optimize after It's not working good enough. So let's remind us the rules of optimization Okay, so the first rule is don't do it. Okay, and the second rule is for experts only So, how do you know if you're an expert or not? I like this definition of being an expert It means that you have done all the possible mistakes So when I'm tell you I'm an expert in post-greSQL don't ask me how many mistakes I made before many of them over the years. Maybe not all of them, but I tried hard so Optimization you don't do it and if you're an expert you're really good at it You've been doing that a lot in the past. Maybe don't do it yet. Okay, maybe you don't need to do it actually The really important thing here is that any optimization is going to implement a trade-off So if everything was easy, you wouldn't do it. So make maybe not doing it is going to make your life easy So that's the best way and so each time you try to optimize something you're making your life hard So is it you know, is it worth it? Do you need to do you have to do it? So do you how do we know if we have to do it or not? That's the classic quote from Knuth, right? Everybody I've read Well only this part usually premature optimization is the root of a level but out of context It doesn't mean what it means actually. So maybe we should read it in full So programmers waste enormous amounts of time thinking about or worrying about the speed of non-critical parts of their programs And these are times that efficiency actually Efficiency sorry actually have a strong negative impact when debugging and maintenance are considered So that's about code, but it's exactly the same thing with your data by schema if you do Some trade-offs without thinking about them You will have duplicates in production and you don't know why you will have an undate that is before the start date And you don't know why you will have many problems with your data in prediction and you don't know why so please don't do it We should forget About small efficiencies say about 90s. That's the trick 97% of the time Premature optimization is the root of a level we should not pass up our opportunities in that critical 3% So, how do you know it's the critical 3%? It's begun use you've been wrong before and you know it's going to be there so if it was not Never used in prediction before you never had a problem before with it It's not in the critical 3% so be sure to make it normalized But when you know you have a problem, then it's in the 3% if you're not an expert yet It will burn in production and you will have to fix it and after you have fixed it in production No, you're you know one stage up in the expertise scale because you've done it wrong ones So no you're getting to be an expert soon So do you know it's just 3% you know because you've done it before and if you never done it before just don't do it yet and So let's see some Optimization techniques around the schema modeling in PostgreSQL The classic optimization technique is caching So basically the cash is the trade-off in between CPU and memory It's always always been there either you recompute the value or you have it already in memory So you read it from memory done you have the value already if you follow The normalization rules that we saw before you will never have duplicate information anywhere in your database Which means that maybe you have to recompute many things over the over time so let's see about auto cash data and As soon as you cash data, you know that you need to implement cash invalidation, right? You know the three the two most difficult things in computing science naming cash invalidation and off by once Yes So cash invalidation is on the list. So how do you do that in postgreSQL? There is an example That's from Formula one data set that is available as open data It's all the races from the whole history of Formula one you can easily have that in your database to play around with it's an Okay schema to work with if you do that and find it you will see that the schema is very not normalized at all So it would be a very good exercise if you want to train to pick the schema and do the work of Normalizing it I didn't for this talk. So it's the the schema that has been that is found and so this query is as you see grouping by The grouping sets, you know grouping sets Who knows about grouping sets? The other guys you don't know sequel Please, you know have training or something or look you know look the documentation because then developers go to me and say Yes sequel. It's the poor fool enough to do anything I have to do really So I need to resort to Python or maybe Java or see or whatever and if you say that and you don't know grouping sets You're missing big so sequel is really nice. So basically we're going to have the top Drivers per Formula one season and how many of them? I don't know but we are going to going to select only those who have more than this number of points Okay, just it's arbitrary. I did that because when I did that for another presentation then the result set would fit on my slide So why not? But maybe you have business rules that are more important than that. So that's a query that you can run and See that trick here. It's you can actually do that in P sequel. It's not just for the slide So you can have a dot SQL file and then you can interactively change the season and run the query again And it will pick the new one. So you can actually use variables in P sequel and then in your sequel scripts Okay, that's very easy to do. So if you didn't know that maybe have a look at it But Let's say that query is on the dashboard when the guys are logging in so each time someone logs in on the on their Nice dashboard for Formula one races. They have a nice updated Value for the current season maybe or something like that. So you are recomputing this thing for every login Maybe you don't want to do that. So maybe you want to cash the information. So you cash it by creating View so I do that in two steps. Some people do it in one step. I prefer it that way. So what I do is I create a view and Then I create a materialist view on top of it because then the definition of the materialist one is select star from the view That's easy But and then you can still use the view and you can compare it easily So is it was beneficial to use the view rather than the materials view or the other rounds? Well, I can easily compare because I have the two of them and I can use them Who knows about left drawing lateral Yeah, okay. The other guy is please, you know, try to learn about sequel He's very important. This thing is very useful when you need it when you don't know it exists You're like, I need to write some more code and when you know it exists. That's maybe two thousand lines of Python code here Okay, so that maybe that's the trade-off you are happy with as soon as you know it works And I sorry, I don't have time to explain it now. But anyway, so now we have a materialist view and we can use it We can even index it So it's very fast now the dashboard Displaying the dashboard now is very fast every user is happy But sometimes there is a new race and you need to update the results So because the cash is invalid as soon as there is a new race, what do you do that? Well, it's a single command line you refresh the materialized view. That's it So implementing cash in post-gross SQL is really easy. The only part we are missing is Integration into the app so in the application you would use the new thing rather than the ideal one But this is enough to integrate right you need just to think about it other Things you can do to denormalize a schema Some people want to have audit trails, which means that You want to have the history of all the data that ever went to your database even if when the schema changes because There is no application where the schema doesn't change Change is part of the world right the only things that don't change are dead So if your application never have any schema updates, it means that basically is dead, which I'm sorry some Some application gets retired so it's okay, but maybe it's not your intention So if you have you will always have new business cases new ways of doing things new products to Give your users things so it's alive So the schema will change which means that when the schema changes Maybe the old history is not compatible with the new schema Which means that you need to denormalize to be able to handle the history of the schema There is more than one way to do it one way that I like is Have using Jason into being an archive thing so because the data is Jason You can put any kind of schema into it. It doesn't care because Jason is Kim address That's why the information is repeated everywhere on the sheet on the every Jason entry here We'll have the same set of keys and different set of values. So it's really expensive on the storage It's a hard to search for etc, but some people are happy with MongoDB anyway. I don't understand why But you can you can with it's the same kind of approach It's a it's a schema less so you have zero guarantee But it also means that whatever happened in the past in your history Let's say you didn't implement your a new check constraint for a start date and end date and now you did and some of the history No is not compatible with your schema, but Jason doesn't care He would just accept that fine because it doesn't know about what's in the Jason field anyway So I found it useful To do that more than once so maybe you want to do it too Another trick is a validity periods that you can implement with that ranges So that's for example rates for exchanging money and the rates, you know are constantly changing But if you're doing any finance or accounting, you need to have the rate at a given date And what you do is that you're going to accept Different values of a rate For a single currency, but with none overlapping validity, which means that then you search the data like this like what How much was the euro at this date? So this operator looks strange when you don't know what to read it So I'm going to tell you and then it doesn't look strange anymore. So that's contains So the validity period contains this date By the way using the data type name here just before a literal string It's called a decorated literal in PostgreSQL. So you can use that that way The other way to do it is colon colon date and that is a cast So it gets the same result, but it's a little different. So date and The thing is because we said that we have this very strong guarantee Then we know that we will have only one record Because given a single date, there is no way that we have more than one because we said we refused to have overlapping entries So for a single date, we have the strong guarantee that there will only be one value in our database, right? So that no you can use that in your application and you can even use that as a join condition You can join in between the rates table and maybe The history of people who did buy or the history of your invoicing etc And you can match the invoice with the rate at the date of the invoice so you can join on this Okay, that's pretty nice to have So the normalization errors in PostgreSQL a lot of them are data types Composite arrays Jason so Jason in PostgreSQL is spelled a little strange It's spelled will be at the end. Don't you know don't bother but just remember to make it. That's it domains and Other tricks are like a h-store at three inter a hll if you don't know about them There was a talk this morning from a Craig who went through most of them. So if you if you've been there, you know about them if you don't There is material on the internet to help you through it know that you know the name you can actually find it. So it's easy Another trick to denormalize your data is partitioning you need to be really careful when you do partitioning Because especially before postgreSQL 11 when you do partitioning the trade-off is that you lose indexing you lose primary keys you lose conflict handling and you use You cannot update keys anymore. So that was before now You can do everything almost everything the the only trick is with the foreign keys It only works one way, but I'm told that maybe it will work both way in the next release Yes, two of them because you are actually implementing the future Thank you sorry, I love that so Partitioning used to be a very important trade-off to make in the schema design and nowadays it's less and less so because As soon as we have the wool support for everything you need to implement a normal Normalized schema, then it's not so much a trade-off anymore, but to this day has still a think of partitioning as a trade-off in the ability to implement a proper schema on top of the data long story short Try not to implement it and until you need it Right except if you know you're in the 3% of cases where if you don't implement it You will be in trouble in production But other new order, you know that right? So maybe you need it Maybe you don't know you need it yet if you don't know you need it yet Don't do it and if you need to do it PostgreSQL is very good at it. So you can do it Some people also want to compare with a not only sequel solutions And my point is that PostgreSQL is sequel and also a part of the not only sequel movement As I'm going to show now. So it's actually yes. Well, right. They have no sequel. We have yes. Well, it's way better The third thing is you can go skim at us with a json be as we said before so that I loaded Do the wool set of you know magic the gathering the cards game some of you might have been playing that before So you can have all of them as a because it's the data set is available as Jason So I just loaded as it was Because usually what you do is you load the data set as it is and then you transform it and normalize it in sequel because the best way to handle the data and Transform it into a new representation is the sequel query language. You might have heard about it sequel is really good at that. So First you load the data and then you process it. So some people implement ETL extract transform load I usually do ELT extract load and then transform because once in the it's in postgreSQL It's way easier to handle. So anyway, here you see a the container operator that we saw before it's back But now it's working with Jason because postgreSQL is an object oriented rdbms, which means that the way I see it You can have things like that this operator knows how to work with many different data types Like you would do in C++ or jvar things and you can actually Write pretty simple sequel and that is fully optimized in this case for Jason. Okay It looks like a normal sequel query and it actually implements a very smart things and ways to handle Jason No sequel system also usually are good at implementing durability trade-offs and In postgreSQL you can set synchronous commit different values Okay, what people don't realize often is that it's the dynamic Synchronous commit is per transaction setting. So you can have in the same postgreSQL system in production at runtime some Workload that is critical and remote apply, which means we only tell the user It's okay. If we know it's been applied to a standby replica and you can have other Transactions that you don't care so much about maybe it's the web session handle and you store it in the database because it's the start of the Project you don't care about the infrastructure yet, etc And then if you lose the database, of course, you lose the web sessions who cares, right? So it's off You can even do that dynamically in your application. You can have a here. It's a trigger where his trigger is written here It's a trigger that say if the user is spending so much money Which is set in a guck