 Hi, everyone. It's time to start. My name is Elias Midemiensky. I'm working for the Data Agrit. Today, I will present you the talk about PostgreSQL worst practices. I think it's a good talk after lunch because you'll recognize some practices, maybe make some fun about that. It should be not so boring like usually it can be. So why worst practices? Most of all because best practices are just very, very boring. Never follow best practices. You can read those books and stupid things in those books because worst practices can really help you to screw things up. You can repeat that one time, two times and every time the result will be the best. Besides that, we who help you with PostgreSQL are nice people. Try to make us happy, we will be happy and thankful. So how it works? Actually, I do not make this up. So it's everything I experienced in my life as a support engineer, as a consultant working with databases. I just have lots of examples. To be honest, more than 100 now and counting, I'm just adding things while working with PostgreSQL. Then right before the talk, actually in between the lunch and this talk today, I run the script and it shuffles everything and prepares for me a talk for around 20 worst practices. To be completely honest, there are some worst practices I like a lot. So I just hint them and pin them into their representation. Please forgive me that at least I honestly say that I do that. So let's see on example how it works. The example is very simple. Do not use indexes. That's a good example of worst practice at testing one. Basically, full scan, table scan, index scan. For God's sake, why those indexes were invented actually? It's a really stupid thing. You can just insert and turn rows on your non-production environment on your laptop, see that actually my thesis was correct. Full scan and index scan basically is the same. Most likely you will need to hint the optimizer to do the index scan in that case, but who cares? Optimizer is a stupid thing as well. So nobody deals with more than 10 rows in production besides of that. Just do that. The first one, use as many count store or count one if you are smart as many as you can because highly concurrent environment, you have thousands of application connections simultaneously. That's okay. After each refreshing of the internet site, for example, your user will see some very meaningful figure and it will change after next refreshment. So just do that because it's a fast query. PostgreSQL will never check the version of exact row to count. It just do that flawlessly and without any problem. So actually you can cheat. You can use best practice. You can estimate the amount of the rows from PG catalog. But never do that because lots of counts, you call for some PostgreSQL consultancy and people can improve behavior of your application within several seconds. You can save on consultancy in that way actually. Another one, use object creation mapping. All databases basically share the same syntax. You know that already probably. You must write database independent code because most likely you are moving from one database technology to another one twice a day or maybe even more, I don't know. And there are no any benefits on database specific syntaxes because SQL standard is pretty the same. Inside all the databases are pretty the same so do not use it in any way. And besides of that, all modern ORMs are quite complicated new technologies for you because for example, take a look on Hibernate. People develop that for more than 10 years. It is really complicated. You can improve your skills, use ORM. Move your joins to your application. Just select everything from each table. Put into your application. Use your favorite programming language. Just think, okay, my favorite programming language is Python or Perl. I should do that inside the application. Then join them at the application level. And that's it, practically it's very convenient. You use your favorite programming language. Now you only need to implement a few things. One of those things is algorithms for joins, nested loop, merge join, so on. And then you need to implement a optimizer to choose the proper join algorithm. That save you a lot of time and it's fun anyway. So do that and many people do actually. It's, I would say it's not quite worse practice because so many people do that. It's practically best practice. Bin trend, be a scammer list. Because this boring schema from back from 70s, that's annoying thing. You just not need to design a schema. You only need one table. To be honest, who never did that? I can guess that. JSON-B data type is pretty effective in Postgres. So basically you can search this quasi table just as a normal table. And you can put max, 100 max in that JSON type, put thousands of parallel TPS, and that's it. That's working. JSON is very effective. That we call that very universal approach. So basically for each and every application you can have basically the same schema. But, you know, at some point people try to figure out what can be better. And actually I have another worst practice which can help in such cases. Be agile, be universal, use entity attribute value. Maybe you need to improve the previous schema using only three tables, entity attribute and value. That's very common in enterprise and regulated industry, for example. At some point you need to add the fourth attribute type because you figure out the decay. There are some vouchers, ints, something like that. Not all attributes are equal. That's very universal. You basically don't need to change your model and application to work with that. But at some point it starts slowing down. At that point you should do very important thing. You should call this thing kernel or core or main system or something like that. And then add more than 1000 denormalization tables to make it finally work somehow. But that's not the end of the story because after all those necessary steps are done, usually people try to add something more complicated like value version. Because it's much easier to add a version than to delete something from this agile schema, at least to delete it fast. So be agile, use entity attribute value. It's the best solution to achieve the high performance database application. Try to create as many indexes as you can. Some of my worst practices are controversial. Indexes consume no disk space. It's true, you know. Indexes do not consume any shared memory. Indexes do not have any types of overhead. Just add 100 indexes to one table and put on a page bench, something like alter column or something like that and see the results. You will know that I'm telling you completely true. Optimizer will definitely choose your index. There are actually some controversial practices like you can query PG catalog and see how many times your indexes was actually used. But that's not the topic of this talk. So I do not advise you to do that. Just keep calm and create more indexes. Always keep your old time series data. If you're writing something like inventory type, log analyzing, something like that, all those data should be in your database forever. Because you know, it's common business requirement. We want analytics on everything. That's a very typical thing. Basically, your data aggregates will never change and you can put all those historical data to some archive database in case we really change once upon 10 years or something like that. But never do that because you will have some benefits. You will always knew where to dig if you run out of the disk space because it's easy way to run out of disk space. And you know, there is one another benefit and very strong benefit. You always can't call that big data. Everyone enjoys big data. It's not smart today to have small data. You know, or you go into the conference on a low available websites with talk about small data. You probably will be never accepted. Actually, to get things even worse, you can solve those with possible partitioning. Technology. So you just, you need to choose how to partition smart. One second for a partition, that's a good choice. So you will have thousands of partitions a day and optimizer will be happy and your DBA will be happy. Everyone will be happy. And besides of that, you know, there is no easy big data. Big data should be hard to achieve. So it had to handle. Otherwise it is not a big data. So do that. Turn auto-arcom off. You see, I knew more than one way to achieve big data. It's quite easy to have several gigs of meaningful data in your table and the size of this table can be easily one terabyte or something like that. Depends on how fast you run into a round. But do that. Just switch auto-arcom off. That's a really stupid auxiliary process. And it's actually very annoying process because it runs for days and interrupts many things in database. Just switch it off. I don't know why by default it's on now. For years it was off by default. For years there were no auto-arcom at all. Just do that. Reinvent slony. Who knows what slony is? Who likes to maintain slony? I knew one man here probably can say that. But you know, PostgreSQL has a replication in core now for quite nice amount of years. Since version nine it actually in core. Since version nine point one it's usable. Okay, you need to reinvent slony. Write your own script in Python which takes logs from one machine, puts to another and controls the apply. Use one trigger based, write your own for God's sake. Because it will allow you to run into all the problems PostgreSQL had since introduction of slony. Replication should not be easy. Replication should be very complicated. So do your own and that will be good. Keep master and slave on the different hardware. Because you know this stupid idea that you need master and slave just to make proper file over fast. You can keep that in very different ways because if you have different hardware, you save on hardware. Your slave machine is much less expensive for example. It has no such amount of memory, cheaper disks and so on. That's really good reduction of the costs. And then you file over. Everyone will understand that our DBA team is very important. Everyone will remember your phone number. Always try to achieve more than one goal at once. To make things even worse you can actually leave the config parameters on the slave untouched. Then you file over back. You also became a hero of your company because everyone again will remember your phone number. Put the synchronous replica to remote data center. All these people explained that we do those things for maximized high availability. Indeed, if you have synchronous replica in different data centers, if something happens not only with master, but with a slave, all your cluster is dumped. So basically that's a good idea. But as always, I have a good recipe how to improve things. Just put that to another continent. Synchronous replica overseas will always work like expected. Just do that. Don't ask me why. Never use foreign keys. Use local produce instead. Consistency control at application level works every time as expected, especially in our era of microservices. You will never ever get data inconsistent using constraint check in the code. And another argument is that database like Postgres truly acids more than 20 years in production, that is a bulletproof way to maintain consistency. It's a good argument not to use that purpose at least. So never use foreign keys. Always use text type for all columns. It's always nice to implement using regular expressions or something like that, validation for IP or for data timestamp or something like that. And actually it allows you to find out what happens in your database like digging through the logs, trying to figure out who did, from which application we got dates like these. I can guess that one was from the pro application, yeah. But actually there are lots of ways how to screw the things if you're not using proper data types. Always use an improved fork of Postgres or make your own. Because you know Postgres, it's not a perfect database and you're smart enough to make it smarter than all those stupid people who write the code. All these annoying can VCC behavior 32 bytes transaction ID into 21st century. All these stupid constraints, auto-wacom, rebalancing or something like that. That's only because community is stupid. Hackers are old and quite lazy. They just don't want to change things. So hack it in hard way. Do not bother to submit any of your changes to the community and maintain it by yourself. It actually will be easy to merge that afterwards into the main tree if you want because you only will need to change. Jim says we had about 2,000 lines of code changed every day. So you maintain probably your fork for half a year or a year. You can easily calculate how many lines you need to merge after that. So for some of Postgres code works, that takes forever. So, you know, there are several folks based on 8th version of Postgres still trying to be merged in the mainstream. So use always an improved Postgres. It's a good idea. Postgres, like every multi-version concurrent control database, like long transactions. So it's a quite good idea, not just only use long transactions, but for example, call external service from a transaction like sending email or interacting by web service with some vendor or something like this. You probably can say, okay, you are kidding. It's really good practice because we need to transactionally send emails, exchange information with GMS, Q or something like that. I probably agree, but this would be a small comment. It's arguable. If the most of developers will be familiar with the word timeout. Most likely it works quite otherwise. You can just start transaction, go away for weekend and wait for a rep around. And yeah, besides of that, it gives you a clue, okay, we need to improve 32-bit transactions. Okay, never read your code you just wrote. I just put it here. So you write the code, never read it again. You can never do the stupid thing to write in a code. Even worse idea is to test it after that. So it basically, it's not ORM-generated code. It's real human did that. So just never read your code more than once. Have a problem with your Postgresco installation. I have a good recipe for you. Move those problems to the container because you know it's, you'll be entranced actually because it's always good to put something very persistent to the stable and something very un-persistent and very unstable. It definitely will solve lots of problems. Besides of that, every worst practice has some additional benefits I would say. Like now your problems are both outside and inside, outside the container and inside the container. Besides of that, you probably will learn how to automate your deployment and so on. And you can learn that very good because from putting databases into containers, you need lots of skills in automation most likely. Not only sloney can be reinvented. Need to convert time to temp. Write a stored procedure in C with a case which checks the timestamp and converts it to UNIX time. You probably think I'm kidding. I saw that in production more than once. So people, for many people, it's much faster to implement something than to read the documentation. That's a typical situation, but not only for timestamps. Need a message queue? Write it. Postgres has lots of things which can shoot your leg in that direction quite effectively. For example, select for update, no wait. Yes, it's possible to make a message queue based on table using that. And most likely it will work. How many time you spend to achieve that? Or a visa revokes or something like that. And besides that, you need to tune out to Wacom carefully on that table and so on. But just do it. Want to use OREM, you probably figure out that OREM is a bad idea because I mentioned that in the worst practice slide as a second slide. Write your own using stored procedures. Just generate SQL using stored procedures. Optimizer and parser will be happy and it will be very easy to figure out where is the problem, just do that. Don't want to use OREM, write your own. And never, never ever use exceptions. Because you know, documentation says they are slow. There is some box in documentation that says that. That's really strange. SCDBA, I usually think that developers do not read the documentation. But for some reason, this stupid block in PostgreSQL documentation, everyone read that. I don't know why. Maybe because Raisin and Tyson were a good idea because everyone constantly will pay attention to the locks. The error will be never lost in those locks. And anyway, who cares about errors? That's actually some annoying thing. And you will never try to expose it to customers and so on. So just don't use that. Application runs out of connections. That's one of my favorites, actually. Just set max connections to 1,000 to 10,000 or whatever you can guess. You know, one of reasons for that is that today the hardware is cheap. And servers with 1,000 CPUs are quite not very expensive. And you know, PostgreSQL will work effectively on the eight core machine with 1,000 connections, of course. And besides that, that doesn't solve your problem of the application, then it tries to connect too many times, not closing transactions and so on and so on. That's really some annoying thing you don't need to dig inside of that. And who said that PostgreSQL workers have some overhead on pre-forework, on semaphores and so on? That's rubbish. And never, ever use PgBonzer because it will be our best practice, probably. Instead of PgBonzer, you should use PgPool. Pull-in connections with PgPool is quite easy. Whoever tried that, and who use PgBonzer and who use 1,000 connections? Nobody usually say that. Pull-in connections with PgPool is easy. It's almost like writing a code in the EMAX operating system. So I knew that people would do that and enjoy that, but most likely I'm the AI person. Simple config, really easy, no documentation needed, useful features. So basically it's good to have built-in replication in connection, pulling an autofill over it, which you cannot use manually. So you know, there is a beautiful creature, a duck. A duck can fly, swim, and run. And probably in comparison to some more specific speeches, all three things duck performs badly. That's a case of PgPool from my point of view. But I should never say that because, you know, consultants will be happy. I'm really pleased that people use PgPool instead of PgBonzer because, you know, you always can charge something more about that. All this starts tuning Postgres from, for example, optimizer options in Postgres code conf. I saw it many times that Postgres misconfigured badly and people try to figure out a big red bottom or silver bullet to tune the things. And forget almost about the chart bar first, the stupid checkpoint stuff, amount of connections, and so on. You just need to start with something really helpful. Genetic query optimizer is a good candidate to be a silver bullet. If you change a couple of those parameters, everything will be good. Just forget about the chart bar first and checkpoints. Try to change something in configuration which looks complicated and helps a lot. At least you can spend lots of hours doing that. And probably the last bad practice for today. Have you heard about the new cool feature? Immediately use that in production. It only can be better if you try to skip the testing and staging and so on, just put it in production. Don't know how to do that? I teach you. Right today, you can do that in a couple of hours. But rush, everything should be very fast. Go today to NVCC and Mask by Bruce Momzen. Today, 3.15 p.m. Liberty One, listen and learn. Learn about X-min and X-max, right? Go to Bruce and ask what's X-min and X-max designed for. And after that, don't listen to his answer. That's an important moment. Use it in your application logic immediately. You know, this stupid transactions ID, maybe remember I said there are 32 bits. So if you have a lot of transactions, they just keep in rolling, rolling, rolling, rolling. If you use that for some version control inside your application, once upon a time, you will never run into the new number of transactions and nothing will be bad. So just, it's a good checklist, how you can use a new feature immediately. Then I first listened to Bruce talk about that. It was back in 2011. Okay, so I should wait for that from my customers. It takes two weeks or something like that. I got the first question about how to use that in practice. But anyway, don't forget that was worst practice talk because I knew the people who just took a photo of my slides, put that on Twitter, and then GD spills his coffee because he decided that it was best practice. Don't forget to send me your favorite if you want. I will try to include that. And feel free to take the talk and present there somewhere in user group or something like that. I definitely knew people from PJD France will license that and add some very specific French bad practices and will present that at PJD France. So feel free to do the same. And if you have some questions or suggestions, we have plenty of time to do that. Thank you.