 And when you go DHS to deploy wars getting an error and something you can do regularly anyway just do a jet pool again and it'll bring you down the latest version and just install scripts again. Right and then you'll be up to date. So Gerald, that's what you have to do. And then hopefully, please report back that your problem has been fixed. That was my error for trying to work on things while I was offline yesterday afternoon to get the chance to test it properly. Okay. So, what we are going to do is to look at enemies. Everybody knows that underlying DHS to web application. Postgres database and postgres database is a very complicated. It's the topic of again of a workshop in itself that we did that once. I don't know if anybody here is a member of FOSFA, a free and open source software foundation for Africa, but we ran a workshop in Accra. Probably about 12 years ago now, but we managed to get one of the postgres lead developers to come to the workshop and he gave a two or three day set of sessions just on postgres. It was fascinating. We should actually see if we can do something similar maybe get a webinar with a real postgres expert. So I'm not a real postgres expert. I know a bit. I've used it for a good couple of years now. I'm going to give you the benefit anyway of what I know of what is good practice. I always cautious about using the word best practice, because I don't claim to necessarily have the best practice. I just have the best that I know. Right. What are the considerations regarding postgres? Well, I guess kind of fairly important really the database is probably the most important part of your system from the performance perspective. Right. If you're, if your database is slow, then everything's going to be slow. Right. If your database is fast, then hopefully everything will be fast. It's also where your data is. And so it needs to be carefully managed and looked after, particularly proper handling of backup files and things like that. Right. It's something that you need to become more familiar with probably than any other part of your system. Right. You need to be very comfortable about dealing with your database executing queries on it. Making performance on it, backing it up, restoring it, things like that. After you've installed postgres by default, there's a huge library of tunable parameters, right, which you can then set about tuning. What I've done in my install is I've created like a template. I'll show that to you a little bit later on with just some fairly easy to follow tips on how to reasonably easily get okay postgres settings. What you generally do after that is then you sit and monitor it and you watch it. Right. You watch it when it's under full load, you watch it over a period of a week, over a period of a month, and make small adjustments as you go along. But before you even start with tuning your postgres and I'd say before you even start with trying to install DHS to it's really important that you start off by looking at what kind of hardware that you're putting it on, be it real hardware or virtual hardware. So before you install, right, obviously the disk and the CPU and the RAM are all quite important resources. From the database perspective, mostly you want to concern yourself with what kind of disk you're trying to put it on. A few thoughts around disks. In my experience, it's the IO latency, right, it seems to be the most significant factor we see in terms of getting good database performance. Over the years, looking at environments, people are putting their postgres on. I found by doing a very simple latency test, I can get a reasonably good idea of how well that database is going to perform. Latency is different to throughput. Right, you can get an environment which promises you that you're going to get many gigabytes of seconds worth of potential throughput. Right, but usually when you're accessing your database, you're not moving gigabytes worth of files from one place to the other. Right, that's what throughput is really going to be a best measure of the best sustained throughput you can get. VMware environments will often tell you how many IOPS you're going to get. Right. And then IOPS again, in itself, isn't going to tell you whether you got good latency. Latency is, I suppose, you find latency. It's the time it takes. If you're thinking in terms of old mechanical disks, right, is the time it would take the head of the disk to move from the place that it's at to the place that it needs to be before it can start doing a read operation or write operation. Right, so there's a little bit of before you're able to read or write, write the disk head had to move itself into position. And this is of course one of the reasons why SSD disk work very much faster because they don't have a mechanical head that needs to move. Testing your latency is a really easy thing to do. This is, if you've got a monitor on the moon in monitor gives you a good graph on latency. It's a good one to watch. So this little test takes half a second to run. Why can't I be on this? It's because it's full screen. I run it now. This little command, DD command, all this command actually does. Sorry, this is distracting for the presentation. Anyway, here we go. My disk. Here I am on my Lin node. DD what DD DD stands for. DD actually can't remember disk disk duplicate this DD actually don't remember. Anyway, it's just speaking from an in file of dev zero is going to bundle of zeros, write it to an out file. It's a very small file just 512 blocks. And it's going to try to do that 1000 times. So basically you're going to measure how fast does it take to write 1000 very small files. And typically on a Lin node. You are going to get a time there of how well under one of a second with the server gets a bit busy. You might even get up to closer to a second. That's how long it takes to run this DD command. I was looking at the server last week. I won't tell you where it is. But again, actually was a VM where provisioned device, and this same test took 30 seconds. If you've got a disk, which takes 30 seconds in terms of latency to be able to to write little files. I mean, you can count rocks faster than that. Your database is going to perform really, really badly. So it's a good thing before you actually install anything right before you start with LXD anything like that. Right. Once you've decided on what environment you're installing on, particularly your database. It might be a cloud service. It might be a physical server. It might be the VM where virtual machine you've been provisioned run DD on it. And if you're getting a figure here that's too high, then complain to whoever it is that's visioning you the disk. Ask for your money back, whatever it might be. Gerald, I know that in Sierra Leone, you're using a particular cloud service provider, which I won't mention the name of. Just here in Dublin. And they had a lot of problems over the last couple of years. Even disk controllers or something, I think they've been getting very, very high latency things. Even if you're getting from a cloud provider, check it, see what kind of disk you're getting. And you want to see a good low number there. There's a low number. Back to my presentation. A good SSD disk should be being used something under one seconds. As we saw there on lindo what I'm getting like point zero eight point zero seven. If you run the same test on your laptop, you're probably going to get something in the same order if you've got an SSD disk on your laptop. Your server is giving you figures like one to three seconds or over five seconds. Then you need to stop and think I mean, if, if your server's got a disk that's slower even than your laptop, is that the right kind of disk to be putting a database on the answer is no it's not. It was spent a lot of money on servers and discs. You want to make sure that your disk is performing properly before you think about putting Postgres on it. If you're configuring a physical environment. That's a lot of fun. Unfortunately, we don't get to do it so much anymore. It's more difficult, I guess, than this slightly broader range of skills involved and simply getting a cloud service. Yeah, if you've got a bundle of discs, put them in RAID 10 for a database, you're going to get like twice the performance that you'll get if you put them in RAID five. It's going to cost you more, I guess, because you only get half the usable disk in RAID 10, but you get what you pay for. If you're buying, if you're configuring a machine trying to put it together, much better than buying a big one terabyte disk by 800, 500 gigabyte disks instead, right? More smaller disks going to perform much better than this one bigger disk and also you're going to get some benefits in reliability, redundancy and the like. Nothing to bear in mind, if you're using something like my LXC containers, as I'm going to show you shortly, they all don't have to use the same disk, right? I mean, fast SSD disks are expensive and you might have a combination of slower disks and faster disks. It's possible to configure your storage pool so that you can tell your Postgres, use the fast disk, not the slow disk. If you've got a cloud service, that's probably not something that you would need to do. Yeah, the fact that, so do this command before you start, check latency on your disk and check it regularly or have some kind of monitoring tool like Moonin will show you what your latency figures are. Because just because you started off getting good results over time, you may start getting less and less good results. That can be because the disk controller is failing or it can be that in your national data center, you started off having the whole machine to yourself. But over a period of months they've over provisioned it and put more and more and more machines on it and suddenly you're getting much less and less direct access to the disk. If you're really interested in Postgres performance, join the Postgres performance mailing list, right? You can Google that. I don't have the link for it. That's what I do. If I'm dealing with Postgres database, it makes sense to be on the mailing list. There's all kinds of interesting discussion there. That's where you'll learn probably the most in terms of becoming a better Postgres manager. Okay, so that's before you install. Make sure you've got a good disk. What other tip can I give you? I've said that if you have fast disks and slow disks and the like, LXT is very flexible in that regard. This is a very slightly naive way, but a very simple way to make sure that your Postgres gets onto a quicker disk. I'll just show you briefly how you would do that. I'm going to leave the slides again quickly. I'll tell you what I'm going to do first before I actually do it. We work on an assumption here that we've got a fast disk and it's mounted here on a fast disk. Then one of the things, we already have a storage. We've seen it's called default. We can create a new storage pool in LXE. We'll call it fast disk. We can tell it again to use a directory file system and the source of that disk is sitting there. Then you can provide a switch so that when we're creating a new Postgres container, we want to create a container called FastBG, which will be for running Postgres on. We can specify minus s when we create it. Use this particular storage pool rather than the default. That will put your database on a different disk. That's a very simple way to do it. You can be much more nuanced about that. Particularly, the proper way to do it, I guess, is to take the Postgres data directory, take the Postgres, well logs and things like that, write them all to separate disks so that you're getting the maximum bandwidth out of each one. This is a quick and easy way of getting yourself on to a different disk. Sometimes you might need to do it. There are other reasons for getting yourself on to a different disk than because it's faster. It might be bigger. You might have a particular disk which is much bigger and that's where you want to put your database. It might be encrypted. We've not talked much about encrypted disk, but if you had a disk volume that was encrypted and you want to tell your Postgres to install itself onto that volume rather than onto the default, this is the way to do it, minus s. Okay, do I need to demo that? Let's do it. Just quickly. Move this thing out of the way. Okay, let's have a look at our storage. We see we've just got our default storage there. Every time I make a container, it's going to put itself in there. In fact, if you go and look inside that directory, you'll find the root file systems of all your containers. We're working on the artificial scenario and we've got a disk called FastDisk which is mounted there on FastDisk. One of the things that we want to do is I can say LXE StorageCreator. The StorageCreator, I'm just going to use the DER file system again and I'm going to say that the source of this storage pool is just Mount FastDisk. Error. I'm going to give it a name. Name of my storage pool, I want to call FastDisk. Now I've made a new storage pool as easy as that. It's the Mount. There we can see. Now I have two storage pools. The default as you see is now being used by five containers and a couple of images as well. My FastDisk currently is not being used. Nobody's using it yet. Let's make a new container quickly. LXE launch. All of these demos using the Alpine distribution because it's just much quicker. Specify the storage. Please make this container sit on my FastDisk. We want to call this My Fast Database. And everything is exactly the same as before. The only difference is that it's going to use a different storage pool. Now if I do a storage list, I can see my FastDisk is being used by one container. The reason for doing that, if you're running on a cloud server, which in most cases I think is becoming probably the most common way to deploy, you're less likely to have this much control over what kind of disk resources you have and how fast each one is. Certainly if you're working in a physical environment, if you're working in a VMware environment, you can make sure that your VMware administrator provisions you with a FastDisk which you can test using the DD command. Once you're happy with it, make a separate storage pool for it and put your Postgres on it. Kind of a waste to use this expensive disk or putting things like your Tomcat containers and the like because they don't need a FastDisk. All they do is write logs primarily. Now I guess DHIS in particular does write quite a lot of logs. In fact, it writes much too many logs. We had a complaint recently where DHIS writing about 14, 15 gigabytes of log per day, which is pretty crazy. That's something that I think we need to work on with the developer team that we cut down that log into something more reasonable, particularly because people do have to pay for disk resources and it's crazy making these massive log files. Anyway, database has a, it's all about the disk. And if you've got a good disk, you want to be able to use it. Sometimes you want to be able to move a container to a different disk. And this kind of quite commonly might happen with Postgres. I mean, you may have been running for a while. And then you got yourself a container called FastDisk or you got yourself some new disks and you want to be able to move an old, move a container off an old storage area and put it onto the new storage area. That's easy enough to do. You can do it using the move command. You would stop Postgres container, move it. You have to give it a different name, right? So move it from Postgres. Now we call it Postgres 1 and specify the storage FastDisk. This command would take quite a long time if your database is big. This is actually now going to copy the data off one disk onto another. And then we can just change the name back to Postgres again. You can move Postgres 1 back to Postgres. And after that, everything would continue to operate as it did before, but your Postgres would be sitting somewhere else. Note that this move command, we're just using it to move containers around within the same host. If you've got a number of different hosts in some kind of a cluster, it's also pretty much the same command really to move containers from one host to another. You just need to specify the remote host as part of the command. Yeah, I keep mentioning ZFS and then saying we're not really going to talk about ZFS. I think we'll have to, maybe if there's interest, we might organize a separate little seminar, maybe outside of the framework of this academy for people who want to delve into ZFS. But ZFS got lots of advantages, particularly with using the database. There's a nice presentation. It's a little bit old and running Postgres on ZFS. It goes into a lot of detail on using the snapshots and things like that. It's old, it's from about 2012. ZFS has been around for a long time, much longer than that. But one of the things it notes at the end of the presentation is that ZFS is not production ready on Linux yet, and that was true back in 2012. It's very much production ready, particularly in Ubuntu 2004. That's the main reason why I moved from 1804 to 2004, particularly so that we can do things like encryption at rest and stuff like that. But yeah. If you have a simple enough requirements, stick with the DER file system. If your disk is crap and you can get a better disk somehow, then it's easy enough to move your containers from one disk to another. OK, besides the disk, I mean, there's also things like low RAM and the CPU, which are important. Postgres likes RAM. It'll use whatever you give it, which is a good thing. It won't let it go to waste. Whatever Postgres doesn't use directly, it makes use of the fact that the operating system will cache on the disk. And so the more RAM you can give you Postgres, the happier it's going to be. So don't starve it. Don't try to run your Postgres on like one gig of RAM or whatever. You can still run, but it's probably not going to be able to do what you needed to do. If you have a really, really slow disk, which sometimes is the case, you can compensate quite a bit sometimes for having a very slow disk by just having a lot of RAM. If you have a lot of RAM, then the operating system will cache a lot of disk writes, and sometimes you can get reasonable performance out of it. I think this is the case in South Africa. They had a lot of Postgres servers running on some pretty good high quality spinning disks. And you feed those servers with enough RAM when they actually perform quite well. Probably not as good as SSD, but hell of a lot cheaper than SSD. Anyway, because Postgres likes its RAM, it's also important, limited. It seems a bit alter-intuitive, but it's true. My little cloud server that I've been using for the demo here, I've got 16 gigabytes of RAM on it. Maybe more typically, I might have a 32 gigabyte of RAM machine. If I put two Tomcat instances on it, and I know those two Tomcat instances, maybe they're going to use about 16 gig of RAM with their different heap settings, then it's quite important that I tell my Postgres not to use all the RAM, because we need to leave some behind for the Tomcats, otherwise it won't work. They may fail to start because of insufficient RAM. So what I would generally do after I've set up Postgres on a server, decide how much RAM you have, decide how much RAM you're going to allow your Postgres to use for itself, only understanding that Postgres will be greedy and it'll use whatever you give it. Once you've figured out what you can afford to give it, you can do this config command to make sure that your Postgres stays within those limits. Okay, besides RAM, Postgres also does use CPU, and a lot of the DHS2 processes in particular, particularly around index generating for analytics, even just handling lots of parallel requests, it's going to use CPU for complex queries, it uses them for doing string comparisons and building hash sets and doing all kinds of things. So obviously the more CPU you have, the better it's going to scale. On a heavily loaded system, you might need to limit it. Or if you've got 48 CPUs, you might decide that you're not going to give all 48 CPUs to Postgres. As I mentioned earlier this morning, there's two ways to limit your CPUs. You can actually just kind of hardwire it and tell Postgres to use particular CPU numbers, or probably more flexibly, you can set an allowance on it. As in what I typically do, I don't jump into doing this right away because it's better to watch the system over a month or two to see how it's going to find out how what the CPU requirements might be of your different containers. It's a good idea always to set an allowance of like 95% just to make sure that there's no container that can completely hog the system. Over time, you might want to limit that a bit more once you've understood better how the load is being shared out. The moral of this little slide, I guess, is give it as much RAM as you can afford and then make sure you set a limit so that it's only going to use the amount of RAM you've decided that it's going to use. Think about limiting the CPU, but do the CPU limits gradually over a period of time as you understand more the way your system is running. Okay, tuning. Right, as many of you will already have seen if you install DHIS to quite a lot of key parameters that you need to set up on it to work nicely. Things like shared buffers, work mem, I can't remember them all, I'll show them to you in a minute. One of the things that I do encourage not to do is to go into your PostgreSQL.conf and start searching for all these parameters and changing them there. The reason for not doing that is it becomes really hard for you to know what you changed and what you didn't change. A much neater way of doing it is to put all your tunings together in one file. So I have this set up in default, under comfd there's a file called dhispg.conf and this file gets loaded after the main PostgreSQL.conf file. So the way that those parameters work is that the ones that are set last are the ones that will get applied. So whatever's in your PostgreSQL.conf file this effectively will override that set refresh. So let's look at that file. Where is it? Where am I? Okay, here I am. Let's have a quick poke into our PostgreSQL container. In we go. This is the place that people generally start poking around in. If your PostgreSQL.conf file. My advice is leave this file alone. It's a very nicely commented, which is kind of useful. It's good to use this for reference. The different parameters mean when I'm making changes, I don't make them in here. We have set up specially for your convenience file in here for your dhispostgreSQL configuration. And in here these are the common parameters and this is the stuff that we've got from the from the main dhisp documentation. These are the things you typically want to be able to set maximum connections. The shed first the workman and his various things down here, all of which are explained elsewhere. So I won't go into them in a lot of detail. Max connections. What should that be set to? Well, really depends primarily on how many dhispostgreSQL instances are going to be using the same database. When you fire up a dhispostgreSQL instance, it creates a database, a pool of database connections, something called uses C3PO, I believe it's called. Our documentation says that the pool size that it creates is 40. I think we need to update our documentation because I've realized that in fact somebody's changed that the pool size is actually 80. The default pool size you get if you install dhispostgreSQL is 80 connections. There are some case places within dhispostgreSQL where we make sneaky little connections outside of the pool. We really shouldn't do that, but we do. So if I've got one dhispostgreSQL server, I would say I need 80 connections plus a little bit of overhead. So I would set my max connections for as 100 in that case. 200 to basically I guess 100 connections per dhispostgreSQL instance, depending on whether you've changed the size of the pool in your dhispostgreSQL setup. If I remember, I'll go back and show you that, but we're going to look at that in another session anyway when we look at Tomcat. Okay, so if you set your max connections too high, it's not going to break anything, right? Max connection doesn't actually do anything except set an up and limit. If you set it too low, you will get errors with your dhs failing to make connections. If you set it too high, basically you're exposing your postgres to the possibility of having more connections than it can handle in terms of, particularly in terms of the RAM, and it might pop. So I've made this, I try to make this as easy as possible. So we've, on this particular postgres, I'm going to say I'm going to give it 16 giga RAM, right? So I'll limit that in the way that I showed you earlier. So I know I've got 16 giga RAM to work with, so I'm going to give four of that to shared buffers. Really don't make this higher than 25%. I always used to try to make shared buffers as big as possible. If I had a load of RAM, I'd use a load of shared buffers and it actually become, not sure where the tipping point is. But when you've got large amounts of RAM, having huge shared buffers becomes less useful than giving some of that RAM to the operating system to use for cash. So don't make this too big. You might, as I say, set it to something and these are ballpark figures. You might watch it over time and try tweaking them a little bit. This is kind of from our manual. 20 meg is given as a reasonable size for your work mem. The thing about work mem is that it's used by each of your connections. So you need to think about, well, if you've got 100 connections, in our case we've got 200 connections. We've got 200 connections multiplied by 20 megabytes of work mem. That means this can potentially use up to four gig of RAM, right? So we've already allocated potentially eight gig, right? Four to shared buffers and possibly up to four or four max connections. Keep that in the back of your mind as we move along. The maintenance work mem, this makes a really big difference with things like generating indexes, which is something that's used a lot during analytics generation. Think of the big aggregate systems. Make it as big as you can afford. Usually you can't afford to make it too big. 512 meg is probably okay in this case. I've got 16 gig in total. I would set it at that to start with. Bear in mind as well that these are the settings that we are putting together for a system that's meant to be running the DHS application. There are times when your system, you're going to use it for something else other than running the DHS application, one of which is restoring database backups. I'll talk about that in a subsequent slide. Effective cache size is quite an important parameter for the Postgres query planner. Postgres has to, when it gets given a query before it actually, it has to make a plan. How is it going to set about executing this? The plan generally consists of how it's going to get stuff off the disk. Is it going to use an index? A lot of that plan is dependent on how much memory it thinks it has available. Effective cache size, you're giving it a bit of a hint how much memory you think it should have available. Here's an approximate formula here. You're available RAM. We've got 16 gig. Take away the maintenance work, Mem. That's half a gig. Take away the max connections multiplied by the work, Mem. That's another four gig. So what am I left with? 11 and a half. So we can give it, even on the understanding that all of these connections are probably not active all at the same time. I think something like 12 gig is a good estimate. We give that to our planner to say, well, if you are making a query plan, think about doing it within 12 gigabytes of RAM. If you set that wrong, if you say effective cache size is 128 gigabytes of RAM, it won't break the system. But what it will do, it'll cause your query planner to make really bad plans. If your query planner will start off thinking it's got 128 gig of RAM, it'll then discover after it's tried to do that that it doesn't have 128 gig of RAM at all. And it's going to have to revise its query. Similarly, if you set it too small, if we said our effective cache size is only one gig of RAM, then it's going to make very inefficient queries because it thinks it's only got a little bit available. This is a hint to the planner. Most of the rest of the stuff you could leave it as is. There is some planet. There is some quite a lot of documentation on this elsewhere. Random page cost is working on the assumption that you've got an SSD disk basically telling this is another hint to your query planner, how fast is your disk. If you've got a really slow disk, right, then you don't want to tell it that it's got a random page because what you give it a much higher random page costs say well in a random page cost is for 10 or something like that. So you're telling your query planner in advance. If it's making query plans that involve a lot of disk accesses that it's quite costly to access the disk. Okay, save my configuration for a moment. Let's go back to the slides. So, yeah, tuning. This idea for tuning. I think it's again it's opinionated. It's my opinion. We are tuning together in one place in the in the in the conflict file, rather than looking around in the main, in the main PostgreSQL dot com. There are some advantages. You can keep, you can keep copies of different configurations you can easily see what the set of customizations are that you have done. And as I've said, I've given you a kind of template file which doesn't have any tuning set on it, or it doesn't have much tuning set on it, but it's easy enough to customize it to your setup. And I've briefly mentioned this before but it's worth mentioning again that sometimes it's useful to have different versions of the file optimized for database restoration. A lot of you, particularly working with some of the really large databases will know that restoring a database can take quite a lot of time. It's not unusual to be taking an hour or more right to restore a database. There are things you can tweak in your Postgres settings to make this database restoration much, much, much faster. And now I've given you a few little hints here. Obviously when you're restoring a database you're not going to have 800 connections to it typically you're going to have one. You don't need to worry so much about your maintenance workman right maintenance workman. What we did in the calculation before we multiplied it by how many connections we were expecting to have if you're only expecting to have the one doing the restore. You can give yourself oodles of maintenance workman right give yourself four gigabytes in this case. Maybe that's too much one gigabyte instead of 20 megabytes. Similarly your, your workman which I've said, okay so maintenance workman. Now that's that's the stuff that's used for creating indexes. We'd set it as I can't remember what I said it as. We said it as half a gigabyte 512 megabytes increase that usually give it four gigabytes. Turn f sync off this is an interesting one and f sync. If you read any of the documentation on postgres there's huge warnings do not ever turn f sync off because f sync. These won't provide you the guarantee that when postgres thinks it's written something to the disk it has actually written it. It's really important not to turn f sync off when you have your postgres running with its with its database installed. It's to protect it against things like power failure right if your power fails halfway through halfway through the day, then as long as you have f sync on the chance of your database getting corrupted is is very low. When you're restoring a database on the other hand. If I get a power failure halfway through restoring the database then I'm just going to delete it and do it again right it's no big crisis and turning f sync off will make the database restore a lot, a lot faster. You just need to remember to turn it back on again after you've finished. So yeah, there's a little bit of ideas around tuning. And as I sometimes it makes sense to keep a couple of tabs and different tuning files may have a separate tuning file that you pop in for when you want to do a restore. Otherwise, the defaults that are in this override file with the commented suggestions is going to do you in most cases. Okay, so if you have a database you need to be able to run queries on it. I don't know I always use Unix pipes for running queries you don't have to you can do minus this some advantages with this is typically how you if I've got a query in there my my query dot sequel. You don't need to cut it straight into ESQL on the database. Generally a good idea particularly you've got complicated queries don't go to PSQL on the command line and start typing out the query and chances are you probably going to make mistakes anyway. So you put the query into a file and just throw the file at the database like this is going to return you the results. You might want to capture the result in which case you're just going to redirect the help like that. That's going to store you your up onto the onto the host. Maybe I can do that quickly. Remember what's running where I think I still got that wish. Okay, I should have an HMS. HMIS database in there will be fairly blank not much in it. But let's make a query. So to run that query I'm going to go at that is the query and send that into a pipe. And the name of my database on there will be called it. It's on the postgres server so I exact on the postgres. Run that on the HMS database and. You can see there's my query output. There's just the admin user because remember we just we just store this thing from blank. If I want to collect the output of that I can do that. My results. And there I should have my results. So that's how you run a query. I mean there's different ways to do it. That's probably the most convenient way in most cases. You can also do it. You can also do it remotely right using SSH with a pipe. I wanted to get to run a query on the database and get the result onto my laptop. This is the way I would do it. I'll take my query. Send the query to SSH to the server on the server run LXC exact postgres. And then collect the output. I don't know if you want me to I won't demonstrate that you know we're going to talk about SSH in a different session. I mean to be aware of, if you just run LXC exact SQL like that, then it will run as the root user in the database. Part of the setup script. We've actually created a database user called root specifically to allow him to do this. That has some implications. Particularly when it comes to restoring databases because if you if you restore a database using that root user. Then by default, everything that you do is going to be owned by the root user. So there's a little thing you need to do to reassign the ownership of your objects in the database. I think I'm talking about. Okay, again. Talk about the disk. Here's an example of the wrong way to take a database dump and restore it to a database. Now I give this as an illustrative example because this is the most common thing. That I see most people would do. If I said, as a little task, can you take the Sierra Leone database and restore it onto your database server. People will do this. They'll go in SSH into the server. Then they'll go and download the Sierra Leone dump file. Then they'll unzip it because you can see that it zipped. And then if they know the LSE commands, they're going to push that file into the Postgres container. Then they're going to exact bash into the Postgres container and they're going to run the SQL file onto the particular database. Okay, there's a horrible, horrible, horrible thing to do. All of that. And primarily because it is really slow, right? It's going to take you a long time to restore a database like that, particularly if your database is very big. The reason why it's kind of horrible, I guess, is as I say, it's all about the disk. In this case, we're actually writing this compressed database to the disk here when I do the W get, right? It's going to pull that file from somewhere right to the disk. If that's a very big file, many, many gigabytes, it's going to take you some time. After that, we're going to unzip it, which means we're going to write the disk yet again, right? This time even bigger. That can also take you a long time. Then start using up quite a lot of your disk space as well. Then we're going to take the file and push it into our Postgres container, which is going to make a second copy of the same big file, right? So now we've written the file to the disk twice and we're going to run it again. So now we've written the file to the disk twice and we've written the compressed file to the disk once. And then finally, we're going to read the file and install it into the database. There's much more neat a way to do that, but there's also quite a big security concern about operating like this. We tend to go to quite a lot of efforts to secure the database reasonably well. One of the things that completely destroys any kind of security you might gain from controlling access to your database is if you leave database dump files lying around all over the place, right? And it's kind of a very common security audit thing that I typically do. Go and look into people's home directories and see how many database dumps are sitting in there, right? I can guarantee, probably we've got 66 participants currently on the call. If I would go into the home directories of at least 50 out of the 66 of them, I'm going to find database dumps in their home directory and there probably won't be protected in any way. So yeah, try to avoid database dumps on the disk. What's a better way to do it? What's the way I like to do it? Okay, what was wrong with it? Okay, I've just said really what was wrong with it. What's a better way to do it? Well, we can do it all with all in one. And that way, you can ensure that the database never touches any of the disks at all until it gets written to its final destination. And all it is, taking the same commands really, putting them into a single pipeline. And let's do it. Let's do it just for fun. We need to make a DHS database first before we can do it. So, this thing out of the way again. We need a DHS database to restore into. If we've made a DHS instance, we'd already have one. But we'd have to delete it again. Let's imagine we were restoring it into our existing HMIS database. The first thing I would do, let's try to stop our Tomcat before we pull the database from underneath its feet. It won't like that. Let's stop the Tomcat. Then we can go straight to our database server and drop database. Good idea to do a backup of this first, right? Make sure you have a backup. I'll show you about backups. Shortly, let's drop database. Now we need to make it a new again because we want this thing to be blank. This is important. We want it to be owned by HMIS user. That's just the way the system works. Every instance has its own database and it connects to it as its own user. Horrible thing that we see often, I think we maybe even nearly encourage it in our way a manual is written. Everybody uses a user called DHS. So you might have 53 databases, all of which are owned by the user DHS. The problem with that is if your DHS user gets compromised, one of the DHS.comp files in one of your instances gets compromised. Then the username and password for that database is also the same username and password for your 53 other databases. Bad idea using a common DHS user for everything. Make a separate user for each database in each instance. We got a new blank database called HMIS. Let's try and restore the Sierra Leone dump to it. So I'm going to go with all you get minus own and that just means WG is going to take its output and dump it to the standard upward stream rather than saving it to a file. That puts it in a good format that we can then pip it. So we take the output of that command, push it into gun zip, it'll unzip it. And then we take up that command and we send it into the database and the database is actually called HMIS. This is relatively quick to do. An important thing about doing this is you, if you look at that command carefully, you'll notice nowhere anywhere in it. Do I write anything to the disk at all? Except at the very last stage when we're writing it into the database. This is kind of really important if you're dealing with things like encryption at rest, right? If you've got a sensitive patient database and you want to make sure that your database is encrypted at rest, it's on an encrypted partition. But if you leave copies of the dump file lying around, then you're basically going to destroy any benefits that you might have got. What I should have done before doing this is I should have tuned my Postgres a bit as I was saying to make it a bit faster on the database restore. That will probably bring this down to about a quarter of the time. Okay, there we are. Done. Okay, so that's the command. It's on the slide. You can use variations of this, but the main point is to use pipes. Don't save things into files unless you really, really, really need to. One thing that is important. Somebody might have noticed when we were doing the restore, there's a whole lot of errors on the front saying user DHS doesn't exist. That's because of the way the database dump is being created. It's a little bit ignorant on our part really. We kind of assumed with the Sierra Leone database dump that you have a user called DHS. We don't have a user called DHS. So that's why we got those errors. Generally speaking to make your database dumps portable. It's better not to save the information about who owns the objects. Because you've got no guarantee where you're going to restore it on the other side, whether that same user exists or not. Let's quickly go and look at that database. I should have the whole database. Go look at it in detail, but the important thing to look at is this. You see everything is owned by Root. Now my problem will be when I connect this database from Tomcat using the HMIS user. It's going to get all kinds of commissions errors, unable to create objects unable to do various things. But clearly you do need to run after restoring in this way. These are very simple reassigned, reassigned. Owned by Root to HMIS. Now I look like that. Much better. After this, I can fire up my... HMIS server again. This time it'll come up running the Sierra Leone demo database. Okay. Windows people will probably find all of that a bit confusing, right? Unix people have been working with all their lives. Piping is one of the most powerful things to do. And yeah, certainly with handling of database files is very useful. Try and keep them in a pipe rather than put them on a disk in most cases. Okay, this is another example. What's this example about? Okay, this is quite a common thing. I'm not actually going to demo it, but it's just an example of a similar technique if you like. It's quite common to have to move the database from one machine to another. Lots of use cases. When this might happen, you might simply have a new machine and you want to move stuff. It might be that you have Postgres 12 running and you've now created a new machine with Postgres 13 on it. So you want to move the database from one to the other in order to get it upgraded. Again, you can do this with a pipe. If I've got... Yeah, I'm sitting on machine A. I can go LXE, exec, Postgres, gbump. So I'll make a dump. I'll leave out the analytics files. The database called DHIS. Minus OS, minus O flag does. That would database but without dumping any of the ownership information. Yeah, that'll stop you getting the area. That'll stop you getting the errors on the other side if you don't have a similar user called DHIS on the other side. So I can take this file, gzip it, and then pipe it into SSH, the B machine. On the B machine, I can zip it again and LXE, exec it into the database container on the B machine. So this would move the database compressed and encrypted from a database server on one machine to a database server on the other. And without touching the disk anywhere in between. It's something that you can practice. Right, an example of piping. I think we have gone over time. How far am I on my slides? I have four or five slides to go. We continue or we wrap it up. What's the feeling maybe you can use the zoom chat. We talk primarily here about restoring rather than backing up. I can say certainly from looking at DHIS instances over the last 10 or 12 years that most definitely the number one security issue faced by many installations I would say probably most installations. Here's the lack of a working and a tested backup plan. If you were to do a risk register, we need to have a security workshop to talk about risk registers. But if you were to do a risk register, and you're going to write down what all your risks are and to rank them in terms of what's the most serious, most serious ones that are always sitting bubbling at the top. One is that you don't have a system administrator, or you only have one that he's thinking of leaving. And the other is you don't have a backup plan. Or if you do have a backup plan, nobody's monitoring it, nobody's checking it. And as a result of this, quite regularly, in fact, sadly much too regularly, we see data loss. Yeah, we talked to somebody might have might delete the virtual machine. Somebody can wipe up the container. Anything can happen. I mean, well remember his machine can catch fire. If you don't have a backup plan, you're going to lose data. People do lose data regularly. Sometimes people have lost many years worth of data. And the simple reason why they've lost the data is they don't have proper backups. Or they've accidentally deleted their backups, or they've not tested their backups. They think they have backups, but when they go to try to restore them, they discover that their backups are corrupted for some reason or another. They don't have their backups stored off the machine and preferably off out of the data center. I was joking a bit about LaMinde and Degambia, but they had a really terrible incident a few years back when they had a fire in the data center. Something like that happens and you don't have backups. You can be in a lot of trouble. The number one reason why there are no database backups. I mean, I think it's quite simply because it's not written in anybody's job description that this person is responsible for doing the database backups. Right. And that's part of having a part of having a security plan is working out whose responsibilities are what and who gets fired because there's no database backup. And you don't want to discover this after the fact you need to discover it by doing regular audits. Anyway, it popped a demand back the automatic automated backup process with the old DHS tools people will remember that it used to take database backups onto the local disk by default. We left that out with this, this set of tools partly because it's very complicated to try to figure out how things are going to get customized in different environments. They have different kind of disks different places where they want to put backups different frequencies for doing backups, trying to make an automated process that meets everybody's requirement was kind of tricky. And what we've done I guess is just to provide the template you just need to enable this. And you may then want to customize it I think at this crime here. Yeah. You'll find it in their user local etc DHS DHS to minus and then there's a cron job. And so you can set some parameters in the file, enable your cron job and you should get backups regularly happening. In fact, there's an automated job to do it doesn't mean that it still shouldn't be somebody's concern to be checking these things. Make sure that they are happening. So there's a file where you specify where you want your backups to go. Right. This is on the host machine. This might vary. You might want to put it into a different place. This is where it's going to go by default. So you need to decide what basis that you want to pick up my case on this machine. I only have one. I'm happy enough with PG backups for a moment. So I only have one database on the machine called HMS and I want that one back. So you can have a list of all your databases in here. And I'm just doing a plain backup talk about including in a second. And backups get rotated is the important because if you've got a naive backup and it just says take it back up every night. Then within a month or two your disk is going to he's going to run out. These database backups are quite big and DHS doesn't lend itself very well to incremental backups because of the whole analytics table thing. So the way this thing works placing and we've run this in lots of places it's kind of fairly battle tested by now. It'll keep a rotating seven days of backup. So, if we decide the day of the week. It'll keep backing up every night. And after it gets to the seventh backup it's going to start deleting the older ones. It keeps you the last seven days. What we can also do is specify that Sundays backup in this case days seven. We always keep the weekly backup. That way, you keep your seven days rolling back up but you also have a weekly backup of every Sunday. Then you can decide to keep three weeks of that and it will also keep a monthly backup on the first day of the month. If you run it like this it means that your disk space is not going to get chewed up quite so quickly. Because at any given moment in time you've got seven daily backups and three weekly backups. So you know you've got 10 backups and you'll also have one backup for every month. That way you can estimate your disk growth quite simply. So yeah this script will just run. We can run it manually. It's going to be. Probably have to run it as soon. It's going to backup script. It's going to read that. It's going to read that DHS to M file. It's discovered that there was no VAR PG backups. If I remember correctly, it should have created it. It'll create the VAR PG backups. And it's busy making. I can see it's working because it's taking time. It's backing up that Sierra Leone database demo database now. Right, that's it. You can find it in there. Something I need to do. I think is work a little bit on the missions of the backups. Because yeah. Then we don't really want to have the cups to be well readable. It's not bad because they're sitting on the host. Even so. So. All that's required then to get your backups working is to do that to go into. And make sure these settings are the way you want them. And then to get it run regularly. You're going to go to this file. Decided. Because we tend to make loads of cron jobs. It has to be for various things as better to give it its own file inside et cetera cron.d. You can put all your DHS to related cron jobs in here. So one of them is this. It's 25 minutes past eight in the evening. It's going to run DHS to backup. You can set that to a different time. I tend to set it for early evening usually because that's kind of the end of the work day. So most of the data entry will have been done by then. It's a good time to back it up before you start doing heavy jobs around midnight on the database. So if I set it like this. Then I should find tomorrow. I'll have a new backup. This is today's backup 202102. I'll get 20210304. And it'll keep seven of them. And then start rotating. I don't really have time. Too much about the. Other than what I've said. Look at it in the file. It's possible to encrypt your backups and sometimes it's a good idea. Maybe it's always a good idea. I haven't decided whether it's always a good idea or sometimes a good idea. There's a big danger with encrypting backups, right? Particularly for archive purposes. You want to be able to access this backup that was made 10 years ago. If you don't have the key that was used to encrypt it, then you won't be able to access the backup from 10 years ago. Sometimes there's lots of advantage of encrypting backups. One of them is you can, you can make them easily accessible, downloadable on a website somewhere that you can download the backups. You know that they're encrypted. You can store them off in other cloud storage somewhere. S3 storage, whatever it might be, but be careful. If you encrypt them and you lose the key, then you can't access them the same as anybody else can't access them. The way it would work on here is simply a matter of uncommenting that and making a list of the bases that you want to get encrypted. We might have had a packet database on here that we wanted to encrypt, for example, and we would put it in like that. I'm going to come back out again. You need a password file to use to, which will be used for encrypting the backup. This file needs to be protected and you need to make sure that you have a very safe copy of it somewhere and preferably more than one copy of it and preferably more than one person got access to it. In fact, you need a security policy around management of keys to deal with this properly. Because I say if you lose the key, you're not going to be able to decrypt it. But if you want to do it, this is the way you do it. If you just uncomment those two lines, then all of the bases will be encrypted at the same time. I think I've mentioned all that in here. I didn't come up with the remote. That's the other thing that you can do to make it easy. I'll try and demo this next time, rather than just talk about it. Because it's a little bit fiddly to set up. The ways we've been handling remote backups up to now, most commonly, is that you allow your machine to be able to SSH into another backup machine, archive machine, whatever you want to call it, that SSH needs to happen using keys and password less. And then all of your backups can be automatically synced to the off-site directory. Maybe we'll give that as an exercise today. Okay, I've mentioned a little bit about encryption passwords. If you create a password file, don't lose it. This is a simple way to make an encrypted file or encrypted or to make a strongish password. Forty hex characters, for example, you just do it like that, then we create your file, search your permissions on it properly. One of the things we've been talking about of late, more than in particular, wants to do this soon, so that's good, so you can help us, is to make use of S3 storage for storing the backups, partly because it's just much cheaper and it's quite convenient. We work well, particularly with encrypted backups. Okay, just a quick look about what's going on when your database is running. You have the Moonian monitor installed by default. It gives a couple of really useful graphs. Probably the most useful one is this one, where you can see the number of connections by day. On your database. You can see here, this database has got 80 connections, and as you get to see during the day, the number of connections on it increases to 200. The yellow stuff means that you've got some problem on the Tomcat server typically, right? Because that means that the transactions are idle. Probably some problem with CPU starvation on Tomcat, you'd have to go look at that. You can look to see if you have a lot of connections waiting for lock, it's the blue stuff. It can sometimes indicate a problem, beginning with tracking, unfortunately. But yeah, it's a very useful graph, just to be in a quick overview of health of transactions between getting onto your database. This is another useful graph. How big is your database, right? The common thing that happens is that your disk fills up. The other common thing that happens is people think, like if you were to look at this example, you would think I need 1.6 terabytes of disk if I was to look at the disk usage at this time here. Sometime during the night, we get this huge increase in disk usage. And if you're not monitoring it over 24 hours, you won't see that spike. That spike is happening because of generation of analytics tables. Quite important to know how spike is, because that lets you know how much disk you need. But yeah, moon in is pretty ugly looking. It's got, I think, some quite useful graphs, particularly for the database. If you want to know what's going on, right, right, right now, right, instantaneously in time, this is a really useful query here. I'm not going to run it for you now. But this basically is going to list out all the queries that are currently running on the database, sorted by the time that the transaction has been running. I find this generally useful if I'm troubleshooting. Something is not performing very well. I want to see what queries are currently running. It's kind of so useful that I actually made a little script for it, run this DHS to be activity and the name of your Postgres container. And it'll just tell you, give you a snapshot of which queries are actually running, all the ones which are not idle, what state they're in. It can be useful even to run this periodically. If you're trying to track down a problem, you know that there are some troubled queries that are happening. Every time you look, they're not happening, right? You don't know when they happen. Sometimes it's useful to sample this maybe every five minutes. You can make a little cron job and collect all the results, analyze it at the end of the day to see what queries were running. Things like PG Top give you a look at it as well. But then you reach a problem. If you do find there's a query in there that's taking ages to run. And you want to try and find out where does it come from? It doesn't really help you. There's nothing here that says which DHS to API requests are going to result in this query being launched. That's where it's my latest, latest love of the last six months, I guess. I started using this thing called Glowroot everywhere. This is an example of a very simple profiler, which you can install alongside your Tomcat. It's not part of the automated install at the moment. We should make it so because I think after using it for six months, we've decided it's really so useful. One of the things that Tomcat, I mean, what Glowroot will show you at any minute, at any time, you can see here, show you the actual API requests and which ones are consuming on average most of the time in this particular server. I won't tell you which one it is. I've scrubbed out some of the information there. API data value sets is expensive, right? We're using 58.3% of all of the server time. It's just processing this. So immediately, just logging into Glowroot, it's given us some useful information, right? We know if we're going to try to optimize anything on this server, it looks like the place to look is going to be API data value sets. We can look again with Glowroot on the slow traces and we can see some of these taken up five minutes, right? That's really problematic. If you click on one of these traces, we can dig into it a little bit. And in this case, it takes us straight to the problem. This is just part of what we see when we click on one of those. But first of all, it shows you the queries that are being executed by this API call. One of the things that I noticed looking at, is that queries are not that, right? This is 100 seconds, a million seconds. It's okay. The problem is this thing has taken nearly five minutes and the problem obviously is not the database. If we look in here, it becomes immediately clear there. If you can see the allocated memory, 26.4 gigabytes. That's what's caused the problem, right? So this API call has resulted in a huge allocation of memory. And that generally results in a lot of CPU use. The garbage collector goes crazy trying to clean up the memory. There we go. We end up with these very, very long request times. And also a lot of excessive CPU usage, just to process those particular API calls. Anyway, what I wanted to illustrate here is just, it's a very useful tool. We use a lot to isolate queries which are causing trouble. The example I gave you, in fact, not a very good one, because this example actually illustrates that there's nothing wrong with the query. But that's, in itself, good information to have. We're going to talk more about troubleshooting, I think, in the last section. There's so many different things to look at. Sometimes it's confusing. Having a few useful tools is helpful. Being able to interpret what you're seeing on those tools is a little bit more difficult to teach, right? That's something that does come with experience. But yeah, that's me finished talking about this request, I think. No, I had one more slide. Then I will let you all go. Okay. I didn't put aside a section for security in this academy, which is really unusual. We have always done, we've done a security section. Instead, I guess what I'm going to try to do is talk a little bit about security in almost every other session. So a couple of thoughts around security on your database. These are just thoughts, but as you've seen, the database is running inside a container. It's not that easy to connect to it from the outside world. Tomcat is allowed to connect to it, but we've made it even quite constrained in the way Tomcat can connect to it. It's tucked away like that for a reason, right? The last thing that you want to happen is for your database somehow to be exposed on the internet. Again, we've got experience of seeing some really horrible examples of that where you were able to directly connect to a country database. In fact, not even using a username and password. I've seen that twice. So yeah, we tuck it away for a reason. Don't go opening it up again unless you know exactly what you're doing. People like to try to connect to the database with PG admin. People like to connect to it using ODBC. In general, don't do it if you have a need for doing kind of extra database analysis. That's a pretty good idea. You know now that you're going to have good backups stored your backup somewhere, you can restore the backup and work on a copy. It's not a good idea to connect lots of third party tools to the DHS to running production database. If you really do have to connect to it for some reason or another, then the proper way to do it is using an SSH tunnel, which we'll talk about next week. Note I've already shown you this already. The database, each database owned by its own user, not the DHS user for everything. The most common pattern, I'd say probably 70% of all the DHS to installations in the world. People have got a DHS user which connects to this. And if they have five databases, the DHS user connects to all five of them. Talked about that before. Access controls as well as managed through pjhba.com and UFW. The default settings, the way that we have it set up with kind of these automated scripts is fairly strict. But that's good. I would keep it that way unless you've got some really good reasons to change it. Talked before. Don't leave your backups lying around on the disk. Don't even store them on disks if you don't have to. An important point, if you put your database on a different machine, right? The way we're working currently with the setup that you've seen up to now, your Tomcat is running effectively on the same host machine as the database server. They just both running inside containers. So the network that the traffic is going from Tomcat to the database is not running through a real network, right? It's running through a virtual network. And it's running unencrypted when you use a connection URL like, I don't know, you've seen the connection URL. If you need to run it on two different AWS instances, for example, then what you don't want is your traffic from Tomcat to the database traveling unencrypted through the cloud providers network. And the way you can do that is you need to change the connection URL a little bit like this to make sure that the connection between the database and the Tomcat server is encrypted. Okay, we could have a weeks long academy just around Postgres, but that's just a few helpful thoughts, I hope, on using Postgres. Reasonably securely and reasonably safely.