 So, yes, I'm a senior principal engineer, Grant McAllister. So I actually do technical stuff once in a while. My guys will disagree with that probably. So I work for AWS in our relational database services group. And I spent a lot of time doing Postgres. And so today we're gonna talk about sort of what we've done new, released over the last 12 months with focus on what we've done recently. And also the things that customers have talked to us about, whether it's about Postgres or RDS and just kind of some of the lessons that we've learned and some of the other things. So, let's dive right in. So, the big major announcement that I don't know how many people have saw was that we now support the major version 95, yay, and specifically 952. So this is actually, the pick up on this release has been fantastic. Even though as you can see we did new minor releases at the same time, 9312 and 947, which is our new default. So if you go to the GUI and you say I want a database, you get 947 unless you ask for 9.5. Even then we're almost getting half our new crates are 9.5 right now. So people are very interested 9.5. It's great to see that enthusiasm for new release. And so far it's going great. So detail wise, every time we do a release, we add some new extensions. In the fall, we added to the 9.4 release at that time, IPv4. And two really cool ones that I think were really necessary that we are missing PG buffer cache and PG stat tuple. These are both very useful to be able to figure out what's going on in your database at a deep level to figure out what's in your cache with PG buffer cache. And then the stat tuple to really understand what's going on with what's in the data, right? For 9.5, nothing too exciting here from our perspective that we just did what Postgres did, which they added a bunch of extensions. We supported those extensions. So these weren't necessarily the ones customers had asked for. We just supported them as part of the new release. It's kind of a history for extensions. We started with 32 originally. With 9.3, we improved that to where we have 35 today on 9.3. We support 39 extensions on 9.4, and now 44 on 9.5. And so this is driven by basically the users and the customers and the requests. And so how many will we have in the future? Well, it's kind of unknown because it's up to you guys. And we actually started a mailing list, RDS Postgres Extensions Request on Amazon. So you can send a request and say, hey, I would like to see extension X, Y, or Z, and we will try to figure out if we can go do that. Now, obviously unsafe languages and stuff is still a challenge for us and it's one we're still trying to figure out how to do. But if there's things that look like we can do them, we'll go do the evaluation and we'll see about getting them in. Cuz that just allows more people to use the service, so. One of the nice changes with 9.5 that came in the database was essentially a change to parameters around checkpointing. So in 9.3, 9.4 that we support, we had set the checkpoint segments to 16 and the checkpoint timeout to five minutes. So what this means is, you start with 16 check segments. And as you start using them and filling them up, you get to the point where if it hasn't been five minutes yet, and you've done 16 segments worth, i.e. 256 meg, guess what? You're gonna checkpoint, right? So this was a fine parameter setting for small instances. But if you had a big instance doing a lot of writing, it wasn't very good. But we didn't wanna set checkpoint segments to this huge size, because guess what? You're gonna end up with a lot of wall segments using a lot of space. And if someone only bought a few gig from us for storage, they'd look and say, where'd all my storage go? It's in wall segments, that's not useful. So the nice thing about Postgres is they change to min wall size and max wall size. And so this works very similarly. And so we set the default to 256 for the min and 2 gig for the max. So you start off with the same 16 segments. But once those are used up, essentially you will start using more, right? The 17th, and when that one's used, so on and so forth, up to the point where you hit 2 gigabytes, right? So this assumes that five minutes hasn't went by. But now what your maximum is is 2 gig. Now remember, these are modifiable by you. It's not like we have hard settings, but as a default, this is much better. So how many people understand the relationship between the amount of wall you're generating and your checkpointing? A few? So every time you do a checkpoint, the next time you touch a block, if you have full block logging turned on in your wall stream, which is the default, then you're going to write out a full copy of that log. So let's say you do a vacuum. You do an insert, an update. You checkpoint, you do it again to that same block. That block's going on the wall segment. We've seen wall levels where customers produce two to three times the amount of wall that they need to, based on having too small of checkpointing. Because it's just dumping lots of copies. Now, you can say, okay, well, that's just a little extra effort. But if you have replicas, now you're shoving all that extra data across the wire to the replica. So it's really important to tune your checkpoints. So this makes at least a good default starting point. One of the other things we did with 95 is we changed our new parameter. It's called RDS Super User Reserve Connection. So we introduced this in 94. And this allows you as a user of RDS to have the ability to have your RDS Super User have reserve connections like you did when you were using Super User. So if your application user has used up all your connections, you can still get in and kill them off and work on your database. But because we introduced this partway through 94, we didn't want to set the default to anything but zero. Because we didn't want to change the connection handling in general. So now for 95, we set it to two much like you'd have the Super User one set. And you can modify this again. Talking about max connections, this was another one of these ones where we didn't have very good defaults. We changed from one very long kind of complex formula to a even longer complex formula. But the goal here was really to give you higher values of max connections for smaller instances and cap this at a somewhat still not reasonable but not completely ridiculous number of 5,000. And I'll show you how that looks. So the blue is the new numbers, the kind of orangish yellow is the old. And you see that the T2 micro, we used to get a lot of questions and people would say, look, I can't even get, why can't I have 100 connections? And we said, well, that's just the default, but it would confuse folks as to what was possible. So this really sets the connections much higher. And when we talk about the big boxes, you can see on the far right, we used to go over and have the setting set to over 8,000 for a R3 extra large. We're starting to talk about a new instance class that we announced as a general thing at re-invent, which is called the X1, which is going to be many times bigger than the R3 extra large. We would have probably ended up with max connection set to something like 30 or 50,000 on that box, just completely ridiculous, right? I mean, 5 is pretty much ridiculous anyway. But we wanted to cap that. And maybe we'll make it lower in the future, but we didn't really want to change too much too fast. Another parameter that we thought wasn't a very good default for us was maintenance work memory. And this actually we've applied to all of our versions, because we thought it was so important to make this change. So in 9.3 we started at 16 meg, which was just woefully too small, right? In 9.4 we increased that to 64, which for a small instance is probably okay, but for pretty much everything else, again, not great. So now we've basically made a parameter that scales with the amount of memory in the system, but it does a minimum of at least 64 meg on the small guys. The caveat here is that we applied this change to everyone's default parameter groups and any future ones. But if you have a custom parameter group, this won't apply. You'll just keep the setting the way you have it. So if you want to change it, you can. This again on sort of small to medium instances. You can see the really tiny orange there, where it was kind of ridiculous. To the blue, which was not great. And now you can see, on an M4, two extra large, you're starting to get at least 500 meg on this. And on the big boxes, on the R38 Excel, you'll get up to four gig. And maybe this might be a little much for you, but at least it's in the ballpark of probably where you need to be, from a maintenance work memory. So this again, very effective, you wanna have your maintenance work memory up there, it's gonna make a huge amount of difference to how efficient your vacuuming is, so this was again, we were having trouble with a bunch of things like people not getting vacuuming done because we didn't have this set as a default. So we think this will be a nice change as well. Sure, I haven't done too much testing around vacuuming. I know, sorry, the question was, do you have any statistics around how much improvement it makes? I don't personally, I think Jim had made some comment about Nazby about how much it had done that he'd seen some improvement. Essentially a lot of the stuff gets chunked down a lot if you don't have very much memory for a lot of these sort of actions. So, Greg, the total memory on the question was what was the total memory on the largest box, it's 244 gig today. Yeah, we didn't wanna go too crazy cuz again, we can't tell specifically what you're gonna use it for. But compared to 64 Meg, it's a great improvement for someone doing work on a box. And so again, our defaults aren't trying to pick the exact right value for you but we do have a lot of customers who don't know nearly as much as most of you about Postgres. And so we're just trying to get it sort of in the ballpark of the right thing so they can kind of work from there. I always tell people about all these parameters. The best thing you can do is go test them. It's very easy to set up a test instance. Go run through, especially in RDS. Run through your parameters, run a vacuum. See what it takes with different ones. Cuz I've actually seen where some of these settings, setting them extra large actually won't get you any performance and sometimes performance decreases. So yeah, sorry? No, we didn't make any changes to work mem because we didn't feel like that was very application specific. And if you make that very large, if people have a lot of connections, then they're gonna have problems. So yeah, we'll keep moving. I'll take more questions later on this if we have them. So of course, one of the things you need to have if you wanna get from one major version to another is major version upgrade. We announced this at re-invent last year. My slides had nine, three to nine, four, but of course now they have nine, four to get to nine, five. So you start with your production instance. We take a backup for safety. We run PG upgrade. This is essentially the same thing you would do if you were doing it at home, on premise. We take another backup, I don't explain why. And you end up with a prod nine, five instance in the end. So the reason why we do both backups is, is that there is no ability to do point in time during PG Upgrade, cuz it's not transactional. It's really cool how it does it, but it's really different than most databases. So for safety, you want backups both at the beginning and the end so that you can actually restore to either of those points, right? Now, this looks like the way you would go do this, but this isn't actually what we recommend. We actually recommend you take your prod database and create a test instance, and run through the PG upgrade process, right? Because what we've found, and I'm hopefully gonna do a lightning talk about this during the conference, is some of the issues we've seen with PG Upgrade, where you can have it fail because you have a dependency on a foreign data wrapper, you have an object that depends on that. And during the upgrade process, it can hang on that. And so there are things that could cause problems. The other thing you wanna do is you actually wanna test your application against the new version, because maybe something else in Postgres has changed the behavior, plans, whatever. So once you make sure that's all good, then you go update your production system, right? I mean, it seems like common sense, but the number of times we get people that say, yeah, my production instance is down cuz I was trying to PG Upgrade. It's not like one person has done that. There's been a number that have called us and said, help, right? So it seems like an obvious thing, but I like to reinforce it. So let's talk a little bit about security. Security's very important to us. It's one of our main tenants at AWS, so I like to show the things that we're doing there. So typically you have your application host, your database instance, you have your backups. When we first started, the main way to protect your database from a network perspective was to put a security group around it and say who could come in and talk to your database. Then we added the ability to do SSL. And then we added virtual private clouds. This allows you to do all kinds of routing, ACLs. You can just configure the heck out of stuff that I don't normally even look at, right? And this was okay, but people kept saying not the full picture. So we did encryption at rest. So this is really nice. You can use a default key or your own key through our KMS system. And you essentially, your database, your logs, your backups, your snapshots, they're all encrypted. So this is all good. And when we got this done, we thought, great, we have the entire encryption story. And then someone pointed out and said, well, what if someone sets that cell mode to disable? Well, what happens is, no SSL. So as someone running a database, you didn't have the ability to actually force your users to be encrypting all the time, right? So because it's a pghbacomp setting, which is a little different for us, we actually added a new RDS parameter called force SSL. By default, it's off because we didn't wanna break anybody. But you turn it on by setting it to one. And that basically changes the HBAComp like you would and forces SSL on if it was off in your parameter group. And now when someone does disable, instead of SSL not being on, they don't get to connect. So now we have that end-to-end story of encryption on the wire and encryption at rest that you can actually use from a compliance perspective. So if someone says, I need this, we now have the full story. So that was just recently released. One of the other things we released over the last year was the ability to share snapshots. So if you have an instance, you can take a snapshot or you can use one of the nightly ones. Let's say you have a test account you wanna share with. You basically go and click and say share and you put that account ID in. Great, and now that account, you can go create a snapshot off it, you can go create an instance off it. It's very handy for doing things like this for a test. You can also share it to public, which is kind of interesting. Like if you have a public data set that you maintain and you'd like to provide access, you can say, look, just instead of giving them the raw data to load into a copy command, just have it all ready to go, right? Make a copy. So this is all great. Yes, question? Well, we'll cover that. So, of course, then we wanted, people said, well, that's all nice, but now you have encryption at rest. How do I share those snapshots? So we just added support for that as well. But it's a little different. Because now you have encryption around this whole thing, and you have that key, right? That protects your data. So when you go to say share with account, guess what? If you use the default key, we say no. And the reason is that that default key could be used for literally thousands of objects in your account. And even though you just wanted to share that one snapshot, you actually share the key for everything. So to make sure this remains secure, we require that you use a custom key. So my recommendation is if you use our encryption at rest feature, always use a custom key. It's much more flexible. It takes another couple seconds to go into the IAM portion of our console and just go into encryption keys, hit create. That's all you gotta do. It's really simple. And so once you've done that, it's much easier to share. Now the cool thing with sharing a snapshot is, it's kind of scary because you're talking about, hey, you encrypted for a reason. You don't want that data just going anywhere. What's nice is you have to actually give permission on that key to the external account. So this allows separation of privilege. You can have your security person have the credentials to be able to share to say which accounts the key can be shared with. Your DBA can own the, which snapshots do I wanna share? And only the combination of the two of them together allow you to actually see the snapshot. So this way you don't end up with some data leakage that you didn't understand. Once that's done, this works exactly the same as with the other one, you can create a snapshot, you can create a DB instance. And of course, they get new keys. The people creating those would specify new keys in their new account, so they're not using the same keys, right? So it's re-encrypted. It's just the reading of the key. One of the questions I got when I first presented our encryption story was someone said to me, well, what does that cost as overhead? And I said, I don't know, which isn't great when you're presenting. So of course, for the next time I did this talk, I came back and I read some benchmarks. So PG Bench, read only in memory. Well, if you're doing encryption at rest, memory's not at rest, so you shouldn't see any difference. And that's what we see. It's a little jitter, so the tests are a little different, but it's not meaningful. Now, when we go to the read-write test, and for those of you that use PG Bench, you'll know that it actually does a fair amount of writing compared to the reading portion. So I would consider it somewhat heavy. And what we see is that small numbers of threads, small difference, and that difference gets up to about 5% to 10% on fairly heavy write workloads. So my belief is you're going to see probably less than this. And I believe as the Intel CPUs keep getting better and better, we're going to see reductions in this number further along as we go. So I don't think it's too bad of an overhead. The other thing that customers have been talking to us about for the last two years really heavily is helping us, they really wanted a lot of help with data movement, both in and out, just replication, migration. So that actually spawned a whole other group inside of our team that is the Data Migration Service, or DMS. We talked about calling it a replication service, but we decided on a migration service. But it can do both, so you can think of it that way. I'm hoping over time that we'll just call it DMS, and everyone will forget what the M's for, but I doubt it. You can see that we support a wide variety of engines, Postgres, Oracle, the whole MySQL family, including Amazon Aurora, Redshift. So this allows you to basically move data between similar or dissimilar engines. And we use change data capture to do this, so it's log-based. And I'll go through the specifics. So if you're using this for migration, if people are familiar with GoldenGate or tools like that, this is very similar, you start with your application, your database, on-premise, or an EC2, and you basically hook up connectivity, either VPN or directly, to EC2 instance or an RDS instance. The DMS service basically is like RDS in that you spin up an instance, right, of it. Once you do that, and you can choose the size and a bunch of other things, once you've done that, you get basically a GUI that you get to connect to this via the console and say, hey, what are my source and targets? Where am I going from and to? And then you can select table schemas, databases, parts, the whole, whatever you want. And then it's going to do a full data load. So this is kind of interesting because it's essentially a full select, and you've got to be able to pull all that data, right? So once that's done, then the cool stuff kicks in where it's the change data capture. It's pulling from the log stream, catching up. Once it's fully caught up, you basically stop your application, let it finish catching the last little bit of the application up, and then you switch over. So you can make this a very, very small downtime migration. And again, this is for multiple engines, but specifically for Postgres, what does this mean? So we use the logical replication features built into Postgres in 9.4 and above. So guess what? You need to have 9.4 above to use this feature. We've had a bunch of customers say, I'm on 9.3, I'm on, and I said, well, you're going to have to upgrade before you can move and use the tool. Today, the source is only EC2 or on-premise. RDS is not a source. It's one of the things that we're working on. And we want to have shortly, but I don't have a specific time on that. As I said, it does a bulk copy, so you need to be able to have a consistent select when you do that. So you've got to have the ability to do that, not break that. And I just have the link there if you want to look at more of it, because we could have a whole presentation just on this tool. Along with that, we had a lot of customers say, I really want to move from, let's say, Microsoft SQL Server Oracle to something like Postgres. A lot of license costs involved, they really would like to reduce that. So we came up with a tool called SCT, Schema Conversion Tool. And it's a GUI tool that you download and you run, and you run it against your current database. And you can see the sources there on the left and the destinations on the right. And so this is a tool to allow you to move into either EC2 or RDS and to convert from one to the other. And the cool thing about this tool is, even if you don't plan on using the tool, it's a great analysis tool, because it'll go through and look at all the stuff you have inside your database, and tell you the areas that will be just simple, easy, those are in green, those will migrate no problem, complete compatibility, the yellow you're going to have to look at a little bit. The reds are ones where there's probably not a compatible feature in the other engine, so you're going to be like, oh, you're probably going to have to go rewrite that piece of your application or code, right? And actually, you can walk through this specific detail, like here we're looking at something that was a yellow, and it's a, hopefully you can see that a little bit, it's on the left is a Oracle procedure, and it's converted to Postgres on the right. And it was just saying, well, just double check that this syntax for the date stuff is going to match exactly what you want. But it does produce the code, and we've had a number, you know, a lot of people doing conversions from Oracle, for example, to Postgres quite successfully using this tool. That doesn't mean it can do everything today, but it does do a lot of different pieces, so, and that's, and like I said, it's free to use to go to EC2 or RDS as a, or anywhere in AWS as a destination. The other thing that I would talk about, that we, that, you know, my guys would tell me to reinforce always is that, you know, vacuum continues to remain a bit of a, you know, a problem for some of our customers, especially transaction ID wraparound. How many people have had that occur to them? Yeah, a few? It's not something you ever want to have happen to you. Typically for customers, it can mean, you know, at, at probably at best minutes of downtime, at worst, probably days of downtime, while, while it's in single user mode and it's cleaning up. So it's not something you want to have happened. So you really want to look at your vacuum parameters and really get, you know, the settings correct. Well, that's good to say, but one of the problems we had with RDS was that customers didn't have quite the same visibility in the vacuum in RDS as they did in stock postgres. So we've done some releases lately that kind of try to help that. The first is that we introduced this RDS force auto vacuum logging level, and this overrides sort of our logging of the admin user so that you can actually see these. So this example is where auto vacuum basically gave up and canceled because it couldn't get a lock on this table. This would help you diagnose why you're not getting vacuuming done, right? And the second piece that we just added to all the minor, the new minor releases is that in PG stat activity before if you looked, you would see these insufficient privileges. You couldn't see what vacuum was doing. And now you can. So again, this is handy if vacuum has gotten stuck on something and it's just spinning and you can't figure out what it's doing. You can look here really easily to figure it out. So we hope this helps with the vacuum. We're also looking at some other stuff around statistics and some other pieces to hopefully help avoid more transaction ID wraparounds. Because I know Brian and Nathan and a few others would love to see that go away. So let's talk about scale and availability quickly. One of the things I wanted to highlight is we released new instance classes on a regular basis. The newest one is our M4 class. The M is sort of a medium mixture so it's a balance of RAM and CPU versus the R which is more memory heavy. So I picked the large class and I basically ran PG bench and a read-only just to kind of show. And as we step up, you can see that the green line is quite a bit higher than the blue line, right? And we actually end up with 37% faster than the M3. But the cool part is, look at the prices there. 39 cents an hour for the old one, 36.5 cents for the new one. That means a 46% better price performance. So basically what I'm telling you here, come spend less money with me. Go change your instances. It's a scale compute, one move, and you actually get more memory on the M4 for the same size. So in general, a really great deal. It's why I put them up here, because from a value perspective, you always want to be looking at these new classes. The other thing that we got for a lot from our feedback from our customers was we love RDS, but I can't use it because I don't have SAR, I don't have IOSTAT, I don't have TOP, I can't see the things I'm used to seeing. So we couldn't give you OS access, instead we gave enhanced metrics. So all of these metrics are now available to you to turn on. It's a feature that you turn on. And you can set the granularity anywhere from one to 60 seconds. So you can get one second granularity of your process list or your memory or IO, or any of the others. So how does this look? This is like a process list. I know it's a little small. But there is a select I was writing on PG Bench, and you can see the CPU there. It's like at 3% or whatever. So you can actually see it. Postgres is nice, because you can actually see like there's a whole bunch of PG Benches sitting in commit, finishing up doing commits there. So you can actually kind of see what's going on with your instance. From a metric perspective, this is what the graphs look like. You can add all or you can have whichever ones up you like. And let's say you were having a problem performance problem right here, right? Can everyone see that little spike? So that's a CPU spike. Now, if you were looking at our regular metrics, and if that thing only occurred for 10 seconds, you probably would only look like a tiny little bump, right? Because it didn't occur for a whole minute. Now you can get down and see, oh, well, that happened for 15 seconds, and exactly no. And then you can look at the other metrics, and you can figure out, oh, look, my number of running tasks went up dramatically. I wonder if someone started something up that beat up my database, right? So you can go and correlate all these things. So I think this is going to be really powerful as a way to kind of work with RDS. The other question we get a lot is, what should I set my shared buffers to? This, again, is the example of the R3 at Excel, 244 gig of RAM, as we said. You need space for your processes. We set shared buffers to a quarter of the size of RAM by default, again, adjustable by you. And basically, the rest will be used by the page cache. So when you select data in Postgres, it's going to go and look at the shared buffers. It's going to go look at the page cache if it can't find them there, because it's basically going to go to disk. Now, let's say it finds it in the page cache. It still has to return it all the way back up through these layers, and that takes an extra overhead. So we believe that the shared buffers should be set to the working set size if you can fit them in. So, and I'll explain why. Now there are some corner cases with Postgres where if you can't set your work, your working set can't fit into shared buffers, where the page replacement algorithms can be kind of nasty in certain conditions. So, you know, you do need to do testing, but there is an advantage to doing what we're recommending. So, PG bench, right workload, R3 and Excel, working set is 10% of memory. So that's 24 gig, that's how big my basically data is. So, TPS on the vertical, the different colors are basically, you know, levels of concurrency with threads. And the bottom is how much shared memory is a percentage of the overall box. So this red line is basically to the left of that. My working set doesn't fit in shared buffers. It's going to spill over the page cache. To the right, it's going to fit. What we see is about a 5% to 6% improvement across the board in this case. But this isn't typical, because that's really not a very big working set. What about when we go to a 50% working set? Same test, same run, but now, you know, the left and right are a little more distinct. 20% what we saw is a top end number on getting into shared buffers. So there is quite an overhead to having to go to the page cache, right? Now, again, you should test and you should look at it. But we just wanted to point out that it is something that you do want to look at tuning. I'll go quickly through this. We sometimes get questions on multi-AZ a bit and how that compares to read replicas. It's very similar in that we also use physical synchronous replication that's available in Postgres today. But ours is block based. We do believe that physical synchronous is the right approach here, because it allows you in a failure to do that failover without having to worry about losing data. And we use DNS to basically tell the application to fail over to the new one. So this is all automated. You don't have to do anything. It works very well. And it doesn't matter if it's the whole AZ, if it's just a box. And we also repair everything in the end and make it all look good. So there's no work you have to do for multi-AZ. Now we believe read replicas is, in our model, they're asynchronous Postgres replicas. And we believe by adding them, you can increase your read availability by thinking about your application and splitting the read portions that don't need write capability to go read from your read replicas. So when there's a failover of a primary, for example, you're still reading from your replicas. And so when that primary comes back online, you can write again. And this is the case if you're doing an upgrade. If you're doing modifications to different pieces, you still have more availability. So it's a nice option to get a higher availability for your application. And the one thing that's kind of different with us is that when you promote a read replica in our environment, it doesn't change any of the existing ones. You'll notice they all stay the same. You just get a new instance. You can make that multi-AZ. You can do whatever you want with it. But it doesn't change really anything. So that's a little different than what you might see with Postgres if you were running it in-house. We also get a lot of questions about replication parameters. Let's say on the replica you're running long select. No one's ever done this without a wear clause. That would ever happen, someone doing reporting machine. Let's say someone updates the row in the source. Postgres is great. Creates a new row. Got the old one around. It's all good. But that gets replicated. That all works fine. And then you go to do a vacuum. It's fine. Everything keeps running, right? Well, except for once this block is removed, we don't care. But that's going to get replicated down. And as soon as that does, you're going to break that query. And this is kind of a surprise to people when they first are getting these. Because they're like, they're not very consistent. And they don't know why they're happening. But it's usually because one of these parameters is not really set correctly. So there's four different parameters that can really be used to control this. I'm not going to cover all of them. I would say vacuum deferred cleanup age is probably the least useful. And the one that we like the most is hot standby feedback. And here's how it works. If you have a replica, you're doing streaming replication, you need to configure this in the replicas parameters. And it's off by default, so you need to turn on. Once you do, it's like a back channel back to the primary to tell it information. So when you run that select on your read replica, it makes it look like it's running on the primary from the primary's perspective. So when auto vacuum goes to do stuff, it actually won't remove those blocks and break that query. So this is a really useful way, and this is what most of our customers now use to run their replicas. Now the only note here is that if streaming replication breaks, that means that the feedback loop has broken. And we also do manual log apply to catch up. And so if we're doing this, that could cause it to break that query. So we do recommend that you also look at the Max standby archive or streaming delay settings as well as sort of a backup to the hot standby feedback. And you can see your conflicts by looking at PGSTAT database conflicts. See here we have a snapshot conflict on this database. Now the other area where I've talked about before, and for some of you that were here last year or some other talks, you would have heard me talk about sort of my excitement for these new burst mode. Because I think the cloud is just really getting started. The ability to get machines whenever you wanted was cool, but the ability to get machines that are cheaper because we're using oversubscription and having burst modes I think is sort of the next step, and I'm sure there'll be many beyond this. So we currently have two models. We have the compute side, which are T2 line, which essentially have a base performance and a burst. If you're below the base, you get credits built up, and then you can use them to burst. And GP2 is basically the same way except for its first storage. It's SSD based storage. You get three apps per gig. So if you buy 100 gig, you get 300 IOPS. If you only use 150, you're basically banking 150 every second to use later, and you can burst to at least 3,000. With RDS, we do some amount of striping so you can get even more in some cases. And the nice thing is for T2, they have nice metrics to show you when you're exhausted. Unfortunately, the GP2 does not yet. I keep beating on the EBS guys to get that metric. Hopefully that will happen soon. But you can monitor this. You can alarm on it so you can actually know what's happening. But the cool thing, and the reason why I've kept this in my deck, is because we just introduced a new model in this line, and it's called the T2 large. So it's much bigger. It's got 8 gig of RAM. It's got more IO bandwidth than the older T2s. But the really cool thing is it sports encryption at rest. Before, none of our T2 line did. So it's great. You've got encryption at rest. You've got your production instance. It's all encrypted. And then you're like, I would like to make a copy to run tests on. Oh, wait. I have to use a much bigger box because there's no support for encryption. We now have a box that is a burst mode box that actually does have that encryption. But the real cool thing is really the price performance that you get from these. So this is similar to what I showed last year, but I've updated it with a new T2 large. So we start with a very old model, the M1 medium. Now three generations old, with 200 gig of our standard magnetic storage. Again, not very useful these days. $0.58 an hour. These actually might be a little wrong because we keep changing the prices. We already lower them. And you see that we get a reasonable amount of TPS along the bottom. And I ran this test for 24 hours to show you the overall what happens. So we added an M3 medium. So this was a fairly new, just one previous generation. But now I went and used our PI apps. So this is our top-of-the-line storage. 200 gig, 2,000 apps. Now that costs a little bit more, but the interesting thing is because it's a newer class machine, it's only $0.40 an hour total. Now look at how much better performance we were getting by using this newer machine with better storage. Because this is a storage bound benchmark to some degree. So that worked pretty well. But what happens if we go look at a large? Same IOS. Well, what's kind of cool with this graph, it just turned out that it was 25% more expensive and it got it almost exactly 25% more TPS. So it would be a reasonable deal to buy that larger box. And that's partially because we get better caching. But now let's talk about the T2s. So the medium, which is the one that I was using before, we start that off and we put on 200 gig of the GP2 storage. So this is a really very inexpensive box. You'll see it's $0.10 an hour, right? It's one fifth the cost of that M3 large. Notice how it outperformed it for the first two hours. So if you only needed that amount of TPS for two hours, you can essentially run for one fifth of the money. Now that drop there, that's at the point where we ran out of GP2 credits. So because this is little IO sensitive benchmark, the rest of the run was kind of low. But the cool thing with GP2 is you could always just buy a little more storage. Because remember, you get IOPS per gig. So we only bought, essentially, 200 gig, gave us 600 IOPS. Well, it's not really fair comparing that to that other box that had 2,000. Why don't we go buy, let's say, 1,000 gigabytes, terabyte? So that's what I did. I hooked the same thing up, terabyte of storage. Now I'm getting 3,000 IOPS. So I'm actually going to get more. Now notice that price, $0.23 an hour. Still a very, very good deal, right? Getting 6,000 TPS for the first two hours until we're throttled on GP2. And even down here at the bottom, we're getting almost the exact same number as we got on that M3 medium at just slightly more than half the money. So this is really quite effective for reducing your cost. Now, someone would say, well, this is just test. But look at what this box does. This is the new T2 large. So you're talking about a box that can do 12,000 TPS on PG bench. This is not a small box, right? 8 gig of RAM and a terabyte of storage for 30 cents an hour, right? And compare that to some of the previous generations. And you can see the kind of value you get there, right? That if you have a bursty workload, it's great, because it can go to a really high number. But look at what it ran like for 18 hours. It ran at about 5,000 TPS. Still better than any of the other ones. And notice the cost difference, right? Still cheaper. So this really shows you that these burst mode oversubscribe models that still give you a guarantee. But we basically do them on the basis that not everyone's going to always use all of their resources. And this really does change the economics of running some of your systems. You can really design different patterns around this model. So with that, I have a question for you before I open it up to questions for me. One of the things that we've heard from some customers is that they would like earlier access to Postgres releases. So today, what we do is we do not release the .0 release for production. We don't believe that's necessarily quite ready at that point. In this case, the timing worked out that we did 952. We had planned to do 951. But there was a quick turnaround on .2. So how many people would be interested in testing, if we had, let's say, betas in RDS? How many people in the room would be interested in using RDS with betas? Possibly in a different environment. Release candidates and .0 releases. I assume that's everyone. Good. So that's definitely something that we're thinking about. We're trying to figure out how we would do. Obviously, we'd be doing this in a different sort of setup because we wouldn't want you thinking that you're going to be able to do production. But thank you very much for that feedback. Pass that along. So with that, I'm done. And we can take any questions. I guess we could start with yours. So I think the first question was which compliance were you interested in? HIPAA compliance. So the question is HIPAA compliance for Postgres. We don't have it at the moment. It is something that we plan on doing with all of our engines. It's something we believe is important for our customers and we've had lots of requests for. I can't give you specific timing, except for just to say it is something that's important to us. Yes. Yeah, no, I think now that we've gotten more of the encryption story in these other pieces, that's become really obvious. And the 4SSL thing was just, no one noticed that until we had encryption at rest, right? Because people weren't looking at it for particular use cases. Now they are. So Greg, other questions? When will I be able to stop or pause by RDS instance? So this is sort of similar to the EC2 model that they now support. That's definitely something we've had a lot of customer feedback on. It's something we'd like to support in RDS. I think StayTuned is what I would say. It's definitely something we want to have. We've got a lot of customer requests for sort of the test development. Is that what your use case is? Just, yeah, keep it there in case you need, yeah. Sounds good. Question up here? Any chance for streaming replication into RDS? Well, I think you mean from a physical basis? So I think physical-based replication is going to be very challenging. It's not something we have kind of teed up at the moment because part of the problem is the security basis and how we manage the database. Physical replication has to be exact copies, right? So I think mostly what we're concentrating right now is on logical replication and basically making both logical in and logical out much more friendly with RDS. It's something at some point I think we'd like to tackle, but the complexity level is quite high. Other questions? I'm excited. Sorry. Well, the migration tool is at the moment that are hooking up your own logical replication is the only things. Other questions? Was there someone? So the question is really about on failover for multi-AZ and how do we let the application know? We don't really let the application know. I mean, what we do is we terminate the old host, which should cause the TCP connection to die, right? But you do need to make sure that your TCP timeouts and stuff are set reasonably. One of the things that you can do with RDS, which is nice, is that you can test this by doing a reboot database with failover, which will actually do the exact same steps that we do in production. So you can actually test your client's stuff to make sure that it's going to work during those situations. Yeah, it's probably your TCP timeouts are probably the issue there that we've seen with other customers. The question is, will you see names and how long does the propagation take? The propagation is around 30 seconds. The total failover time is right around a minute. So that's from detection all the way to failover and recovery. Now, of course, it depends on how you've configured how much time between checkpointing, which is another thing to think about, because that's going to change your recovery time to some degree. But yeah, that's the basic numbers. We do. We currently have it set to 0.9 as a default. Again, you're able to change it. We basically went through and looked at a lot of the literature and the recommendations that were out there and said, yeah, given what we think our customer base is, that's probably the thing we'd want to do is smooth this out as much as possible. Is PG Notified supported in RDS? I do not think it is. What is that extension? So I think originally we didn't do any extensions that were sort of doing outbound functionality. We've now changed that and we have a bunch of them. So I would say, drop us a line. I mean, we've obviously heard, but if other people have those kind of requests, we'd love to hear them and we can definitely take a look based on customer feedback. Other questions? No, no. So ours are in separate availability zones, separate data centers for what most people would think of as data centers. We call them availability zones. They have separate IPs. They're actually in different subnets if you're in a VPC. And they're two different physical IP addresses and it's a C name that points to one of those two, depending on which one's the primary. I don't know specifically how far apart they are, but it's in the range of like 30, 50 miles, I think is that right, Kevin? Yeah, it's roughly in that range. Obviously there's a balance of risk of wanting them far enough apart to minimize risk, but of course because we're doing synchronous replication, there's an overhead to the time. So if you have any caching or you have TCP timeouts set very long, you can have staleness and that's why we recommend people run the test and do that, so. Probably time for one? No, not at the moment, this is, all we do is a C name, yeah. Time for one more question? Question about the support for possible custom logical decoding plugins. I think that's something we've started looking at. Any request would be great for us to understand that sort of goes into the, you could send us on the extension email or just, or tell us, that's sort of a whole new area of sort of extensions and kind of functionality that we're sort of thinking about now about how to support, but yeah, it's definitely of interest because I think it's in the same way that extensions are very useful in the postgres world, those same kind of ones will be useful as well, so. Cool, okay, I will be here. We have a booth, if you wanna stop by, I'll be there. And if you see any of us in the black shirts, feel free to come and ask us questions.