 Thank you. So, welcome to the retitled shootout at the pass corral, or shootout at the pass even, could be. Before we get started here, we'll be filtering in, how many people right now are running PostgresQL on a public cloud? It's most of our audience. Okay, wait, out of those people, how many people are running on AWS? How many people are running it on a different public cloud? Okay. Well, I knew that AWS had industry dominance. So, this should get retitled because I was able to do more cloud testing before this talk, and we'll be covering a few additional cloud options for you. For some of the stuff, first, this is a work in progress. Some of our stuff is on GitHub. I need to get the sort of data loading and that sort of thing on there, but this is the project that we're working through for this big cloud testing effort. I want to actually give thanks to my collaborator here, who is not able to fly out from New Zealand in order to join me for presentations, but Ruben Rubio Ray of manageacloud.com, who has been running a lot of the testing. I also want to thank Heroku and AWS for helping with some of the tests and giving us technical information when we were getting some weird test results. So, before we get started on some of the tests, first of all, what is the cloud? A lot of people use clouds. I don't really understand, but I find that at least among our clients, a lot of people use various public clouds. I don't have any really clear idea how they work. As far as they're concerned, they're, well, magic. But cloud hosting is not magic. Cloud hosting is just a bunch of servers with multiple layers of virtualization, shared storage, and an API. Right? You've got those things. You're a cloud. And those things are not fundamentally different from other things that happened before, although they are, they make deployment and management substantially different. So, the good things about putting your postcards in a public cloud are fast deployment. As most people know, fast deployment, easy scale out by adding replicas or moving to larger instances. And the ability, and this is the critical thing for at least a lot of the people in the industry I deal with, the ability to minimize the number of ops staff that you have to have in order to run the same infrastructure, particularly eliminating a lot of the management of hardware that used to be a big part of running, say, a web business. The other big advantage is that when you're just getting started or you're just starting a project, it's pretty cheap to host stuff in a public cloud. Now, there's a disadvantage there because as you need to use large amounts of resources, hosting stuff in a public cloud starts to become a lot more expensive than even owning your own hardware. And that's how things, that's how hosting, that's how cloud hosting makes such a good business because the profitability in the high end is much better than the profitability in the low end. Now there are some other drawbacks to having Postgres in a public cloud. Number one, system resources on cloud instances are not equal to even the same specified system resources on real hardware and what's available is much narrower. I mean, up until like a couple of years ago, for example, the largest, the most cores you could get out of an Amazon cloud instance was seven or eight, I remember it was seven or eight, one of the other, whereas it was hard to actually buy your own hardware with less than 32 cores. The other big issue is an additional exposure of attack surfaces in terms of your security vulnerability because now attackers can come out not only through the regular means but they can try to come at you via the cloud framework, either by exploiting it or exploiting taking advantage of that same ease of management for ease of attack as code spaces found out the hard way last year. This was a startup that did not protect their Amazon login credentials adequately, got held hostage by an attacker, handled it badly and got wiped out completely. They don't exist as a company anymore. Now, that's the bad. The ugly is that everything is shared on a public cloud. You are sharing your space in the public cloud with thousands to millions of other users, including network and IO and storage and CPU, which means that your performance is dependent on someone else's peak load. Or, as I put it another way, sharing is not caring. The final thing to think about in terms of hosting Postgres in the cloud is that cloud instances are meant to be ephemeral, right? They burn up, they go away, things happen, spin up another instance and this means that a lot of things that in on-premises hosting you might put off need to be done up front on the cloud in terms of having a serious DR plan with multiple kinds of redundancy. Because you pretty much have to plan, I mean it was always the case with even regular hardware, right? You have to kind of plan for this to fail, etc., but you put in good rate and that's more of an eventual consideration. If you're hosting stuff in a public cloud, pretty much assume that whatever you're building could disappear tomorrow or five minutes from now and for that reason you should be prepared for that to happen. So, now from the other side of this thing, because I've done this speaking to a few people who are more on the cloud side of things, the database side of things, is why should we be interested in Postgres on the cloud from a testing perspective? And the answer is that transactional databases, as far as I know, are pretty much unique in workloads in working out all of the different parts of a system, right? We use CPUs, we use RAM, we do IO, we do network IO, and we do it all at the same time. And one thing database geeks have been good at in terms of database engines for a while is working out ways to 100% utilize all system resources because there's never enough throughput. So, having sort of set up, you know, what we're, you know, the interest in the cloud and why we're sort of starting on this benchmark route. I want to introduce you to a few clouds that we're going to be talking about. We've got, actually, in this presentation today, we're covering six public clouds. Rackspace, RDS, DigitalOcean, EC2, Google Compute Engine, and Heroku. Now people say, wait a minute, the seven guys there. So, I will tell you who the seventh is near the end of the presentation. But, let's get started on describing some of these. So, the first we're going to start with the three clouds that are actually all kind of different versions of Amazon Web Services. We've got the straight up EC2 option, our Gunslinger, Roll Your Own, and EC2. We've got RDS, who I've assigned to the Rancher, and we've got the Dandy, which is Heroku, in terms of our cloud. So, there are actually other options on AWS. Mark cornered me and asked me, please don't forget that we have a cloud. I haven't actually done any testing on the Enterprise DB thing, mostly because I didn't expect it to have different performance characteristics than the other options, but I might be doing that in the future. Now, these all three AWS options share certain characteristics, you know, in terms of advantages, disadvantages together. One is, you have access to a comprehensive API that covers everything you would want to do with that. Like, you're never actually required to log in to the web screen, etc. You can control everything about those clouds in code. And that can be a little different from some of the other public clouds, where you discover certain functions and features, certain management operations that require logging into the vendor's web interface and doing things in a point and clicky way, which is not really good for any kind of automated operation or deployment. The other thing is that because of relying on the AWS backbone, you have global distribution with 11 data centers around the world, something like that, you know, making it available all over. And then the other big thing about going with one of the AWS-based clouds is that you have all of the other Amazon services available if you want them for part of building an application, which is a truly colossal number of services. None of the other competing, you know, completely separate backbone clouds have anywhere near this variety of stuff in terms of caching stuff and routing and DNS controllers and special security things and deployment automation and all of these other widgets that can make it faster to build an application instead of writing your own software for that. So that's your sort of the antiges, regardless of which one of the AWS clouds that you choose, you have all of that stuff available. So let's now go with the basic and this was the only option for Postgres on a public cloud for quite a while, which was roll your own on AWS EC2, right? Let's go ahead and put together our own thing. And it's still, I would guess, probably the numerically most common option because so, you know, because people know how to do it. It's really simple. All you do is create an EC2 instance, install PostgresQL on it, configure PostgresQL, and you can run. Now, here's I've tried to give a little sort of profile of the different clouds in terms of what's available. So here's our profile for roll your own an EC2. We're on a platform as a service cloud. And any administration is is up to you any HA is up to you. There are tools available through the other Amazon stuff. But, but you have to put them together. But versions, extensions, extra versions and extensions are anything you want to install. You know, if you really have some reason to download Postgres 7.4 and run in a public cloud, not that I would recommend that, because there are many known security holes in a version that old, you can do it. And any extension you want, regardless of what permissions it needs, you can do it. There are no additional features to this cloud beyond what AWS offers in general. There's no special cloud features for this. And, you know, your general cost is going to be pretty cheap. Now, AWS offers a bunch of different instance types. This is already out of date as of last week, I guess, because we've now got the some of the generation four instance types available, a bunch of the generation four instance types available. But it's still the case kind of in terms of databases that that Amazon divides stuff into these different sort of classes. General purpose is good for most kinds of Postgres databases. If you have some reason you have a really CPU intensive workload, you might want to do commute optimize. Most of the time I end up with the memory optimized in order to maximize data caching. And then if you're going to do data warehousing, you might go with one of the i-series ones. The big thing about this is that this will go over latency and shared storage. This is true even if you run your own shared storage is much higher than latency and local storage. And for that reason, being able to cache your whole database or at least your working set in memory on the cloud instance becomes more critical on a public cloud than it would be in on-prem stuff. And so if you're choosing where to spend your money, getting enough RAM to cache the entire database is a really good strategy on any public cloud, certainly in AWS. Now, AWS offers a number of storage options. The one that we use for pretty much all production database servers is EBS plus provisioned IOPS. There's a new option in this area, the GP2 option, which is not available the time I was doing the testing. Expect that the next time I give this in April, that will be included in the test results and on my blog. And the reason for that is that EBS plus provisioned IOPS gives you some guarantees of durability and guarantees of consistent throughput. Now, the general SSD option can be really good for bursting stuff. But at least with the first generation GP stuff, you were a lot more influenced by shared peak usage from other people who are accessing the same storage. And so the throughput would be fairly variable. And also, if your database workload is not bursty, you'd see performance dropping off. There's another GP2 stuff, which Grant talked about yesterday. And so you get that from his talk even on the slides if you weren't in it. Now, one of the things you can do to get higher performance is to actually use instant storage. Some of the higher end instances come with a significant amount of local SSD storage, which much lower latency than block storage. The disadvantage to this is that it is fundamentally riskier in terms of data loss. There are a number of different things that can happen in AWS in terms of unplanned restarts. We don't actually lose the incentive just restarts unplanned that can cause data corruption on local instant storage. And you don't have some of the snapshotting tools for doing things like binary backups that can be really useful for data redundancy for using local instant storage. So this is a lot more of a practical option for, say, a replica than it might be for your masternode. Another important thing to keep in mind is that provisioned IOPS is not the same thing as throughput. You know, is that Amazon is guaranteeing a certain number of, is guaranteeing a certain number of IO operations per second. And actually have done a fantastic job of delivering exactly that within a couple, you know, within like five, 10%. But IOPS are limited to 8k, which is actually convenient for Postgres, no? 32k, okay. So, but one of the things that I've noticed in some of these patterns, which has caused, required us to have really increases on IOPS, is that certain kinds of operations the database engine will do, such as a straight up index scan results in having an IOP for every row of the data that it's looking up, if the data is fairly sparsely distributed. And so for that particular operation, then your Pryops limit becomes the number of rows per second. So 10,000 IOPS or 1000 IOPS sounds reasonably fast, 1000 rows per second, if you're trying to scan a million rows is not very fast. So think about that in terms of configuration, and in terms of actually getting the throughput that you want. As I mentioned, you need to set up redundancy of several kinds. I'd recommend that we always, anytime we deploy, we always set up both continuous backup and replicas. You need to monitor, for instance, failure. And you are on a shared network. So using SSL becomes not an option but a necessity. So having said all that sort of setup, we decided to pick two instance sizes to test performance on comparatively. One is our small instance size. So this is the M3 medium one core, almost four gigabytes of RAM size with 40 gigabytes of storage with 1000 provisioned IOPS. You know, this was picked as in, this is the sort of instance we would deploy for somebody who has a PostgreSQL server that is an important part of their application, but is not particularly high performance. And they're looking to save money. And then we had a sort of large performance instance, and this is a bit higher performance. This is not as high end as we would get, but we had a sort of limited budget for doing the testing. So I needed to pick an instance size that was reasonably high end, but that we could afford to deploy many of to do the testing. And so that was the R3 double extra large, eight cores, 60 gigabytes of RAM, 200 gigabytes, 4000 provisioned IOPS. That's a large instance. And so the instance sizes that we picked on other clouds were based on trying to match this particular profile. We also figured out an important thing is figuring out cost performance ratios. And so this is the cost calculations that we have. Fortunately, this cost calculations already out of date. Amazon has had pricing updates and that sort of thing. So I'll be redoing these cost calculations sometime soon because they no longer apply to the current costs that are available. And so this is the sort of full redundancy, the update in terms of new storage options available. Another thing that you can do if downtime is not that critical for you, you just don't want to avoid losing data, is that you can run an instance and you're really looking to save money. You can run an instance that's just doing continuous backup to S3 and ignore the whole application issue. And that would be your sort of cheapest option. The higher instance is a little bit more expensive. The again, you know, looking for updates. So I second one, the relational database service, Amazon Postgres RDS, we've got the whole crew here. And this is for people who say, Hey, I really like Amazon Web Services, but I don't really want to deal with all of this management of Postgres stuff. And I would like someone to do it for me. So this is what's otherwise known as DBAs or databases of service. You don't deal with the OS, you don't deal with system configuration, you're just effectively getting port 5432 and some API based configuration management interfaces. The advantage of this is something they call SEDBA, which is somebody else is the DBA. So you in theory don't need DBA support and in practice you need some. But certainly in terms of basic things like uptime, backups are automated. Some of the configuration is automated updates are automated. And for small development shops where they have a couple of DevOps, they want to manage everything that's going to be a really good option. The disadvantage of databases of service is that it limits the options that you have with Postgres. Only specific versions are available. Only certain a limited list of extensions are available. Some of the configuration parameters you're not allowed to change, particularly security configuration tends to be fairly locked down in terms of what they support and what they don't support. Other configuration may or may not be available. And of course it costs more because you're paying somebody else to be your DBA, at least part of the time. And that person has to be paid. So looking at that for the profile, so this is our databases service cloud. Administration is mostly automatic. And I say mostly automatic is that, like I said, backup some things, other issues, other things, not so automated. There's a couple of different HA options on RDS. One is that you can run replicas like you would on EC2. And the other is they have this thing called multi-availability zone, which I'll talk about in a second. Versions currently available are 93 and 94. There's a list of about two dozen extensions that they have available that are pretty clear and you can use those. We can't use anything else. There's no particular bonus features for the cloud itself. And the price point on this is moderate. So you mentioned multi-availability zone. So this is Amazon's automated failover uptime guarantee sort of thing. It's synchronous replication to another node with automated failover and DNS redirection. It's a good way for guaranteed uptime. There is, as you'll see when I do the ratings, major performance overhead associated with this because we are talking about a form of synchronous replication that operates, as I understand it, on the file system level. So third Amazon cloud are Dandy, Heroku. This is for people who say I just want to develop. The cloud should handle everything for me automatically. This is our sort of extreme point of databases service management of I really want, I want to have port five, four, three, two, and I want someone else's job to be to worry about everything else. So this is another database, database of service cloud. Administration is fully automated as a matter of you can't really do any administration of your own even if you wanted to. HA is a combination of replication and point-in-time recovery with automation around it and a set of different options for the level of uptime guarantee that you want. The versions that are available currently 9394, Heroku also tends to put Postgres beta's up there for the users to test. Again, there's about two dozen extensions that are pretty clear that you can install. There's several extra cloud features and the pricing is on the high side compared to other options. So I want to mention some of those extra cloud features. One is one of the big things that people get out of Heroku is this get base. A get and Rails rate based instance management, including management of the database that works really well with continuous integration development shops, agile shops. They've also added a couple of unique features that own data clips are basically sort of web-sharable materialized views. They're really awesome if you have to share little snippets of data with your customers or other people in your team. And they've simplified our replication model through this sort of follower thing, making it much, much easier and conceptually easier for developers, for web developers to understand, having replicas for Postgres stuff. Now the biggest piece of dandy-bling is this, which is when I started my testing and I launched the first two large Heroku instances, in 24 hours I got an email from a Heroku staff member asking if I needed any help with anything. This is not something I got in any other cloud, no matter how much of a bill I ran up. So that is mainly, so when we go later on into pricing and you see Heroku's pricing, you can understand this is part of what you're paying for. So Heroku options are a lot more limited, five database sizes, three levels of HA uptime, and that's it, 15 possibilities. So the two Heroku sizes that we chose for this, the standard two, which is 3.5 gigabytes of RAM and shared hosting, and the standard six, which is 60 gigabytes of RAM, gives you a dedicated instance. And those were two options for testing. So again, what you're saying, we're trying to keep a comparable set of instances here. Now let's move on to some of the non-Amazon based stuff. So Rackspace, Rackspace is a cloud, they're a hosting company, of course they have a cloud. And the main reason for people to be in Rackspace seems to be this, which is, I have servers at Rackspace already and I want to expand to a cloud. We've got a bunch of clients who are doing basically the main advantage of the Rackspace system, which is what they call hybrid hosting, where you have some racked servers and some cloud servers. This is again, we're back to platform as a service, it's not database as a service. Administration is mostly DIY, although they have a bunch of managed infrastructure options, which at least involve management on the OS level if they don't involve helping you manage the database. Any HA for the database that you have is do-it-yourself. There are some Rackspace cloud tools that you can use as building blocks for this in terms of DNS and availability tools, but you have to set up yourself. Since you're installing this yourself, any version, any extension, the extras is this hybrid cloud option. And again, our price is back to a moderate level. Now, I just say they have some support options. One of these things that Rackspace has done, their big slogan is fanatical support and a support component is required with your cloud server purchase. It's added on a per hour basis. It gets cheaper the more cloud instances you have because they're sort of pooling it together. And you can bump this up to higher levels of support from Rackspace. But they won't actually help you with Postgres because they're not staffed for that. They'll help you with everything around Linux, et cetera, and their Linux people are top-notch. But in terms of helping you with Postgres, they're going to be that's not covered. Now, one of things about Rackspace storage is that, well, Rackspace has an external block store. They really push you towards instance storage. And actually on these smaller cloud instances, the block store is not even available as an option. And I don't know if Rackspace instance storage carries the same risks that AWS instance storage does. I don't have enough information for that. But this is going to be important when you see the performance figures because on the smaller instance, we had to use instance storage, which changes the performance profile of the Rackspace servers. Sizing here, so these are the instances we're using on Rackspace, General 1.4 and Memory 1.60 for the small and the large instances respectively. Now, I assigned Google Compute Engine to the Drifter, not because I feel that Google is particularly near to well, but because they're a bit of a latecomer to this party and they seem to be entering the market with the intention of being the cheapest option. They're not actually, but the cheapest option among the big players. So, you know, hence the Drifter. And this seems to be the main reason to use GCE, right, is you like the general concept of AWS and EC2, and you want to take advantage of Google's lowball pricing and lots and lots of free credits. Again, this is platform as a service, at least as far as Postgres is concerned. Administration, HA, do it yourself. Version is available, anything you want. The one sort of cloud extra is that they have built in a fair amount of Docker integration. For anybody who does Docker, anybody here do Docker stuff? So, they've built in a fair amount of Docker integration, which is something that's still coming soon on most of the other clouds. And the price is cheap. So, these are two instances, and one standard one and one high mem 8, again reasonably comparable to the other ones. Now, our final of the six that we're going to talk about here is the KID DigitalOcean. DigitalOcean is a kid in both in sort of size and orientation and in attitude, which you'd know if you've seen any of their advertisements. They're really catering to 25-year-old developers in terms of who they advertise to, and I think that's kind of appropriate. The main reason to use it seems to be I want something that's cheap, simple, and fast, and I don't want any other stuff. So, this platform is a service again, like the other ones. They're no extras, they're almost no cloud, meaningful cloud services of any kind available on DigitalOcean in terms of anything that would help you build stuff out, but the pricing is ridiculously cheap. Like as in, I look at that pricing and I'm like, how long is it going to be before they burn through all their VC money? Because that's got to be measured in months at this point. The other problem with the KID is that the kid tends to get himself shot up a lot. There's no block store available in DigitalOcean at all. Everything is a femoral instant storage. The nodes are basically completely non-durable from my perspective. I wouldn't trust anything there to be up longer. There's no storage cloud, there's nothing, there's no S3 equivalent for backups either, and no available HA features from the cloud. So you're completely on your own on the DigitalOcean end of things. So this is the sizing. DigitalOcean was the one we had the hardest time picking instances that matched the sizes of the other clouds because their instant sizing is very different and they have a very limited menu of instant sizes available. So we had to go with an instance on the small end that had two rather than one core and on the large instances that had more cores but slightly less RAM. This is not exactly matching up. There are other clouds which I will get to testing as I have time. That would include EDB Postgres Plus I mentioned, Red Hat OpenShift Joint which actually is it's a little it's a different kind of a cloud but it has some nice options for Postgres. Azure. I've had people from Microsoft emailing me so they want Azure included in this and sure why not. So time for some performance figures. So the experiment we did so far which involves mainly PG Bench. How many people here have used PG Bench? People know PG Bench right? So there's some advantages of PG Bench and the reason why we started with it. Chips with Postgres, Micro Benchmark, really fast set up and tear down compared to other benchmarks which is really important when you are on an automated basis starting up a bunch of instances, running a benchmark, collecting stats, shutting them down and then running the next one and having to pay for all this. So some disadvantages though. PG Bench's access pattern is fairly narrow. You're basically doing random single row lookups and updates which doesn't match a lot of people's real workloads. It's very reliant on single row write latency and it's not very tunable. As a matter of fact when we started doing the benchmarks we actually had two different profiles on the non-DVAS clouds. One of a configured Postgres and one of an unconfigured Postgres. And on the small instances that made no difference at all because of PG Bench's access pattern. As a matter of fact sometimes the configured instance was slower. So you won't actually be seeing the configured versus unconfigured in the initial results for that reason. But we went in with it and we said okay what we'll do with PG Bench is we'll actually do three different sort of sizes. One is the in-memory read write where you have a database is about half the size of RAM and you're doing write transactions on it. The second is with the same database do a read-only workload of read-only queries and the third is generate a database which is larger than RAM, 150 to 200 percent the size of RAM and do write transactions because we expected the different clouds to perform differently on these different workloads and actually they did. You can look at the slides later this is the actual setup that we used for these. So now from these we're trying to get two different metrics. One is obviously PG Bench's standard output which is TPS transactions per second. The other thing that we recorded was load time because it is the one of the thing we can get out of PG Bench. This is for the amount of time for generating the regular database and for at least some of my clients the consideration of how long does it take to load to do a whole bunch of inserts into this database. Straightup inserts into the database is actually significant characteristic because we have some clients that are doing that. You know better to have two metrics than one. Some other conditions this was all Postgres 9.3 because that was the version that was available across all of the various clouds that we were using. Unfortunately OS's could not be made consistent. We initially were going to try to do it on all Centos and we couldn't do that on one of the clouds for reasons and so it's actually a mix of Centos 7 and Ubuntu 14.04 which is regrettable but there wasn't a way around it without spending a whole lot of time. And PG Bench was but one of the other things we did is that by the way you know for each of these either pairs right so we're doing the large instance we have a large instance running PG Bench and a large instance with the Postgres database server because you know the we didn't want to do it all on the same machine because network IO is important for real-world performance and then of course we had to run all of these things a whole bunch of times and the reason we had to run a whole bunch of times well let me show you a box plot not that kind of box plot this kind of box plot box plot is something you do in statistics right we're looking at distribution of data and this is actually distribution of data from I think this is the read-only test on EC2 from our first set of runs and you can see that there's a fair amount of difference between the slowest run and the fastest run there's a lot of difference between the slowest run and the fastest run actually for a multiply right because if you take it from the median the slowest run was a third the speed of the median and the fastest run was four times the speed of the median so this is one of the first important things we learn in the cloud is I always knew that instance performance was variable but doing a whole set of benchmarks really shows you how variable it is it's really variable but given that level of variability it was important for us to throw out outliers except noting where we had abnormal numbers of outliers in terms of performance so in the case of load time where smaller load time is better I'll be giving you statistics for the median and the 90% level of load time and by 90% I mean 90% of databases will load in this time or less and for the for TPS we'll be giving you the median and the 10% as in you know 90% of servers were faster than this so here's here's our benchmarks and and by the way and again I'm going to note this I caution people multiple times there are comparability problems between some of these you'll see is rack space talk about rack space and digital ocean do not have the same I we don't think they have the same durability guarantees for my own as the other clouds do which changes performance the well I'm providing a cha option with all of these it's not necessarily automated a cha in all cases in a lot of cases I'm just saying this is what it looks like with a replica instance size are not identical in instance OS is are not identical so do some of your own comparisons the most important caution is this is a work in progress we're doing ongoing testing we learn new things through doing the performance testing and we revise the performance tests but we did discover some interesting things so talk about cost that is a little more fixes public it changes all the times I know all the the Amazon costs already out of date I believe since I've made the slide the but here's our cost for in the small node for an each instance version on each instance and these are the for three of the clouds these are reasonably close and this is by the way per month reasonably close to comparable the now two of them are not digital ocean and Google Compute Engine you know like what the hell is going on here why this so cheap you know and we believe you know and I talked to actually an analyst about this news analysts we believe that both these clouds are operating below cost in an effort to gain market share so the these are really cheap right now don't necessarily expect them to be really cheap a year from now now large nodes we actually see a lot more differentiation particularly whoops particularly Haroku charges a lot more for their large nodes mostly because they expect to do a lot more support direct support of large known customers and so they charge a lot more for that and we get a little bit more differentiation with the databases of service clouds on the high end now first performance metric load time so this is our small in memory databases the smallest database about a gigabyte and a half we're doing you know these straight-up inserts into memory shorter bars are better so the now you'll see actually a couple of things from this one of them right here is this is you see the big effect of having of loading into instance storage versus loading into remote block storage the eye because the latency is lower and the PG bench loads things be inserts not be a copy so latency per rose a significant metric and so these end up being a lot faster with lower durability guarantee if we actually use instant storage on EC2 we actually see a similar profile to see later on now the other thing that we have here is we have slow loading on RDS which is something I contacted Grant and his team about and we figured out that what was actually going on here was a difference in how stuff was configured in terms of storage which is a bunch of these you know heroku with RDS with my EC2 configuration we are we have archiving turned on but archiving is going to a different with my configuration with EC2 and heroku DC we've archiving turned on but archiving is going to a separate volume with RDS the archiving gets countered against your general allocation of IOPS so if you look at it another way we actually have slightly more IOPS available on the other ones and in a load test that makes a huge difference the so so we could actually configure the RDS for higher IOPS but that would also increase the cost so you can sort of do the trade here yeah other client was an EC2 instance the of the of the appropriate size yes and in the same in the same region when we know in the same region in the same availability zone when we could determine that with heroku we can't really determine what availability zone things are in and so we just had to sort of guess the so now next size database about 7 gigabytes you know for the small instance and so we're getting a little bit larger and we have the opportunity to fill up all of memory and this changes and one of the things we actually see that changes a lot is that Google Compute Engine actually gets a lot slower once you fill up memory and I don't know exactly what they're doing in their architecture but but we actually see that in a lot of the other places now one of the other things that we'll see well actually I'll talk about this when we get to TPS because the heroku small instance is sort of interesting let's move up to the next size database which is a large in memory database and so then this is loading how big do we make this about 20 gigabytes database size on a much larger instance and the interesting thing that we see here is that the archiving becomes much less of an impact on the RDS instances relative to the size of the database and then only thing we're seeing here is I mentioned that there's a heavy performance penalty to using the multi-availability zone synchronous replication you can see here the difference between the reason why I have two sets of bars for RDS versus RDS multi-availability zone is because the performance profile is very different and you see that Google may have optimized something else but they don't seem to be optimizing IEL for just raw IEL. Now one of the interesting things to do here is to actually do a cost comparison here is to say based on that estimated cost per month let's compare how much does it cost me to load one megabyte per second and you get sort of dramatic profile we have our sort of newcomers who are underpricing things that are really cheap we have rack space that looks relatively cheap because of the instant storage issue and then we have RDS and Heroku which are regular and then because of our penalty and loads for the multi-availability zone RDS multi-availability zone is relatively expensive large in memory looking fairly similar here except that the Heroku pricing goes way up so let's get some actually transaction processing here this is the small in-memory read-write test so doing 15 minutes of PG bench read-write in terms of transactions per second here and I have an interesting sort of distribution and this is actually an interesting place where so we've got a couple things going on here a digital ocean and rack space again latency dominated the instant storage kind of pulling ahead there I for a couple of interesting things here we see actually we get really good performance in the Heroku small instance and I dug some into this I contact some of the Heroku people the Heroku small instances unlike the large instances are actually on shared Amazon instances where they've distributed multiple Postgres instances running on the same shared Amazon instance so that means that if I'm going to slam one of those I really have a great opportunity to be a bad neighbor and overutilize other people's resources and so I can actually get a lot more performance than I'm paying for out of a Heroku small instance for that reason the in-memory read only is a little bit different I was actually expecting Heroku to do a lot better on this given that configuration we actually encountered from Heroku on this was a lot more variability than we were expecting and I think this has to do with the bad neighbor problem is that here I was hitting you know despite the test I hit a significant number of occasions on which someone else was being my bad neighbor the and you know EC2 RDS more or less equivalent Google performing a lot better here and this really applies to me with the Google cloud as you might be implied by Google Compute Engine they've done a lot to optimize CPU and RAM usage even though they seem to be ignoring IO performance the now I wait where do you do small and disc read write oh no this on disc read write small so again when we get bigger than memory what's our performance like and you can see the difference Google Compute Engine drops and we've got you know rack space digital ocean taking off because of instant storage speeds and some of the rest so I'm a little behind so then let's look at the large instance and large instance we get a very different sort of performance profile one of the interesting things that we actually saw here in the large instance is we're getting some kind of throttling on the Google Compute Engine or rack space and other stuff that we did with those two indicated that there was some kind of throttling present it's not anything that they expressly do that's like in their docks or terms of service but it's clearly happening because of the sort of ceiling that we hit at 500 and I think it's maybe IO throttling you know you can sort of see for the rest of this we got really good and for the similar performance off of RDS and Heroku on the large instances the EC2 instance is getting a higher degree of variability across because you can see the difference between the 10% in the median here a digital ocean kind of all over the place read only was a lot more stable at a much higher level was actually interesting how much the large read only was almost the same across all the clouds in this case the reason why our rack space is ahead is that their large instance has more cores the and then the one other thing that we got actually off of this and I want to rerun some of the tests is on the multi-availability zone instance in the read-only test we kept having instances stall which was interesting they would just stop responding for like 10 30 seconds and and that really killed the throughput on enough runs to show up on the 10% level large on disk and here's where the throttling comes in we couldn't run the large on this test on Google Compute Engineer Rack Space the data load never completed before we hit our four-hour timeout with test runs so here we've got you know our transactions per second again we're seeing a lot more equivalent performance actually across clouds at the size which is interesting I again I've got some price calculations in here which can see in the slides I actually have a lot more graphs than this which I'll be posting on my blog but let's actually finish up here because I'm running out of time now I did say that there was a seventh ombre here that we're that I didn't mention and that is running what I call running with scissors mode which I've talked about before and they have running with scissors mode is you disable all of Postgres' durability guarantees and you disable cloud durability guarantees by running doing things like running an instant storage etc and then you run tests and so adding the scissors mode we can actually get a higher level performance than we have this now I was actually going over these initial results with Andres early on the hackers lounge and he pointed out some stuff that I hadn't done in the configuration so I think I can actually make this much higher than it is the which we will be testing the future so what's ahead for the set of benchmarks well number one actually want to get a more complete benchmark working this is probably going to be DVD store because Jignesh has packaged that up for Docker so that I can actually use it and deploy it reasonably the I would kind of like something that's a little bit more webby with a much higher mixture of reads like say 10 percent writes 90 percent reads I may need to write that myself because it doesn't seem to be anything out there I really want a graph geek to collaborate with me in this because my skills at generating graphs suck and I really don't want to spend a lot of time learning I Python so that I can actually get different graphs out of this so if any of you are a graphing geek and you're really interested in this project please jump on with this and help me generate some better visualizations so here's all of my contact information and I think we have we have a minute two minutes for questions not on EC2 do you have checksums turned on oh okay see there we go why didn't you mention that earlier when I was doing the data load test that would have made that would that makes a substantial difference okay the yeah the problem is that this is just constantly leading to more tests that I need to do the yes well so manage the manage a cloud footed the bill for a bunch of us by the way so what they do is they actually offer a tool that knows all of the cloud API so that you can have a single deployment process for the different clouds which was invaluable to me in doing this so I don't know it was a few thousand dollars your your employer paid for the Heroku tests oh you're not oh okay Heroku paid for the Heroku test the so the so but the others you know there we paid for the Amazon time I don't know what Amazon what was our Amazon bill last month in order of magnitude okay also wasn't bad I mean see the nice thing about this is even though we're running lots of instances mostly look the small instances you know total elaps testing time and this is why I liked using PG bench would be 30 minutes right they're all by the hour one of the reasons why I hadn't done line out for example is that line out doesn't really like doing by the hour I and for that reason they'd be prohibitively expensive to test anyone else okay one more and then where he raises hand first and then we're done I kind of think for most applications I can imagine network latency would kill you for that I can't the question was whether or not people would want to have sort of multi different cloud provider applications and it's hard for me to imagine the application where that would make sense from a performance standpoint because of the the sort of inner data center inner network latency that would that would be an issue oh and also paying for the data transfer costs as well which are substantial the one of the things I actually did learn through this by the way is that the vendor that has the highest data transfer cost is rack space which I had not been expecting I they make it very easy to pay 10 times as much in data transfer costs you're paying for hosting so and that's something to consider for real applications anyway I think we're done I'm happy to answer more questions outside the room where the next speaker sets up so thank you