 Thank you. So where was this talk born from? So whenever I end up coming to Postgres conferences, I get a bunch of questions by smaller operations about how are you using it, what does your infrastructure look like inside of the company and whatnot. And we are a fairly large website. We're an S&P 500 NASDAQ 100 company. And we're exclusively a Postgres shop. There's some back office stuff in Hadoop. But if you're visiting the production website, you're visiting a site that is powered by Postgres. We build from source. And we do all of our own expertise in-house. But we haven't actually been as active in the community as we would have wanted to be, mostly for a lack of resourcing, just because we haven't had the opportunity or energy to do it. But we do operate at very large scales. And we have some lessons learned for people who are trying to run websites or do the sort of work on these sorts of scales. Now, interestingly here, the colors aren't coming out that great, up until about last year, when I came over to the operation side, we did not have a 100% dedicated person to Postgres. We were running, we're at 315 million unique visitors every month, and we were running at those scales without a dedicated resource. Matt had kind of come over before, but he was split three ways to Sunday doing that sort of stuff. So we haven't had that. It scaled very well despite our lack of attention. But recently, we've been focusing in on how we're going to get it to scale better. And we found 10 to 100x throughput improvements on our read scaling by fixing a few of the things that were really burning us. And we're trying to get more involved into the community here. So how do we scale? What do we scale to? I have to take the public numbers, 200 million reviews and opinions. Our site is serving 4.5 million locations, 29 million photos, 70 million members. I mentioned this just because this is what's sitting in our production databases right now in our data center. And this is what's being queried live. Sits there. Combined, this is about 5 to 10 terabytes worth of data that we're querying on a regular basis. It's worth noting this is all the published data. We're a public site on the internet. We do fraud detection. There's a large bulk there that is never published, but still sits in the database. So the tables are of reasonably large sizes. One thing that doesn't kind of come through is also about 25% of our load day or night comes from search engines. We're highly SEO optimized. So Googlebot is hitting us constantly. It takes them four or five weeks to get through the site, and then they start over from the beginning. And so that's to the tune of hundreds of thousands of requests per minute always. What does our current Postgres stack look like? We're mostly a 9.1 shop. We have some 9.2 and a little bit of 9.3. Again, the effort of upgrading Postgres versions is always a little bit of it's time. So we're getting to the point where we're upgrading to 9.3 this summer. HA, within Data Center, we're currently a mix of DRBD and streaming. We used to do DRBD exclusively. Now we've kind of been working through all the failure cases on streaming, and we're ready to basically get rid of DRBD at this point. And then between Data Center replication is done with our own homegrown trigger-based replication. Its closest known ancestor is a project called DB Mirror. You can find it on GitHub. I wouldn't recommend it. It's close. That was a few rewrites ago. Mostly we've kept it around because it's the devil we know. We're kind of holding that out, hoping to get it to limp along long enough that way. Logical replication comes around and solves the use cases that we need for trigger-based, that we're currently using trigger-based replication for. So what does the inside of one of our data centers look like? So we run pretty lean and mean. The entire site enters through 160 Java web servers. We can take this is, it's 160, but that's what we need in order to hit next summer peak plus give us some room. And while we can still take a quarter of the amount at a time to do releases. So in order to be able to release in any reasonable fashion, we need to take them out. There's a bunch of about four or five read-only databases. They're all the same behind a load balancer that handles the read-only queries. And then most of our databases are just broken out topically. This was the easiest way to shard back when we were doing this five, 10 years ago. It was just kind of straightforward. We've got reviews and we've got members. So why not just kind of split them that way? And then the service layer is broken out that way as well. Couple notes about this. Like I said, HA is currently DRBD. We're actively switching that to physical streaming. We run real hardware in all of our data centers. Only things that aren't particularly important are virtualized just because we've seen 30 to 40% performance losses by running on virtual hardware versus physical hardware. For us right now, this is changing, but RAM's been cheaper than SSDs. Buying RAM, we put 768 gig of RAM into our reviews database. That was like 40 or $50,000 when you buy that off of one of the vendors, as opposed to SSDs at this point are like $2 a gig of pipe, but they were like $5 to $10 a gig of pipe. So we just kind of threw RAM at the problem. And we can do that because our workloads are extremely read heavy, but we also do some memcaching and in-memory stores to insulate the databases. Again, these were actually patterns that were built back from when we were running 8.4 and Postgres was significantly less performant and like even from when we were running seven series. This is something I both, it's both good and bad. I don't know that I'd recommend this pattern, but we have a lot that sits in our application. For example, or just because it's too complicated to do anything other than do kind of a bulk load there. For example, all of the web servers know about the structure of the world. They know where all the countries are, what hotels are in what city. They have skeletal data and then they go back to the databases to fill in the data, but there's still a pretty serious amount of data that they hold in memory. So we actually went and serialized the Java implementation of a result set. We have file-backed result sets and this prevents it, because we do actually, we'll go ahead and restart 50 or 60 servers at a time, trying to send those all back to the databases to answer relatively heavyweight queries, 30-second query, you only have 32 cores, trying to do that with 60 servers is not particularly sane. So what we do with that is we run the queries, save them off on the staging server, R-sync that out to all the machines and so they can run in either mode. They can either run, they can either start directly from the database or they can start from the files. Quick little aside about performance of the kernel cache. Not so much related to us, but it was an experiment that I ran. Kernel buffer cache can be pretty slow sometimes. If you go and you create a file, that's about 20, or find a machine with about 20, more than 20 gigs of RAM and you create a large file, go ahead, open up H-top, create a file that's as big as main memory, sync it and wait at least 30 seconds for the kernel cache background writer to come through. I'm not exactly sure what. The technical reason why this is important, I will tell you experimentally it is. Then go ahead and call RM. RM will hang for about one to two seconds per 10 gigabytes that you need to clear out of the kernel cache, and you can sit there in H-top and look at the amount of memory allocated to cache and you can actually just watch it go down in these increments. But if you repeat the same experiment, but so you do the same thing all over again, but then you drop the kernel cache, not that you'd ever really want to drop your entire kernel cache in a production system, RM returns immediately. Same thing goes for unmounting file systems, those sorts of things. Removing large files, the actual time is spent in kernel CPU flushing the buffer cache. This is actually important for partitioned tables. If you can F-advise a partition table out before you do the partition drop, it will actually, you'll hold on to the lock for dropping the partition for less time because you don't have to delete the files while you're doing that. Disaster recovery, okay. So I showed you one data center, but the truth is we actually have two. And they're basically identical. If anything, the backup one is always better because that's where we do hardware upgrades, but they're both equally spec'd. I think we're in the alpha site right now and we'll switch in a few weeks to the beta site. So they're equal, only one is taking traffic at any given time, not currently multi-mastered, partially because of the Postgres reason of doing trigger-based replication and all the extra storage and those sorts of things. But the role reversal is return. They're a pretty routine for us. And this allows us to do disaster recovery load testing. We actually have a real environment to do load testing. And I think this is actually kind of an important thought because although it seems like we're buying twice as much hardware to have, you know, we don't skimp on the backup on the disaster recovery site, we've been doing a bunch of load testing over the past year and we've stopped buying hardware because we've had a real environment to do our load testing. We've been able to find all of the performance issues that are deep in our application and been able to basically stop having to buy stuff. And in terms of our application, we're now over-provisioned. We can probably on the same hardware go another year or two in terms of projected growth but just because we've over-provisioned or because of the lessons that we've been able to extract by putting the site under load. Failover always. So our general motto is that the disaster recovery site is not the disaster recovery site until it's taken production traffic. I go back to, we had to do a data center move of our disaster recovery site at one point or actually we've had to do two in the past 18 months and they're kind of painful. But basically from the time the first server entered the data center, when we did the data center move, to the time it was taking production traffic was about 36 to 48 hours. We simply did not want to be in the state where if you actually believe that the data center was done, prove it, fail over to it and if it's something breaks, then you have the functional one to go back to and that's actually what we discovered when we did one of these data center moves is we were about to go into it and discover that the config was totally broken and if we had had an emergency on the other side, we would have gone down. But having the flexibility of, oh we threw some traffic there, oh we can bring it back and fix that immediately was incredibly important for us. So this is kind of the way that we run things and because we do these load tests and because we swap them about every three months, we have little hesitation about swapping data centers. Things happen, sometimes you have a Cisco switch and the Cisco switch will go and have a file descriptor leak when you make Nagios calls and you make Nagios calls on both of the switches and sometimes they both crash on Christmas Eve because they both run out of file descriptors or sometimes the LA May West is overwhelmed and anyone who's got data centers on both sides of the country are getting out because there's too much DOS attacking happening in LA. So sometimes you have to move. One of the cases that I go back to is at one point we had a rather serious production issue at about 9.30 in the morning while our entire ops team was driving into work and one guy, our VP of ops was able to call one guy as he was getting into his car and he could swap the entire data center over just by walking or he went back into his house and failed the data center over. So we don't really, it leaves us in a position where again our first response if we really, if it is a big issue with that particular data center, we don't really think twice about it. There's no can we fix it? Well, we know that that one's there and we can also truthfully if we needed to in a pinch we can actually, we've done this where we actually run out of both data centers in a read-only fashion when we're stuck. Right now, DR is like production, paging, alerting, all that sort of thing goes on. Slight exception is we have when we're actively doing load testing we had to not page because it turned out we were paging all the time. Hey. So we're, I mean obviously we're always pushing stuff over. We go into a read-only mode when we fail over. So the backup data center is in the read-only mode and it's getting, the replication's pushing things through and actually we have, we discovered we couldn't actually fail over quick, we couldn't run well without Memcache. We actually have something in the backup data center on the most important Memcache clusters which is actually when data gets replicated in gets pushed up into Memcache. So I mean, we do have some concerns around getting all the data over there but basically when we have a real disaster issue we tend to stay in read-only mode until we've sort of figured those sorts of things out. The important and critical thing for us is our system of record for our advertising and clicks is log files. So we aren't, we don't really have a revenue impact when we swap. We have a content collection impact which don't get me wrong, we are working to not or to spend as little time in read-only mode as possible but it's not like, it's not like we go back and the CEO is like, where did all this money go? It's, you know, that sort of thing. Does that kind of address your question? But yeah. One thing we did find on the paging and alerting side is that while doing load tests we routinely break things. So being, turning the paging off while we were doing load testing was important. But other than that the DR site is treated as production and we never do anything to the DR site that within five minutes it should be able to handle production traffic. So in an emergency, basically the time to get to the DR site for us is the time of DNS propagation. We're working on improving that, so that's like five minutes. And then for a planned read-write failover, currently that's about 10 minutes of read-only time. Our goal is to bring that down maybe by the end of the year to next year to about five seconds of read-only time, such that it's almost completely transparent to any users as we move traffic across. There's, we kind of have to play with our load balancers and have them chat across to forward traffic in order to do with that quickly, but that is the intention. What this means for Postgres right now at TripAdvisor. So we're currently stuck in a mix of physical and trigger-based replication until the logical replication is completed. Physical doesn't really lend itself well to the sort of constant remastering we do, because we're doing this remastering four to eight times a year. And one of the requirements is when we've gone over, the backup should be immediately ready to take traffic. I go back to the time when we had a particular firewall issue where as soon as we got the backup functioning, the biggest priority was how do we deal with the firewalls? If the other firewalls had the same issue, how do we run without the firewalls? Because that was very much, as soon as the site was stable, the next thing was how do we get the back? How do we get the one we were just out of back to the point where we're even hacked up in a way that we could continue to take production traffic? So that's important for us. And also currently trigger-based replication puts us in the position that I'm gonna do my nine, three upgrade, and I'm just gonna do that in the backup data center. I'm also gonna upgrade hardware, gonna do all those things, and then I wanna flip back or so. There's no down. We currently have that, and losing that would be kind of a big deal. So we're kind of stuck in this sort of mixed mode until the BDR project really gets there. And one of the other things that we kind of take away is we have a real performance lab for testing read-only workloads. We don't have as good of one for testing read-write workloads. Yes, the backup data center is always taking replication traffic, but it's basically a forced read-only mode. The application user has its ability to do DML completely revoked, and it's also the web server itself doesn't show the particular pages that would allow you to submit content and those sorts of things. So we have a good test environment for that, but not as good of a test environment for read-write workloads. Thankfully for us, the vast majority of our traffic is read-only traffic. Not that we don't collect a lot of content, it's just by volume. Code releases. We do 10 code releases a week. Currently a big one on Monday, and then every other day, on Monday and then there's a 2 p.m. on Monday, and every other day there's a 10 a.m. and 2 p.m. release. And so that we're pushing code out. We also push new read-only data out every night with the nightly bounce of the site. So really we redeploy our entire sites three times a day and in emergencies four times a day. We also do half a dozen schema changes underneath the feet of our site. This is pretty standard stuff, right? You can do a subset of the changes live because they don't block. The rest we end up doing in the backup data center failing over to the modified schema and then fixing it in the other side. And also we've got some stuff that does automated schema validation to make sure that both sides are in the same way. So we certainly get the reports once we have them in the unbalanced state. It yells at us, but we can do that sort of thing. We're also, we've got a 15-year-old code base and a company where the motto is speed wins. So these things still often run like a scrappy little startup and there's a lot of code going out. So going back and refactoring everything to best practices is not really a practical thing for our organization. So we've got basically between some of the work that I do about third of my time is spent there and the work of our tech dev team, three or four people their job is to systematically go through, identify the most egregious errors both in application or egregious performance issues both on the application and on the database side and we go back and we fix them and deal with those sorts of things. And that's basically where we've been able to stall hardware purchasing by just increasing by finding the actual causes of issues. And I'm gonna come back to that. We do, for our dev environments, we have an extra slave that from which we dump or store all of our databases onto obviously not nearly as good hardware, but we put them into, we have a dev mini site. So people's personal development servers fit into the dev mini site. We have a pre-release site and a test lab which people can kind of borrow and it also does simulated releases to deal with code backwards compatibility issues and those sorts of things. Cause those are important to us because like I said we're online restarting the site 10 times a day and so you've got services and server and web servers that are running different versions of the code. That currently is a little bit of a pain for us. 36 hour process that we run every weekend to dump everything out and build new databases. Fortunately it's not well suited for PG based backup just because we're trying to save hardware and everything like that. So it's a little bit messy but it does give, it does mean that our development environments very much mirror production and if you see a bug that's caused by particular data in dev or in production you can normally reproduce that in dev unless it's like the review that was published five minutes ago. And I think this slide is probably why the reason half the people are here. Everything is obvious once you know the answer and sometimes the cause of performance issues are not immediately obvious but in hindsight. One of the first things we kind of discovered as we move from, we are a CentOS shop. As we move from CentOS 5 to CentOS 6 one of the things we discovered is versions of the current, no. Decided that they were going to ignore the BIOS power settings and that they were going to go ahead and run in power saving mode anyways because they thought that they knew better. And it turns out the only way around this is like to cap a number into something in the proc file system and keep that file open for the duration of the life that the server's up because as soon as you close the file the kernel reverts its setting. Needless to say this actually ends up being a little bit of a problem because if you're serving queries that are sub millisecond and the kernel has decided that it's gonna clock the CPU down to half the speed, those queries end up taking longer and then it goes and cycles the clock up after your request has returned. So this has, it affected our databases and it actually affected our application servers. It also makes understanding your performance a little bit hard because if you're going ahead and you're putting a database under a certain workload and it's burning 50% of the CPU and then all of a sudden you double the workload and it's still burning 50% of the CPU it's pretty non-obvious what's going on there. I would advise, especially, I don't know how this plays out on virtual servers because we're mostly a physical shop, using a utility called i7Z has weird bugs but basically it will show you for each processor what the current clock speed on it and whether or not it's in a turbo multiplier or all those sorts of things and it will show you. And if you're not running in high power mode and you're trying to do sort of a website like us, response time drop and all those sorts of things so that is an important utility and you will see that your clock is not running, the CPU clock is not running at the speed the CPU is rated at. Other thing we discovered, again this is, I should say that we discovered this in a 911 and so it might not be applicable to newer versions of Postgres. Obviously creating and destroying connections is expensive. What we weren't aware of is how expensive it was for the rest of the server. We had one particular server where it was creating extraneous connections to the database. The front ends were basically talking to the reviews database when they weren't supposed to be but they were only creating the connections. They weren't executing queries which just because there was a minimum connection pool size and some stuff like that but we actually found doing that adversely affected the performance of the database. So the creation and destruction of connections isn't just expensive for, oh it took this long to create a connection. It's also, it makes the rest of the server slow. I was talking with Rob Haas about that and I was looking at one of the mailing list threads and I'll kind of play with that. If I actually figure out why I'll put that on the mailing list but that's what we ended up finding. Most of our database performance and stability problems were actually relatively self-inflicted by our connection pool. We don't use Bouncer, I'll mention that a little bit later but for the topical databases because we've already have a service layer that's kind of there's only four machines talking to it. We were like, okay, we'll just put the, we were under the impression that we had kind of carved that down and that's kind of where we're at now. But our connection pool was very good at knocking the database over. It would do such things as because it would create it would create extra connections instead of letting it wait, basically revisit your connection pooling code inside of your application if you haven't done that. We had other cases where it would decide, well it's not, these connections have lived for too long because we want to recycle connections just to kind of do that. And it would clean up all the connections at the same time which means you destroy a bunch of connections and recreate the connections and seemingly all the connections were allocated when the server started. They were all allocated at the same time and all destroyed at the same time. So we actually added our connections live for a random amount of time plus or minus 40% off of the lifetime of our connections and we keep them for about 30 minutes. One of the reasons we still create connections every 30 minutes is one about resource leaks but also because we restart the site so often we don't want to be in the place where steady state we're fine but restarting causes an issue for the databases. So we still wanted to kind of have that as enough background noise to make it performance more consistent because we already have kind of a bunch of swinging there. Dynamic SQL has been part of our flexibility as an organization using the unnamed prepared statement. We've kind of chatted about doing stored procedures and it's a terrifying project to kind of go back through our code base and do that but we might do it in a couple cases. Anyways, we discovered that parse and plan was dominating the query execution time there. Everything, like I said, we mostly sit in memory. So all the query that would take a millisecond, two thirds of the time would be in the parse and plan phase and the execution phase would be only what 300 microseconds or something like that. So after the fact inside of our wrapper of the JDBC we're doing dynamic caching of prepared statements. So even if you ask for a new prepared statement if that connection has already seen a query that has the same text but needs parameter substitution. So we're playing with that. Again, it's been across the board a major performance savings. We knew that there were going to be pathological cases because we're running 911 and there have been pathological cases where the query planner is smarter when it knows the values of the parameters. Again, you discover that someone somewhere along the way was using a like statement for a string when they wanted a quality. But basically in our playing around with 9-2 and 9-3 part of our motivation for doing that quicker is that 9-2 and 9-3 don't have that issue because of the improvements made to the planned cache. Also kind of discover that prepared statements offer a large network bandwidth improvement even when using stored procedures just because again, some poorly written queries led to this discovery. But if you're only getting a single row back I want the metadata about this location. The description of the result set is almost as large as the actual result set itself. So there is some savings there if you don't have to resend that metadata back. Again, playing around with PG Bouncer it did seem to in all of our tests massively help with stability and capacity but it ended up breaking prepared statements for us and the performance benefits that we got out of preparing the statements and caching the plans was more important or ended up being a bigger savings than it was to use or than it was to use Bouncer. Again, if we were stored procedures then we could do that refactoring obviously that's probably the best of both worlds. We've also played around with theoretically you could add to Bouncer. Bouncer code is a little bit complicated and hard to reason about but you could actually add protocol level prepares through Bouncer. It's documented in the Wiki and it's like four weeks at work but we might come back to that at some point. Anyways, so we tend to throw hardware at the problem. I said most of our databases are between 256 and 768 gigs of RAM in production and on the database that's like 600 gigs even 911 can easily do 50,000 relatively sophisticated queries per second. We can also, recently we were doing something where we were able to do on a 911 server on a database that was three or 400 gigs we were able to run the whole site without any caching for the member layer. We could actually go run the entire site every request that needed members going directly back to the database with no caching layer whatsoever. So we've been able to do that but only after fixing the things on the previous page. We went from being able to not take out about 10% of the caching layer to being able to take out the entire caching layer by fixing the things on the previous slide. And we probably could go higher than 50,000 but somewhere around there you start maxing out your one gigabit interfaces into the server especially when you're trying to do that. It's a relatively low latency network but you're still, you're traveling over a network. Our plans for getting more throughput involve upgrading to 9.3 obviously we've all seen the performance improvements that come with 9.2 and 9.3 and putting most of our database servers on with network interfaces that are 10 to 40 gigabits just because that's what the server, the servers are capable of serving more than a gigabit per second of traffic so why not in order to actually use all of the resources we're gonna, we're kinda upgrading the network there. So SSDs are finally feasible for us. They've been for the longest time, they've been, it's been cheaper for us to just throw RAM at the problem. Now we're kind of revisiting that thought and going into the SSD space. Also previous generalizations of enterprise SSDs have had some issues about durability. If you're familiar with SSDs they have RAM on chip and they lie when they say that they complete an F-sync so they need to have enough capacitance to write back the data in the case of an unexpected power loss and some of them haven't done it well reliably but that's basically in the past at this point. Running SSDs through RAID arrays I know the RAID array is basically the bottleneck. You can only put like five or six drives in to the RAID array before the RAID controller CPU, again you don't get much visibility into the RAID controller unfortunately but before that becomes the bottleneck. We did some synthetic tests where we actually mounted all of the drives individually through the RAID controller as their own devices. In those cases we were peaked out at 40 gigabits per second, reads 20 gigabit per second, writes to the drives and we found that when we're doing the individual drives they were fairly consistent but when we're trying to do behind a RAID array it seems to jump around. Again these were Dell RAID arrays with LSI controllers. We've been kind of working with Dell and trying to figure that out but not clear on why we don't get consistent write performance there. It's still way better write performance than we're getting spindles. It still orders a magnitude better than that so we're just kind of taking it as it is. Again 75,000 are doing like PG test F-sync. We see on these SSD backed RAID arrays we're seeing 75,000 to 100,000 F-syncs per second when you're doing 8K writes. We tested a nine server into Maysync and we're doing some sort of four write heavy workloads because we haven't done too much with that and we were able to write 300 to 400 megabytes per second to the wall. The on the sort of hardware, the biggest thing is once we were doing that after sustaining that RAID for about 36 to 48 hours the auto vacuumer couldn't keep up and the box locked me out because I was trying because of wraparound issues but you know. So want to experiment more with the PCI Express SSDs from what I've heard from people in this conference they're just absolutely phenomenal. They're also like $10 a gigabyte as opposed to $2 a gigabyte so there's a high price penalty there. Also interesting, again these were our tests on the SSDs there, we ended up finding that for the write caches on the RAID controllers to actually put them in write through mode as opposed to write back mode which is opposite to your intuition. Your intuition is oh write it to the RAM in the RAID controller. That's why the RAID controller has batteries. That's why you bought a RAID controller. It actually turns out that if you're writing too fast the RAID controller can't manage its own cash and buffers and somewhere around, what is that? Five, these are the individually mounted drives. Somewhere around five individually mounted drives it just can't do anymore whereas if you actually write through the RAID controller and say I'm just going to call this write directly back to the disk you actually get, it continues to scale with the number of disks. And this was backed up by, I forget who I was talking to but one or two other people I've talked to this weekend have had similar experiences with their RAID controllers and they are running them in write through mode. There's a slight penalty on F-Sync performance. We have, again the trade off for us was worth it in terms of being able to write the volume of data. You get 20% less F-Sync but you know. Anyways, that's basically what I had and I open the floor up to questions if anyone has any. I think the biggest thing that kind of right now logical replication is probably the biggest in terms of that. I also think that playing around with the failing over of nodes is also, everyone's rolling their own version of how you do the high availability and all those sorts of things. In terms of what infrastructure you build around that and it's kind of fragmented around there and I think that's kind of a bigger. Those two things are in my mind the biggest things. You have any other thoughts on that Matt? Robert did, yeah. One of the things I didn't end up talking about is we actually do our monitoring of our site in a Postgres database that consumes all of the logs. Because we do 10 releases a week, we actually deploy to about 10% of our servers, wait about 30 minutes, collect the logs, do anomaly report detection and so in that server we were funneling in like 80,000 writes, 100,000 writes per second in terms of tuple count and some of those things. Again, for analytical style workloads we have someone I chat with at our company who wants, for analytical workloads wants two types of transactions, copy in and delete. Yes, stuff dying. So the biggest, how do we handle stuff dying? So right now if I go to the production data center where that's the most sensitive thing if the back office stuff goes down it's a little bit less sensitive, right? So we're doing DRBD there. To be honest we used to do automated failover and the number of times that automated failover brought us down and was the right decision, basically left us in the position where we've essentially backed out of automated failover. We've got paging, we've got scripts that do the failover pretty nicely but to be completely honest we don't, we do it manually and if we go back over the past five years the number of cases where an automated failover would have saved us any downtime is maybe once in terms of that whereas the number of times automated failover brought us down was three or four times particularly because we're stuck in the DRBD world and haven't gotten streaming replication out just because we haven't had the time to and you're relying on a database that has like 768 gig of RAM for performance we actually go and or when we do that DRBD failover because that file system's not mounted you're now failing over to a database with a completely cold cache. Before all the f-advised stuff we actually had an old trick where we went into the Postgres data directory and recursively capped it's star to dev null because that was faster than letting the cache warm that way so again it doesn't tend to burn us often enough but we're really kind of trying to figure out the better way to do that as we start adding more and more database servers because product is requesting new categories and more data and so we need to kind of get that moving a bit smoother. The other thing that I'm really trying to get to and we're gonna have to revisit is doing more with read replicas and having the concept of an SLA for reads and an SLA for writes because again our commerce and our revenue is tied to reads it's not tied to writes and being able to kind of have a cluster of read slaves and be able to say, well maybe the master died but I'm still serving traffic I'm still showing prices I'm still doing our core business I'm just not collecting new content that's the other angle we're trying to address there. Um? Cause 9.4 is only about what, how it came out in December and I think we're just a little bit skittish there particularly I think we're not so worth I think because Matt and I were chatting about this the scariest thing to us, well two things A, a corrupt index with some of the the B tree stuff that's gone there and B, hitting an edge case that no one else hit segfaulting, postgres restarting because we're in this world where we really we're in a, we're active passive but we're in a hot cold model not a hot, hot model so the fear of hitting a weird edge case that's not to say that I won't start playing with 9.4 for some back office stuff we tend to like I said I have those read databases right now my read databases I have 2 on 9.1, 1 on 9.2 and 1 on 9.3 and when I do the 9.3 upgrade I'll put one of them on 9.4 to kind of put it under some stress but for anything where I only have one of my biggest fear is really a segfault anything else I can work around I segfault and restart that is a clear outage and trying to explain oh I'm running the six month old version because, you know I can do that in the back office and we will but not on the production site we try to be a little bit more conservative there um we haven't done much of that yet um again I guess I would say mostly for lack of energy I would go um and say that again because we're also on spindles and because of our replication it's actually hard to do a postgres upgrade we'd actually have to have an extra server and then we've got cache coherency problems so we'd have to pause replication not to say that we couldn't and not to say that we won't in the future it's just I've got honestly I have a nine months nine to twelve month backlog of projects of things that I want to get to and so prioritizing that is a little bit hard any other questions? um not in production um we have a team that their entire job is obviously knowing about all the locations in the world is core to our business um if you can't you can't list reviews for a restaurant or a hotel if you don't know it exists um so we have a team that does that and they do have a database that is they have a postgres database and they do all that sort of work um in terms of um in terms of the production site um all of our web servers have KD trees in them um for doing some of the geospatial stuff basically inside of the application um and we have another project that's going out but they're just kind of they're just going to use gist and points um and then fix to grab too much data and kind of clean that up um again we also push most we also push a lot of that off to google maps and things like that um so I think that's we're all set? thank you all very much