 All right, so good morning, everyone. And if you do have a question, go ahead, shout it out. But maybe at the end, I did a run through yesterday. It's a little too much content. I think it was a little aggressive in the amount of material I wanted to cover. So we might not quite get through all of it, but that's OK. If you do have a question, go ahead and ask. And just an introduction. My name is David Rader, and I'm with OpenSCG. And we do a lot of work around postgres migrations as well as helping folks scale their database, scale their database platforms. And a lot of this is really based on experiences that we've had with customers about how to manage larger databases, especially when you get into the data warehouse or analytic type applications. So what am I talking about here? What is this all about? You might have seen questions about, does postgres support partitions? How do you deal with that? Are there any ways that you could actually compress your data, but also get faster queries out of that? And what about trying to stretch a larger table across servers? So if you wanted to start doing multi-server, start doing sharding, possibly start stretching out into a cloud-based database. And if you're new to postgres, what is a foreign data wrapper? What is an FTW? Think about, if you're running a database, if you're running, especially in sort of a data warehouse or a system where you keep adding more and more data into your database. So maybe you're doing log records, sensor records, order history records, anything like that, where you just keep growing, right? It's not a single constant quantity of data. How do you start dealing with that? How do you deal with older records? You want to start purging, start moving things into an archive. If you just delete those records, vacuuming has a problem later as you try to catch up. And we all are running out of disk space on our servers. So how can you start dealing with this? Well, a couple of things we can do about it, right? So the first thing, if any of you are experienced DVAs with larger databases, the first thing you want to do is start partitioning that table, right? Break it up, create multiple different sub-tables, deal with that. How does Postgres deal with that? How do we use that? Second thing we can do is really take advantage of some of the advanced Postgres features like foreign data wrappers that give us the capability both to go across multiple servers, but also within a single server, start using interesting data storage techniques. One of those being column storage, which is really good for analytic queries. It's really good in that data warehouse situation. Or start moving those archives to a remote server. Just a quick show of hands. Any of these terms, foreign data wrappers, extensions, column stores, any of them familiar to you so far? Yeah, okay, good. All right, so with that big table, your queries might be slowing down. Some of your operations, what do you do, right? All right, partitioning. This might be a little bit of basic, but I want to just level set everybody. So dealing with that large table, partitioning, simple thing to do when you talk about sort of vertical partitioning is just break the table up into multiple chunks, right? You might do it by date range, weekly, monthly partitions. You might do it by IDs. So newer customer IDs and one partition, older IDs and a different partition. But the idea is to just break up your large table into partitions, make it easier to manage. So how does Postgres do partitioning? Well, as of Postgres 9.6, the current production release, it really uses table inheritance. So if you're been using Postgres for a while, you may have heard that it's sort of a post-relational system. Deals with table inheritance, sort of object-oriented inheritance of tables, that's actually how you implement partitioning today in Postgres 9.6. The sort of best practice, the parent table would be empty, you wouldn't have any rows in that. Each individual partition would be a sub-table, child table underneath it. And you would break up those partitions by date, or possibly by customer ID, by ID ranges, so that hopefully you're able to spread the data across. Now in the case where you're doing log entries or so forth and it's sort of a date-based partition, you're always gonna be adding new data into the newest partition, the one that you've just recently created. If it's sort of customer data and you're splitting it by date, by ID, sorry, you may be inserting data across different partitions and you're able to spread out. When you're doing this though, you've gotta handle how do you insert the data into the right partition. So your SQL query is still gonna be doing an insert into that top-level parent table. But you wanna make sure that the data actually gets routed to the right child table. And you do that with an insert trigger and you're able to route the tuple down to the correct child table. The cool thing about doing partitioning this way is that the Postgres Optimizer is able to use the check constraints on the child tables to actually exclude them from your query. So when you come in and say, I wanna query the entire set of log data, the Optimizer doesn't have to scan every one of those child tables. Doesn't have to scan every partition as part of the query plan. It can actually use check constraints to exclude some of those partitions. Now, breaking it up, that's good. Maybe your disk is full. Nobody likes to take downtime to add more disks to your server. So what can you do? How can you actually start making it smaller? Well, Postgres is a great relational database. We all know it's the most advanced open source database, right? But it's also got a lot of no SQL capabilities as well. And one of those is the ability to add a column store data store underneath for individual tables. This is done using a foreign data wrapper, using a column store foreign data wrapper. And you're able to take data, actually compress it on disk, as well as for many types of analytic queries, speed up the execution of those queries. Now, if you're familiar with column store, I'll go into more detail about it later. But if you're familiar with it, it's really optimized for sort of batch oriented operations. So it's really good for older partitions. If you're doing sort of date oriented partitioning, things that are not gonna change anymore. So you've added your log records in, you've added in those older orders. Well, they weren't old when you added them, but over time they don't change anymore. So that's a really good situation to move them into a column store, to be able to squeeze them down and speed them up. You don't wanna do it for your current data, where you're adding new records in or updating records right now. It doesn't really work in the column store to do that. But what if you're thinking about moving to the cloud? We certainly, Amazon's a big sponsor, Google just gave a talk about their Postgres cloud services. Can you move half your database? Anybody here ever tried to move half a database to the cloud? You can. You can actually start talking about sharding and moving some tables out to a remote server. You can do it essentially in the same way that we just talked about with the compression where if you take individual partitions and start turning them into foreign tables using the Postgres foreign data wrapper, you could move individual partitions out to a remote server. And this way you can actually start sharding across multiple database servers, start using more CPU power. You can also start elastically sort of moving out to a cloud. But you don't have to move your most recent, most current data where you're getting the most updates, you're getting the most activity. The application expects the most performance response time. So if you're able to separate it to sort of older history records that you want to query versus new frequently used data, it can keep your users and your application performance expectations under management. So just quick pause. Does that make sense to everybody? Okay, a lot of heads nodding, good. So that was great, that was sort of the overview. Now we're gonna start making eyeballs bleed and sort of look at code and how do you actually do this? So the sample data set that I'm gonna be working off of is based on some of the airline on time arrival data. It's available from the US Department of Transportation, the Bureau of Transportation Statistics. It's got about six million flight records per year, goes back 20 or so years worth of data, pretty easy to get. And it's just a nice data set to work with for doing, playing around with and samples and such. If you look at what the parent table would look like and for this arrival on time, you've got what's the flight date, what's the carrier, flight number, where did it start from, where did it go to, when was it supposed to leave and 20 or 30 more fields, right? And we talked about before when you're gonna partition, you need to go in and create the child tables and create all the check constraints that go on those child tables and just show hands who has created partitions and check constraints, right? So this would be the way you would do it if you were doing it by hand, right? You would come in and actually create the individual child tables, inherit from the parent table, create a check constraint on the individual child tables. And if you're doing it by date range, this was all set up for sort of year by year partitions. You create that check constraint and what a check constraint does is it tells the Postgres optimizer that that table cannot have data outside of that check constraint. If you try to insert data, it will actually fail. It would use the check constraint to fail the insert. And when it's planning to select, it can exclude those partitions based on that check constraint. If you actually run a query against the parent table, again, the table inheritance works so that if you run a select, in this case just a simple select count star off the parent AOT table, when you specify a date range, the query planner is able to go in and say, okay, I'm gonna do the count aggregate, great. But first what I'm gonna do is scan the parent table, which is empty. You'll see there are no rows that come out of the parent table. And I'm gonna scan just one partition. So even though we've got year by year, we've got 20 partitions, the query planner is able to use the check constraint, look at the where clause and say, there's only one partition that could possibly satisfy that where clause, I can ignore all the other partitions, drop them out. This will absolutely help speed up those queries against those large tables where you have many, many partitions. When you're trying to insert new data in, I mentioned earlier that it's sort of this trigger-based routing. So if you do an insert against the parent table, you need to actually capture the insert, add a trigger onto the insert, and move it to the correct partition table. One of the tricks that you always have to think about is what do you wanna do if you happen to be missing a partition? Do you want to create a default or drop it into the parent table by default? Do you wanna reject the insert? What do you wanna do? It's, this is definitely an application-specific question. If you put it in the parent table you wind up with funny things later where you need to move data out of the parent table, you have a little more maintenance effort to deal with. If you put it into a default partition, you don't have clean data in there and you can't exclude that default partition. So this is something that you wanna think about from your application standpoint and decide how you best wanna handle that. And the insert trigger looks similar to some degree to the partitioning and looks similar with those check constraints where you just, as you go through the insert, you take a look at the key field, you wanna optimize the insert trigger to put the most frequently used partitions. So if you're doing date-based or serial ID-based with new customers, put those sort of newer partitions at the top just so that they get found in the trigger short circuits quickly. So if I go through and just reroute, say, hey, if it's an insert operation, check the date values, okay, which child partition should I actually move that insert into, right? Makes sense so far? Okay. Lot of code there. One important tip if you're using Postgres 94, 95, 96, don't write all that code. So use an extension called pgpartman, which does all of that for you automatically. Instead of all those create table and create constraints and so on and so forth, you just create the extension and then create a parent relationship and it can build the partitions for you. So also got maintenance capability. So we talked about missing partitions. Obviously, if you're doing something date-based, as the date changes, as new months come along, you need to add the new monthly partitions in. You can sort of plan ahead, you can create three partitions in the future, but you're still gonna have to add additional partitions as time goes on. Same thing if you're doing sort of customer ID-based or any type of ID, as your ID numbers grow, you're gonna need to add new partitions to handle that new data. Partman has that sort of maintenance scheduling built in or you can schedule it through CrownJobs. So it's important to understand the background and how it works, but there are better ways in the Postgres community to take advantage of some of the reusable parts and use the extension here. Similarly, if you are doing bulk loading, so if you're loading new data in, just don't go through the parent table. If you can possibly avoid it, do the direct insert into the trial partitions. This requires planning ahead of time to say split up your bulk load data, but running every insert through that insert trigger is a lot slower. So if you can avoid that, if you can bulk load into the trial partitions directly, you're gonna get a lot of improvement there. Now, I said in Postgres 9.6, this is how partitioning works earlier. So one thing to watch out for or one thing to notice that in the upcoming version of Postgres 10, there's more native partitioning built in. So the optimizer knows more about the partitioning. The query routing, tuple routing, sorry, is built in to the native Postgres executable. So the insert on a parent table that needs to be routed out to individual partitions can be a lot faster, as well as select queries and the optimization around being able to drop partition or exclude partitions from the query plan is a lot faster as well. So Postgres 10 has the sort of first round of native partitioning built in should be coming out in six months or so this fall, probably. Definitely take a look for that as you think about upgrading to Postgres 10, but it is the first version of native partitioning. There are still some limitations. It's not gonna support everything, things like automatic indexing. So normally you would need to create an index on every child table. And if you do the inheritance earlier as you're creating those child tables, you can have the indexes sort of pushed down, cascaded. You can use partman to create indexes across child partitions. The first version of native partitioning in Postgres 10 isn't gonna support automatic indexing. So you're still gonna need to go in and manually create indexes. And it's not gonna be simple to convert. So if you have an existing partition table in Postgres 9, 5 or 9, 6 and you upgrade to Postgres 10 and you wanna convert to use native partitioning, that conversion process is not gonna be just an altered table, switch it to native partitioning. You're probably gonna need to create a new native table, native partition table in Postgres 10 and copy the data over. So there are some limitations. The partitioning isn't gonna be implemented in every part of the optimizer in the first pass, but it's something to definitely pay attention to if you're using it. Okay, so that was general background partitioning in Postgres, child tables, check constraints. Next we're gonna just talk about foreign data wrappers and how we can start using those. Does everybody know what an FDW is? Yes, most folks, great, okay. So, you know, simple, right? It allows your Postgres database to connect to other data sources. They can be all sorts of things. It was introduced in Postgres 9, 6 originally with read only. Each version of the 9.x series added to it from write capabilities to pushing down ware clauses to being able to remote join two tables in the remote database. And again, feature update in Postgres 10, the Postgres FDW will also have the ability to push down aggregates. So if you are doing analytic work and you're doing sums, counts, et cetera, in 9.6, if you try to do a sum aggregate function on a remote table, Postgres will actually pull all the rows back and sum them up in the local server. So it's a little painful. In 10.0, the Postgres FDW will be able to push that sum down to the remote server and you'll get a big speed up on doing that type of aggregation. That can be really helpful when you're talking about doing part of sharding and when you're talking about moving some of your data into a cloud server. There are lots of foreign data wrappers available in the Postgres ecosystem. So, Postgres FDW comes as part of Postgres install. It's easy to use it as a setup. Certainly other relational databases, Oracle MySQL, SQL Server, SQL Lite. There's a JDBC, FDW, there's ODBC, FDW. These are easy ways to connect to just about any relational database you might want. You can also connect to no SQL databases. So if you wanted to join to a Cassandra data store or Hadoop or MongoDB and actually start integrating data from Postgres to lots of other types of data stores, you can do things like files. So CSV, XML, JSON. It's even one that allows you to do a PGDump export file as a foreign table. So if you're trying to move data from one server to another, you don't necessarily need to load the entire dump. You could run a query and just pull individual records out of that dump file, so pretty cool. And then folks have gone off and done things like pointing to S3 files or Twitter streams or Google spreadsheets, even things like using the PROC file system on your Linux system. So if you want to look at CPU stats or things like that inside of a SQL query, you can do that. Those are all really cool and they show some of the power of Postgres and how you can do interesting things. What I care about for this is the column store. So there's a C-store FDW, the guys from Citus Data who are also here originally created that and still support it. And it is a column store data format. So if you've done anything with Hadoop or with sort of big data systems, you may have heard of column store, you may have heard of the optimized row columnar on disk format. The big thing is that instead of storing data as rows, so in Postgres you insert a row and it basically writes the entire row sequentially on a data page. And when you insert another row, it writes the next row sequentially on a data page. Oh, don't hit the microphone. And as you keep doing that, keeps adding individual rows. And that's great if most of your queries are trying to pull most of that row out. But if you're only looking for the third column in every row, you have to actually hit every single row, the whole row, and there's no way to just grab the third column only. What the column store data format does is break things up and actually store by column. So if a lot of your queries, if a lot of your analytics are, give me the maximum value, give me the sum, give me something where I just care about two or three columns out of a 30 column-wide table, this can give you a lot of performance benefit because you don't have to actually scan all the data. The C Store FTW also has sort of built-in skip indexes so that when you apply where clause is it's able to skip data pages, it doesn't actually need to look at every record to filter them out. It does a lot of optimizations like that as well as compresses the data on disk. So you get a lot of different benefits there. The caveat is that any sort of column-oriented database or column-oriented data store is good for batch inserts. It's not really good for updates because if you tried to update a single row, it would need to go and hit each different part for each column and change it. And if you were inserting one single row, it would need to go add that column value for each field separately across. So it would be doing a lot of different I.O. It really wouldn't be efficient for doing single operations. So that's great for moving a partition over. If you want to install the C Store FTW, you can get a binary install in the Big SQL distribution using the Big SQL PGC command line. You can compile it using the PGXN client tool for compiling extensions. Once you've got it installed and running, add it to your Postgres Conf shared preload libraries. And after that, restart the server, sorry. Got to always restart after doing your shared preloads. Once you do that, it's sort of a standard extension. So create the extension in your database, create the server. So any foreign data wrapper that you work with, you wind up creating a server, mapping it to the foreign data wrapper. And then create the table definition. And there are some options. So you can choose what kind of compression. You can choose some optimization around how large are the stripes, how many records per stripe, et cetera. So as you learn your data better, you can make it faster. But basically, in the C Store FTW, you don't have an existing table for the Postgres FTW for other sort of remote foreign data wrappers. You already have a table that you're pointing at. In this particular case, you're defining the table as you create the foreign table definition. Loading the data is as simple as, if you have an existing partition, so in this case, we wanna move the 2011 partition from a native table into a sort of column version of it, insert into select star. It's fairly simple to just copy the data over. You could also use the copy command to load up in files. Make sense? Why would you do it? So if you look at the size of those two partitions that we just did, right? In the native Postgres table, that one year partition was about 725 megs. When we moved it over to the column store partition, it was reduced to 169 megs, right? So 76% space savings, right? It's pretty nice. So if you are running out of this space, you don't have to go make your server bigger, you can compress your data. Great. You also get speed benefits. So for some of those analytic queries, so if I was looking just for how many flights in 2011, six years ago were canceled, there were 193 flights in the US that were canceled six years ago today, and it takes close to one second to execute that count in a native Postgres partition. If I do that same query on the compressed column store partition, it runs in 52 milliseconds, so almost 20 times faster. And that's because, again, it's an analytic query. That column store is really optimized for that type of analytic use. Okay. Covered some partitioning, covered foreign data wrappers, talked about C-Store. Makes sense so far. All right. How do you make that a little more automated, right? That process of creating this C-Store partition of moving the data over, it's kind of manual. Let's try to optimize that a little bit. We want to move the older partitions over. We want to convert the partitions. And what's interesting is Postgres data catalog has all the information you need to do things like alter the inheritance of a child table to the parent table, or remove a child partition from that parent table. So the basic process of doing this, and can you guys see the text in the middle here? No. Okay, great. So six step process, pretty simple. We're gonna skip that part. First thing you want to do is check that the child partition you're trying to move is actually a child partition. So if you go in and say, I want to move that 2011 partition, the first thing you want to do is query the Postgres catalog information, find out for that table, is it actually a child of the parent table? Because you don't want to move something that's not a child. And then you want to look up those check constraints. Remember we talked about how important the check constraint is for the optimizer, for the trigger inserts, et cetera. And when you're trying to make this into an automated process, you don't want to be typing out the new check constraint. You want to take the check constraint from the old partition and copy it over to the C-store partition that you're gonna create. So you read the check constraint out of the Postgres catalog. You should only have one check constraint port table. We'll make sure that's true. And then actually get the full text definition of that check constraint. Create the new partition. And in this particular case, you're building up sort of dynamic SQL inside your PLPG SQL function, generating that create foreign table statement that we showed earlier with all the columns in there and the options on the foreign server. And then moving the data over, just doing that insert, select start from. And then running the analyze command on the foreign server. So the C-store FDW in particular uses statistics. It pushes those statistics back to the optimizer. That helps a lot with query planning. So if you are gonna bulk load six million rows over, definitely use the analyze command to update those statistics. And then add that check constraint to the new partition. So we queried it, we got the definition we copied it over earlier. Add it onto the new partition that we just created. Again, generating the dynamic SQL. Drop the old partition and add the new partition. And Postgres is really interesting in terms of you can actually create transactions around this type of DPL statements. So if you are doing this, you can turn this into a transaction. You can roll it back if anything isn't gonna work. So it makes it really nice from the standpoint of managing your database. And you're able to just alter the new table that you created and make it inherit from the parent that you're trying to connect it to. And then when you go and run the query against that new partition with a where clause that shows up in the date range that you just moved over, the optimizer is able to use the check constraint. It's able to just do the single foreign scan against one partition. So it drops all the other partitions and knows that this is the only one that might be there. And the C Store FTW gives you a little bit of information in the explain and analyze which file was it working on, how big was the file, et cetera, and how many rows came back. And again, you're down in that 20 millisecond range. So you can actually set this up as an ongoing maintenance process that you run once a month if you're moving sort of monthly partitions over. And you can script it and automate it to convert all the partitions up. If you compare that same query against the native table, it was 1.3 seconds instead of 50 milliseconds. So that big speed up going against the C Store. Partitioning FTW C Store, converting old partitions to C Store makes sense. So the next thing would be, well, what if we wanna move some of those partitions to remote servers and using the Postgres FTW to connect to that remote server? Which, similar scenario, you can take individual partitions from the large table. We may leave some of the partitions compressed. Other ones we're gonna convert over into a foreign data wrapper remote table, move over to the other server. And we're gonna actually execute the queries remotely. To set up the Postgres FTW, similar to any FTW, create the extension. It comes with Postgres. So if you've already got Postgres installed, you should have the Postgres FTW as part of your distribution. Create the server. In this case, when we created the server on the C Store FTW, you didn't really have any options. You didn't need to do anything because it was just creating a local file on the same data server. When you're creating a remote connection with the Postgres FTW, you need to provide the connection options. So which server are you actually connecting to? What's the port number? Which database on that remote server are you doing? And then you need to create the user mapping. So whatever user you're running, your local Postgres query as, that user needs to be able to have some mapping that says when that user tries to connect and run a query on the remote server, which user ID and password should it use on the remote server? So it could be, you could configure it so that every server has the same user. You could have different users on different servers, but you need to somehow tell your Postgres foreign data wrapper which remote login you're gonna use. I recommend usually creating a separate schema. So when you import the table definitions from the remote server or when you create your remote foreign table definitions, create them in a separate schema just so that you can manage them. Sometimes that doesn't give you complete transparency on the application and you may decide that you really want your remote tables to be in the same schema as your local tables. But from a DBA perspective, I always find it a lot easier when you're looking at your list of tables. If the remote tables are in a separate schema, it's very easy to tell that they're actually remote tables and you don't start doing crazy things to them. The Postgres FDW and most of the other FDWs actually allow you to import the foreign schema definition. So earlier we went in and said create a foreign table and we listed 20 fields and defined the table that way. You can do that table by table, but you can also just import the entire foreign schema. Saves you a lot of time, avoids problems of data type mismatches and any little issues that you may have there. And again, if you are gonna do that, it's really nice to put it in a separate schema just from the organization standpoint. You can filter it out, you can choose individual tables, you can give star list of tables, things like that, but it's a good way to pull the definitions over. When you go and try to move the data from your individual partitions over to the remote table, you're gonna do it in a pretty similar way. So the insert into select star from the local partition works, you can actually alter the remote table definition, add the check constraint to it. So you can apply check constraints on remote tables. One key is that Postgres is not actually validating that check constraint. So you need to make sure that your data really matches what you're telling it, but the optimizer will use the check constraint in the query plan. So make sure if you're connecting to a remote table and you're gonna put a constraint on, it really does match what you're telling the optimizer. Otherwise you'll get wrong results, right? You can add the remote table to your parent table. So create the inheritance relationship and tell Postgres that it should be part of that parent table hierarchy. And you can then remove the old partition by just doing an alter table, no inherit. And then if you want to, you can drop your local table, drop your local partition if you don't care, or whatever type of archiving you need to do. And then the query against the remote table. Again, the optimizer knows that it can use the where clause. It can apply the filter, use the check constraint. And it only needs to scan a single partition. So it looks at the parent table, it looks at the single partition. It happens to be a foreign scan. So it executes the full foreign scan. And in this particular case though, what you'll see is that the Postgres FDW is pulling back all of the rows that meet the condition and the aggregate is being done on the local server. So this is what I was talking about earlier where in Postgres 9.6, the Postgres FDW was not able to push those aggregates to the remote server. So if what you're doing is more analytic queries, if you're doing sums, count stars, et cetera, you're gonna have some delay there. So it's actually gonna pull all 500 rows, all 1,000 rows back to the local server and then try to apply the aggregate locally. If you're just retrieving one row at a time, if you're moving order history records, things like that, and you're just gonna go pull one order history record, that's great, works fast, it's easy. But the aggregates are gonna be slower than if you're doing them locally. So the summary on the remote server partitions, it works, it mostly works. Most of the Postgres starting solutions are built using this type of approach and using the FDW. There are a lot of little gotchas. There's a lot of things to watch out for when you're doing this. Certainly the latency. So if you're talking about just a server side by side, side by side, and you're sharding locally in your data center, the latency is not gonna be that bad from a network standpoint, but executing a query on a remote server is a lot slower than executing a query locally. If you're actually talking about a server on the cloud, you've got a lot more latency from a network standpoint from a remote execution standpoint, watch out for that. Inserting through the foreign data wrapper is pretty slow. So if you're talking about doing that insert into select star, that's gonna take a while. It's a lot slower than just moving it into the C store FDW locally or moving it into a different table. You may wanna really look at doing exports of data, moving the data over to the remote server and doing a load on the remote server directly. Talked about Postgres FDW in 9.6 does not push the aggregates down. So watch out for that. That'll be better in 10, but for right now, you got to live with it. If you lose your remote connection, so even if some of the partitions locally were available, but if the remote connection was not available, the whole query would error out. So a lot of times what you start doing is creating sort of a, especially in the case where you're moving history partitions over, you'll create a view of just current data or you'll leave the parent table as just current data and you'll actually create a history table and then create a view where it's current plus history. And in the application, if you only care about the last 180 days, something like that, you can just query the local data, just query the current data. If the user says, no, no, I really want everything in history, you're gonna hit that view, you're gonna go back and actually do the remote query. But that way, if you lose that connection, you're not erroring out on most of your user queries. You're only erroring out when they're actually going on the remote data. And you can't execute DDL through the FDW. So if you're trying to move partitions over, all of the work of actually creating the new partitions can't be done through the FDW. You either need to do it through a separate libpq connection. You could do that in a PL Python store procedure or you could do that in your application outside or you need to manually as a DBA pre-configure the partitions on the remote server. So something to watch out for as you try to automate it as you try to make it into a repeatable process. All right, talked about partitioning, the basics of how it works in 9.6, little bit of preview of how it's gonna work in 10.0, child tables, inheritance, et cetera. Talked about the foreign data wrappers, both local ones like the column store as well as remote Postgres and other databases. Column store on Postgres, some of the benefits there, especially for analytic queries, how you move partitions over and how to use the Postgres FDW for the remote server. So we will pause there, any questions? Sure, how to match those two concepts. So I wanna separate very strongly the Postgres FDW versus the column store FDW. So in 9.6, the FDW API allows pushing down aggregates. The Postgres FDW does not push aggregates down to remote Postgres. The C store FDW does calculate the aggregate locally. So any other questions? Yeah, you can set up the SSL connections on the FDW as well. At which point would you partition a Postgres table basically? So it's absolutely gonna depend on how wide the data is and what the size of the records are. A table with a million records, typically in Postgres, no problem, just single table, don't worry about partitions. When you start talking about 50 million, 100 million records, one table could become a problem. From strictly query performance, it's actually not an issue usually. So the proper indexing, even partial indexes on a single table, you can do query performance on 100 million, 500 million rows fairly easily. Where you really start seeing benefits to the partitioning is around the data maintenance. So if you want to do any sort of archiving, if you wanna delete a month's worth of data, if it's in a single large table and you try to delete that month's worth of data, you created a lot of vacuum debt. So the vacuum process needs to come through and clean up all those old records. The delete is a long, delete execution that's gonna be slow. If you have your table partitioned and you wanna delete a month's worth of data, you just drop one partition and you're done. Similarly, if you're trying to do exports and backups, you can basically export and backup those individual partition tables. Whereas if it's a single large table, you gotta deal with the entire table. So the query performance is not gonna be where you start seeing that threshold. It's gonna be more around the data maintenance. I'd be curious, anybody in the room who has done partitioning, when did you actually partition your table? Yeah, yeah, landing ahead. When it's like building the enclosures, there's no big built-ins and it's always based on some kind of engine and efficiency index usage for the four-election of our practice data series. Although in place, the entire type of work blocks will move right to queries against the total partitions of the parent so that the data can not be filled with all those blocks. Yeah. But it really is for large tables. Large is relative, but I would definitely keep it in one table until you really need it unless if you're building the second version of a system or if you know for sure that you're collecting sensor data or something and you know you're gonna have 10,000 sensors and you know you're gonna get one record per minute or something and you can do the math and you can say it's gonna be a billion rows pretty fast then go ahead and partition ahead of time but otherwise wait until you need it. So the optimizer is not great at doing sub-partition joins completely. You can join the two parent tables and it will execute the plan. You'll get the correct results. It's not perfect and it's not gonna be fully optimized and that's especially true in the 10-0 native partitioning that that sort of partition level joining is not implemented in the first version. Partman doesn't do anything special for you so it's the similar question of if you know how the data is partitioned and you know how the query needs to work you can write the join against the two partitions. You can absolutely just join the two parent tables together but you won't necessarily get the same performance benefit you would expect. Now if both tables are partitioned on the same key so that and you're joining on that key then the query optimizer can prune partitions and it will individually prune all the partitions but if you're trying to join on a key that they're not partitioned on there's not the magic there. Any other questions? You talked a bit about some of the numbers for the performance aspects using the column store. Do you have any sense for the performance penalty on the transactional website? On column store specifically? I don't have some numbers but it's pretty bad if you tried to do individual inserts. If that was your use case to do individual inserts don't use C store. Okay, if you can still hear any other questions you're talking about in the Postgres FDW inserting to a remote table as well. So that is a great question. I have not looked at that code section to know why. Obviously it's actually executing queries remotely. It's not doing anything Postgres is treating that remote server just as any old remote server. So there's no deep optimization about how to move the data over and it's not doing anything intelligent about saying I know that's a Postgres server therefore I can push the data in a special way. Obviously it's going over network connection those types of things but it's a good question in terms of is there anything that could be done to improve those inserts? It's supported syntactically but it's gonna process the file locally and then push the data through the FDW. Still can do so. Try your use case. If you actually try to export in the case where you have a local partition that you wanna move to a remote server if you try to export the data and then copy the data over and then load it you're still gonna have multiple steps there and so try it one time and sort of compare the two and see which one works for your use case. If you're doing an export and moving over. Yeah, so I mean you would actually export the table locally copy the file over and then execute a command on the remote server to import it. This is in the case where you're trying to move to a remote Postgres server not going through the column store. Any other questions here at the end? Thank you.