 Welcome back. We took the month of August off and we are officially back with our first Postgres Conference webinar with PostgresQL Parallelism, Do's and Don'ts. I'm joined by Sebastian Dressler, Solutions Architect at Swarm64, and Andy Ellicott, who's the head of product and marketing at Swarm64. Today they're gonna give a brief history and overview of parallelism in PostgresQL. Talk about how to increase parallelism to speed up queries, outline common mistakes that block parallelism in PostgresQL, and also talk about parallelism best practices. My name is Lindsay Hooper, and I'm one of the Postgres Conference organizers, and I'll be your moderator for this webinar. Before we get started, I'd like to give you a little background about your speaker, Sebastian, who has a wealth of experience in connecting hardware with software, building creative solutions. He's a Solutions Architect at Swarm64, helping customers fully understand how to fit Swarm64 technology into their environments. And Sebastian holds an MSC from TU Berlin in computer science, and NeoVim is his favorite editor. So with that, I'm gonna hand it off to Andy, and you can take it away. Enjoy. Thank you, Lindsay. And as Lindsay mentioned, our agenda today is Sebastian will go through parallelism in Postgres, why that's important to query performance, cover a quick history of parallelism in Postgres, go through how to configure Postgres for parallelism, and then really go into a deeper detail on where to expect parallelism to make a big difference and what features in Postgres might actually block parallelism. And we'll end with Q and A. If you have questions as we go along, just type them into the chat. For those of you who are not familiar with Postgres, we are mainly based in Berlin and Boston, and we are in the business of inventing new ways for Postgres to go faster and speeding up query performance through the Swarm64 DA extensions to Postgres. And those extensions include the following capabilities, among other things, improvements to the query planning and execution in Postgres, column store indexing, which is column or storage, which reduces IO, greater parallelism, we extend the amount of parallel processing that Postgres applies during query execution and different algorithms to speed up joint performance. And all of this is just an extension to free open source Postgres. Don't have to change your code or your data, migrate anything, just speeds it up and works with both community open source version of Postgres as well as EDB. So speeding up parallelism has given us a lot of insight into parallel processing within Postgres and that's the basis for Sebastian's talk today and knowing where parallelism can help and where it might be hindered. So without further ado, let me turn it over to Sebastian for the details. Sebastian. Thanks, Andy. Welcome everybody, thanks that you're here. Yeah, as Andy said, we will be talking about parallelism mostly today and you probably start with pre-overview first and then we will dive in deep immediately. And these are basically the topics we will be covering today, mostly do's and don'ts. Actually, I realized when compiling the slides we focus more on the don'ts or like on the blockers because interestingly Postgres does this very well already the parallelism sometimes can be really difficult to not get a parallel indeed. I had this for preparations once where I was wondering this should be not parallel but it was parallel. So there we go. Just short introduction, you'll be seeing the like the split screen view here. We have the left side where the slides will be or the notes will be and on the right hand side there's a split in top and bottom and on the top you see Postgres output. So whenever I run a query, you will see the output on the top and the bottom part we were just using as some sort of scratch space to keep some results we'd be getting throughout. Okay, I stopped the talking of the introduction and be dive right in. So basically first part, obviously the common question, why do you want per query execution? And the probably obvious direct and short answer is it makes your query score faster and well, this is probably self-evident. Throughout the talk we were seeing one or two examples where this does not happen actually. So there are also times where you don't want to have this. And the other answer is obviously it improves your machine utilization and what we've been seeing at Swarm whenever we touched analytical queries with nowadays Postgres is that you typically don't use the whole capabilities of your resources fully. The exception being maybe the RAM because as you know, Postgres can cache a lot and make a really good use of RAM. But when it comes to utilizing CPU resources, this can be somewhat improved, let's put it like that. And that's at any set also something we provide you as a product in the end. So this is basically the goal here. Now, which queries benefit from perils in a nutshell like keeping this really broad whenever you have large amounts of data to scan through or you're having involved lengthy queries with joint statements you want to push down all these kinds of source. Now on the opposite, when does it not benefit if you're having more OTP like queries? So you have really low latency queries where it's important that you go sub-second and you just designed your queries to exactly fulfill that. Maybe to update a single row to do a point look up here and there or you're just having not enough data to crunch through and we're basically talking about a couple of hundred or even a hundred thousand rows where you can scale quite well without any additional parallelism. So this is like the downside. However, when you're using features we for instance provide or also other vendors provide then you can easily add another layer on your Postgres and for instance with the columnar index you can also put your analytical queries on top of your OTP data basically without hindering it much in its performance. So there are some benefits and I personally think that the trend basically is going clearly in the direction of hybrid workloads mixed workloads and nowadays we can run them all in Postgres. Short history of parallelism in Postgres first time this popped up and you can actually follow this quite well in the documentation which is excellent on this point because you just click on the parallelism chapter and you immediately see that it has nothing before 9.6 so this is where it started and 9.6 basically introduced some forms of early parallelism in terms of you could sequentially scan your data in parallel you could have some hash joints and there's loop joints in parallel and aggregations could be done in like two stages, a partial aggregation which runs parallel basically and a final aggregation which then combines the result within a single worker and then that single worker or that result could even be put back to you as a result for the next part of the query basically. Now then came PG10 and PG10 and I remember this gave like a huge enormous boost to your performance suddenly and in terms of parallelism the added features here are bitmap heap scans and also index only scans which could then be run in parallel eventually and also merge joints and the next big leap indeed was Postgres 11 even more parallelism, parallel append now suddenly and from a user perspective this felt much more mature than compared to 10 and 9.6 obviously and now we are at Postgres 12 and this is just even more improving and as I said earlier it's sometimes it's really hard to like get it wrong to not run a parallel query. So the next slide brings us directly now into the topic. Let's look into the configuration. How do I make my system ready for running parallel queries? And the first question probably is what kind of machine do I need? Do I need an especially beefy machine or not? But this is probably more down to the taste. So what we typically recommend is that you go with let's say 12 CPU cores. So basically if you enable hyperthreading you would go with six CPU cores with six CPU cores plus six additional threads. So this is something you can start with and then RAM depends of course on the workload but typically something between lines of 128 gigabytes is good. And this is quite important here. You want to have a good IO or you want to have some sort of cache or you want to add even more RAM especially if you're going to make use of a lot of page caching which Postgres does do for you then you can't get enough RAM essentially or you have such a strong IO subsystem then you can that you can query directly from it essentially and this is quite easy these days because we have all these nice NVMe drives which deliver a couple of gigabytes per second and this is just extremely helpful plus if you're talking about more index scans then you can obviously put your indices into a separate table space which is on a very fast disk like an SSD keep your data on spinning desks or even network attached storage and then you can go immediately quite fast. Now in terms of configuring Postgres itself there are some semantics we have to clarify first and these are these three together max worker processes, max per workers and workers per gather and those three they work in conjunction and as we will see in the next examples basically they take from each other meaning if you configure wrong the amount you give to one or the other and you don't put attention on the relations they are having then this can go easily wrong. Now, if you're talking about worker processes this is how many processes Postgres is allowed to spawn at all. The worst case that can happen is you set this to zero which effectively means Postgres doesn't run at all or you set it to one then yeah I haven't actually never tried this probably it also doesn't run at all so you very likely need a minimum amount here and the next parameter is the max per workers and these are taking from the processes basically so you want to set them potentially equal or rather less because at the end of the day you need to have some worker processes left over to do your maintenance tasks like auto vacuuming or your wall shipment whatever you have running in the background that hooks into Postgres and the third one is the max per workers per gather and this is actually iPhone at least the most difficult one because of this notion per gather. Now, if you ever analyze the Postgres query plan you will note that there is like a note at the very end of the plan collecting all the data together and then giving you back the result and this is basically a gather node and you can have multiple of these in your plans actually meaning these parameter controls how many workers one of the nodes can spawn below it so let's say you have a join the join ends up in a gather node then this parameter controls how many per workers this particular join can have now if the relations are wrong again then this can easily backfire straight forward example if we have a machine with 24 virtual course then we could for instance set the max worker processes to 32 this is obviously over utilizing the machine at that point because you only have 24 virtual course but this is fine because we assume that every process basically is within certain limits plus you're having an operating system which is very likely able to do time slicing on your processes and therefore they can share these resources. The max per workers we could very easily set to 24 meaning I can run 24 per workers in total and the max per workers per gather we are setting to 12 here. Now from a very straightforward perspective what that brings you is you can run 12 queries two queries at the same time and each of the queries can use 12 workers. And now if this assumes obviously that each query only has a single gather node meaning it could happen that you run one query with two gather nodes and those two gather nodes can take up 12 each and then worst case they don't leave anything as over for the other query that might want to run and that for that query it means in the end it would run sequential or it doesn't run at all because you're out of resources. So that's basically the cost-grain semantics here and what I found handy as a rule of thumb is you always want to set one larger than the other and ideally you select for instance max per workers much larger than your max per works per gather which would bring us to something like this for our 24 week machine. So we're having a huge amount of worker processes here which would be 140 from those 140, 120 can be parallel workers actually and from those 120 we can give each gather nodes 12 and that basically enables us in the end to run 10 queries utilizing 12 workers each at a time or again with this example of the gather nodes slightly skewed basically like if every query uses two gather nodes then you can run five queries with 12 workers each at the same time. Now this basically avoids you this kind of pitfall that you in the middle of your query run out of these workers. This happens quite often actually and then like you think about your query and you think okay the query is quite parallel actually but why doesn't it run parallel until the end? Yeah, because I'm out of my gather nodes workers here and that's simply why. So let's increase that resource. However, you also want to watch out for any contention or overutilization of your system. So be a bit mindful of these settings. Also noteworthy Postgres itself limits the maximum parallelism you can apply. So these maximum parallel workers per gather set to 24 here they are likely not going to be used if you're using native Postgres because it checks how big is my table I want to scan and based on the size of the table it gives you a certain amount of workers and this the logic here is a logarithmic one meaning the bigger your table the more workers you get but there's also an upper limit. That upper limit typically is somewhere 12 or 14 something like that really depends on the sizings. However, if you like use the swarm extension in that case you can set this specific parameter here and then kind of overwrite this limit this can be very handy if you're having system there for instance you only have a single user and that single user is running one query after another and wants to use the whole machine basically. Okay, other settings also not negligible as is the cost model. So typically this controls obviously your planner and your plan whether it's parallel or not. Now the two main screws here are parallel table cost and the setup cost. And when we configure a system we typically turn them somewhat down. So the default for the table cost for instance is one but we turn it down to 0.1 and that just basically encourages the planner to more often use parallel plans instead of sequential ones. There's a caveat here and that is you need to do this in relation with the other costs as well. So the worst thing you can do basically is that your CPU operator cost accidentally becomes larger than the parallel table cost. And what this would lead to is in the worst case you're only getting parallel plans. Whereas for instance a sequential plan would be much better in that case. And this is a very typical OTP case again for instance like if you run TPCC and you have this configured wrong then your TPCC queries are going to use parallel plans just lag ones at least. And they are going to run horribly slow and if you turn this around and configure it correct then in the end your performance is going to be much more improved. Also there's the table scan size and this is a very typical parameter people tend to forget. So if you're trying to get per queries run and it's like you set the costing right you set the workers correct everything is correct so far but they don't run parallel then go check this parameter. This parameter is typically set to eight. This is in terms of page sizes. So if every page page is eight kilobytes in size then take this times eight and this is your minimum table scan size and if some of the estimates are off or let's say your statistics are off then this is not going to run parallel by any means. So the trick here is to simply start with zero try it, see whether you get a parallel plan and then slowly increase it and even slight changes like increasing it from let's say one to two or so can disable your parallelism again so you want to keep the setting rather low and the last one here this is a bit debatable it highly depends on your workload in the end but the question is do you want to have nested loops on or off? In our TPCH benchmark we typically turn this off because it helps the benchmark it just helps the planner to avoid certain situations where it would choose a nested loop maybe it doesn't make it parallel but if it's not having 100% of the information at hand then this might be absolutely the wrong choice and your query runs much slower than it would be without but real world scenarios are different and therefore the recommendation likely is keep it on or know your workload very well. That was number one and now we are diving right in and we will see a bit what happens when we put certain situations into a Perl context or when we are trying to do so. Let's start with my favorite example actually Bitmer Heapscans because I once had a webinar or training session and we tried out several plans and then we cranked our parallelism like crazy and it simply did not run faster and I was like, hmm, sorry guys I have no idea what's happening here. So that was it Bitmer Heapscans. You prepared a small data set on this machine where we're running on this is a TPCH data set with 100 gigabytes of raw data so it's not that much but it's also not too small just right for the example. And as you see, I need to trick it a bit here to use a Bitmer Heapscan that's why this index only scan is off and we're starting with like four Perl workers and I'm just going to run this, this won't take so long. You see now the result on the right side and as you see from the plan the first thing is good we have our four workers here so we asked it to do four workers, it gave us four workers. Now the result, the execution time here is roughly 600 milliseconds. I'm just going to paste this now into this scratch buffer here so I'm going to say four workers and now let's see what's going to happen if you crank this up to eight. I'm just running this again. So we checked the plan again, we have eight workers that's good, just what we expected. Yeah, time is okay, not that much more actually but a bit, so it's a bit borderline not pretty sure whether this is, oh sorry, I took the wrong one, not really sure whether this is an effect we are seeing here or whether this is something else could be also measurement noise. By the way, I scanned this dataset quite a lot through already so this should be all on cache, there's no disk access. Now we are trying with 16 and the first surprise we're seeing is 12 workers only must be that logarithmic thing I talked about three moments ago and that's exactly it and execution time has gone up. So we are at 980 milliseconds almost a second now this is not really doubled but it's quite a lot. So long story short, if you're having bitmip index scans just don't try to crank up the person this is going to backfire. One solution is disable the bitmip scan to be on the safe side or again basically know your queries know your workload very well and then you can just disable it per query instead of disabling it globally which is typically not such a good idea. But yeah, this is basically the first negative example when perilsn can backfire for you. Let's look at the next one. Also another great topic actually postgres functions. We have what we thought. I've written a small function here as said this is a tpch dataset this function does nothing else then it takes a name from a country and it returns the ID of that country. That can come in handy if you want to do certain queries and I'm just going to create this function here. So you see we dropped it first, we recreated it and I'm going to just run explain this time. Now, what we would expect from this plan is that it runs completely parallel. Let's see what happens. And we see it doesn't. So if you remember beforehand in the plans it said something about worker's plan, worker's launched, nothing in here, zero. So what does it mean? It's going to run sequential. Now, why is this the case? Because Postgres likes to play things safe and this is a very good thing, obviously, but it can backfire on you. And in this case, Postgres just assumes I don't know anything about this function. So I'm going to be on the safe side. I'm going to run it sequential if it's going to be used. And because if you're having one sequential element in your queries or in your entire plan, then the plan will go sequential as well. There are a few exceptions and those exceptions are, for instance, if you can place such a gather node in between. But this is not the case here. So we are ending up with a sequential plan. Now, of course Postgres developers thought about this and they just added hints for the function. And we can mark this function as parallel safe, which we just do here. So I'm just going to recreate this function, but this time it's parallel safe. And now we can just run this again and you will see that now it's using workers. So the function basically returns my nation ID here and then feeds this into the sequential scan and that then can be worked parallel with. So this is something you need to keep in mind. If you're having functions can also be just plain SQL functions or anything else that is like create function in the end. Consider making it parallel safe if you need to use it. There are certain, there's certain occurrences when you don't want to do this. For instance, when the function is writing to something then stay away from parallel safety. But this is very well explained in the documentation actually as well. However, this setting is not related to whether the content of the function can be executed parallel or not. And that's something I stumbled over. I had actually no idea about this until I tried it properly out. So we have another function here and this function just does some query from where we expect the query to run in parallel in the end. So this is like this select on star. We have a join here. So we're just joining two tables together fitting the name. And we again pass in the name of the country in this case and to make join happen this takes the name of the country gives us the ID back and then just runs that query. And naively probably one could assume this doesn't run parallel at all because like the inner function as you see this parallel safe marker here is not there. So if I would want to make it parallel safe I would have to write this, but I'm not. So we just keep it like this and the first created function obviously and then we're going to run it. By the way, as you see there is explained allies here and this function is not going to return me the amount of customers in a nation. It returns zero, but it will show us the plan it used actually. So we will be getting on the output side the plan to verify that this run parallel. But before we do so, we force the parallelism to be off by just setting this max per workers per gather to zero. So in this case the plan would run sequential that's going to run this doesn't take so long. And again, we're searching here for workers launched workers planned nothing in there must be a sequential plan obviously. And now we're going to use eight workers in parallel just run the same thing again. And as you see this time it worked it again decided to not use all the eight but only six that's fine. But the point is that this content of the function executes in parallel. Now, sometimes you have functions with for instance loops and those are not going to execute parallel for instance. So there are very common PLPG SQL constructs which are simply not going to execute a parallel. And in this case you just tear your function apart and whatever goes in parallel basically put it into a secondary function called it function and this will then execute in parallel basically. So you do a little bit of divide and conquer here this works very well. Okay, that's it for the functions. Let's look into the next example cursors. I'm not going to say this is my favorite topic. This is my least favorite topic actually actually cursors very, very common still especially if you're connecting JDBC then a lot of older programs do have like they open a cursor, they fire the query and then they fetch a certain amount of rows and crunch on it maybe and then they fetch the next certain amount of rows and so on and so forth. So this can be quite useful but yeah, we will see. So in this example, we are just going to select a single row from the customer table here and to make it a bit more interesting, we order this and as you see, I put here enable index off sorry, enable index scan to off because normally it would just go to the index picket but we want to simulate what happens if let's say your cost for an index lookup of this type is too much basically, so it would do a parallel scan. So let's just execute this again. It doesn't take too long. You see you have 12 workers launched here and you see a really nice sort. So and that makes it actually fast. So this sort normally would go rather slow probably if it was sequential and because it is not can sort in parallel much faster. Now what happens if I use a cursor for this and for this purpose, just to be on the safe side the disable the index scan again, we wrap it into a transaction and the first thing this does is basically declaring the cursor and we just run and explain analyze over it which should give us the plan basically back and then normally what you would do where you would do a fetch for getting that first row. So we ordered the data set and then the cursor basically gives us back the first row. So I'm just going to run this again if you're waiting a bit now because this cursor is going to make the sort sequential and therefore it's going to run slower than before. So you see this on the execution time already this took us roughly 10 seconds. So if I want to fetch my first row with this I have to wait 10 seconds whereas if I can throw 12 workers at it I'm having to wait a second or even less depending on the size of the table obviously but this is exactly what it does. It needs to do a sequential plan and this is a big drawback if you're opening cursors also by the way if you're having BI tools and they open a cursor because they need to get the data back in pieces then this is not going to work in parallel and this is going to make you curious. So now suggestions for this one is obviously don't use cursors better fetch all rows at once or if you need to divide and conquer somehow then think about offset and limit even if this is like more work on your end but you don't want to limit the parallelism in this case. Now talking about cursors by the way and talking about BI tools there's a pro tip so to speak and this is like if you're using one of the BI tools that are freely available there for instance if you use Metabase then Metabase is going to connect via JDBC to your Postgres. Now this is not going to run parallel and then you go and dig into the code of Metabase to just find out Metabase is not using any cursors but what Metabase does is via JDBC it connects in the so-called extended query mode and that just means no parallelism in that mode. So if you're using JDBC consider adding this prefer query mode simple to your connection string to open it in the simple query mode and then you get parallelism for your JDBC BI tools. Next one is window functions. This is also an interesting one. I'm not going to run the full here we just look into the explain this time. So this window function does nothing else than taking the part table partitioning it by the brand and then assigning your own numbers maybe a bit of dumb query but it's here for a purpose to show some effect and what we're seeing here is what this window function does in the end and this is purely logical to partition it by the brand it has to sort it and this basically this is your part which is going to run slow. If you have much more data in the system this will run even slower because you can't parallelize it and you will have to wait forever for this query to return. Now what you can do is actually you just prepare yourself a table which is sorted and interestingly the sort in Postgres 11 it will be done again but the sort is much faster because the data is already sorted and I don't know the internals here but there seems to be a mechanism to that Postgres knows that data is already sorted. So this is a Postgres 12 and if you run this here then what we see in the plan actually is we have a parallel sequential scan this is sorted with 12 workers and then there comes this gather merge and then comes the window aggregation function which is like the partition by and this is by the way also a nice demonstration of the Postgres 12 CTE inlining feature so this with statement here it's very likely inlined and to me it seems there is some yeah quite some involved logic in there in Postgres to let it know internally hey this is already sorted you can just do your partitioning. So in other words if you're having window functions be mindful of what you're doing with them because they tend to make plans go sequential and if it's like this partition by or if you're having an order by in your window function then consider taking this out and just like give your data in pre-order this will help you a lot. Next one is the trinity of insert, update and delete. Now if you ever read the documentation about parallel queries in Postgres you will note that whenever you're writing to the database this is not going to be parallel and obviously insert, update and delete are writing to the database so therefore they are not going to be parallel. Now the problem is what happens if your update statement is so involved that you have quite some business logic to create what you want to update. So in this case I've just taken a query from TPCH which is calculating customers with kind of freight issues. So they had certain problems with getting their delivery within an interval of three months and this is what this query down here calculates and as you see this is quite an involved query. It joins over four tables. We have a bit of a wear clause here. We have a group by and so on and so forth but what I'm interested in is customer key here to update my customer and like just give a comment and say hey this customer is likely having issues. So if we just run the explain again here on this one and what you see is the figure here what you see is this is no surprise because as I said update right to the database therefore can't be parallel, easy as pie. Now no pair of workers in here and what you can do in this case is you simply drag out your query which is very involved and you just wrap it into a table. Now there's one point where customers can write to the database in parallel or at least the query is parallel. The writing is probably not and that is great table S. So if you directly create your table from within a query then this is going to be executed in parallel and this is the simple straightforward trick here. I'm just going to run this again and we suddenly see hey the plan is parallel. It's parallel until the end basically almost because there's this group by but basically this is exactly what I wanted made my query faster and because I have a temporary table now my update statement just can take from this table and then basically do the update in a much faster manner assuming that this table is not the biggest one basically but still it would be slower if you would need to first calculate what you need to update in a sequential manner. Next one, recreating tables with insert. This is a pattern we see actually. If you're having ETL, ET drops then what's likely happening is you do a truncate on your existing aggregated data table and then you just call insert into that table and you're just going to write your select statement from your raw data for instance and this can be as complex as you want. This is going to result in the very same thing again here because insert into is not going to be parallel and therefore your query is going to be sequentially executed and if it's a very involved query it's going to be slow. I'm just repeating myself here. Now I just showed you the trick of create table S and this is like, this is a very nice thing to do. You just drop your aggregated data table and then you recreate it with create table S because this is going to be parallel. However, please be mindful there are certain things to be aware of. Now the types auto-deduced, which is a good thing because you don't have to both or with the schema of the table you're going to create but those types can be different than you expect them. So classical example is you get a numeric back where you expected double precision or vice versa and then for instance one of your APIs or your BI tool is going crazy because it expected a different type and to circumvent is just an explicit type cost. So just be mindful of this. It's going to create something that you might not expect and the same by the way goes for column names. So if you have queries where you just do select column one S A and select column B S B then this is going to be your column name in the end. So you want to pick proper column names as well. Okay, and another pro tip here. If you're having indexes on your tables then ensure that you create them after you created the table because this can make use of parallel maintenance workers. If you set them correctly, this is by the way yet another setting you can do. I think that I believe the default is two or three but you maybe want to set this higher and then your index creation can typically make use of this at least the B tree index creation does that and your index will be created so much faster and this also is going to improve insert speeds by the way, so if you have to use insert then at least think about maybe dropping the index if you're going to truncate an insert because this index it has to write in between with a single worker. This is going to slow you down again but you can just shift it to the very end. Okay, other constructs to watch out for. We already had the CTEs or the with statement. This is typically not no problem because Postgres 12 introduces this nice inlining. Postgres 11 does this sometimes as not the inlining but it still runs the with statements in parallel. Postgres 10 does it not at all. The moment you have a with statement in Postgres 10 and want to run something per no way to do it. The trick here is you can use an unlock table even if like Postgres 12 uses your CTE multiple times consider using an unlock table. Now, the next one is little joins. They are cool but they are not going to be parallel because the little join uses a nested loop. So think about how you're using little joins and if you avoid them, if you can, or you try to prefer your data in the sense of that your little join has only the data it needs and this should be as much reduced as possible basically. And the last one is temporary tables because you can create them with this Create Table S statement and then just have the creation parallel but the moment you're going to use them, they are not going to be parallel anymore because these reside only in a local process memory. And that's why by the way on the hint with the withs I said clearly you should use an unlock table because if it's not unlocked, it's not going to be parallel. Okay, last one and also a very important part is you should try keeping your queries parallel. Another example from TPCH, this is query number 17. This is actually a query collated subquery here and we're just going to look into the explain plan again not going to run it and analyze you see the runtime below this is 150 seconds here on this data set. And what makes this query kind of really evil is this line and line 17 here where you basically force it to look up into the other table and we just look into the plan, what's going to happen here and you see there's some parallelism here quite deep inside the plan but we're having a sequential scan on line item and line item in this case is the biggest table you can get. And because we are also doing a join you're going to scan this thing over and over again. And you have to scan it completely. There is no index because you don't know what you're going to get and this is just really bad. And because this is a correlated subquery you can decorrelated and the decorrelated version looks something like this. As you see, this is quite expressive actually because I'm repeating a couple of statements here. So for instance, we are filtering twice for the brand we are filtering twice for the container but essentially this part here is your decorrelation. So you drag it out from the where statement you drag it into the from you say this is another table I need from you please execute like this. And if we look into this plan then what we're going to see is looks more parallel. So if you remember we have here this is not too deep 15 workers and then we can do actually a parallel scan on line item which is going to be much faster than the sequential scan. And on the second part there is another gather node here and this is by the way one of these examples with multiple gather nodes. So here's we are going to use 30 workers in total. And if I'm having only 20 available then one of these will get a penalty basically and will execute with less workers. And this is going to be much better and I measured this beforehand this is 19 seconds of front time results are the same but as you see query is much more involved. Now last part of the presentation coming back to swarm what we do basically is we are going to give you that for free. Basically you get the decorrelation you get faster access to your queries and some other interesting features like what we call a shuffle node where your plan is going to stay parallel. And I just call this whole bunch of sets here where I just enable all the features because like I disabled them before and to not get any unwanted side effects let me comment on the analyze. And let's look at this plan. And as you can see the effect here is we reduce the whole plan basically to a single gather which makes use of 52 workers which we're having on the machine at the moment. And this is a really straightforward lean plan. And in the end this is not executing as good as the hand optimized version but I measured it 24 seconds and I think this is still quite good. Okay that's it we are all full. Thank you Sebastian we got a little bit of the late starts we have some time left for questions. There was some, there are a few that have been asked already I'll start with those but if anybody else has questions you can type them into the chat window there. The first one Sebastian was on parallel safe functions. Can you maybe remind people what the syntax was to make a markup function parallel safe and maybe also explain the difference between a safe and an unsafe function like what kind of functions are unsafe. Right, sure. So the key here is this little statement at the very end it doesn't matter much where you place it but it needs to be there. Please consult a documentation for the full information on this one but basically you would want to add parallel safe onto your function definition and this tells Postgres that when it is using this function it is okay to use this function in parallel because it is for instance not writing to the database but is for instance just calling a select and giving you back a result and maybe just for the sake of the example again just run this again because it was anyways only explained. So this is basically your plan without anything meaning I don't see parallel workers launched here and therefore this is going to run sequential but if I use this parallel safe variant then what we see here is it says workers plant 12 and if I would have run this as a explain analyze it would say workers plant 12, workers launch 12 for instance meaning this is exactly the point that you get a parallel query and this little difference parallel versus not parallel is done with parallel safe in the end but only if you're using this query somewhere else so if your query does something on its own and just gives you back a result and the result is not used anymore anywhere then the query is likely going to execute a parallel. So be mindful if you're using functions this is for instance important for post-JS because as far as I know most of the post-JS functions are not parallel safe mark. So if you use them in conjunction with your queries you're very likely going to get sequential plans. Thank you. I think just one more question here if I understood the question correctly do unlocked tables support parallel queries? Can you query an unlocked table in parallel? Yes, they do because they're written to disk. The contrast here would be the temp tables which are only alive in the current connection or within a function and they only reside in a single worker basically and so they are so-called process local and therefore you can't query from them in parallel but you can create them in parallel. Thanks. And another late breaking question can materialize views, mat views, leverage parallel execution? Yes. So materialize views can be create in parallel that works very well. Okay, great. Okay, I think we'll wrap up if anybody else has other questions feel free to just contact us through swarm64.com I hope everybody enjoyed the webinar today and we will see you again at I think our next Postgres Conf webinar is September 29th on column store indexing using column store indexes in Postgres. Lindsay? Thank you, Sebastian. Thank you, Andy. That was fantastic. Thank you to all of you who spent a little bit of your day with us and I hope to see you on next week's webinars. Have a wonderful afternoon.