 Hello everybody and thank you for joining us today for the virtual Vertica VDC 2020. Today's breakout session is entitled Optimizing Query Performance and Resource Cool Tuning. I'm Jeff Healy, I lead Vertica Marketing. I'll be your host for this breakout session. Joining me today are Rakesh Bancoula and Abhi Thakur, Vertica Product Technology Engineers and key members of a Vertica customer success team. But before we begin I encourage you to submit questions or comments during the virtual session. You don't have to debate. Just type your question or comment in the question box below the slides and click submit. There will be a Q&A session at the end of the presentation. We'll answer as many questions we're able to during that time. Any questions we don't address we'll do our best to answer them offline. Alternatively, visit VerticaForms.Form.Vertica.com to post your questions there after the session. Our engineering team is planning to join the forms to keep the conversation going. Also a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of your slides. And yes, this virtual session is being recorded and will be available to view on-demand this week. I'll send you a notification as soon as it's ready. Now let's get started. Over to you, Rakesh. Thank you, Jeff. Hello, everyone. My name is Rakesh Bancoula. Along with me, we have Abhi and Abhi Bancoula. We both are going to cover the present session on optimizing query performance and resource tool tuning. In this session, we are going to discuss query optimization, how to review the query plans, and how to get the best query plans with proper projection design. Then discuss on resource allocations and how to find resource contentions. Abhi will continue the discussion on important use cases. In general, to successfully complete any activity or any project, the main things it requires are the plan. Plan for that activity on what to do first, what to do next, what all things you can do in parallel. The next thing you need, the best people to work on that project as per the plan. So, first thing is the plan and next is the people or resources. If you overload the same set of people or resources by involving them in multiple projects or activities or if any person or resource is sick in a given project is going to impact on the overall completion of that project. The same analogy we can apply to query performance too. For a query to perform well, it needs two main things. One is the best query plan and other is the best resources to execute the plan. Of course, in some cases resource contentions, whether it can be from system side or within the database may slow down the query even when we have best query plan and best resource allocations. We are going to discuss each of these three items little more in depth. Let us start with query plans. User submits the query to database and Vertica Optimizer generates the query plan. In generating query plans, Optimizer uses the statistics information available on the tables. So, statistics plays very important role in generating good query plans. As a best practice, always maintain up-to-date statistics. If you want to see how query plan looks like, add explain keyword in front of your query and run that query. It displays the query plan on the screen. Other option is dc explain plans table saves all the explain plans of the queries run on the database. So, once you have query plan, what to check in it to make sure plan is good. The first thing, I would look for no statistics or predicate out of range. If you see any of these, means table involved in the query have no up-to-date statistics. It is now the time to update the statistics. Next thing to look in the explain plans are broadcast, res segments around the join operator, global res segments around the group by operators. These indicate during the runtime of the query, data flow between the nodes over the network and will slow down the query execution. As far as possible, prevent such operations. How to prevent these? We will discuss in the projection design topic. Regarding the join order, check on inner side and outer side which tables are used, how many rows each side processing. Inner side picking a table, having smaller number of rows is good in case of hash join. As hash join built in memory, smaller the number of rows, faster it is to build the hash table and also helps in consuming less memory. Then check if the plan is picking query specific projection or default projections. If optimizer ignoring any query specific projections but picking the default super projection will show you how to use query specific hints to force the plan to pick query specific projections which helps in improving the performance. Here is one example query plan of a query trying to find number of products sold from a store in a given state. This query is having joints between store table, product table and group by operation to find the count. So first look for no statistics, particularly around storage access part. This plan is not reporting any no statistics. This means statistics are up to date and plan is good so far. Then check what projections are used. This is also around the storage access part. For join order check, we have hash join in part ID 4 having its inner in part ID 6 processing 60,000 rows and outer is in part ID 7 processing 30 million rows. Inner side processing last record is good. This helps in building hash table quicker by using less memory. Check if any broadcast resegment joins in part ID 4 and also part ID 3. Both are having inner broadcast. Inner side having 60,000 records are broadcasted to all nodes in the cluster. This could impact the query performance negatively. These are some of the main things which we normally check in the explain plans. Till now, we have seen that how to get good query plans. To get good query plans, we need to maintain up to date statistics and also discussed how to review query plans. Projection design is the next important thing in getting good query plan particularly in preventing broadcast resegment. Broadcast resegment happens during join operation when the existing segmentation class of the projections involved in the join not matching with the join columns in the query. These operations causes data flow over the network and negatively impacts the query performance particularly when it transfers millions or billions of rows. These operations also causes query to acquire more memory particularly in network send and receive operations. One can avoid these broadcast resegments with proper projection segmentation. Say join involved between two fact tables T1, T2 on column I. Then segment the projections on these T1, T2 tables on column I. This is also called identically segmenting projections. In other case say join involved between a fact table and a dimension table. Then replicate or create an unsegmented projection on dimension table will help avoiding broadcast resegments during join operation. During group by operation, global resegment groups causes data flow over the network. This can also slow down the query performance. To avoid these global resegment groups, create segmentation class of the projection to match with the group by columns in the query. In previous slides, we have seen the importance of projection segmentation class in preventing the broadcast resegments during the join operation. The order by class of projection design plays important role in picking the join method. We have two important join methods, merge join and hash join. Merge join is faster and consumes less memory than hash join. Ferry plan uses merge join when both projections involved in the join operation are segmented and ordered on the join keys. In all other cases, hash join method will be used. In case of group by operation 2, we have two methods, group by pipeline and group by hash. Group by pipeline is faster and consumes less memory compared to group by hash. The requirement for group by pipeline is projection must be segmented and ordered by on grouping columns. In all other cases, group by hash method will be used. Up to now, we have seen importance of stats and projection design in getting good query plans. As statistics are based on estimates over sample of data, it is possible in a very rare cases, default query plan may not be as good as you expected, even after maintaining up-to-date stats and good projection design. To work around this, Vertica providing you some query hints to force optimizer to generate even better query plans. Here are some example join hints which helps in picking join method and how to distribute the data, that is broadcast or resegment on inner or outer side and also which group by method to pick. The table level hints helps to force pick query specific projection or skipping any particular projection in a given query. These all hints are available in Vertica documentation too. Here are few general hints useful in controlling how to load data with class materialization etc. We are going to discuss some examples on how to use these query hints. Here is an example on how to force query plan to pick as join. The hint used here is J type which takes arguments H for hash join and M for merge join. How to place this hint just after the join keyword in the query as shown in the example here. Another important join hint is JFMT, join format type HID. This hint is useful in case when join columns are of large vercas. By default, Vertica allocates memory based on column data type definition, not by looking at the actual data length in those columns. Say for example, join column defined as vercas 1000 5000 or more but actual length of the data in this column is say less than 50 characters. Vertica is going to use more memory to process such columns in join and also slow down the join processing. JFMT hint is useful in this particular case. JFMT parameter V uses the actual length of the join column as shown in the example using JFMT of V hint helps in reducing the memory requirement for this query and executes faster too. This hint helps in how to force inner or outer side of the join operator to be distributed using broadcast or re-segment. This hint takes two parameters. First is the outer side and second is the inner side. As shown in the example, Distrib A comma R after join keyword in the query helps to force, re-segment the inner side of the join outer side leaving it to optimizer to choose the distribution method. Group by hint helps in forcing query plan to pick group by hash or group by pipeline. As shown in the example, GB type of hash used just after group by class in the query helps to force this query to pick group by hash type. Till now we discussed the first part of query performance which is query plans. Now we are moving on to discuss next part of query performance which is resource allocations. Resource manager allocates resources to queries based on the settings on resource pools. The main resources which resource pools control are memory, CPU, query concurrency. The important resource pool parameters which we have to tune according to the workload are memory size, plan concurrency, max concurrency and execution parallelism. Query budget plays an important role in query performance. Based on the query budget query planner allocates worker threads to process the query request. If budget is very low query gets less number of threads and if that query requires to process huge data then query takes longer time to execute because of less threads or less parallelism. In other case if the budget is very high and query executed on the pool is a simple one which results in waste of resources that is query which acquires the resources holds it till it completes the execution and that resource is not available to other queries. Every resource pool has its own query budget. This query budget is calculated based on the memory size and plan concurrency settings on that pool. Resource pool status table has a column called query budget kb which shows the budget value of a given resource pool. The general recommendation for query budget is to be in the range of 1gb to 8gb. We can do few checks to validate if the existing resource pool settings are good or not. First thing we can check to see if query is getting resource allocations quickly or waiting in the resource queues longer. You can check this in resource queues table on a live system multiple times particularly during your peak workload hours. If large number of queries are waiting in resource queues indicates the existing resource pool settings not matching with your workload requirement. Might be memory allocated is not enough or max concurrency settings are not proper. If query is not spending much time in resource queues indicates resources are allocated to meet your peak workload but not sure if we have over or under allocated the resources. For this check the budget in resource pool status table to find any pool having way larger than 8gb or much smaller than 1gb. Both over allocation and under allocation of budget is not good for query performance. Also check in DC resource acquisitions table to find any transaction acquired additional memory during the query execution. This indicates the original given budget is not sufficient for that transaction. Having too many resource pools is also not good. How to create resource pools or tune existing resource pools? Resource pool settings should match to the present workload. You can categorize the workload into well-known workload and adhakh workload. In case of well-known workload where you will be running same queries regularly like daily reports having same setup queries processing similar size of data or daily ETL jobs etc. In this case queries are fixed. Depending on the complexity of the queries you can further divide it into low, medium, high resource required pools. Then try setting the budget to 1gb, 4gb, 8gb on these pools by allocating the memory and setting the plans and currency as per your requirement. Then run the query and measure the execution time. Try couple of iterations by increasing and then decreasing the budget to find the best settings for your resource pools. For category of adhakh workload where there is no control over the number of users going to run the queries concurrently or complexity of queries users are going to submit. For this category we cannot estimate in advance the optimum query budget. So for this category of workload we have to use cascading resource pool settings where query starts on the pool based on the runtime cap set. Then query resources moves to a secondary pool. This helps in preventing smaller queries waiting for resources longer time when a big query consuming all resources and running for longer time. Some important resource pool monitoring tables. On a live system you can query resource queue stable to find any transaction waiting for resources. You will also find on which resource pool transaction is waiting, how long it is waiting, how many queries are waiting on that pool. Resource pool status gives info on how many queries are in execution on each resource pool, how much memory in use and additional info. For resource consumption of a transaction which was already completed you can query DC resource acquisitions to find how much memory a given transaction used per node. DC resource pool move table shows info on what all transactions moved from primary to secondary pool in case of cascading resource pools. DC resource rejections gives info on which node which resource a given transaction failed or rejected. Query consumption table gives info on how much CPU disk network resources a given transaction utilized. Till now we discussed query plans and how to allocate resources for better query performance. It is possible for queries to perform slower when there is any resource contention. This contention can be within database or from system side. Here are some important system tables and queries which helps in finding resource contention. Table DC query execution gives the information on transaction level how much time it took for each execution step, like how much time it took for planning, resource allocation, actual execution etc. If the time taken is more in planning which is mostly due to catalog contention. You can query DC log attempts, DC log releases table as shown here to see how long transactions are waiting to acquire global catalog log. How long transaction holding GCLX? Normally GCLX acquire and release should be done within couple of milliseconds. If the transactions are waiting for few seconds to acquire GCLX or holding GCLX longer indicates some catalog contention which may be due to too many concurrent queries or due to long running queries or system services holding catalog mute access and causing other transactions to cure. The queries given here particularly the system tables will help you further narrow down the contention. You can query sessions table to find any long running user queries. You can query system services table to find any service like analyze row count, move out, merge out operation, running for long time. DC flow events table gives info on what are flow events happening. You can also query system resource usage table to find any particular system resource like CPU, memory, describo or network throughput saturating on any node. It is possible one flow node in the cluster could impact overall performance of queries negatively. To identify any flow node in the cluster we use queries, select one and select KV1. KV key value one query just executes on initiator node and a good node KV1 query returns within 50 milliseconds. As shown here you can use a script to run this select KV1 query on all nodes in the cluster. You can repeat this step multiple times say 5 to 10 times then review the time taken by this query on all nodes in all test runs. If there is any one node taking more than say few seconds compared to other nodes taking just milliseconds then something is wrong with that node. To find what is going on with the node which took more time for KV1 query run per stop. Per stop gives info on top down list of functions in which system spending most of the time. These functions can be kernel functions or vertical functions as shown here. Based on where system is spending most of the time we will get some clue on what is going on with that node. Abhi will continue with the remaining part of this session. Over to you Abhi. Hey thanks Akesh. My name is Abhi Manitakur and today I will cover some performance cases which we have addressed recently in our customer cluster which we would be applying the best practices just shared by Akesh. Now to find where the performance problem is it is always easy if we know where the problem is and to understand that like Akesh just explained the life of a query has different phases. The phases are pre execution which is the planning execution and post execution which is releasing all the acquired resources. This is something very similar to plane taking a flight path where it prepares itself gets onto the runway takes off and lands back onto the runway. So let's prepare our flight to take off. So this is a use case which is from a dashboard application where the dashboard fails to refresh once in a while and there is a batch of queries which is sent by the dashboard to the Vertica database and let's see how we can debug to see where the failure is or where the slowness is. To review the dashboard application as these are very short lived queries we need to see what were the historical execution and from the historical executions we basically try to find where is the exact amount of time spent whether it is in the planning phase execution phase are in the post execution and if they are pretty consistent all the time which means the plan has not changed in the execution which will also help us determine what is the memory used and if the memory budget is ideal as just shared by Akesh the budget plays a very important role. So DC query executions one stop place to go and find your timings whether it is a planning timing extra or is it an execute plan or is it an abandoned plan. So looking at the queries which we received and the times from the scrutinized we find most of the time average execution the execution is pretty consistent and there is there is some time extra time spent in the planning phase which gives us which gives us if the brief that there is resource contention. This is a very simple matrix which you can follow to find if if you have issues so the system resource contention catalog contention and resource contention all of these contribute mostly because of the concurrency and let's see if we can drill down further to find the issue in this dashboard application query. So to get the concurrency we pull out the number of queries issued what is the max concurrency achieved what are the number of threads what is the overall percentage of query duration and all this data is available in the vAdvisor report so as soon as we provide scrutinized we generate a vAdvisor report which helps us get complete insight of this data so based on this we definitely see there is very high concurrency and most of the queries finish in less than a second which is good there are queries which go beyond 10 second and over a minute but so definitely the cluster had concurrency. What is more interesting is to find from this graph is I'm sorry if this is not very readable but the topmost line what you see is the select and the bottom two or three lines are the create, drop and altars so definitely this cluster is having a lot of DDLs and DMLs being issued and what do they contribute is if there's a large DDLs and DMLs they cause catalog contention so we need to make sure that the batch what we are sending is not causing too many catalog contention into the cluster which delays the complete plan phase as the system resources are busy and at the same time what we also analyze is the analyze statics running every hour which is very aggressive I would say it should be scheduled to the need only so if a table is not changed drastically let's not schedule the analyze statics for the table. A couple more settings as shared by Rakesh it definitely plays an important role in the merge out and mode operations so now let's look at the budget of the query the budget of the the budget of the resource pool is currently at about 2gp and it is the 75 percentile memory queries are definitely executing at that same budget which is good and bad because these are dashboard queries they don't need such a large amount of memory the max memory is shown here from the capture data is about 20gp which is pretty high so what we did is we found that there are some queries run by different users who are running in the same dashboard pool which should not be happening as dashboard pool is something like a premium pool are kind of a private runway to run your own private jet and why I make that statement is as you see resource pools are like are like runways you have you have different resource pools different runways to cater different types of plane different types of fly flights which so as you can manage your resource pools differently the your flights can take off and land easily so from this we determined that the budget is something which could be worked on now let's look as we saw in the in the in the previous numbers that there were some resource weights and like I said it is resource pools are like your runway so if you have everything ready your plane is waiting just to get on to the runway to take off you would definitely not want in want to be in that situation so in this case what we found is the pool is there are there are quite a bit number of queries which have been weighted in the pool and they weighted almost a second and which which can be avoided by modifying the the amount of resources allocated to the resource pool so in this case we increased the resource pool to have provide more memory which is 80 GB and reduce the budget from 2 GB to 1 GB also making sure that the plan concurrency is increased to match the memory budget and also we moved the user who was running into the dashboard query pool so this is something which we have which we have gone which we found also in the resource pool is the execution parallelism and how this affects and what what number changes so execution parallelism is something which allocates the the plan allocates the number of threads network buffers and all the data around it just before before even the query executes and in this case this pool had a auto which is which defaults to the core count and so dashboard pool dashboard query is not being too too high on resources they need to just get what they want so we reduce the execution parallelism to 8 and this drastically brought down the amount of threads which were needed without changing the time of execution so this is all what we saw how we could tune before the query takes off now let's see what path we followed this is the exact path what we followed hope hope this diagram helps and these are the things which we took care of so tune your resource pool adjust your execution parallelism based on the type of the queries the resource pool is catering to and match your memory sizes don't be too aggressive on your resource budgets and see if you could replace your staging tables with temporary tables as they help a lot in reducing the DDL DDLs and DMLs reducing the catalog contention and the places where you cannot replace them with the truncate tables reduce your analyze statics durations and if possible follow the best practices for a couple more operations so moving on let's see let's let's let our query take a flight and see what best practices can be applied here so this is another I would say very classic example of very where the query has been running and suddenly stops to fade and it's the error is I think most of you will have seen as inner join did not fit in memory what does this mean it basically means the inner table is trying to build a large hash table and it needs a lot of memory to fit there are only two reasons why it could fail one your statics are outdated and your resource pool is not letting you grab all the memory needed so in this particular case the resource pool is easy is not allowing the all the memory it needs as you see the query acquired 180 GB of memory and it failed when looking at the in most cases you should be able to figure out the issue looking at the explain plan of the query as shared by Rakesh earlier but in this case if you see the explain plan looks awesome there's no other operator like inner broadcast or outer resegment or something like that it's just a joint hash so looking further we find into the projection so inner is unsegmented projection the outer is segmented excellent this is what is needed so in this case what we would recommend is go find further what is the cost the cost to scan this rose seems to be pretty high so there's the table DC query execution engine profile in Vertica which helps you drill down to every smallest amount of time memory and what were the number of rows used by individual operators per part so while looking into the execution engine profile details for this query we found the amount of time spent is on the joint operator and it's the joint inner hash table build time which is taking huge amount of time it's just waiting basically for the lower operator scan and storage union to pass the data so how could we how can we avoid this clearly we can avoid it by creating a segmented projection instead of unsegmented projection on such a large table with 1 billion rows following the practice to create the projection so this is a projection which was created and it was segmented on the column which is part of the select clause over here now the plan looks nice and clean still and the execution of the query now executes in 22 minutes 15 seconds and the most important you see is the memory it executes in just 15 GB of memory so basically to what was done is the unsegmented projection which acquires a lot of memory per node is now not taking that much of memory and executing faster as it has been divided by the number of nodes per node to execute only a small share of data but the customer was still not happy as 22 minutes is still high and let's see if we can tune it further to make the cost go down and execution time go down so looking at the explained plan again like I said most of the time you could see the plan and say what's going on in this case there is an inner re-segment so how could we avoid the inner re-segment we can avoid the inner re-segment are most of the times all the re-segments just by creating the projection which are identically segmented which means your inner and outer both have the same amount same segmentation clause the same was done over here as you see it is now segmented on sales ID and also ordered by sales ID which helps us execute the query drop from 22 minutes to 8 minutes and now the memory acquired is chess equals to the pool budget which is 8 GB and if you see the most what is needed is the hash join is converted into a merge join being the ordered by the segmented clause and also the joint clause so what this guess is is it has reduced the global data distribution and by changing the projection design we have improved the query performance but in there are times when you could not have changed the projection design and there's nothing much which can be done in all those cases as even in the first case of Vertica after fail of the inner join the second retry Vertica replants it with the spill-to-disk operator you could you could not let the system degrade by acquiring 180 GB for whatever duration of minutes the query had you could simply use this hand to replace and run the query in the very first go let the query let the system have all the resources it needs so use hints wherever possible and spill-to-disk is definitely your option where there are no option no other options for you to change your projection design now there are times when you you find that you you have gone through your query plan you have gone through every other thing and there's not much you see anywhere but you definitely look at the query and you feel that yeah I think I can rewrite this query and what makes you decide that is you look you look at the query and you see that the same table has been accessed several times in my query plan how can I rewrite this query to access my table just once and in this particular use case a very simple use case where a table is scanned three times for several different filters and then a union in Vertica union is a kind of costly operator I would say because union does not know what are what's the amount of data which should be coming from the underlying queries so we allocate a lot of resources to keep the union running now we could simply replace all these unions by a simple R clause so a simple R clause changes the complete plan of the query and the cost drops down drastically and now the optimizer almost know the exact amount of rows it has to process so change look at your query plans and see if you could make the execution engine profile are the optimizer do better job just by doing some small rewrite like if there are some tables frequently accessed you could even use a width clause which would do an early materialization and make use the better performance or for the union which I just shared and replace your left joints with right joints use your hints like shared earlier for your changing your hash table types this is the exact path what we have followed in this presentation hope this presentation was helpful in addressing or at least finding some performance issues in your queries are in your cluster so thank you for listening to our presentation now we are ready for Q&A