 Today is part three for our lecture on query optimization. Again, I'm here at my home office. It's just me and the Terrier who may or may not be asking questions as we go along. So the type of optimizers we've talked about so far have been the sort of classic query optimizers where the query shows up, we parse the SQL, then we run it through the optimizer and regenerate a plan before we even begin executing the query. This is how most query optimizers work. Obviously you can't run the query unless you have a query plan so you have to put it through the optimizer. But the challenge is going to be that what we think is the best plan during this optimization phase before we even start running it may actually be incorrect. Since we can't actually run the query without a plan, there's some assumptions we have to make about what the database and what our environment looks like. But these things can change over time for various reasons. So the physical design, the database can change because the administrator of the application could add and drop indexes or change the partitioning scheme. The database itself could get modified, people could insert tuples or delete tuples and that could change the distribution of values for our columns. If we're invoking our queries as prepared statements then the behavior of the query for one set of parameters might be different for another set of parameters. And then of course every time you run, analyze or whatever the command is in our database system to recollect the statistics that we use in our cost models, every time we update them then the decisions we'll make in our optimizer could be entirely different. So the focus today is to understand how can we potentially improve our optimizers, the efficacy of or the quality of the plans that we're generating by maybe relaxing this requirement that we only generate a plan or only revisit our assumptions or we never visit our assumptions once we generate the plan in the beginning. So to do this we want to understand a little bit what a bad query plan looks like or why query plans can be considered bad and then end up with less than optimal performance. So we'll cover this more on Wednesday when we discuss cost models but in general I say the high level, the biggest problem we're always going to have is the we're going to get the join order isn't correct. Joins are almost always the most expensive thing we're going to execute in a analytical workload and so if we get the ordering incorrect that can lead to poor performance. And the reason why we're going to select incorrect ordering is because we're going to have inaccurate cardinality estimations meaning we think that our join is going to produce X number of tuples but it's really going to be X times Y or some larger multiple than what we thought was going to happen. Again we'll discuss why this occurs more in the next lecture but this issue is going to be a reoccurring theme that we're going to have to overcome and today's discussion is sort of seeing techniques to do this. So but since we know things are going to go bad like we just know we can just assume that our cost model is going to be inaccurate our optimizer is going to make bad decisions then ideally if we can detect how bad our query plan is once we start running it then we can make a decision to adapt the plan to modify potentially to account for these differences and what we're seeing in the real data versus what we assume we're going to see and then we can then try to convert our plan into something that's closer to the optimal plan. So what I mean by like that is say we have a simple query like this it's a four-way join between eight tables ABC and D and then we just have a simple where clause for the B and D tables. So let's say that we run this query through our query optimizer and we generate this plan. It's a bunch of hash joins and nothing but sequential scans. But let's say that when we generate this plan for this particular join we estimated the cardinality of that operator to be 1,000. This is an arbitrary number that I'm using for this illustration. The cardinality is the number of tuples that this operator will admit. But let's say when we actually start to run it we see that we're actually generating 100,000 tuples. So our actual cardinality is two orders of magnitude greater than the estimate one. So the question we're trying to deal with today is if we knew what this true cardinality was before we started executing it, while we're executing it then could we change some aspect of this query plan to get us closer to a more optimal plan? Could we change the join ordering? Would we want to choose a different algorithm to do our join? Would we want to change maybe the access methods that come below the join to use maybe an index or a different type of scam? So this is sort of what we're focusing on today is how to then maybe adapt this kind of plan when we know something about what the data looks like once we start running it. So the high level idea again is that we want to be able to execute, estimate the behavior of a plan to determine its quality relative to other plans. This is what the cost model is doing. But the tricky thing is going to be back here before I started executing this I had to derive this cardinality from my statistics that I had maintaining my catalog about what my table looks like or what these two tables look like when you join them together. So these stats are going to be based on histograms and possibly samples that we're collecting from the data. We can also make decisions about what the hardware looks like, what kind of cache sizes we have, what kind of maybe algorithm we want to use, what other queries are running at the same time. The cost model stuff will cover next class, but the main idea to think about is what we're talking about today is before we run the query, we only have an estimation of what the data looks like and how our query will perform. And if we get that wrong, we want to then try to be able to correct ourselves. So the technique we're talking about today is called adaptive query optimization. It's sometimes called in the research literature adaptive query processing. They're essentially synonymous. And it's everything I said so far. The idea is that this technique is going to allow the database system to modify the query plan for a query to better fit what the actual underlying data looks like. And we can be modifying for the query plan by just generating an entirely new query plan, like throwing away the old one and starting over. So before we could try to modify a subset of the query plan by introducing new sort of subplans or almost like a pipeline at different points where we have to materialize people, where we could potentially switch from one plan strategy to another. And this one here, you basically go back to the optimizer and start over. This one here is that you can try to have the optimizer only do only replant a portion of it and provide these alternative strategies at the beginning. So the main sort of takeaway approach what we're doing here is that rather than just relying on our statistical models that are estimations or approximations of what the data looks like, we're trying to use the data we've collected while we actually execute the query to then help us make a decision about what the right plan should be for our particular query. And this data we're going to collect could be used for helping our current query or we can merge it back into the statistics we've collected through our analyze operation and have it be used for other queries. So again, we'll cover the various ways you have to do this, but when you think about what a query actually is doing or what analyze does, analyze is doing a sequential scan to compute some statistical models about what the data looks like. And so if we're doing the sequential scan on a table, that's essentially the same thing as analyze. And so rather than just evaluating predicates or using the tuples as we scan them to generate the result we need for that particular query, we can piggyback off of that scan operator and maintain or update these statistical models with new information. And the question here is whether we just update that models for ourselves to make our query go better or can we share this with other queries in the global catalog and now other queries can benefit from the data we collected from this. So there's three broad categories that I want to cover using AQL or adaptive query optimization. One is that we can use AQL to benefit future invocations of our query. The second approach is to try to make our current invocation of our query better. And then the last one would be, well, this is like helping your current query, this is also helping your current query, but this one would be sort of starting over from scratch and just running through the optimizer all over again. This would be adding locations in the query plan that allow you to change one strategy versus switching one strategy to the next with, again, out having to go back to the optimizer. So we'll go through each of these one by one. So the most simplest form of adaptive query optimization is, as I said, where as we execute our query, we also collect some information about what the data looks like and then we can use that information to decide whether our query is wrong and want to replan it, or we could then merge that back into the sort of global catalog. Again, when you think about this, what if the optimizer are actually doing? You have a bunch of histograms or statistical models about what your activities look like. So for a given predicate in your where clause, you want to estimate the selectivity of that predicate because that will determine how many tuples your scan will emit and you can use that to make decisions about join orderings and other things above in the query plan. So as you execute the scan, you actually know the true selectivity because you're applying the predicate on the tuples and you know the number of tuples or percentage of the tuples are going to match. So if you then determine that the cost model estimated in my selectivity was 1%, but when I run the real query and I run the query and actually do the evaluation of the predicate, my selectivity is 99%, then I wanted to use that information to help me decide whether to replan my query or that future queries come along, they can exploit the knowledge that I've gained. So again, the one approach is to try to fix my current query or the other one is just merging it back into the overall statistical model in the catalog so that I can then help queries in the future. So the most basic approach to do this is called reversion based plan correction and the idea here, as I said, is just every single time I invoke a query, I keep track of what query plan I generated for it, I keep track of the cost estimations I had for it and then I'll have all my metrics of what happened when I ran it, how many tuples I selected and how much CPU or memory that I used. And I'm going to maintain this history inside the database itself. So you'll see this in the commercial systems like in DB2, Oracle and in SQL Server, they have this built-in repository of the history of every single query that ever got invoked and they can use that information to help decide how to do query planning in the future. So let's say that we have a prepared statement or we have a query that's been invoked all the time and we have a cash query plan. So rather than maybe running through the optimizer every single time, we can just use the cash query plan that we've already generated from previous invocations. So if now there's a change in the statistics or something about the database, a physical design that changes and we recognize that we maybe want to, for this particular query we keep invoking, we want to run it back through the optimizer and see if we generate a new plan. But then when we run that new plan for this query, if we see that the performance of the query is worse than the old plan that we had before, then we just want to run it back to it. If there's a regression in the performance, we switch back to the plan that we know actually performed better for us, despite the change in the physical design or the statistical models. So we use that simple query example that I have before, the four-way join. Say again, this is my original plan and I'm doing nothing but sequential scans and a hash join and say that my estimated cost is a thousand and my cost estimation is pretty good, so my actual cost actually matches up. These are just synthetic numbers here. So I'm going to store in my execution history for my database system that I generated for this query, I generated this plan, and when I ran it I had this cost. And this is just another database or another table in my database system, right? You're sort of eating your own dog food. Rather than having an auxiliary store, this is just another table that you record this information. All right, so now let's say there's a change in our database design. Say the DBA comes along and adds two indexes on the B table and the D table, which we're using in our order clause. So now when we invoke the same query again, we would recognize that the design of the database has changed in such a way where we may now want to reconsider the query plan for this particular query. So this query touches B dot val and D dot val. Well, I just happen to create indexes on those columns. So I want to run this through my optimizer again and see what plan I get. Let's say now for the new plan, it's completely different. So now instead of running hash joins, we're running index nest of loop joins. And we're doing an index scan on B on D, which we can now do because we have an index on that, which we didn't have before. And so now we're going to pick this plan for our query because the estimated cost is 800, which is less than the estimated cost that we had over here. But when we actually run it for whatever reason that we don't care about at this point, the actual cost is 1200. This could be that we incorrectly estimated that the cost of these nest of loop joins would be cheaper than the hash joins, so we picked nest of loop joins. So just as before, it's when we actually put this, excuse me, that's not corona. If we actually now put this in our execution history, we would recognize that for this plan here, again, it performed worse than this other one here. So the next time we invoke it, we want to make sure that we use this plan. We want to revert back to the one that we know performed better. So for this approach here, this is something that Microsoft has had in SQL Server, and I think Oracle has something similar since maybe 2012, 2013, but this is pretty coarse grain. This is pretty brain-dead heuristic. It's basically saying, oh, this query plan is bad. Let me just switch back to this one. So it's all-or-nothing thing. So the paper you guys were signed reading from Microsoft is called plan stitching, and the high-level idea is exactly the same, where if we recognize that our query is running slower than query plans we saw in the past, rather than potentially just throwing away the entire query plan, the new query plan, and reverting back to the old one, maybe there are elements or aspects or subplans within the newer plan we actually wouldn't want to retain, because that'll help us lead us towards a better plan, a more optimal plan. And the other thing you think about plan stitch as well is that the subplans you're going to borrow from other queries don't need to be actually from the same query. Like in this case here, I can only reuse the plan in the simplest form. I can only swap between plans if they're running on the exact same query. But with plan stitch, because I can excise out subplans or portions of the query plan, as long as I know that they're logically equivalent, I can take bits and pieces from other queries. The other thing too is that if there is a change in the physical design where a new query plan becomes invalid, meaning it defined that it wanted to do index scan, but then I dropped that index, rather than just throwing away the query plan in its entirety, I can maybe pull out pieces of it. So the basic approach they're going to use or the way they're going to generate these stitched plans is a dynamic programming search method using a bottom-up approach, where you check to see from going from one level to the next and saying we do a system R, going from one node to the next, you pick which subplan is the best. And then once you reach the end goal, you find the cheapest path. So this means that it's not guaranteed to find a better plan than the best plan you have so far. And it is not guaranteed to always produce a valid plan but there's some basic characteristics to use to make sure that happens. So going back to our example here, say that this is our new plan and say it was working just fine, like it was actually faster so we always want to use this. But now if I come along and I drop one of the indexes that I'm using, this plan now becomes invalid and under core-screen reversion, I can't reuse it, but with plan stitching, I actually want to figure out what components of this subplan or the query plan here that I may want to use in the new plan. Even though overall it's invalid, there's still portions that are still usable. So let's say in this case there, say this portion of the subplan, the subplan of this part of the query, the execution cost is $600 and we would know this because we can keep track of the actual runtime cost of all the operators in our query. And for this one here, this subplan over here has a cost of $150. So now if I combine these together into a stitch plan, the total cost of this case would be $750. Whereas before, if I didn't run this, it was $1,000. So again, the idea is that we want to be able to borrow bits and pieces of different query plans to help us produce a more optimal plan. And this is being done separately from the regular optimizer. In the case of Microsoft SQL Server, they're running Cascades, so they're actually doing a top-down search. But this is sort of this auxiliary search that's running on the side that in the background, it tries to find a plan that can stitch together. So let's talk about how they actually do this. The first step is you need to identify which portions with subplans in our queries are logically equivalent. We talked about this before with Cascades when we had multi-expression groups. We want to know that the output of a given subplan is the same or equivalent to another subplan. And again, we have to rely on the rules of the relational algebra to recognize which operations can be commutative or associative. So in this case here, this portion of the subplan, the output is that a join, b join, c. This portion of another subplan is the output is c join, b join, a. But since joins, these inner joins here are commutative, we know that these are logically equivalent. Now, as I said, well, one challenge with this is that determining whether any arbitrary logical expressions or logical subplans are equivalent has been shown to be undecidable, meaning the question is, are these two subplans logically equivalent? It's a yes or no answer, but there's no algorithm that exists as an improvement that can be guaranteed to always give the correct answer. So in the plan stitch phase, they're going to rely on some additional heuristics to identify things like, oh, I know that these two subplans are accessing different tables, so therefore they can't be logically equivalent. You obviously can do more complicated things. The optimizer itself in SQL server also has those kind of checks and plays, and so they rely on that as well. So they have their own heuristics to prune things that can never be logically equivalent, and they rely on the SQL server optimizer to identify that the logical subplan you're trying to mash together, or the subplan you're trying to mash together in the stitch plan is invalid. So the heuristics are providing them with this sort of sweet spot balance between the difficulty and the implementation. It's a bunch of enforcer rules, the accuracy of the determination whether they're equivalent and then the performance. It's not an exhaustive search and exhaust evaluation of all possible inputs to different subplans. It's just rules based on the relational algebra. So now once we identify that we have a bunch of equivalent subplans, we want to sort of combine them together into one giant query plan where you're going to add some additional operators to determine that you can have branches to go down different paths in the subplan. So this is how they're going to encode all the different combinations of the subplans to stitch together. So the way this is going to work is they're going to introduce this new OR operator, which is not actually being used for execution. This is just something for the search. And the OR basically indicates that the subplans below it are logically equivalent. So we could choose either path. So starting from the top, we have an OR clause at the very beginning and then we have the two for this particular query. We have either doing the hash join or the nested loop join. And again, these are logically equivalent because this is a join, b join, c join, d and this is c join, b join, a join, d and those joins are commutative so therefore these are logically equivalent. So then now, say you go down, we're going to go with like a depth first search going down on this side. For this one here, same thing. We can do the hash join on a and b followed by c. This is the nested loop join on c followed by c join, b join, a. Again, these are logically equivalent. So that's why we can have the OR clause. We can choose either one. And then we're just going to keep going down until we get to our leaf node in the sequential scan. And then here we don't see, there's another option for us in this portion of the query plan because the one we stitched from just only had a hash join. So now in this case here, for the hash join a, we can do a sequential scan as we saw on the first plan or we can do the index scan on b because that came from the second plan. And so we have an OR operator to express that. Going back up here, we can only do a sequential scan on c in the straight path. Going back up here for the nested loop join, the only thing we can do below it is another nested loop join. And then for this, we can either do a sequential scan on a or again for b, we can either do the sequential scan or the index scan. Going back up here for the hash join, again, that's feeding a sequential scan on d feeding in and then we just complete the rest of the tree like this. So this is a bit more simplified version of what they showed in the paper but these are actually the possible options you can have. And so what I think, remember in the paper, what they talked about is that this approach and doing the search within this defined stitch plan, they're able to stitch about 75 to almost 100% of all the plans together for the workloads that they looked at. So now that we've encoded our search space, we actually want to do our search and this is just starting from the bottom and going up the same way we did with the system programming search where we just, for every single leaf node we start off with figuring out what the cost is for going to the next operator. We pick which one is the best and then once we complete all the do the search for all the nodes at our current level we then go up to the next level and complete this process. So let's say we start with the sequential scan on a it only has one option first which is just the it's either the hash join on a, b or the nested loop over here say the hash join is cheaper so we pick that now we do the sequential scan on b this has an OR operator so this is either doing a hash join or the nested loop join and then say the hash join is cheaper so we pick that now we do this for the index scan on b again there's an OR operator we do the hash join and the nested loop join and so because we have an index the nested loop join actually would be cheaper here so we would pick that and we just keep going down the line and do this for all our leaf nodes and then we're done we go up to the next level and then again now we have a cost for all these paths leading up and we just pick which one is is the cheapest for us and then we reconstruct the we construct the stitch plan that way right? So again I think this is an interesting approach I don't think Microsoft is actually running this in production like this was a paper that was published in sigmod I don't know of any other system that's doing something similar like this from an engineering standpoint the fact that you have to run this separately from the from the query optimizer and sort of have separate infrastructure for that Terry what are you doing? Right so rather than having separate so search infrastructure if this is integrated into the query optimizer component itself I think this would be really interesting approach so there's another system that does something similar to this plan stitching but they're actually working on a sort of a cogen level rather than like a physical query plan level so Amazon has their redshift data warehouse service and it's based on a par excel and they use actually a they do it's a cogen engine so the database system for a given physical plan generates C++ code or C code which they then compile and then they run they invoke the shared object that comes out of the compiler and that's how they do query compilation so obviously the most expensive part of cogen engine is the compilation in their case they're actually forking GTC or whatever compiler they're using to actually generate the machine code they want to try to avoid that for every single query so what they can do is they say you're doing you want to compile the scan on B where you want to see where B dot val equals some input parameter so the cogen that piece run it through the compiler that generates x86 code and then they'll go ahead and cache it and then now any time you re invoke this query you know you can just reuse the compiled version of the scan on B but similar to plan stitching what they can also do is they can recognize that if you have another query with the same kind of predicate B dot val equals some parameter it'll cogen the exact same thing so rather than recompiling it which is again the expensive part they can identify that they have a cache plan fragment for this scan here and they can reuse that and so they actually can do this across all possible across all their customers so like this scan on a table to do one something equals something on a varchar field that's going to be the same from one table to the next because it's a column so you're just ripping through the column so they can actually share these little fragments and stitch these physical query plans of the compiled query plans together from all possible customers so now for a given query that they've never seen before if it has the same pattern of access methods and joins and other things as queries from another customer they just pull from a cogen and cache and stitch together so that's kind of cool alright so there another interesting system to talk about is IBM's leo the learning optimizer and so this is an example of where you have a feedback loop being used to improve the accuracy of the cost models in the system the idea is that again I keep track of what my cost model estimates were when I generated the query plan and then when I run it if I recognize that those estimates are way off I start recording information about what I'm seeing in the real data and then when my query completes I return the result back to the user or the application that requests the query but I also go update my cost model statistics with the new information that I've collected so IBM's leo was actually shipped in production in db2 today but this is one of the examples of a commercial system applying one of these adaptive query processing techniques alright so the the plan stitching stuff that we talked about or the version stuff is about fixing future invocations of a query to improve them based on the results that I'm seeing when I actually execute my query but now we want to talk about how do we fix my query if I invoke my SQL query and I determine that I have a bad plan what do I do because I don't want to wait for the next invocation I want to fix the one I have right now so I'm calling this the replanning the current invocation again the idea is that if I determine that the observed behavior of the query plan as I'm executing it is way off or divergent what the estimated behavior was that the cost models produced then I can decide to potentially either stop the query and go back and generate a new plan or I can decide to maybe how about some how recognize that I've already produced some work for me and keep that portion of the data that I've already processed and then turn back to the optimizer and ask it to just generate a sub-plan and either just start with a scratch and you decide that continue with the same query plan that I have now is going to be worse than just starting over obviously if you're at the last two but the last operator then it's a bad idea, just let it finish so striking a right balance of this is difficult and then the other approach is determining that well I'm doing 100 joins and I've already done one of them and then you keep that one that I have because that was expensive for the other 99 and the whole idea here is that you're going back to the optimizer and saying hey generate generate me a new plan so let me give an example that does something sort of similar to like this so this is from Apache Quickstep Quickstep was or is a embedded analytical engine sort of similar to DuckDB but I don't think it's supported SQL it came out of University of Wisconsin and then it's been it's been turned over to the Apache Foundation I think it's been kicked out of the incubator program because I don't think they've updated it recently I don't know what's going on with Jignesh and his team but I haven't really seen any updates in a while but they had this really interesting approach called look ahead information passing where I can do some work at the beginning of my query and pass that along to other operators or other portions of my query plan and hopefully make a decision about what the right ordering is for things up ahead so for this example say we have a simple data that says three tables so this would be a star schema this approach only works for star schema we have a fact table in the middle and then you have dimension tables coming out of it so it's not for arbitrary star schemas or arbitrary tree based schemas so the way this is going to work is say this is my SQL query like this I'm doing a three way join between the fact table and the two dimension tables so what I'm going to do is before I begin before I start scanning the fact table and start computing the hash tables for my join I'm going to scan through the dimension tables and generate a balloon filter we saw this similar technique being used when we talked about joins this idea came from vector wise I'm going to pass it on to the other side of the query plan so that maybe avoid a hash table probe and we see this so the joins are going to be on the dimension tables these are going to generate the hash tables and the fact table is just going to do a probe so I want to generate the balloon filter and then check the balloon filter to see whether the key I'm looking for can even exist in the hash table which is cheaper than doing the hash table probe but what we're going to do differently here is that for the fact table we're going to do some sampling to determine the selectivity of the different balloon filters for these different tables and then if we determine that the second table here the second dimension table is actually more selective than the first one then I want to reshuffle the reshuffle my joins so that I do the probe on this hash table first because I'm going to end up throwing away more tuples and we can do this before we actually start running anything because we've already built the hash tables, we generate the balloon filter and we can make the decision before we start scanning and doing the probe so I think this is a really interesting idea as far as I know, QuickStep is the only one that does this and I don't know whether it actually made it into the open source version alright, the last adaptive query optimization techniques or category you want to talk about is what I call sort of plan pivot points and the idea here is that we want to introduce additional subplans in our query and then have a have a sort of special synthetic operator that we put into our query plan that allows us to pivot or switch which path in that query plan we want to do and the idea here is that we can put conditions in the switch operator or the change plan operator that if we determine that our data looks one way we'll go down one path, if it looks another way we'll go down the other path it doesn't have to be two, it can be multiple ones so the sort of two most famous techniques for doing this are parametric optimization or proactive re-optimization again at a high level they're going to work exactly the same way it's just the sophistication of their technique is slightly different so parametric optimization is actually developed in the late 1980s in 1989, this actually came out of the volcano project, again the same one that does the volcano query optimizer the volcano iterator model they also did early work on adaptive query optimization which is again, I'm saying that work is very influential so as I said the idea is that for each pipeline in our query that we think that there are different alternatives we could have that would make a big performance difference will generate different cell plans for them and then now in our query we'll have this choose plan operator that basically has an if clause that says if the cardinality of the operator below me looks one way, of a certain size then I want to choose the first plan if it looks another way then I'll choose this other plan in this case here if I know that my data is really small then maybe I want to do an S loop join because that's going to be cheaper than having to build a hash table and probe it but if my data is really big then I maybe want to build a hash join or do the hash join so again I think this is actually an interesting idea of course obviously the tricky thing is determining what this conditional this condition should be and you know there's it's sort of through trial and error as you develop the thing and it's actually very dependent also on the on the hardware but the nice thing about this is like there's nothing we end up like not having to go back to the optimizer and sort of replan everything the and we don't throw away any of the data that we've collected right so we do this hash join and then we just determine whether we want to go down one path versus another a more recent sophisticated approach to this is called proactive re-optimization which is actually combining the ability to go back to the optimizer and generate a new plan as well as to tweak it in the same way we saw in the previous example so they actually can do both and so at a high level it works like this so query shows up we generate through the optimizer and so we'll generate different different switchable plans just like before but we're also going to now generate bounding boxes that allow us to determine whether the assumptions we're making and our decisions about whether to go down one path versus another are actually going to match up with reality basically trying to put a balance on the uncertainty we're seeing in the data as we run so now we start executing the query just like before in Leo and other techniques we execute query collect statistics about the data that we're seeing for our particular query and then we can switch the query plan just as we saw before if we determine that one plan path might be better than another but then if we also determine based on our exceeding our estimations in our bounding box thresholds if we see that we're way out of whack and our estimations are way off then we just go back and re-optimize and then you can determine whether to pin the portions of the query plan that you've already executed because you know they're expensive or you can just say throw everything away and start over so this is sort of again a crash course on adaptive query optimization the I actually really like this these techniques and there's for obvious reasons right like it doesn't rely on getting it right at the very beginning like you can sort of correct yourself as you're actually running the query so again we'll see in next class when we talk about cost models of how bad things can actually get but the way you actually need to implement this is super important that it's just not you don't want to implement your optimizer and your execution engine completely separately from each other it's sort of a symbiotic relationship where you have to know what kind of strategies could be employed by the execution engine and the optimizer in terms of like switching paths and throwing away intermediate results or not and then you build your optimizer around what your execution engine can actually do so for this reason I think I think applying this technique for or using this technique with sort of this optimizer as a service like orca or calcite could actually be tricky because there's different approaches for how you can actually support adaptive query execution in the system itself so the in addition to having sort of more robust or more more sophisticated query optimizers all you know versus the open source one open source systems all the major database vendors now support this within the last actually mostly in the last three or four years like dv2 had this leo thing in early 2000s but really in the last three years both oracle and sql server and now teradata also included the ability to do adaptive query optimization but that's my knowledge and postgres and my sql simply can't do this and none of this is the newer open source systems that have come around in the last decade so so again this was just to show you that you don't have to build the optimizer the way we described it where you sort of plan once and run it there are techniques to actually modify the query while it's running and they get feedback from execution and put it into the system what the f*** are you doing alright sorry so next class we'll then start discussing how cost models work and we'll see actually why they're so bad okay alright guys bank it in the side pocket