 All right, so today is the second lecture now on query optimization implementation. So the quick announcement I'll have to say is that immediately after this today's lecture, we go up to the eighth floor. We have the co-founder of Stream, which is the Stream Processing Platform. It's come fluent from California. Come give us off here at CMU. So if you have time, please come check it out. And like I said, in my message on Piazza, he's still looking for several interns, so you can talk to him about that if you're interested. Okay? All right, so today's agenda, we're going to focus more on Cascades. And I realize I had you guys read the MemSQL paper on their optimizer, which it's an okay paper, but it's not a Cascades implementation. And the reason why I had you guys read that is I'm still looking for what's the right paper for you guys to read in this class for the second lecture on optimization. And in my opinion, it is interesting because they sort of describe the same techniques that we're talking about here just in different terms. And there's a whole bunch of extra stuff they talk about to how to do distributed query optimization, which I think is correct, but it's not germane to what we're doing here, so that's fine. But we're going to spend most of our time talking about Cascades, and in particular, how I'm going to describe Cascades is essentially how it's described in the Columbia paper, the Columbia work. And then we'll talk about Orca, which is a modern implementation of Cascades. And then we'll finish up talking about the extra credit assignment, which we'll go out today. Okay? So as a refresher for where we were at from the last class, we spent time walking through the history of query optimization, and we discussed the various ways you could implement a query optimizer. Going back from the very beginning of the 1970s with Ingress and Oracle, you had a heuristic-based approach where you basically, in your source code of your system, you have some hard-coded rules that do various transformation optimization to try to put the query plan into an optimized form. And as we said that this was sort of a hacky way to do this because it's not considering costs in any of its calculations. So it just knows that I have some hard and fast rules that are static rules that I always want to do predicate pushdown, or I always want to put the largest table as the outer table in a join. And then we talked about how to do something more sophisticated to do what's done in, was done in IAM System R and the early version of DV2, where now you do the same heuristics to do some static transformations that the first systems did, but then you also include now this additional step where you have a cost-based search model to look for an optimal query plan, and you're using your cost model to estimate whether one plan is better than another. And in the original example of System R, and what most people do, what most people care about is trying to figure out what the proper join ordering is for complex queries. For two tables it's pretty easy, but when you start going beyond that, it's NP-hard or NP-complete, so they need to come up the way to make this easier to do. Then we talked about the randomized search algorithms. So this was the simulated kneeling approach or the genetic algorithm approach in Postgres. And as I said, only Postgres is the only one that knows that actually does this. And they only kick in the genetic algorithm when your query has 13 or more tables. But then we spent most of our time talking about, at the end, the stratified search and the unified search. And these are ways to build what are called optimizer generators, where you can define in a declarative language the rules for the transformations that you want to apply on a query, and then you have a rules engine apply those transformations, and it does some kind of search to find an optimal plan. So the Starburst approach was the first one out of IBM that did this stratified search, where you sort of have two stages, and then now it's in use in DB2, and as far as I can tell, it's used in Oracle. And then the unified search model is the volcano or cascades approach, which is now used in SQL Server, Green Plum, CalCyte, our system, and a bunch of other stuff. So again, we'll spend most of our time talking about this, but it's important to see how these two things can be distinguished from each other. I should have spent a little bit of time last class talking about how what Postgres does in a bit more detail. So as I said, there's a genetic algorithm that is only fired or enabled when your query has 13 or more tables. But in general, they fall back to the heuristic plus cost model search. And what's really interesting about this, what they do is it's a good example of why, in my opinion, a cascade model or a unified search model is better, because from a software engineering standpoint, it makes it easier to maintain and extend the query optimizer. So in the case of Postgres, they have what I'll call a rigid workflow in the query optimizer, where you go through these different stages, and these different stages have different responsibilities, and they make assumptions about what the query plan from one stage is producing is going to be fed into it. So in the first stage, they basically do what we said before in Ingress and in the first versions of Oracle, where you have these static rules to do rewriting or transformations based on heuristics. I can't predicate push down as the classic one. So then you go into a cost-based search model, basically the same thing they did in System R, where you're trying to find the optimal join ordering. So then at this point, you have a basic physical plan, but it may be actually missing some extra stuff that you need in your query, things like aggregations or having clauses. So what will happen is, if you do the cost-based search, then they go into this next stage, and then they add in back all the stuff that your query needs. So they strip it all out, run the cost-based search to find the right join ordering, and then they go add everything in to actually complete the physical plan. And then if you have subqueries, you basically do this recursively. So you go into the interquery, do all the planning you need there, maybe try to do some rewriting to elevate the subquery out, but essentially you're just doing all these stages for each of them. So one of the lead developers of Postgres lives in Pittsburgh. He's actually a CMU alum. He got his PhD here in the 1990s, not in databases, in software engineering I think. And so we had lunch with him a few years ago when we were trying to get started on the Peloton project, because we started with Postgres because we thought let's take the query optimizer and the single parser and we can use that in our system. And so we were asking them basic questions about how does this work, how does that work, to get a sense of how easy it would be for us to modify this. And he kind of shook his head and said this is one of the most tricky parts of Postgres that's actually kind of brittle because all these stages have again these assumptions about what the plans need to look like. And he admitted that not very many people actually understand the Postgres query optimizer. And so again it's amazing what they've done with it, the things they're able to support, but we decided to end up ditching this entirely and go build our own because we didn't want to have to have to spend a little time trying to figure out what this thing was actually doing. So this is a good example of how things were before optimizer generators. This is an example where inside of the database system the programmer actually writes the transformations or writes the codes that gets executed as it's written to make changes and try to optimize the system. So the optimizer generator movement that came out of the late 1980s early 1990s was this idea that rather than having to have all this bacon exactly your transformations encode to these static transformations, instead that you could again write your rules in a declarative language which in theory would be easier for people to extend and maintain depending what language is written in that's may or may not be true. And then the idea was that you write your declarative rules then you run it through this rules engine which then generates the C or C++ code that you then compile and link into your system and then now any single time you need to modify or extend your query optimizer you go modify the declarative rules and then the rules engine can make sure that everything is correct. So the advantage of this approach is that you're going to be able to separate how you're actually going to perform the search over the query plan to find the best choice for you from the actual data model. So this means that whether you're writing on a document database like a JSON database like Mongo whether it's a relational database a graph database you could still use the same optimizer generator and you can still apply the same transformations you need to optimize your query without having it to be specific to one particular data model. And likewise you can then write all these transformation rules for logical operators and physical operators to be completely separate from each other without trying to worry about how they're intermingled. Yes? So his statement is if you need to do a transformation from a logical plan to a physical plan uh wouldn't your transformations need to know something about the database? Yeah the data model. So think a layer above that like it's not you know the transformations are like you know not like read this bit at this offset right it's the standard things like predicate pushdown predicate pushdown is makes sense whether it's a document database or a relational database right uh so so you're not hard coding it to to any you know any one query language or any one data model you don't need to know these things all that logic about like you know maybe the physical layout of the data you made there may be one uh algorithm physical operator you may want to use versus another that ends up being put into the cost model which I'll talk about next time. I got them really asking them what what a DSL like that relational calculus okay which is not not pretty uh no and I'll I mean so in our case it's C plus plus right um for these again so think of these as like toolkits so rather than essentially what we're doing now we we implement our own Cascades implementation right the idea was that you could take these toolkits download them and write your your you know the rules and then it would spit out code that you could link in right and the rule is going to be something like predicate pushdown yeah like like something looks like datalog sort of or like it's uh in the in the case of uh starburst it's this thing uh SGML or something it's some some some high-level language that you define these transformations for our purposes we don't care right so right um and again another big thing is that the the the implementation of how you're actually going to perform the search whether it's top down or bottom up can be independent of how you define all these things right that's left to the rules engine that's going to apply these changes and do the search so as I said starburst is the first example and then the Gertz graphy who invented the Cascade he had basically three three iterations of this and he at every single step he learned from mistakes and made one better than the other and the last one he did was was Cascades and OTT plus plus was a optimized generate toolkit out of wisconsin for the paradise project we'll talk about a little bit of it in a second but the basic idea of this one was they actually try to go do everything so they they wanted to do top down they want to do bottom up they support all of them and try to figure out you know which one would actually be the best so this is the for these papers here they actually did the the most comprehensive bake off of the different models to see what which one actually works but it was back in the the 90s um so it's somewhat dated all right so in the again the two different two major approaches that we care about are the stratified search and then the unified search again the stratified search is where we have uh we write our transformation rules to put us from uh to do logical changes to our query plan um and then when we do these changes though we don't actually never consider the cost model because it's sort of like this the static rules that we had in the heuristic space approach but they're just defined in a nice dsl to make these things easier to maintain um but we don't look at the cost model for anything and then we do a cost based search again whether it's bottom up or top down it depends on what you actually want in your implementation but this is where we actually then do a conversion from the logical plan to the physical plan and we use a cost model to decide whether we have one is better than another in the unified search we don't make a distinction between different stages of doing logical to logical transformations or logical physical transfer transformations it's all in one single search model right um and at all every single step we can consult the cost model to help us make decisions about whether we're doing the right thing now for some logical to logical transformations you actually can't get a cost until you actually get a physical operator um but for other cases you can take an upper bound and you know an estimate of what you think you're thinking uh what your operator is going to do and use that to help guide the search process a little better so the major downside of the unified search model is that you end up doing a way more transformations throughout the entire search because as we'll see in our examples you know we can do swaps you know to put you know for doing a join b we could swap that to be b join a and then another logical transformation could put us back to a join b so to avoid these redundant computations and you know avoid getting caught in an infinite loop we're going to take advantage of memorization to reduce the amount of redundant work we're essentially doing right to reduce the number of redundant or unnecessary transformations or things that we've already done and also avoid having to go to the cost model and do an estimate because we already know we've already know we've computed the cost for a particular change so one thing to be uh important to note is that there is actually the unified model and the stratified model are separate but these are mutually exclusive to whether you want to do a top down or a bottom up search model search approach right in the case of op plus plus uh it could be a unified model and and actually it was always a unified model but it could either do uh one or the other it just so happened that like the first stratified search model implementation well of starburst was it was a bottom up approach and volcano and cascades are the most well known unified models and these are top down but they could be completely separate you could have one versus the other in either approach so the big thing to understand about top down optimization is that we're going to start with the final outcome of what we want for our query plan and then we're going to work down in the tree and do transformations to either go from logical to logical or logical to physical and generate a query plan uh working to the bottom where we get to the final access methods contrast this with bottom up is where essentially you start with nothing you start with the access methods and then you build up your your query plan uh by adding in the pieces that that you need right and at a high level these are essentially the same uh the op plus plus paper shows that you can generate you know roughly the equivalent query plans in either one um i think it really comes down to in many ways the uh the form of software engineering and maintainability right i actually think that for me this is sort of easier the reason that this is about but we'll see a comment at the end where um the uh the the creator of cascades actually says that this actually might be better than cascades because he made a side comment last year which i haven't really thought through yet okay so the cascades uh query optimizer as i said is an object oriented implementation of uh of the the volcano query optimizer that that he built so there is an original cascades paper from 1995 i didn't have you guys read it because i actually don't think it's very good and the columbia master's thesis that i have you guys read those 30 pages are on my opinion actually the best explanation of what cascades actually is so columbia is not an exact implementation of cascades but there are some optimizations that they do but for our purposes we'll just when i say cascades here um you know in some ways i really mean columbia but they're at a high load they're essentially the same so the key thing about cascades is they claim that they're object oriented um this was the 90s this was in vogue so they make a big deal about this um but then they're gonna allow us to do simplistic rewriting for expressions uh by using a mapping function to do these transformations rather than doing have to do an exhaustive search right and another big thing is that they'll be able to do transformations on the fly rather than having to regenerate them all at once so on volcano as you do the top down search every single time you landed at a group uh you would apply all the transformations immediately and explode your search base whereas in cascades as we'll see in a second you only do transformations as you need all right so the the four major things that are unique about cascades is that the optimization tasks are just data structures that get loaded into a queue and we can apply them uh one by one um we can embed in our groups uh physical properties or requirements that we need to have about our data like whether things need to be sorted a certain way um and then our rules can consider them and know that they can't apply certain transformations if those transformations would violate these rules all right so contrast this again with what we saw in system r where i said the search model had no way to know that data needed to be sort of a different way you had to embed that logic inside of the the cost model all right so now you have this weird thing where the logic about whether something is a good uh is going to be sorted correctly the way you need it would be embedded in the cost model and not the actual search itself then they're also going to have support for ordering the moves by promise and essentially means that you can define priorities for your transformations and reshuffle things as as as you as you go down into the search tree and predicates will be treated as logical and physical operators which the land allows us to write transformations that do predicate push down and all the other things that we do want to do just as if we were doing changes on on operators in our query plan so i'm going to go through a bunch of the definitions that cascades talks about for expressions and groups and multi expressions and then we'll do an example to see at a high level how the cascades actually works so the first thing is that they're going to find uh expressions and expressions aren't going to be like a predicate uh sometimes may may use that term like a you know wear clothes expression but it really has to do with a a sort of high level operator that we have in our in our system right so it could be not any operator with a one or more uh zero or more input expressions so a logical expression could be uh a join b join c um and then the physical expression that is equivalent to this logical expression could be uh do a file scan so that the subscript f means file scan subscript uh hj means hash join so do a file scan on af and hash join it with with a file scan on b and then do a nested loop join uh on c uh retrieved by file scan so again these are logically equivalent even though one is a physical operator one's a logical operator meaning they're both going to output the exact same result and because we know this they allow us to you know determine and optimize certain things later on so what we can do now is we can now define what are called groups which would be a set of logic equivalent physical and logical operations that all produce the same output so my previous example the logical expression was logically equivalent to the physical expression because they both produced the output of a join b join c and so what'll happen is all the in a single group will sort of define them based on what their output is so again in this case here i'm defining the output of this group as the a join b join c and i don't define in this case at this high level output requirement i'm not defining the order of anything right this this is a logical output so i'm going to have all my logical expressions which is again just the all possible permutations of join orderings to produce this output and then the physical expressions will be all possible physical implementations of every single possible logical implementation that produce the same output and i can define how i'm accessing the data and how i'm doing the join right if i had things like i needed to know that this was sorted i could have requirements of properties to say that this thing this output has to be sorted and then it all could be encapsulated in the the physical expressions here right so the first thing to see is that for a simple thing you know a join b join c i'm going to have a ton of different logical expressions and each for the each individual logical expression i'm going to have a ton of different physical expressions that are equivalent to them so i need to do something about reducing my search space for all these equivalent expressions and also reducing the storage space for all these because now i need to keep track of all these different different expressions and i may have to you know jump to them in the search tree computer cost on them and store information about them so in order to cut down on all the storage overhead they're going to organize groups in terms of what are called multi expressions and the idea here is that instead of having explicit instantiations of every single logical operator and physical operator i can instead define them in terms of these multi expressions that are again are logically equivalent to what what i showed in the previous slide so again my output is a join b join c so in this case here i could have a multi expression that says a join b join c and a a and b are grouped together and same for b and c and and so forth going down here and then on the physical side i only need to now define what the actual join algorithms i want to use so previously i was saying here's how here's how i'm going to scan it and here's how i'm joining a and b and here's how i'm joining c but in my multi expression you say i'm going to join a and b i don't know how you'll figure this out later but just represent a and b in this multi expression as this as a sort of a high level concept but then to join it with c again not defining how we actually access it i i'll say that here's my join algorithm right and then the idea is as you're now going to do the search you would know to say all right i need to know how to convert this into a physical operator so you'll jump to a group that has this multi expression that's a that's equivalent to it and then do planning on this it'll become more clear in the in the example question no yes attributes you're going to express as a single multi expression and like let's say so if there are like a b c d in which you could have both a b and c d but you could have a b as one multi expression and c d as one multi expression you could have a and b c d as the you just have all of them yes you have to have all right so in here assume like i so the dot dot means it keeps going for all possible combinations right and as we see in the example the you could define a priority on these transformations so that you can maybe only transform the first one and and because you think that's going to be the one that's going to probably do those benefit or again if you have an enforced rule you say well this thing produces my output in the sort of order that i need so all these other ones i won't instantiate we'll see an example of how you sort of go through one by one apply the transformation and then figure out what the cost is right but you have to have everything i'm just showing only a subset so this is a good example actually what a distinction between the the top down and the bottom up stuff so in the case of bottom up you would start with the the the individual elements or the individual tables you want to access so you you would first do your planning on how to access a then you your plan on how to access b and how to access c and then in the in the next level up as you go as you go and cross or going up in the search stream then you consider how to join a and b how to join b and c how to join c and a right and then after that you go up to the next level and how to figure out how to the right way to join a b and c all together right so in the case of uh if in the top down model you can actually take advantage of the fact that you know something about how you how you need to join a b and c and you can use this information to make decisions about whether you go down you know looking at a c first or b c first or a c first right because you know all this ahead of time again i think when i show the example this should be more clear all right so now we can define what are the rules to do these extra transformations and essentially there's two classes that are rules right there's the logical logical logical physical so under cascase terminology a transformation rule is to go from a logical logical operator to a logical operator an implementation rule is going from a logical to physical and the way they're going to represent this in the uh optimizer is that every rule is defined by the pattern that it gets that is triggered or that you look for to say that you should apply this rule uh you know what what is the what is the structure of the sub structure of the query plan look like and then you have the actual action you take that defines how you modify the the the target in your in your query plan if you match the pattern right so look look at it so my pattern could be something like this say i'm doing a uh uh join across three three tables well i'm leaving these as group because i actually don't care what comes below here right it could be accessing a table it could be another join i just know that i want to deal with two joint equi joint operators so this is the pattern that i'm looking for so say my query looks like this at this point in in my search so these boxes represent uh logical operators because i haven't defined how i'm doing the join having to find what you know what what what access method i'm using so my my query plan here will match this pattern so if you have a transformation rule again this is a logical a logical transformation i can have a rule that says rotate left to right so instead of having the uh the join for for two tables on the left side of the tree i can rotate it now and have it be on the other side yes the group stand for like all logical expressions or all logical and physical expressions so his question is what is a group stand for here it's it's this thing right i like incurred all the logical and physical yeah correct yeah and again these things are instantiated as needed so like i could just have one one logical and a bunch of physical but in in reality it could be a ton of these i just haven't applied the transformation jet you only do them as you need so again i don't care at this point what the hell is going on below me i just know do i see this pattern and there and then matches all right so again i can do my my rotate left to right again this is logically equivalent to uh to our original query plan so that's that's fine um and then i can have an implementation rule where i now substitute the uh i do transformation from logical to physical so i modify or change out all of the logical join operators now to be physical join operators where i'm explicitly specifying that i'm doing a sort merge join for you know for these join operators yes a couple of questions first um is a rule always cost oblivious or can i actually do something to say oh you look like a plan when you're going to cost something like this so his question is is applying a rule cost oblivious the pattern is cost cost oblivious you could do the transformation then cost that query the new query plan at that moment and then decide that it's actually higher than the best query plan you've seen so far so you you halt the halt the termination so you always apply the action always apply the rule as far as i know but after doing the transformation you get you can then decide i don't want to proceed further and the second question is is there a program out of way for me to actually derive such equivalence relations from like the semantics of the relational calculus itself as opposed to after working by hand this question is is there a way to automatically derive the rules by just taking the grammar of relational calculus or or yeah basically give them like semantics um the relational calculus so by semantics me what like no because it's the optimizations you care about relational calculus doesn't define again relational calculus is it is a declarative uh mathematical definition of how to execute a query does not say how to execute how to execute it efficiently so there's nothing in relational calculus that says predicate push downs are what they're the way to go but i should be able to get a bunch of transformation rules and you know only a subset of the world will work so you might be able to do this so you so you could you yeah that's one i mean essentially so you're essentially defining in these rules the the the rules of relational algebra relational calculus associativity commutativity right that's essentially what what what you're defining you're just you're defining how to to make the change though you seem unsaturated that answer i guess you can you can still a bunch of the rules due to automatically due transformation rules they just on average will be helpful like i can i can basically tell it you can do associativity you can do commutativity but it doesn't apply these things over and over again and blow up a lot of things that's essentially what people do right i mean your your statement was can i can i automatically derive them yeah because but like like what does that mean like that's like taking a grammar file or something a more high level it's essentially the dsl yeah so you can think of the rules you can think of the rules as a implementation of the transformations that are allowed on a relational calculus right so yes yes so i mentioned this earlier what's what's one obvious problem we could have with these logical logical transformations here so again we can go from logical logical or logical or physical we can never go physical to logical mem sequel sort of breaks that but we'll get there later what's one obvious problem here our transformation is rotate left to right there's another transformation rotate right to left and that puts me back exactly to where i was here right which then would be an infinite loop what's that yeah so the memoization is going to solve this so the memo table is going to allow us to store in a compact i say graph structure could be a table a hash table in the original cascade favor they tell about being a graph but the idea is that all that's all the different transformations we've seen before we can actually maintain a some information about them to avoid having to again just get stuck in an infinite loop by by changing things back and forth unnecessarily right so to make this work you're gonna have to store the a bunch of information about the the trees you're generating the query plans you're generating for each group so that you know that if you do a transformation you would say aha i've actually done this before and i have a cost so i don't want to i don't want to do this right so they talk about how to do this efficiently um in the 1990s they they use some hash function called look up which i've never heard before uh but the basic idea is again you have a way to to hash the the the the group you're trying to generate and you can use the hash code to look up in a hash table and say i've seen this before or not right so the key thing about the way this is going to this is going to be able to work is something called the principle of optimality so now this is not specific to uh cascades or the top-down search model it's a high-level concept that is applicable to that's going to allow us to rely on memorization to avoid having to do uh unnecessary searches with down different paths so the basic idea is this is that every self-plan of an optimal plan is a self-optimal right it's sort of uh no brainer but sort of think about it if i have a one group at the here and one group here i if i know i have the total optimal plan for my entire query there can't possibly be a more optimal query plan for the the second group at the bottom because if there was then i wouldn't be the optimal query plan right so that's basically what the principle optimality says for us and it basically allows us to know that once we see that we have a query plan that has a higher cost than the best one we've seen so far there's no need to even proceed further down into the query tree right and this is essentially the branch and bound search we can do to cut things off when we know we can't we're not going to get any better no matter what we do all right so let's look at a high-level example of cascades here so this is that three-way join between a b and c so what i'm showing here is essentially what you would start with when you initialize your search model on our cascades right so we have our two groups to do the joins a b and c and a and b for now we're ignoring you know joining b and c or joining a a and c i would just deal with joining a b like this and then in here along the leaf nodes we have the groups that correspond to accessing a b and c and for this you always have the logical operator right get a get b not defining again physically how i do that right this is pretty much the only logical operator you can have for this yes optimal there's a principle of often there is for like convenience uh or you just like guaranteed that like uh query plan of uh it was just like every sub query is optimal is always optimal like in the final so your question is is the principal optimality is that applicable sorry say it again so like if every sub plan of this optimal of this plan it's optimal does it mean like it's always true that the optimal the plan as a whole is always optimal if your sub plan is optimal i think question is if your sub plan is optimal does that mean is that guaranteed the total plan is optimal yeah i mean isn't popular that the total optimal of the total plan is optimal while like one of its like sub plans it's not yeah it can't be because it wouldn't be the optimal plan right if there's a better if there's a better sub plan then that would be the optimal plan not the one you have it'll make more sense for the access methods yes that's the reverse that that tells you sub optimal plans will always come back yeah if that were true that's wouldn't be a pr yes um if that sub optimal plan also looks in like the context of the overall plan because what about cases where like if you need to sort your output like you might want to choose a sorting joint which would like like produce a better cost but like that sort of joint may not be the best plan for that uh now get your point um yes i think so i think he's correct yes yeah we'll come back to this all right um for simplicity i didn't include any physical uh properties in there but we can talk about afterwards all right so again so what i'm showing here is what you see when you first initialize a search with cascades right you always have to have the root again this is the output we want we want to join a join b join c and then i'm only showing one of the uh groups that can be extracted out of this uh we'll see later on that you would expand this out but you have to start somewhere so this would be the sort of the first thing that you would pick so at the very beginning what we're going to do is pick one logical expression uh to to to to generate from our expected output here so this is again this is what we knew what was going to come this is a join b uh together in some way we haven't defined it yet at this point in our multi-expression and then we're going to have uh an access on the scene so we would traverse down now into the tree and we land at this group and again we haven't defined any logical expressions yet so we would generate the first one uh a join b and again this is a lot of expression so i'm not defining how the join's performed and i'm also not defining how the the i'm accessing a and b right because that's wouldn't wouldn't be defined here it had to be defined down below so let's say now i get i traverse down go to this this this next group here and i already have my uh logical expression get a so i can transform that now to do either the physical uh multi expressions to do a file scan on a essentially a sequential scan or an index scan i'm ignoring you know how you define what index it is but assume that for every every possible index you could have you would have one physical expression for that and then you knew obvious pruning to say like if the query doesn't access this uh you know these attributes that find my index that i wouldn't want to include that anyway so let's say that uh i evaluate both of these and i run through my cost model and say which one of these is actually the best so for to to make this work in the in the cost model you actually have to send along um more information about the context of the query plan for all these other things we don't have a physical plan yet so it's sort of based exactly on just what's going to cost actually execute this this one one file scan so say that this one is the best this has cost 10 again which is an internal metric that means nothing to the outside world but for our purposes we just say it's 10 and then in our memorization table we'll record that for the group with the output of a the file scan what had the lowest cost and we have again record information how we got there uh what was our logical transformation to get there and then what the cost was as well so then we go back up now into the next group and do the same thing go down the next one we begin we apply our transformation rule from the logical expression to the physical expression and then for our purposes again we say that the the file scan is has the lowest cost and we update our memo table all right so now at this point though uh we can make a decision about uh what the next transformation that it is that we want to do so we could do uh from a logical to physical and now try to figure out what the joint algorithm we want to use or we could do a lot another logical to logical to now try another combination of the of the joint ordering good okay um so for our purposes again this this is what these priorities you can define for this example here we'll just say well let's do a logical to logical transformation um and now you see what the memorization table will help because when I want to say what's the cost of actually accessing b or a and these arrows should be switched um I've already computed these costs right this is the optimal selection for for for this part of the tree here so I don't actually need to go to the cost model and say and or do any of these transformations say what's the cost of accessing these things I know I have it in my memo table and I can just reuse that yes your question is the cost predefined uh wouldn't I have predefined you see there's a thing called the cost model you give it a query plan and it says I expect you to execute you know with cost x and what x actually means is in turn to the system um we'll see this next class you know it's often the card analogy of the operator how much data you're reading how much data you're now put so it looks like if you want to memorize the cost model I could know that these costs should not be calculated in brown so the statement is for this example here the cost model needs to know that the that would mean from the sorry say what do you mean if I if the cost model is a completely separate component I tell it to do a join b it gives me a number yes and then I ask it to do b join a it doesn't necessarily know to reuse the previous cost system oh yeah so the cost model is dumb doesn't know anything all those memoization tables that's stored in the optimizer so the optimizer has to know how to apply join to combine two cost model results into a new cost right um this part is actually not clear to me uh it seems as if you just want to be able to take the whole thing and compute the cost uh but the problem is you haven't defined how the hell you're actually doing these other things so you could set these to be infinity because you don't know and that's that's a that's your upper bound but that doesn't help you because now you're not be able to compute you know do any pruning to get the very top from what I can tell in these in case in Colombia yes they do a summation add these things together and for things where they don't have a physical plan uh they rely on a gross approximation based on uh cardinality but that to me that seems kind of hacky and it seems like again you're you're you're bleeding logic from the cost model now into the to the optimizer I don't I actually don't know what we don't I don't know what we do or we can ask Bowie all right so uh and now we can uh you know there's only possible two logical transformations for this output a join b b join a so now we've exhausted all the transformation we can do so we could apply all the physical ones right and for our purposes this thing we just we generate all of them so we can do uh a join b with a nested loop a join b with a sort merge I guess I'm missing four we could be joined a with a nested loop as well again we do the same thing we could we could uh use the cost model to estimate what the cost is for each of these and then we can take the summation of the the cost generated by our children groups and add that together to the cost of what we need to compute this and then we update our memorization table to say that doing the sort merge on a and b uh is the is the best it's most optimal plan so far for for this group we go back up now to the tree now we go down go down the other side and do the same thing on on c and now we have another cost so at this point now um again we have that same choice where we can apply additional more logical transformations to expand out now the root of the tree or we can do physical transformations and expand those things out right this really ends ends up depending on what you know the priority for these transformations but then what will happen is because now we have different join orderings for uh and for a root group this will create much new groups that again have all the same information about how to do transformations for the data that it has as well in here so this is clear at a high level right in the system our case you start at the bottom you have nothing all right none of your tables are sorted so you you start with the access methods or the individual tables then you go up and say how do I join two tables together and then you go up even further how to join three tables together right where this is sort of starting logically at the top and you and you go down so what's one one big question we have about this right i'm at i'm at a terminal i write my query hit enter i don't want to take minutes right if i have a lot of tables this could be an exhaustive search so we need a way to know when we should we stop so uh this is called search to termination termination so essentially what how do we figure out when is actually the right time for us to halt the search and whatever the best plan we've seen so far we say that that's what we're going to go use it may not be the optimal plan but again we knew that going into this because we knew we weren't going to do an exhaustive search so the most obvious thing is just to use wall clock time and it's what pretty much everyone uses essentially to stop the optimizer if it if you know you've exhausted the search everything um or you stop right away otherwise when you hit this timer you just stop and spit out what the best query plan is um a you can also define a cost threshold um but basically you say if i have some uh some current uh cost that i want i know i want to try to compare against and if i'm 10 percent better than that then that's good enough i'll just stop right there so this is not something you typically do in a real system because again the cost model is spitting out these internal values that don't have any real meaning to the outside world so this can be used for things like prepared statements where you say i have my my query plan that i generated an hour ago for this prepared statement let me just run up to opportunistically the the the cost model or the optimizer and if i find a plan that's 10 percent better uh then i'll stop right away otherwise i'll terminate after after you know a thousand milliseconds or something and then and then the last one is that i said before it's like when you know there's no more transformation you can do on your plan you just stop right there right and this is typically done on the per group level so if you know you've terminated or exhausted all groups you you can stop entirely so now as i said before that uh cascades model came out of this movement called the optimizer generators in the late 1980s early 1990s um and so it's been implemented in a a couple different uh toolkits from back then i already mentioned wisconsin up plus plus and then the uh at a parlor state university um they had this thing called columbia which again i think is the best uh it's not the best open source implementation but the best description of how this thing actually works um but since then the cascades model has shown up in other uh other modern toolkits so pivotal oracle as orca is a uh cascade implementation in c++ that runs as a standalone service and we'll talk about it in a second but basically pivotal designed this to be the query optimizer for all their sort of big data system so they had this thing called green plumb which is a distributed uh parallel version of postgres sort of came out the same time vertica did um and they had this other thing called uh hawk which is like sequel on top of hadou but rather than writing query optimizer for each system you could run orca as a as a standalone service send queries over xml and it had all the transformation rules to do your query optimization it would spit you back a um an optimized query plan we looked in this when we first started building our system we abandoned it because way too much xml and it was not really there was no documentation when we looked at it before but i think they they since fixed it up but we've already moved on and then in the apache foundation they had a synagogue cal site is written in java and this is designed to be a java based implementation of cascades i don't know how good it is versus orca versus what i think i do um i've heard good things i've heard bad things um but we haven't looked into this in terms of cascades implementations that actually integrated in systems sequel server is probably the most famous one of this the sequel server's query optimizer is really really good the cost model is really really good we'll see this in the paper you guys need read for next week actually what today's monday right what is today on wednesday uh i woke up at 3 a.m i couldn't fall asleep all right uh in the paper you guys read from wednesday their query optimizer is is the best and they do they do a benchmark on which adventure ones this one produces the best plans i'm not saying because it's cascades i think they also do awesome have awesome cost model and statistics and sampling um but this is probably the best implementation of cascades that's most well known tandem not in stop sequel was a company it did this company they got bought by uh hp it's still around but uh i had to bleep this but people still pay for this uh but it's not really being updated anymore claustrix has a small little blip on their documentation to say they have a a uh that they have a query optimizer called sierra that's based on cascades and then in our system in peloton we've been spending the last year and a half to actually try to build out our own cascades implementation um the op plus plus one is interesting because as i said it's meant to do both top down or sort of bottom up and top down search all in a single framework and you can decide as the database system implementer which one you want to use all right so kind of going through real quickly um predicate expressions are a big deal that we have to deal with um in the case of cascades uh we can represent them just as more operators right and so then the the transformations that we can do uh for you know things like joins are essentially applied the same way for uh for predicate expressions and we rely on the fact that we know the rules of of the boolean logic or logic in general to make sure that we generate predicates or apply predicates in a way that's that's produces still the correct result so the most common optimization you want to do is predicate push down and in case of cascades there's essentially three ways to do this so the first way is to end just embed it as a rule or transformation in in the in the search model that's applied like anything else um and the advantage of this is that you can use your cost model again to help you decide whether a pushing down a predicate a certain way will actually produce a better result for you um the alternative is actually to do either perform the push down either before or after you generate the optical plan during the regular uh uh cascade search process so this is sort of what Postgres does right they do a read write phase where before you start converting things from logical logical operators and physical operators you do the push down in here um the literature says that this actually makes it tricky to do complex predicates but Postgres is really good so i don't know whether that's true and then the alternative which is actually a bad idea but for for completeness we can talk about it is that you do all your query optimization that that you would do normally ignoring how to do predicate push down and then once you generate a optimal plan and quotes uh then you do your predicate push down the reason why this is a terrible idea is because if all of your scans now are just doing complete sequential scans because you haven't pushed down the predicates then the cost the selectivity estimation for every single uh uh you know look up on a table will be one so therefore you'll never actually you'll probably generate the worst possible plan um so nobody does this i probably should just remove it uh don't do that but another interesting thing we got to deal with is uh what's the right order for applying predicates so in most of the literature you know you see the where clause and the where clause will say you know a equals one and b equals two and you just assume all right well i'll just do a equals one and then i'll do b equals two but it may maybe the case that not only will have predicates have different selectivities they'll actually also have different costs to actually apply them so the standard way everyone always does this is just you pick the selectivity how to reorder things but there may be a case where some predicates are super expensive to evaluate so maybe you don't want to actually uh apply them at the bottom of your query plan and you may want to migrate then move them up to to a higher part so an example here right i say i have my single select star from foo where foo dot id equals one two three four and then i compute the sha 512 hash on a some kind of string and see whether that matches that thing that i'm looking for so it may be the case that this thing is uh is has reasonable selectivity but computing the you know 512 bit hash on the value is really expensive so maybe i want to do that uh much later when you know when i'm producing the final output or if i had a join maybe i want to do this after i did the join right yes so this question is uh it's not a query it's a physical design question so this question is in my example here i'm computing this on the fly say i was executing this query all the time does it make sense to pre-compute the sha 512 for this column store that internally and then now when i when i do my my x to my query instead of computing this on the fly i had that thing pre-computed for me that's essentially called a materialized view all right uh actually you can also do it as an index you can build an index with functions that'll essentially act like a materialized view um so this is a physical design question that the most data systems do not do for you uh the administrator has to do this we're trying to do this automatically and this is actually one of the things we want to build in the self-driving system right there was work done by microsoft in the 2000s late 1990s where they can suggest materialized views to solve exactly this problem here yes what would you be able to estimate this activity for sha 512 bit so his question is how could you even estimate the selectivity of this because this is a black box correct so this is actually one of the big problems with functions and especially user-defined functions is that from the optimizer standpoint of a cost model standpoint this is a black box i have no idea what the selectivity is now you can be a little bit clever and say like well i'll watch what happens a couple times and then i compute the selectivity based on that um microsoft has a paper where they can convert uh udfs into actually a relational algebra and just throw that in the query plan itself and then run that through the regular cost model uh that came out and be able to be like two months ago uh that's actually super cool um no other system can actually handle this udfs are always treated like black boxes and you just take a worst-case estimate yeah uh it just confused me that would have been wouldn't the whole credit case be impacted we determined to be forced when you really don't refer to the president like a full idea if the two plus full idea is not equal to when before then you jump out then and you're not capable to say all right so yeah so that's me or yeah so he's absolutely right so in the case if you buy with well no not necessarily right this doesn't have to be a primary key right i could have a billion tuples that match one two three four but then half of them match match this so the first filter could let everything through or let some subset through and then i do further refining with this if it was a primary key sure as soon as you match right you're done and therefore you're only better one too but in this case here assume it's not unique let's talk a little about orca for a bit as i said it is a standalone cascades implementation and the way it works is that you run it on a separate machine and you invoke their api to send over cattle information stats and then logical plans and then they already have the transformation rules written for you to do the um you know to do quoted optimization in the search in a cascades model what's really interesting about it is that uh they actually support multi-threaded search so up until you know in my example i just showed where here's what a single thread will do when it does search and most database systems search is always single threaded right because you know you kind of want to have the your threads executing queries not crunching on the query the query model the query search but in case of orca again it's designed to run on a standalone machine so you can blast it with all the cores and it's not going to not going to slow down your your your system of course now you got to deal with the issue of make sure you send your catalog and stats all the time you keep them fresh and make sure you get correct estimate estimations um but this is certainly an interesting approach and it has been some work on on an academia on using gpu's to do query search but as far as you know that hasn't made it into to any real system what i like about orca though is especially in this paper they talk about how to build a real query optimizer in you know for the real world um and so there's two issues that they had to deal with and they talk about how to actually solve them so the first is that uh they want to be able to do debugging uh for things that crash remotely right and so normally what happens is that when you run your you know you run your program and you hit a segfault or some problem it'll core dump right and you'll see the stack um and you'll see what's in your heap but you're not going to see all the information about the path you took to get to uh you know to to that particular problem that you were hit so what they would do is they would have the ability to basically maintain the trace history of how you're searching in in the tree so that when you crash you get all this extra state information dumped out that you can then send back to their offices and then help them try to reproduce the exact error that you were hitting um now the cloud guys don't really have this issue because if the you know if the system crashes running on your cloud setup you just look right into the side of it and see what's going on all right the other issue is that they wanted to make sure that their their optimizer was actually generating accurate um query plans in terms of uh in terms of the cost model so the big issue is if you're if you're you your query optimizer generates two query plans one that thinks of the optimal one and one thinks it's it'll be slower you want to know when you actually run them that the one that I thought was faster is actually is truly faster right the cost model doesn't need to be accurate in terms of like that that one is 20 times faster than the other you just want to know that if the cost model thinks this one's better than this one it actually should have be look that way when you actually run it so they have a toolkit a testing framework I forget what the name is but it's uh it's actually not open source I haven't been able to find it but they have a whole way to sort of set this up as like a fuzz tester to try out different query plans run them see that uh the the the relative order how they actually match up and the cost model matches up their real order when they run them so to finish up real quickly with mem sql again the the reason why I had you guys read this paper is because um it's interesting to see they describe things differently than how we in the columbia paper or the cascade paper but at a high level they're sort of more or less doing the same thing so it's not a cascade implementation um but it has bits and pieces from from everything so they have basically three components they have a rewriters a numerator and a planner so the rewriters is doing logical to logical transformations but they can actually access the cost of cost model in the way that the heuristic stuff couldn't do to actually see whether they're actually doing the proper transformations and then for the numerator is mostly focused on join ordering but this is where they were doing the logical physical transformations and then the planner would actually take the physical plans generated by the numerator and then convert them back to sql and then send them to other execution notes so again this class is all about single note systems this paper is about them being a distributed query executor or distributed query planning they make a big deal about how their thing is is keeps track of the distribution in a way that other systems don't which they are correct that's the right way to do it but i didn't really feel like that was the major contribution this last piece is actually most interesting because as far as i know nobody actually does this everyone else would if you have a distributed data system they ship around around the physical plan right because that and everyone knows how to execute those but they actually ship around sql so sort of what happens is if you have your your your sql generated on one one node then you ship that to another node they basically have to do this all over again right to put it back into a physical plan that they can actually execute but what's interesting about this is that because they're you're doing the planning at the local node you're actually going to for the for the sort of sub plans you generate here you actually generate an optimal plan based on what you know about the data locally whereas the in a global system the stats may not be up to date and the wherever wherever node did the planning may actually generate a uh an optimal plan because the data has changed since the since you got the last stats update so again at a high level this sort of looks like this uh you have a parser it's like everyone else and you have abstract syntax tree you bind the object IDs to the what's in the catalog then you feed it to the rewriters i don't know whether this is actually true whether you can go back from the numerator to rewrite or but essentially this does all the logical transformations this does logical physical transformations they need to take the output of the physical plan and run it through what they call the planner but this just emits sql right the confusing thing about the paper is they talk about how the rewriter can do sql to sql transformation or sql to sql rewriting but then other parts of the paper they say it's doing logical to logical transformations so i think they're doing this i don't like anybody would do would modify sql to sql um all right i just want to finish up real quickly all right so the as i said before this is the quote from from david duwit it may not be attributed to him but it is in his slides that he gave at microsoft a few years ago but he says query optimization is not rocket science when you flunk out of query optimization we make you go build rockets right because again it's just i've covered it in two two lectures people have spent decades looking into this um again and the research literature originally suggested that there was actually no difference in the type of quality of the of the plans you can generate using the bottom up versus top down search strategies and this actually comes from the op plus plus paper that david duwit wrote in the 1990s um and as we'll see next week none of this actually is matters at all if your cost model is generating crappy estimations in order to have good estimations you need statistics and other things sampling to get make this actually work any questions yes so this question is what's the what are the trade-offs for the standalone implementations versus the integrated implementations well i mean there is a latency issue right because you have to go send the request to the standalone thing uh on the other hand you don't have to build one because you can just use theirs uh i mean there's pros and cons of these things the i say the major issue is not having direct access to the data or the database and knowing you know having you know perfectly up-to-date stats for these things but you have this problem in distributed databases as well right think of this think of our old lab right my query is going to run for an hour do i care if it takes a few milliseconds to send over the request to do planning on the machine and then get a response back probably not all right so i say here as i as this that there the research suggests that there is no difference in the quality between the bottom up versus this is the top-down search strategies so gertz graphy was awarded a uh i think the test of time or system innovator for award at sigmod last year and he gave a talk basically was big in a one hour crash course of all the things various things we've talked about that he's worked on throughout his entire career and at the very end he started talking about cascades and he mentioned oh you know it's an orca it's in calcite it's in cmu system but then he sort of made this off hand a comment at the very end just like oh well actually maybe cascades isn't the best and this was captured by joe hellerstein who is a database professor at berkeley and he says surprising comments from graphy on query optimization and he says use dynamic programming for joins the the bottom up approach um and use cascades for extensibility operators so being able to support you know all sorts of possible different complex queries and then he expanded to say that it's a paper from thomas norman the hyper guy showing you that the doing the dynamic programming approach is more efficient than cascades for joint enumeration so uh the guy at event of cascades maybe says the cascades is in the right way to go uh at this point we're going with cascades what can i say right okay at some point i talked to him about what he said all right extra credit real quick so the extra credit assignment i think it's 10 percent is that you can write a and a psychopedia article about your favorite data as minimum system right uh we're actually building a new website um like very very close as in like a day or two from actually putting it up we had to outsource it explain later uh but the basic idea is that we're trying to build the database of databases so you want to write a wikipedia style article about a database system but rather than being free from text like it is in wikipedia you actually there's actually a taxonomy defined with pre-selected options for you for the different components of the system so here you go every go to wikipedia and you read like the the article on postgres and the article on my sequel right sometimes they refer to you know mvcc in different ways right whereas there is you know there is a standard way to define what mvcc actually is so the idea is that for all these different features of the system you select the predefined options and then you write a little paragraph and say here's here's why that who is out out it's actually implemented right and then you provide citations so the website sort of looks like this so this is example what dream home looks like and again there's different subsections to say here's what the current role model looks like here's how to do checkpoints here's how to do data models right so essentially all the things we're talking about in this in this class and then it's hard to see in this but like then you have again a form you can edit things and then it's going to allow us to do again it's butchered my holy grail is then be able to do a search in the psychopedia and say show me all the data systems that do mvcc show me all the new systems that do two-phase locking with deadlock prevention right by having this taxonomy and that that's semi-structured we allows us to do the search so the website is dbdb.io uh the database of databases i thought about using db2.io but that's asking for a lawsuit um so i what i'll do is i'll put a sign-up sheet on the on the google spreadsheet go pick whatever system you want to write about some systems are already being written out in the past from previous years and so i'll show which ones you can't use and so first come first serve so what i'll say is if you pick a widely known one like if you pick oracle for example there's a ton of information about oracle so therefore i expect you to have a you know comprehensive and well written and well cited article um if you choose something more obscure pumpkin db right hamster db whatever right uh then there may not be a lot of documentation and i maybe i can help you find something or we can get in touch with the developers and help you fill in uh which what's missing so i've shown this uh i've shown this list before uh so i have a spreadsheet of all possible databases i know about it's around like 500 or so uh combination of academic and commercial ones uh and source ones so there should be enough for everyone to pick okay yes the database okay so again i've warned you guys before please don't steal whatever's there right uh it's you know if you make sure everything's cited write everything in your own words don't just just don't take you know cut and paste from the the you know dirt documentation and put it into the website okay again i'll i'll post the the website uh later today and then and then sign up sheet okay all right so next class cost models and then i'll spend a little time talking about some general tips about how working on a large code base all right so i need back everyone's final exam and the ten practice or ten solutions just go yeah one court and my thoughts hip hop related ride a rhyme and my pants intoxicated lyrics and quicker with a sip of moe liquor since i'm a city slacker play waves and pick up rhymes i create rotate at a way too quick to duplicate filipines as i skate mics at fahrenheit when i hold them real tight then i'm in flight then we ignite blood starts to boil i heat up the party for you let the girl run me and my mic down for oil record still turns with third degree burn for one man i heat up your brain give it a suntan to just cool let the temperature rise to cool it off with same eyes