 We've been presenting this paper called Generating Compiler Optimizations from Proofs. So there's kind of two ways I can present this. One involves math, and another one doesn't, but it's really short and doesn't really explain things. So I'll try and do that one first and see what happens after that. So let's talk about the main problem that we want to solve. So given an optimization that compilers do, it's given a certain program. You transform to another program, which hopefully does the same thing as the first one, but presumably runs faster. So the way optimizations are implemented in compilers, people hand-code the kind of optimizations that you can use and then just throw them in. So this paper has quite an interesting idea, is that given the original program, so any program that's to be optimized, the final output of the optimized program, and a proof that both are equivalent, you can actually derive the most general optimization from these things. So in an automatic way, but without doing math, it won't look very automatic. So let me just clarify what a proof of equivalent should look like. So let's have our fake programming language, which has plus and minus and numbers. So here is a program in our language. So what we can do is that we can note that we use this axiom like x minus x equals 0. We apply it here and then we get 8 plus 0. And after this, we can apply identity for addition. So that would be if y equals 0 and x plus y, then x plus y is equal to x. It's a bit weird how I wrote it, but it's a bit hard to explain why it's like that. Just bear with me for a while. So we have our original program on this side and optimized version, which does the same thing as the original. And so we would like to derive a most general optimization based on this proof of equivalence. So a naive way to do this is to just replace equals by equals. So in that manner, you will get x plus x minus x goes to x as your optimization. But this isn't really the most general optimization because there's nothing forcing this x and this x to be the same thing. So the most general optimization is in fact x plus y minus y goes to x. So to give the intuition of why this is actually possible is because given a proof, let's say we are given a proof that maybe it's written by a human or generated in some way, a proof usually encompasses what is important about the transformation. It doesn't use the fact that this x was the same as this one. So the proof actually itself contains enough information to derive this generalized optimization. So the key idea behind this is to start from the n and somehow apply the proof rules backwards. And while applying them backwards, you kind of mess with the thing such that everything lines up. Yeah, so let's say you started from this thing. A generalized form of 8 would be some variable p or something. And then you want to kind of go backwards applying this rule. So you want this side to look like 8 plus 0. So you need to have some q and you replace this thing with q here. So it's a bit hard to explain, but you replace the p with, so we had one substitution which was p goes to q plus 0. Is it basically some kind of a better matching? It looks like unification, but it's a bit more complicated than that. Yeah, better matching is kind of along the right idea, I think. Yeah. Like an unfolding? Unfolding. It's like a bigger program than anything. Maybe. I'm not very sure about that, yeah. Sorry. Yeah. So you have kind of like, OK, so following this axioms, you actually have q plus r. You have two things that you will need to show, which is q plus r. And you have q plus r and you also want to show that r equals 0. And for that, you apply this axiom backwards. And then you get something like, you can substitute r for s minus s. And then you finally get q plus s minus s. And then everything works out in the end. So that's basically it. So I'm done. So let me go and talk about some possible applications of this idea. So the most obvious one, of course, is to give instance of a program and the optimized version and then give the proof that the two are equal. Like for example, by hand. And then you could actually teach compilers optimizations without actually writing the code for the optimizations. Yeah. Another application that they came up with was, suppose you had a super compiler, a super optimizer. Don't know what you would call it. Yeah. Which basically does a lot of analysis, which are very time consuming, to optimize the program as much as possible. So what you could do is you give your super optimizer the original program. You tell it to optimize it to something simpler. And then after that, you apply the generalization and you get something of this form. And now instead of doing whatever analysis it did here, you can just do a pattern matching on your program. And it just becomes like a single rewriting from one to the other, which comes a lot faster. And also because this algorithm was actually designed in very abstract terms. So you can actually apply these ideas to areas such as database lookup optimizations, where you generalize relations between columns in some way. Or you could also do improve type checking errors in languages like Haskell or something. Because if you have a type checking error, it says that this expression should be of this type. Then you could take the statement at the end, which is like expression e should have type x. And then you basically use this to ask why should expression e have type x? And when you generalize something, you remove everything that's unnecessary. So this simplifies the error message to tell you only the parts which forces the thing to be of a certain type. That's actually all I have for the no math part, basically. So I'm not sure if I should. You're not on the math part. Let's do it. Because I might bore a lot of people. So if this wouldn't generate a lot of rules, if the problem is large? The idea is you only do one optimization at a time. But you provide a short program, and then you generalize this. As in it generates a generalized version of optimization, there's only one. The more, yeah. So it's not like you generate many different programs. So you feed in the one on the left, right? This one. This one. Do you have to feed in the axioms? Yes, you will also need some set of axioms and the proof itself. So then they will return you the optimized one on the right? No. So what you give is this thing, the unoptimized version, the unoptimized instance, the optimized instance, the proof of equivalence. And then it will generate the general instance. Yeah. You have to feed it intermediate states. Sorry. You may have to fill in the intermediate states. Yes, because you need all the steps of the proof. Yeah. So that means you have to give in the optimizer generator to the evaluation? Like there's different ways of doing it. You cannot escape from the fact that you cannot just prove everything automatically. But you can automate some parts of it. I believe they did try to automatically generate proofs of equivalence. But in the basic application, which I talked about, it's like you start with one optimization, then you do it by hand, and you prove that it's equal by hand. And then it just learns the optimization about you calling it. Sorry. One question in one comment. Because it seems to me that the majority of the work here is done when you pattern match against the nonlinear pattern. So you have 8 plus 8 minus 8. And by choosing the two 8s on the right side to both be S. But the left-hand side 8 is just Q when you abstract it. That's something that makes this work. But how do you know? You need to do some math for that. So I haven't done that. I mean, why not write it as S plus, S minus 8? Yes, to explain that properly, you need to go into more detail as to how you actually execute proof steps backwards. So I can do that after this. But it will take some time, and it's confusing. And the other thing I wanted to add is that first there are arithmetic systems, which are solvable. So if you only use Pressburger arithmetic, then of course, you should be able to come up with all these. You can do it automatically, which is nice. But it depends. I mean, I don't think you will use Pressburger arithmetic only for your programs. So yeah. But there's still manual effort involved, but it's a nice idea. So any more questions? So let me try and do some math, I guess. OK. I'm not sure actually if I can do all of this, but I'll try. So actually, the original formulation of the algorithm was using category theory if anybody knows what that is. So I'll just go through a super fast overview. So if you have a category, a category is a collection of objects and arrows. So for example, the easiest way to think of them is just as sets. And the arrows between sets will be functions. Let's say functions f and g. And when you have these arrows, just need to obey a few rules firstly. Every object has an identity arrow. You can compose any two arrows to form a third arrow. And the last one is that composition is an associative operation. So the idea that they came up with was to encode the axioms of your proof system as arrows in a certain category. Not in the usual sense, but let's see. So let's talk about the category of binary relations. So binary relation r is a subset of two poles. It's a subset of two poles. And we can express them like x, y, y, z. So we can write them like this. So I say x is related to y and y is related to z. And so let's say we want to encode like transitivity of this relation. Then the transitivity axiom could be an arrow from x, y, y, z to x, z. So what's I got to say? So arrows in this category of relations are basically substitutions of the variables such that every relation is preserved. So with x, r, y, then any substitution, sigma x, is also related to sigma of y. So the arrow here would just be the inclusion from this relation to the other. And then let's say I have this other thing, A, B, B, C, A, C, A, A, something like this. Then we can say that this relation is transitive because there's an arrow from this into this f. The arrows f and g such that from the left-hand side of the axiom to the original and the right-hand side to this, such that the whole diagram commutes in the sense that applying g after the transitivity axiom is equal to applying this straight. And then we would like to encode using axioms to derive more information about a certain thing which we didn't have before. So in this case, it would be if you started with A, B, B, C, A, A, for example, then you would like to use transitivity. Let's say you know this relation is transitive. Then you would like to use transitivity to derive that A is related to C. So the way it works is that you take a push-out of this arrow and this arrow, which I should have explained just now, but I didn't. Yeah. Let's see where it is. They call this application. So just to show that this works is that you can have, this arrow would be the inclusion of this set into this one. And this arrow would be the inclusion with the substitution of x going to A, y going to B, and z going to C. So I need to explain what a push-out is. And a push-out is given objects A, B, and C and arrows F and G. A push-out is the object D such that for any other object, any object E, there is a unique arrow from D into E. So the way to think about this is to think of arrows as like saying if there's an arrow from A to B, that means B has all the information that is encoded in A. That's one way to think about it. So this is saying or B has the structure of A because these are usually structure preserving arrows. So in this sense, if B has the structure of A and C has the structure of A, then the push-out is the way we can glue both B and C together that contains the least amount of information we've got. So in our case of XM application, we want to find the smallest possible relation that is transitive and has this structure as well. So this would be a single XM application. And if we have a whole proof, it will be a series of XM applications. So we will get a series of push-outs like this in this manner. So we're well connected by this part over here. And at the bottom, we will have our full concrete proof that we want to generalize. So let's see. Let me do one more construction. So we just now did a push-out. We now talk about pullbacks, which are the dual notion to push-outs. So it's given two arrows here. We find an object over here such that for any object which goes into this, we have a unique arrow this way. So push-outs correspond to gluing together these two things along the common part A. And pullbacks correspond to identifying the common part A from B and C. So let's say we have, how did I write it? If you have. So let's look at generalizing a single proof step. So this will be a single proof step. We would like to generalize it by finding some objects here that look like this. So because our axiom instantiates to our concrete proof, but we want to find a more general one that instantiates into the concrete one, but it's also instantiated into by the axioms. So one thing to note is that, yes, if we are just talking about a single proof step, the most general generalization is the axiom itself. But the reason this doesn't work when you have multiple proof steps is because the push-outs don't look like this. They are not connected in this way, but rather this thing is connected here. So in simpler terms, when you're doing a proof, your axioms don't always apply at the same part of the term. They can apply to different parts of the term. So simply just choosing the axiom isn't enough. How am I supposed to do this? So let's think about an example of optimization. So when we're optimizing something, we want a proof of equivalence. So we would like to point to something in the final object that we want to generalize. In this case, for optimization, there will be an equivalence that we want to generalize. OK, let me digress a little bit. So let's look at our proof again of 8 plus 8 minus 8 equals 8. So if we write it as like a AST kind of thing, so you initially have this. And then we use 8 plus 8 minus 8 equals 0. The way they represent this is they add an equality edge, which says that this subterm is equal to this subterm. So this subterm equals 0. And then you apply one more. You use the fact that 8 plus x plus y equals 0 equals to x. And you get this kind of term. So this says that the whole tree is equal to this subterm over here. And so when I'm talking about in the context of this optimization a single property would be an edge in this diagram, an equality edge in this diagram. In this case, we would like to generalize this equality over here, which is the final equality that we derive. So let's see. So it's also a little bit mysterious to me sometimes. But the way it's constructed is first you take the pullback of this of C and P, which identify the common part of the property we want to generalize and the outcome of the last axiom. So this gives you some object O. Then you glue together C and P along the common part that you identified. So this gives you some G prime. And because of the push out property, this has an arrow into E. Then we would like to find some object here that is like the equivalent of this thing, but for the original left-hand side of the axiom. So the way they do this is that they actually define a new categorical construction. And then they just say that if this categorical construction exists, then you can do this generalization using that method. So they call this push-out completion. It's a bit weird. But so given F and G, the push-out completion of this is an object D such that this whole thing is a push-out. And furthermore, for any other push-out that factors through G, there is an arrow going from here into here, which is what it's trying to say is that you take, hold on. So this object here happens to be the contents of C minus B and plus A. So that's a bit weird. So let me try and talk about sets, for example. So if you had a set A, maybe this was x, y, z. And then so there's an arrow into this set. And let's say we had x, y, z, w, d. Let's say we had these two arrows. Then the push-out completion would actually be x, y, z, d. So it's like from here to here, by applying this rule, we added another w. And in our instantiation, there was another d added as well. So we undid the application of adding the extra w, but we still kept the application of the d. And because of having this arrow inside from d to e prime, you ensure that this is the smallest possible object that has these properties. So in a way, this construction basically encapsulates what it means to apply axiom backwards, to undo the effect of axiom. And so they just apply it to this part over here to get your generalization. And so now once you have this, you can chain multiple of these things together and you can derive a generalized proof of the whole thing. So we should actually look at a single example using this example over here. So let's see. So our concrete optimization was of the following steps. So we started with this. We applied the axiom x minus x goes to 0. So this is our axiom. I'll just label this itch just for clarity. So and then we applied the second axiom, which was x plus y equals to 0. Circle, x plus y, and if y equals to 0, then the whole thing is equal to x. This is circle. So they call these kind of expressions extended program expression graphs or something like that. So they showed that these things actually form a category with all the necessary constructions. And the axioms allow you to take pushouts and pullbacks and whatever. Zero. OK. So now we identify this edge which we want to generalize, which is this one. The one we say is the whole thing is equal to 8. And this gives a single edge is we can point it out in this thing by this AST with some variable p equal to q. So this is like just saying p equals to q. And then it's pointing out this edge in the whole thing where p is the 8 and q is the rest of it. Taking the pullback means finding the common part between this thing and this thing, which happens to be pretty much the same thing. It's also p to q, but I'll just rename them. So when we take the pushout, we join these two on the common part. We join this thing and this thing along the common part, which basically gives you the same thing. Because as I said, when you only have one axiom, the most general generalization is the axiom itself. So it shouldn't be surprising that we are actually just getting the same thing as the axiom here. So we call this A plus B. Just to be clear, my arrows in this category are substitutions and making one thing a subtree of the other. So here, we substitute the x with A, y with B. And here, we substitute the p with A and q with B goes 0. And so likewise, taking the pushout completion gives you the same thing. It's undoing the effect of the application. And with this representation, undoing the axiom of that effect of the axiom is usually quite simple because it's usually just remove all the things that were created by that axiom step. So this simply becomes this. So now we repeat the process one more time. This one is slightly more interesting. So we have this object and this object. We glue it together over the common part, which is this equality edge that we are now generalizing. When we take the pullback, we are not gluing it together. We are identifying the equality edge. We are identifying the common part. So that will be some r equals to s. And when we take the pushout, we glue these two things together along this edge. So we match this edge with this edge. And then we see that you have to substitute this x with some B minus B so that this subterm becomes the same as, I'm sorry, you have to substitute y with this part over here so that you can glue the tree together along this edge. So you will get, oh, this should be A and B, not x and y. So you get something like A plus C, C goes to 0. And finally, the pushout completion just removes this equality edge. And then you get A plus, yeah, OK. So then this is a generalized form of this. And to obtain an actual proof, you have to like, this isn't exactly quite a proof because there's no arrow from here into here directly. But you need to replace, you need to like undo, like you need to substitute all the correct things in for the variables here. For example, then going from here to here, we substituted C minus C for B. So if you, so you have to substitute that back inside here. And then you repeat that. You just cascade all the substitutions. And then you get a generalized proof, yeah. So that's the, that's how it works. I don't know, but it's kind of nice to me. But so yeah, I just think that this was quite an interesting, the interesting thing was that like category theory is normally this very abstract thing that nobody really cares about anyway. So, but it was quite surprising that people actually, that they actually conceive of this algorithm using such an abstract foundation because according to them, thinking in this manner clarified a lot of the weird hand waving things that you would do when you were thinking about different, certain instances of proof systems. Yeah, OK, so that's all. Yeah, I don't know. Any questions? Maybe. Sorry, it could be applied to super compilation, but it doesn't give you any proof, right? It just gives you a. No, as in the idea would be, so the way they did it is that they were able to automatically generate some proofs of equivalence. Yeah, so they could use, they could use these results for that. Yeah. You mentioned that the, this is only available for, for categories and like arrows where you can construct like push-out completion, is that right? So does that limit the? Technically it does. So you need to have, you need to have, firstly your category needs to have a way of instantiating and coding axioms. You need to be able to take push-outs with the axioms. You can, you must do pull-backs and you must have this push-out completion thing. So yes, like technically this will limit the kind of maybe axioms that you will be able to work with, but they, according to them, they said that all of the common axioms they tried, like distributivity, so on and so forth, were able to, they were able to encode it. I was wondering what the programming language actually, like is there like, does that limit the, like the features of the programming language that you're like, does it only look like graphically transparent, like programming languages or like... Maybe, but I mean they seem to be working on some C, like they seem to be working with C or at least a fragment of C, yeah. So I would say like, it works with at least most of the common cases, yeah. Do you have any experience in like, track interpretation? Actually I've been, I know some things because I've been reading up, but I can't say I've actually like done stuff. Yeah. This kind of reminds me of the, like, it's about finding a middle, like, net is between the concrete and the track. I'm not very familiar with that, but I can't really say whether there's a connection here, yeah. Sorry. So in a classic optimization problem, you want to not prove that the optimized program is equivalent, but personally to find the optimized program. So how does this work in this setting? Because here everything we've seen is about proving that, yes, it's a valid transformation, but does this give any tool to finding the transform? I don't, this thing doesn't give you a way to do that. I think the upshot is that, the upshot is mainly in the ability to say for the programmer to be able to say that this thing should be faster than this one, and these two are equivalent. So can the compiler please figure out the optimization for this? So it's not really finding what the optimized version is. That wasn't really the goal of the paper, yeah. And I think more generally this idea of finding most general generalization is interesting in itself because it's showing you a way to strip away all the unessential parts and only leaving the most important things, yeah. Something like that. If the programmer starts with the optimized program, why not write the optimized programming? Because you want your compiler to optimize programs for you. So if you can teach it the optimization, then it can apply it to every program that you want. Because writing optimized programs directly is annoying some, so you want the compiler to do it for you, yeah. So can I just clarify, it takes the thing in the middle. Yeah, this is the generalized version, yeah. And it's equivalent. I mean, the programs that you use to still optimize this. Yes, they prove that if your proof system is sound and it has all these constructions, then this middle thing is also a valid transformation. So some things to note it's like, there's nothing being said about whether this thing runs faster than this thing. Let's kind of assume that you will only teach it optimizations that are actually faster than the original one. So, and also another thing is that when I say most general generalization is actually relative to the axioms that you give it. So for example, I could actually replace, like let's say, maybe not in this case, but in some cases maybe you could talk about if you use like associativity of maybe plus. Instead of using class, maybe it could be replaced with any operation that's associative. So it also depends on how general your axioms are. But given the axioms, this is the most general generalization. You do your point about assuming that you only teach it axioms that optimize the program. If you've done it the optimized program first and the original program afterwards, could it theoretically learn how to escape your program Sure. Yeah, it will work. Yeah, definitely. If it kind of funny. Here is the instance of the programs actually necessary because you have the proof already, right? Could you generate the instance from the proof? What's at the bottom? Do you actually need this as input? I mean, if you had the proof, then I suppose that you could. But the idea is that you are given the instances and then you have to find a proof for it. Yeah, something like that. I don't know. Maybe I'm missing something because that does sound a bit strange. So, yeah. What's the practical reason why it's not more commonly used? I assume that maybe... You're going to list it. Yeah, I don't know. This is kind of an old paper. It's 2010, I think. So, presumably that's something which makes it not very useful. I assume it's encoding all these axioms is very tedious and finding a proof is difficult. So that's the problem that I would imagine here. Yeah, okay. Any last questions? Okay.