 Yeah, this talk is about the new Yule optimizer, and I want to start in a good tradition. I want to start with a definition. What is an optimizer? An optimizer is a piece of software that takes another piece of software, a program as input, and transforms it into a new piece of software that hopefully requires fewer resources, or at least not more, and is semantically equivalent. So it means it does exactly the same thing. And both of these components of the definition allow for some kind of slack, so both the resources and the semantic equivalence, but we'll touch on that later. Let's see an example. So this is Solidity code that computes the square of the input number, and it uses the exponentiation operation, which is quite expensive on the EVM, and the same thing can be achieved with just a multiplication of x by itself. And so such a transformation on Solidity would be a valid optimizer transformation, because it does the same thing, and it's cheaper. Okay, but we don't want to optimize Solidity directly. We want to optimize the low-level language that we'll see later. This thing has two problems. The first one touches on the resources part. So what are actually the resources we want to optimize? On the EVM, this might be trivial. Of course, we want to optimize for gas. But if you take a closer look, then you see that, yeah, gas is not, yeah, there's no, not a unique thing you can call gas or gas usage of a smart contract, because there are at least two. The first is the gas that is required to deploy the smart contract, and the second thing is the gas that is required later if you call individual functions. And this is a, yeah, this is a trade-off that actually matters later, because some routines, if you transform them to something that is more compact but does the same thing, then this reduces deploy time costs, but most of the time it increases runtime costs. So and the current optimizer also has a flag where you can set exactly this trade-off. It's called the runs parameter. This is often misinterpreted as just the number of runs you want to, the amount of effort you want to put into the optimizer, but it's actually exactly this trade-off. And this trade-off problem gets even more complicated when you have loops, because then you might want to, so for the code that is inside the loop that is executed more often than the code outside of the loop, so there it's even more important that the runtime costs are lower, and you might be fine with a little bit more deploy time costs in such cases. But yeah, these are tough decisions for the optimizer, and it will not always get it right. And the second problem is that even if you have a clearly defined, mathematically defined metric for resource consumption, it is theoretically impossible to create a perfect optimizer. And perfect optimizer means it takes a program as input, and the output program is the best possible program that does the same thing with the least amount of resource usage. And the reason for that is, yeah, not the problem that finding this optimal program is too difficult, because the search space is too large or something like that, but it's the semantic equivalence. And more specifically, you probably heard about the halting problem that gets cited all the way for Turing-complete blockchain smart contract execution environments. And you can show via the halting problem that a perfect optimizer is impossible, and yeah, the halting problem is that it's a result from theoretical computer science, and it says that there is no program that decides on a given input program, whether it halts on all inputs or not. And because we're on the EVM here, we replace halt by revert, and this is still true. And so we now assume that we have a perfect optimizer and use that perfect optimizer to solve the halting problem, and because of that, perfect optimizer is impossible. And yeah, the way we do it is, so if a program reverts on all inputs, and it's the shortest program, then it has to look something like this empty contract here, because that's the shortest program that reverts on all inputs. And so to decide the halting problem, we take the perfect optimizer, run it on the input, and if it outputs this empty smart contract, then the input halts on all inputs and otherwise not. Okay, that's a nice theoretical result, but yeah, completely useless for practice. I mean, it's nice to know the lower bounds and where we cannot go, but if we relax this optimal thing, we can get quite far. So now next question you might ask is why do we want an optimizer? And there is probably an obvious answer, and that is we want cheaper smart contracts. But if you take a closer look, then this is not the main reason to have an optimizer. Instead, an optimizer allows you to write your code in a more modular and more understandable way. So, if you do not use an optimizer and you care about resource consumption and you always have to consider, oh, is this cheap enough what I'm writing here or not? Can I change it in a way so that it's a little bit cheaper and still does the same thing? But if you know that the optimizer will just do that for yourself, then you can write it so that it's readable, it's auditable, you are 100% sure that it works, and you don't have to care about the resource consumption all the time. And an example of that is this smart contract here. It's a simple fragment of a voting contract. We have a vote function which takes an output, an outcome we want to vote, and it checks that the user has not voted yet. And if the user has not voted yet, it assigns the weight of the user to the votes. And you see that this weight of is another function that just returns 10 for the owner and one for everyone else. And without an optimizer, this would perform a function call which is costly, so a cheaper way would be to just take this statement that is inside the function and put it at the point of the function call. But that will reduce readability, it will not tell you what this weird expression 10.1 actually is now in the way it's written. Now we see it's the weight of the vote. And also if we use this weight, if it's like that, we can use the function from other places and then modify the weights without having to modify it everywhere in the code. Okay, in the rest of the talk, I will quickly describe how the current Solidity Optimizer works and then explain what we plan to do on Yule. The current optimizer is wholly based on opcode streams, so it's extremely low level. Yeah, it has several stages, I won't go into detail for all of them. The most extensive stage is the last one here, the common sub-expression eliminator, which does much more than what the name suggests. And yeah, let's dive into that a little. So what it does is first it chops code, so it gets a stream of opcodes and chops that into blocks, blocks that don't contain jumps, don't contain external calls and some other restrictions. Then these blocks are fed opcode by opcode to the component, the component builds symbolic expression trees, so analyzes the stack usage and creates symbolic expressions out of them. Then these expression trees are simplified to, I think, 40 or 50 simple transformation rules like constant plus variable plus constant is variable plus the sum of the constants and so on. These rules will be reused by the Yule optimizer, so that's not something we have to rewrite. And after the expression trees are simplified, the component records all changes to memory and storage in an abstract way, so both the value that is written and the point where it's written to are these abstract expressions. Yeah, the problem with that is that it kind of looks like this, so on the left you have the stream of opcodes and on the right you have some kind of explanation of what the component has to store internally. In the end we can't be, yeah, there's no real way to output the internal symbolic representation, even if there was such a way, it would be very hard to read. And yeah, so, yeah, the takeaway from the old CSE optimizer is that it builds a gigantic internal data structure, yeah, it, yeah. And after it has built this data structure, it regenerates the code from scratch, so it starts from the bottom up and takes a look what the desired stack elements at the end of this chopped up block would be, then recreates the stack elements and also recreates changes to storage and memory in a more efficient way because it will eliminate multiple stores to the same storage location and multiple memory stores to the same memory location. And also if you have two expressions in the code that do the same thing, then it's only computed once. And yeah, as I said, the main drawback of this component is very opaque. Also, it does only vary local optimizations only inside these blocks. It has no notion of function, so it cannot perform inlining and also cannot do any loop optimizations. There are some stages in the old optimizer that look beyond these blocks, but they also do not do inlining. Okay, now let's take a look at Yule and what the new optimizer can do with it. There has been a talk, I think, yesterday by Alex about Yule here at Defcon4 and also another one at Defcon3 last year. We are already using Yule in the new ABI coder and the plan is to use Yule for everything else starting next week, so the plan is to rewrite the code generator of Solidity using Yule so that it can target both EVM and WebAssembly, and we will also be able to use the optimizer for all of the code that Solidity generates, not only for the ABI coder. Yule has a simple syntax, has structured components, and I think it's quite intuitive to read. But yeah, I'm already spending too much time on that, so let's take a look at the optimizer itself. Instead of building a component that looks at the code and assembles tons of information, we decided to instead replace it by a component that performs many tiny local transformations to the code. So every single step of the component does only one, so every single step of every single component of the optimizer does only one single thing, only one small transformation on the Yule code, and the output of each of the steps is always, again, Yule code, so it is always readable, it is still text, there are no big internal data structures, and at every time you can look at it and see whether the transformation was correct or not. And the optimizer also keeps the structure of the code, so it keeps functions and loops, it does not introduce go-tos, and this helps us for the translation to WebAssembly because WebAssembly does not have go-tos, it only has functions, loops, and conditions. The tricky part of building the optimizer is not designing the components, but coming up with a good strategy on when to call each component. But the good news here is that even if this strategy turns out to be suboptimal, it will never result in invalid code, because as long as we check that every of these small, tiny transformations does its job correctly, then any combination of these steps will also result in correct code. It might be less efficient, but it will always be correct. Okay, now let's take an example. Let's take a look at an example. This is UL code that computes the sum of an array. So we have the first function that has a for loop over the array elements. And this first function calls the second function. And the second function retrieves a single array element. So x is the array and i is the position in the array. And the interesting thing here is that array load performs bounce checking. So array load checks whether i is less than the length of the array. And this in the form here, it's inefficient because it's done in every single loop iteration. But it is very safe because we do it every single loop iteration. And the cool thing now is that we will see that the optimizer is able to remove these bounce checks with equivalent transformations. So the first thing that happens is that we explode this large expression into an intermediate assignment. And now we have the function call isolated. And since the function is only called once, we can inline it so we can replace the function call by the body of the function. So that's a more drastic change to the code, but it's simple enough. And now the next thing that happens is we remove this useless additional indented block. And we also rename one of the variables. Okay, now it's already a little bit clearer. Now we changed the formatting a little. Okay, so this was already one of the two tricky parts. Now what happens next is that we take a look at all the statements inside the loop body and see if some of the statements don't actually depend on the iteration at all. So the only variable that is reassigned inside the loop is sum and i. So everything that does not depend on sum or i or anything that depends on sum or i can be pulled out of the loop and thus not executed for every single loop iteration but only once before the loop starts. Okay, so that was data and Len. Now we realize that length and Len, they are assigned the same value. Since it's a memory load, it might not be the same value in the end, but if the memory does not change between these two memory load operations then of course it has to be the same value and this is the case here, so we do not modify memory. So Len and length are actually the same thing, which means we can remove Len and replace every Len by length, okay? And now we see that inside this if statement, inside the loop, yeah, that's the point where I could use a laser pointer. So there is a less than i length, so if is zero, less than i length, revert. That is the bounce check we had in the function. And we realize that less than i length is also the loop condition, so the for loop has a condition and it runs as long as this is true, so as long as i is less than length, the for loop runs and this means that inside the loop, inside the body of the for loop, less than i length would always be true, otherwise the body would not execute. So this means inside this bounce check we can replace less than i length by true or one. Okay, and now we see we have a constant there and apply an operation on that is zero is zero is the is EVM speak for logical negation, so is zero of one, so is zero, it's basically not true, which is false, which is zero. So we can replace if is zero of one by if zero. And now we see, so you see these are really tiny modifications, we could have removed the if all together for a long time already, but we want to keep every step as small as possible. So we see if zero and of course, yeah, it's a condition that is always false, so the whole if statement can't be removed. Cool, so that was, did you see the magic happening? This is where the bounce check was removed. And now we can do some more things. We again, explode this complex expression into multiple assignments to new variables. And okay, this is this I think the most tricky transformation. If you take a look at underscore two, it is I multiplied by hex 20 or by hex two zero and I is the loop iteration counter. So this means underscore two is always zero x 20 times the number of iterations. And we know that multiplication is more expensive than addition. But as the code is here now, we multiply in every loop iteration anyway, so we would like to replace it by an addition. And so what we can, so, and we know underscore two is zero x 20 times number of iterations. So we can write in a different way where we just add up in every iteration. And it will, it's a bit tricky to find the correct start. So we pull out this underscore two outside of the loop and we have to start with minus zero x 20. And then we can replace the multiplication by an addition. So two is two plus zero x 20. Okay. Yeah, if you look at underscore three, that's just underscore two plus data. So this is kind of a similar variable. Or in other words, these are all additions. So they are all associative, which means we can pull out the data to the very beginning. Yeah. So underscore two is, so we, now the data is added to two at the very beginning and not anymore inside the loop. Okay. And I think that's almost it already. Yeah. So what we can do is, if you look at the definition of underscore two now, and if you insert data into that, then you see this is x plus zero x 20 minus zero x 20. So we can replace underscore two by data, I know by x, of course, yeah. So it was x plus zero x two zero minus zero x two zero, so it's x. And we also see that x is unused in the rest of the program. So we can replace underscore two by x. Now, that was the next step. We see that data is fully unused in the rest of the program. So we can remove it all together, yeah. And x is not used anymore. So there's no need to redefine it as underscore two. So we can replace underscore two by x. Okay. That's almost it, I think. Yeah, underscore one is only used once, so we can inline it back into the expression. And yeah, I would claim that this is the optimal program. The only thing is that one already mentioned, we might replace the loop iteration variable by x and add zero x 20 instead of one for each iteration. But yeah, if you do that, then we would have to modify length. We would have to compare against length times zero x 20. And I think that can create problems with overflow. Yeah. Okay, that was an example of how the optimizer operates. You see that the resulting program is really short and all the intermediate steps were quite small. Another thing we plan to do is memory optimizations. Currently Solidity does not have any memory management because in the EVM memory is usually short lived, so it doesn't make sense to add the huge overhead of memory management. And this currently results in some wasted memory, especially if you, for example, if you allocate a memory array inside a function and do not return it so you basically don't really use it, then it would make sense to free the memory again after the end of the function. And this is something we believe that the UL optimizer can do when we introduce memory objects as first class citizens. There is a version of UL that has types and this would fit nicely into the typed UL dialect. Yeah, so as a summary, I hope that the new optimizer will be safer, more transparent and more powerful. The main challenge here is finding good heuristics, but as I explained, this will not impact correctness. And code size is always an important measure, whether or not to apply some transformation, but sometimes it makes sense to create larger code in intermediate steps that can be reduced to even shorter code later. Just some quick words on the roadmap. We currently implemented most of the steps we've seen here, apart from the two or three loop transformations we've seen. But now the next step is to implement these loop transformations too and check that all the transformations are correct. And then we will apply that to the ABI coder, take the ABI coder out of experimental and then make the rewrite of the Solidity compiler and at that point also other steps might make sense that do not make sense in the ABI coder. So we will continually improve the optimizer there. Thanks for your attention.