 You'll also have to skip all blades through some of these. So the official talk title is optimization techniques for EVM implementation as you can see from... This is the proof that the title is too long because it didn't fit into the box there. So I think the better one would be how to build a fast EVM. Just if you hold it for just a moment. Yeah, I can feel like high pitch covered for a while. Okay, okay, thanks. But to be precise, I think we have to mention the tournament title. This is not a talk about how to do fancy stuff with a dozen time compilers, head of time compilers or anything like that. The whole thing I'm going to talk about here will be related to pretty much portable code and nothing of that would use advanced techniques from generic interpretive optimization space. My name is Paolo Velita and for many years I do EVMs. So let me go briefly through my history of working with EVM. I started quite early even before the Ethereum has launched. And the first project I was working on with my colleague was EVM Gen. So it was a project that was supposed to prove that EVM can be executed fast. And it was working by translating EVM bytecode to LLVM IR. And then using LLVM tooling it was combining that to a machine code, load it to memory and then it was executed as a native code. So the project died around 2017. I think it was mostly not interested in working on it and using it. Although it proved it a long time ago, it's doable. But I think the big controversy about how to actually use compilation pipeline on consensus critical platforms. But what actually came out of the EVM product was EVMC which was an attempt to define the low-level API for EVMs. And it's actually for both sides for EVMs and the clients of these days. And if one of the sites implemented it can use different implementations of EVMs. More like a way of flagging work. And the third point here is this other project which I would now use is the CPP Ethereum. And it has like big C++ codebase. It was one of the three implementations that started before like as was the original Ethereum implementations. But there actually we also use the EVMC API and packed the EVM implementations from there into this API. So it's available as a plugin nowadays. And the last project is EVM1 which I started last year. And many of the things, most of the things I am going to present here are coming from discoveries in this project. So EVM1 is interpreter implementing EVM. And the key differences are it has pretty good 256-bit interior implementation. And it has also experimental efficient different way of counting gas and changing color constraints. And it has like many other optimizations kind of like micro-optimizations which are related either to C++ or maybe not. But I am going to talk about the first two points here. And these, both of these were actually released in version 0.1 some time ago. So they were done on the same time. So when we start with there, when we want to start with the 256-bit Ethereum implementation. So we can say it's critical from EVM point of view. It's like the Archimetics EVM uses. So actually I was kind of like not sure that this is the real case. It's like maybe it's important but maybe it's not so much that was my expectations. But my coincidence actually recently you had option to actually place the general implementation of integer in Aleph interpreter, which previously was using boost multi-position. So boost is commonly known as C++, library, some kind of standard library extension. And it's commonly used in many C++ projects. So I think it's nothing to be worried about in terms of quality and performance. But we had option to replace that implementation with X implementation, which is the implementation I actually did for EVM 1. And at this point Vix thanks to some internet guy called Vistaq who actually contributed all these changes. And the results look more or less like that. So this is like zoom of the benchmark, small set of benchmarks I did. And we're going back to that in the end. But simply by replacing Vix integer implementation we can get around three times speed improvement here. This is summary of some benchmarks I will show later. So it's like geometric meaning of a small set of days. To only have some kind of idea how the average usage of the EVM performs here. But if we go back to talking about this 256 integer implementation, and if you can refer probably the very overused 80, 20 wrong. So if you want to implement integer it actually works mostly like that. So 80% of your struggle is around division, which is pretty much complex to do. And anything else you want to implement it's pretty much straightforward. So by division here I also mean reminder of the division or module operations and so on. Because the results of these actually come from the same algorithm. But if we considering short time here, if I can give a quick, let's say, hence how I think the performing 256 bit precision integer implementation should look like. So these are some guidelines I have. So first of all you can forget about any kind of fancy algorithm that show up somewhere in the space. So anyone else like multiplication algorithm that can save to some multiplications of the words. Any names? Okay, good, because they are not needed actually. So I mean they work at a point when you actually have big numbers to multiply and so on. The precision is actually quite small precision actually comparing to big numbers in general. So none of them are needed and I'm pretty sure that's correct as stated. But what I also did in X is I actually ignored all these like easy cases to handle at first. Like when many implementations do. So even if you have option to figure out that the numbers you are multiplying are small like they would be 64 bits. I think it's not worth it to actually handle this case separately. Because like it distorts all the algorithm and you have additional conditions and branches to check. And pretty much it works great if you just only focus on 256 bits. And the first thing I want to mention here is like first optimizations you can do is try to avoid dynamic memorial allocation. So if you use any kind of generic multi big number libraries that actually can handle arbitrary precision. Which are not fixed on this compile time or anything like that. They will definitely use dynamic memorial allocation because it's unknown during compilation what the size would be. So if there's option to use fixed precision at this point I think it's perfect because that saves you the memorial allocation. Some libraries offer that. But I think we should go further here and it's also like I'm sure I can really explain that with visualization. But many of the big number implementations actually have keep the number of words they will later have in the number representation. As a kind of additional information stored in the type. Which makes the type bigger by this additional item. So it makes it more or less 25% bigger. And I think from like even the memorial access partners it's kind of important to not have it. Because if you have type it is like the minimal possible size which is 32 bytes. It nicely fits into cash rise and so on. And when you access stack items of this size it's performed much faster. And the fifth suggestion is try to understand that CPU has some kind of special instructions let's say. I mean they are not like obviously available from the high level languages. So they can actually speed up addition and other operations and have either high precision multiplication. Or can actually give you like high part of the multiplication and so on. And I think this should be aware that when you build a code in high level language it's good to know if either compiler can give you access to these. By using one of patterns how to access them. Or you might need to use some interesting things to for example get access to high precision multiplication that you'll see below the head. And that's one last thing in terms of division. I mean I was actually struggling with division for a long time. Because I also had to learn most of these. But this paper really helped me here. So this is a paper coming from GMP library which is like this well known big number in implementation. And actually it gives you like straight away. Maybe it's not straight away because the form is described in this like mathematical form. So you still have to translate it into code. But I think it gives you almost full recipe how to actually go like that. And I did it. Still GMP is faster here. But not by some margin because maybe they definitely use assembly directly as a way of speeding up. So maybe just like somehow compiler form that it cannot optimize the same way that some people did some time ago. But yeah it was very hard to do. But it feels like still a bit lost. So my suggestion is if you have other options try to actually port the micro to other codes. What's the license on it? It's Apache 2. Awesome. So I mean by porting I don't mean use it directly. I mean in this case it's always accessing C code by CMP. I think it won't work very well. But we can try that. But by porting I mean you can try to just copy it to other languages and make it build and try to like use the same sympathy. If you don't know where to start from. Yeah so that was mostly about this integer stuff. Any good questions now? I can handle that. Probably we will continue. How big is the library? I'm not sure exactly what it can be. I think between 1000 and 2000 I think something like that. But this also like design is like this is done by using generics and like kind of how to explain it. Like repair city design so like you build 256 operations out of 128 operations. So I think like a lot of like C++ crowd around the place make it difficult. I mean like it increases my work. But I think it's mostly complicated except for division. Yeah so let's go to the app. So it's about it's a small innovation here and experimental way of counting gas and doing other things. I actually wanted to make it in written form. And there is a link to a draft or article I think about that to explain it like maybe to do it later. So I just wanted to leave the pointer to it. But like if we consider it how EVM execution works. So it actually need to do it in two phases. So first one that these needs to check where are the valid job destinations. And in EVM we actually put a bit more work into this preprocessing phase before the execution starts. So I will try to explain that by example. So we have a code EVM bytecode assembly here. But you don't actually need to understand what it does. Actually I've tried to do it a bit like do something. But in the internalize there is one mistake here. So it's actually pretty stupid now. But anyway what we want to do and we will try to do what the preprocessing step is doing for EVM 1. So we want to identify basic blocks. Basic blocks are the sequence of code that are kind of executed uninterruptedly. And this is coming from the compiler construction. And basic blocks are the sequence of instruction that are for sure executed in the order. And then the basic blocks are kind of linked by jumps between them. So number five should be the star of the visible right. And the star with the jump gas. Yeah but still it's a good question. It will be seen in a moment. I think you asked me where the basic blocks start. And you're right that this is also the start of basic blocks. So we're actually looking here for the terminator instruction that actually adds the basic blocks. So either by being a jump or it actually terminates the EVM execution entirely. But it actually doesn't matter in this place. And the second group of instruction is actually the one on the model. It's a jump gas which starts the basic blocks. And by using that we can speed this into basic blocks. This is like visualization of that with some gaps. So yeah so that's like two splits here. Because this one instruction is after the terminator instruction. And here actually the both conditions are applied in the same place. But we can also find the grounders where grounders is not preceded by anything in the engine. But still it's played, it's played. Okay so can I have like one minute? Yeah quite a bit. Okay and what we do here we actually can then having basic blocks we can pre-compute the base gas cost of these instructions. But we can also pre-compute some start requirements for that. And having all these free values during execution we can check these conditions on your previous block. And then it doesn't have to be checked anymore during execution. So this saves a lot of time during interpretation because we pre-compute some information. And we don't have to check it for basic instructions as addition, not application and so on. So all these free checks the base gas cost check back on the overflow can be done only at the beginning of the basic blocks. And then we can safely assume they are on the check and happily execute further instructions. And this mostly is based on this fact that EVM exceptions are not in distribution between the types what actually the error happened. I mean the cause of this information they revert all the changes and consume of the gas. Thank you.