 three parts. I'm going to have some kind of an intro here and then we have a really interesting talk by Casey of a benchmarking. I'm sure you guys are here for the benchmarking to hear about benchmarking and what numbers we've become a bit and what things we realize. And I'm going to explain briefly what E-wasn't 1.0 and E-wasn't 2.0 is. So this, this naming is going to come up a few times. And then we're going to conclude with a bunch of questions regarding wasm or blockchain. And there will be plenty of time to ask questions during these talks. So E-wasm, E-wasm, E-wasm. We have a bunch of different ways to write the project name. And the current one, the current form is the last one. So please use that. I still see people using the first one. It's not a recommended form anymore. I actually started a really long version of this talk with a lot of stories. But I realize you guys may not be interested in stories. You want to hear facts here. So kind of focus on facts. But if you're interested in stories, you can go to that link. And it will belong. But it seems to be, I think, 45 minutes. So this one is going to be much quicker. Okay. We kept the motivation of E-wasm or VMs in general. We want to have a really fast execution. But at the same time, we want to have an extensible architecture. So EVM kind of fails on the extensibility side. But we're going to see it may not fail on the speed. And lastly, we want to have a lot of languages supported with great tools. So these were the three motivations for them. And eventually, by looking at a lot of different architectures and VMs, wasm was selected. So here's a recap of the original prototypes back in 2016. I think many of you guys were following. It wasn't since 2016. And you're kind of confused. Okay, what's going on? It's 2019 today. So in 2016, we started off with the motivation that wasm should coexist with EVM on Ethereum. And we wanted to use really fast wasm engines, those which are in the browser. And just all of these prototypes were written in JavaScript. I'm entering JavaScript because it's going to be important. And we did achieve a couple of key points back then. We had metering injection, prototype and working. So this was a proof that you can meter other bytecode, not just EVM. It was an important milestone. We also managed to write a version of an EVM to wasm compiler. So this is important because we wanted to have EVM and wasm on the same chain. And eventually, we wanted to phase out EVM so you could just use a wasm engine by still having compatibility with EVM bytecode. We had some kind of a C-toolchain, but it was so early for wasm that we actually had to write a lot of stuff by hand in wasm assembly format. So just by text. It was fun, but it wasn't really official. This was the point where we had the first benchmarks. And we're going to talk a lot about benchmarks today. So that was Defton 2. E-wasm outperforms EVM. So we're super excited. We achieve what we wanted. But there's an asterisk there. It was outperforming it on VA, a really fast compiler engine for wasm. And we were comparing that against a slow interpreter of EVM. Fun fact, it's actually not so slow of an EVM interpreter once in Aleph compared to Goeteria, but it was still not on par with VA. And then we had, so all of this happened in 2016. And then we had a full year of hiatus in 2017. The team only consisted of two people and nothing really happened. We were occupied with other things. So that's why you haven't really seen anything with E-wasm in 2017. And we only restarted work in 2018 after Defton 3. The main goal was to launch E-wasm as it is on E1. However, this coincided with Defton 3, which was focusing quite a bit already on serenity. And some of you may ask, what is serenity? Serenity is the main name of E2.0. So E2.0 was already on the table at that point. We did make some progress on E-wasm. We polished the specs and we also had to update everything to Byzantium, the hard fork at the time. We made a bunch of test cases. Of course it wasn't 100%, but we did a lot of test cases. And we wrote a brand new E-wasm engine in C++ called Hera. So the previous product was in JavaScript, but at this point we had an engine in C++. And this was using an interpreter of Wasm called Webbit. It wasn't the fast VA, it wasn't the browsers to read fast engines. It was an interpreter for Wasm. So then we went on to add more interpreters and also a compiler engine to Hera. And we kind of expected at this point in time, which was two years after Wasm was proposed, initial work on Wasm was starting, it was two years after that. And we kind of expected that all these fast engines from the browsers could be usable standalone. But we were kind of wrong. That wasn't the case. And it's still not the case. You cannot use any of these fast engines standalone without the browser. And our work on benchmarking really began at this point. Because we had these different engines, we were able to benchmark them with different inputs. And we came to two major key realizations. That compilation time on the fast engines, so the way these fast engines work is you take the Wasm byte code, you translate it to machine code, and then you run the machine code. And this compilation phase, compiling that code to machine code, turned out to take a lot of time. And it took way more time than we spent on actually executing that code. And in many cases, this time, the total time, was actually more to running it on an interpreter. The main point here is a lot of the code we run on Ethereum runs for quite a short time. We don't do a lot of stuff in the blocks. We only want to spend less than a second, and the interpreter actually way less than a second. And compilation time may take way longer than a second. Compared to that, interpreters may actually take less. But we did find that interpreters, at least by bit, was extremely slow. At least so we thought, extremely slow. It was really slow for 256-bit stuff, which Ethereum does, but it was kind of okay for 64-bit stuff. So we already kind of really unhappy at this point when a decided a bomb was dropped. A lot of people were asking, would JIT bombs be an issue? And now you ask what JIT bombs are. Anybody knows? Okay, so basically, you want to have a short input which can blow up this compilation phase. And we took this question really seriously. We spent some time to validate this claim. And it came as true. We were able to find JIT bombs. One example is for V8, we had this JIT bomb off, I mean, 20 seconds. Anyway, it was taking a lot of time. And that made us really worried. Okay, interpreters are slow. We cannot use these fast engines standalone. And they're suspectible to these JIT bombs. So we're in a really bad position at this point. So basically we turned briefly to trying to defend ourselves against JIT bombs. So we're still kind of committed to use these fast engines because we went to the speed. But we didn't want it to be suspectible to JIT bombs. So we came up with some really cool ideas how to do that. But they turned out to be too complicated. So what was the next step then? We tried to reduce EWASM to a safe subset first, which can be quickly introduced. But the safe subset would be the pre-composed. You guys know what pre-composed are? So the reason it is kind of safe, because they're introduced to a hard fork, there's consensus what the code is. There likely should be a lot of scrutiny for that code that should be an audit. So whenever a pre-compact gets in, it should be a really well-reviewed code. It may not be the case, but we kind of expected that that would be the case. And with that we could introduce EWASM, a safe subset of EWASM. And then we could improve things in the background. We also had, I just briefly mentioned that we had another ambitious idea called EWASM Blueprints. And I'm going to go into that. But it's really interesting, the talk I mentioned in the beginning, it has a slide on it. But I think you can also Google it. It was really ambitious. It's cool, but kind of too complicated. So as you see, this was a terrible point of time for us. Nothing really started to work out. We needed a breakthrough. And the breakthrough is going to be communicated by Casey now. Then I'm going to come back with some more stuff. He wasn't okay, I don't quite know. If I could just click work. Oh, you've got a clicker. Casey finds this. Is there one question? You can have one question. So you said that you can't use the logon and the standalone thing. I thought that there were services like file player and things like that that we're trying to do, like these ad services where they run there, lots of stuff there. Aren't those standalone? So the question was, there are some services today which offer standalone. It seemed to offer standalone. It wasn't engines. You can find stuff in the cloud. The clarification, so that thing was actually like 16 months ago. And we only looked at the really big engines in the browsers, like Firefox and Chrome. And that's still the case. There's some work trying to pull them out. So there's a work to have a standardized API in C and then you can pull all of those out and use them through the standardized API. But since then, a couple of other, there are actually at least five other projects to write really fast engines. Some of them are better than others. And one thing which surface, I think was June, FASTI has a new engine. That's the one I think you were referring to. So today there are a couple more options. Is it working, Casey? No, it did do it though. No, I don't think so. But we can have more time. What did you say? Oh, no. I'll take more questions later. If you have any questions about the methodology or if you can't see something on the screen, please don't hesitate to interrupt. So it's an accidental discovery. Explaining what a preconfile is and how this was motivated. One of the main use cases for BWASM, recap where we were. Then the discovery that flipped the world upside down, the breakthrough. So, precompile is EVM code that's treated as a special case. So you charge less for this special code. In theory, all precompiles should be paid the same cost as any contract that's deployed. But because EVM is thought to be slow, these special cases are added and the gas cost is subsidized. Most of these precompiles are around cryptography. So, EC recover is signature verification. There's a couple of hash functions here. And then once added in October 2017 in Byzantium. Also cryptography, so it will be important later. Most of the precompiles that people want seem to be related to cryptography with the exception of a traditional hash function. And some people think precompiles are just mine and it's easier to add precompiles than it is to make the EVM go faster. So what's wrong with just adding more precompiles? Well, it expands the trusted computing base. So the trusted computing base. In the Ethereum context, it's like the consensus critical code that is in each client. So if there's a bug in that code, then you get a consensus bug. In this consensus critical node with precompiles come custom gas rolls that are very complicated. So there was a consensus bug found after the testnet rollout. So it was found before the mainnet hard fork. Before activation on mainnet, but on the testnet in ModXP. There was a consensus bug found in the pairing check. And just the IP, I don't know if it's still safe to test cases to be written because everybody wants to propose new precompiles but they don't want to write test cases. And when I took this, I don't know, maybe it still says this to be written but this one's kind of near and dear to me because I actually found that consensus bug in that one. And that was after it was activated on mainnet. Also there's the social governance burden of, you know, it takes up time on core dev calls. It takes up developer time. And most importantly I think is the philosophy. I mean, precompiles actually go against the Ethereum philosophy in my opinion. If all code should be treated equal and no special treatment just because, you know, you're like connected and, you know, you're buddies with Vitalik or whatever, you know, well tough, you pay the same guest cost as everybody else. So here's where we were last dot com, which Alex pretty much went over but we had some interpreters. They were too slow. Compilers, Alex talked a lot about the optimizing compiler. The solution for Jibbombs was like a single pass compiler but there really weren't too many of those. So, yeah, like Lightbeam, which is being worked on by the parody guys as a single pass compiler but, you know, doesn't support memory yet. Maybe it doesn't now. I don't know. But in brief, nothing really suitable. And so the problem with interpreters, so these are like hash functions down here and yeah, so the time for the hash functions is okay. The issue is the pre-compiles that people really want to run, which is elliptic curve, you know, photography stuff, snark for vacation. This is like 12 seconds, 17 seconds, 25 seconds, over 30 seconds. So when you look at these benchmarks on interpreter, you think it's hopeless. There's like, even if you have like a 10x or 100x speedo, you're still too slow. So this is kind of like the mindset that you're in, you know, that we were in. The 1x discussions between DevCon last year and, you know, like the January meeting. We face this problem that the compiler's engines aren't ready. We want to introduce Wasm on Ethereum. Then how do we do that? And it was like, okay, well, we could start with an interpreter just to get the door open. So even if it is too slow to do, you know, the really useful stuff that people want to do, at least you would have, you know, Wasm on Ethereum and it would incentivize people to, you know, teams to work on compiler engines and bring compiler engines up to, you know, production readiness, integrated Ethereum client. So that was the plan. And then we noticed something almost by accident. So just again, I can't emphasize, can't overemphasize like the dogma that has to overcome that interpreters are just too slow. Even if I ask this question, you know, in discussion at the meeting in January and this conversation sticks in my mind, because I asked, you know, what was the worst premature optimization, you know, for Ethereum and the design before, you know, the Genesis block was launched back in 2015. And so the two candidates, you know, I suggested it was the Mirko Patricia Tree because as a consequence of this large branching factor now, proof sizes for stateless clients are way too big. And Vitalik actually responded that he thought maybe it was the EVM having a 256-bit stack size because, well, computer architectures are actually 64-bit. So, I mean, EVM, you know, so that's going to make EVM slow. Really awesome credit here to Greg Colvin, the, you know, wise, long, pre-bearded, young dog of Ethereum for bringing this benchmark and, you know, arguing that it will actually have 256-bit workloads. EVM is going to have an advantage. So, we'll go back to this. So, these two are WASM engines. So, I mean, these are WASM interpreters. So, this is Wabbit, which is the fastest WASM interpreter at 1.8 seconds. And then, yes, EVM, 162 milliseconds. Fair to EVM, 116, and then EVM-1, which was a C++ engine that Pavel decided to break on his Christmas break, which is even, like, 10 times faster than, you know, over 10 times faster than you have in parity. So, when you're faced with this kind of number, I mean, even, it's over 100 times slower. So, you have to ask yourself, what is going on here? And, I mean, the short story is, you know, this overhead with, to do a 256-bit multiplication using 64-bit instructions, it requires, you know, like, something like 25 instructions. So, you get a 25x slowdown right there. But so, even when you see this kind of number, you're like, okay, well, yeah, 256-bit workloads, EVM excels. It's WASM, I mean, it picks WASM. But, what EVM is still the last slower than native is, everyone knows. So, this was the other, yeah, lucky circumstance that led to this discovery was this contract called Weierstrudel. And so, Weierstrudel, it implements elliptic curve scalar point multiplication, or VM128. So, it does the same exact thing as the EC mall pre-compile that is on Ethereum already. And so, it was written by Zachary Williamson of Clonk fame. And it's highly optimized for minimal gas costs. It actually beats the gas cost of the native pre-compile. Now, when I first heard this, you know, my initial thought was, wow, the native pre-compile gas cost must be priced, like, just terribly priced. You know, my initial thought wasn't, wow, EVM is really fast, right? It's that, okay, well, something up with this native pre-compile pricing. Let's just confirm that and, you know, compare it with the native runtime. And this was really like the, the shocking result was when, so it was the, the combination of having this Weierstrudel example and having optimized interpreter that power row. Because if we didn't have it, we would have said, okay, well, yeah, it's pretty fast. It's like two and a half milliseconds, but that's still, you know, ten times slower than native. So, you know, interpreters are still slow, even if it's 256 bit. But when you run the optimized EVM bytecode in an optimized interpreter, you get a number that is pretty close to native. I mean, so in this one, it's 580 microseconds. And here that, you know, the native rest is 300. The native got this, you know, is almost 100. But I mean, it's even further optimized. But if you compare it to, like, the average native, and so the thing is, it could be even faster, right? Because it was optimized for gas costs. So it uses these MoMods, which are actually slower in EVM. So if it was implemented to be fast in EVM, it would be 2x faster. So it would really be, like, right on par with the native parity speed. Once this sunk in, we basically realized that EVM interpreters can be fast, and there's perhaps no speed benefit from introducing WASM on ETH1. At the same time, progress on execution on ETH2 was picking up. We'll hear more about that later. But this motivated us to introduce the proposed EIP 2045 particle gas costs. So we have easy MoM performing fast in an interpreter. But, you know, you can still kind of say, well, yeah. I mean, okay, so you can use scalar multiplication, but anyone can use scalar multiplication interpreter. Can you really, can you use pairings? Because pairings are the most intensive pre-compile. Unfortunately, I mean, there's no implementation of pairings in EVM, so we weren't able to benchmark that until WebSnark was released. So from Jordy Blalena here, and I mean, WebSnark was released a while ago of what pairings were just added. So this gives us an opportunity to benchmark pairings in an interpreter by using host functions for the 256-bit math. So this gives the same effect as like EVM's 256-bit stack size. And so when we do that, you know, here's like in a Wasm interpreter, you know, four and a half seconds of Wabbit. V8's interpreter is slower, so it's 13 seconds. In the single pass, it's 200 milliseconds. And the interpreter with big nodes, it's 100 milliseconds. That's very close to optimizing Wasm compiler via 100 milliseconds. There's an interpreter getting like basically the same speed as an optimizing Wasm compiler. So this is just, well, these are signature verification, which is scalar point multiplication. We already knew we could do that. Web pairings, can you do the same thing with pairings? And it turns out, yes. So these are like very fresh. We just finally got to this point this week. Here's like 230 milliseconds in regular interpreter without big node host functions. And we'll zoom in here to, here's in the single pass compiler, 12 milliseconds. Optimizing compiler, 7 and a half milliseconds. Interpreter with big node host functions, 5 milliseconds. Native Rust is 4.2 milliseconds. This is an interpreter doing pairings at pretty darn close to native speed. We're running short on time, so I think I'm going to rush through here. And, okay, I mean, so the thing about comparing it to native speed, right, the goal of Wasm isn't to simply run code at native speed, it's to safely run untrusted code at near native speed, so that's a lot harder problem. So when we compare, like, well, Wasm is not as fast as native. Well, yeah, that's because native, you're running trusted code. With Wasm, you're running untrusted code. So it's not like you really, I mean, native is a nice, you know, ideal to subscribe for, but it's different. So there's a performance gap. So that performance gap, you know, it's a topic of research. And I'm sure, you know, if you ask Paul, he'll have more to say about it. So as to the new reality, the old dogmas interpreters are too slow. You need compiler engines. Well, these benchmarks show that we can actually do a lot of the things people want to do with interpreters. Some caveats, the Vignameci, you know, here's talk a lot about the Vignameci. I don't have a specific proposal yet. That is something we want to come up with. Also 64-bit workloads. It's not as fast as native, you know. So that's still a challenge. We'll see if it becomes a bottleneck. That's what we've been, you know, prototyping things and benchmarking. But interpreters, even Firefox is introducing a new fast as a JavaScript interpreter. So not a Wasm interpreter, but it gets to the same ideas in the local optimizations that we would like to do to a Wasm interpreter. Yeah, so those numbers we saw, they're not even the best we can do. There's some interpreter speed-ups. This one, so here, I did this in a couple of weeks, and I got almost a 30% speed-up on Wabbit, and it required writing some C++ code, and I don't even know, like, I'm not a C++ programmer. So when I was able to achieve this kind of speed-up on Wabbit, I'm like, well, somebody who actually knows C++ and is an actual expert on optimized interpreters, they'd be able to do a lot more than I've been able to do so far. What I did in brief was this thing called Super Instructions. If you research, like, optimized interpreters, it's pretty standard stuff. So instead of taking three instructions to execute this little clock, it would execute in one, as one Super Instruction in the interpreter loop. And there's more optimizations on the table that are waiting to be sped up and have even better numbers. There's a clarification on interpreters versus compilers, because now when we start talking about how we're running everything on interpreters and we're really good at doing stuff quickly on interpreters, people say, well, does that mean we're never going to have compiler engines? No, it doesn't mean that. Compiler is still an option, but in our opinion, they're not ready for launch right now, so if we're going to launch on January 3rd, then I would recommend going with an interpreter engine, and we can do a lot of stuff with it. So next steps. Propose a specific big-num API and more interpreter speedups. Metering. So we know, like, we've got good progress on the injection algorithm for metering, but the gas cost table that will be used is still an open question, and it's easy to throw up any old gas cost table, but it's difficult to thoroughly analyze it, so even EVMs are still the subject of much analysis and research, so we'll need to do the same thing for WASM. And that's it, so I'm going to pass it back. Do you guys see Switch at the door? Can everybody check your service? Do you guys have any questions on the benchmarking before you leave, because there's a lot more. Go ahead, please. So I remember there were two items regarding why EVMs, so one is using 256-bit integer, and another, I forgot, item probably is 2.3. Is there any, like, more benchmark surrounding that, because it also applies to EVMs at all, in general, like, blockchain or Ethereum. I mean, there's a lot of... It was about a few mature optimizations. Yeah, this one. So basically it said, oh, what was it? Why that slow, yeah? Yeah, the first one's not related to the VM, that's just related to ETH1, because that's the tree structure that Ethereum 1.0 uses. Okay. A lot of questions. So EOS, is using WebStand, is any lessons... Sorry? EOS. Are you using WebStand? Is any lessons that we can learn from? What can we... What can we choose and what can we benchmark? Is any... So the EOS uses it in a way that's not deterministic, because the... Yeah, so they have, like, a time cut-off, and it's up to, you know, one of the 21 block producers can make whatever block and execute whatever transaction. So they don't face the same challenges as when you're trying to deterministically execute. But the conclusion of, like, maybe 256-bit slow moments of the WebStand and WebStand maybe also applies to EOS, right? Yeah. Yeah, so you said that ETH1 instruction on the ETH1 uses WebStand versus... So it takes 25, I64, like, 64-bit multiplications. So on the processor, at the processor level, you know, the processor would be executing, excuse me, would be executing 25, you know, multiplications. But the loop inside the interpreter is only one mole 256 operation, and then it executes the native, you know, CPU assembly for these other 64-bit multiplications. Why is it different from the EVM? It's just, because the EVM was defined with the 256-bit stack size, so the mole opcode on EVM already works. I mean, does the CPU generate that? Yeah, you know, it's the same as with... Yeah, it's all, I mean, if you're running it on a 64-bit CPU then under the hood, you're still using 64-bit multidimensions. But it's just the, you don't have the interpreter overhead of having a new instruction for every single 64-bit multiplication. Yeah. Yeah, just, yeah. When it comes to concoilers versus interpreters, are there any implications regarding, say, constant time limitations in the cryptographic? Primitives, you know, you're trying to avoid side-channel attacks on those kinds of things. Does that kind of power versus interpreter have an advantage? No, in our context, so the side-channel attacks and the constant time stuff, that's mostly around when you're dealing with private keys, you know, in this context. These are contracts executed on, you know, on the blockchain. So everything's all transparent. There's no private keys. You don't have to worry about side-channels. Maybe one more. Okay. So on your copy of the science break, you have the additional, you are sending the deposit and you have different types, right? You have the 36 types. Do you want the 56 types? Yeah, like integers. Sorry? So, like on your data cost break, you have, for example, a size of integers, but figure size is 64 bits. So I'm trying to understand these things. There's probably types that are just in the WASM memory. They're not... So you're not, like, extending the instructions that you've written pretty much. Right, you know, that's why we're using the external functions, right? Yeah. So we're using the, you know, post functions rather than adding, like, new opcodes to WASM. So that way, you, WASM is just WASM. Right. We're not modifying WASM at all. So what is the goal of WASM to extend the WASM? Because right now we're doing a virtual machine for WASM. Yeah, the goal is to stay compatible with WASM. All right, cool. Okay, just the quick recap of Casey's talk, which was really interesting. So basically, we had these two, it just coincidentally happened this year that EVM1 was released and WASM2 was released and we showed that EVM interpreters can be extremely fast or at least very close to native and we hypothesized that, okay, so maybe WASM interpreters can be the same. Then the whole point was that, you know, we came to this idea that maybe WASM interpreters can be the same and we managed to kind of prove that and that was a real breakthrough for WASM itself. So it's just a recap of what Casey said, the main, you think, we have to introduce, is a bignum library. It doesn't change instructions, it's just a library and with that we can get very close to native in many cases. At least the primitives used by dApps are covered and if you guys have some other primitives needed, you know, probably we can have a look at that in the coming months. Basically this leads into two points here, that it was a 1.0, it's kind of ready to be released with interpreters because we have just like WASM interpreters, but at the same time we also have this really fast EVM interpreter, so why don't we just optimize the EVM interpreters and other clients, you know, we could do that and we actually took that stance that that should be the next step for each 1.0 People should, the client developers should have a look at optimizations in EVM 1, they should take those optimizations and we should rebalance the prices on EVM because it seems EVM can be really fast. And there, so that's EVM 245, almost made it into Istanbul but maybe for the next hard port. So if, of course these optimizations into EVM won't really happen in EVM on each one, we can look at introducing E-WASM on each one, but if EVM can be optimized then the benefit of WASM on each one isn't that big. So that's basically the key realization we ended up with after all this benchmarking work. Now, a swift change to E2.0, so we mentioned this quite a few times, in the past, I think we were quite close to the E2.0 teams for a long while, but we really set down to get coding and get more interactive with them in May of this year. And E2.0 itself needs a really fast execution engine and based on these findings we can start with a really fast interpreter, but as compiler engines are going to be more mature, going to be better, we can seamlessly introduce those into E2.0. So we can start with this really fast interpreter, but we are not missing out on compiler engines if they do turn out to be mature. So we have a focus mostly together with the Quil team, those guys on that side, on prototyping different execution designs on E2.0. And we also look at creating some of these testing tools, there's one called Skype, and we were writing a couple of execution environments and optimizing them. So there's going to be a really long discussion tomorrow, at 2.15, in the B3 room over there, don't be afraid, so that is really a long session, it's a two-hour slot, but it's not going to be long talks. We're going to have a couple of 10-15-minute liking talks explaining all these different execution environments. So we basically started the long interaction of what E2.0 execution is, and then we go into details of the testing tool and all these different execution environments. So we really invite you guys to come there tomorrow and learn more about E2.0. But I do give you a tiny primer here, just to understand the... Okay, just go with one really quick question. This is a quick question in terms of the work that the Quil team is doing in Scout, are they... I've not really seen Scouts in terms of being focused on the execution environments, and I know that the Quil team is doing work on top of Lighthouse, but is the intention that they'll kind of converge at some points? Yeah, yeah, definitely. But it actually started with Scout and the goal was to get Scout into Lighthouse. And that's kind of what's happening, but we're actually converging into moving into a separate tool entirely, because things are going to be much more rapidly developing, so we may not want to be attached to Lighthouse. But there's going to be more stuff tomorrow. Okay, quick primer on the E2.0, because you guys need to be prepared of this in the coming months and years. So contracts are not called contracts anymore because it's just confusing. But we have this even more confusing term called execution environment or short EEs. You're going to hear EEs, I guess, a lot of times. There's even today some kind of defects or I don't know, some kind of E2.0 talk, a lot of E2.0 talks. So this is specific to one proposal, but there are more proposals on E2.0. EEs are executed in shards in the proposal I'm referring to. And these shards only store a really tiny amount of data for each of these environments or contracts. They only store a state route, which is only a 32-bit bay. If I want to be clear, it actually can be, I think, up to 96 bits, what you can store. Is it bit or byte? Yeah, it's byte. Yeah, 256 bits. It's kind of confusing. But yeah, there's space to store more. But, you know, it's definitely not something you can store or your data. It has to be just a quick summary of what you have. And basically, the execution, what's going on on these shards is only proving that so you have to supply your proof and you have to verify the proof that it ends up with the same state route. And you also supply your proof and you have to end up with the same state, which is stored in the shard. And next to that proof, you also supply your new transactions that you do. So first you verify that you actually supply the right things. Then you make the changes, calculate the new way and sort the new way. That's all that's happening on E12, at least regarding your execution. Is that clear? Any question on that? Great. So we ended up actually having two E-wasms. We have this E-wasm 1.0, which is the state form. It's even compatible, but it also comes with all the legacy of E-via. E-wasm 1.0, which is stateless, and is really just about this pure execution. And then these are the final points we have on that. We can launch E-wasm 1.0 on E-1.0, but you guys can also use E-wasm 1.0 contracts within these execution environments. I think everybody understands stateful contracts way more than they understand stateless model. So you probably want to keep using a stateful model, so you may want to use E-wasm 1.0, which is a stateful model, which can be encapsulated into this E-wasm 2.0 charts. Does that make any sense to you? Really? I think it was a really bad explanation. But we better explain it well tomorrow. So that's all we have now. I think there's time for some questions, and then we can jump into some really hardcore stuff on E-wasm questions. Regarding E-wasm 1.0, there are a lot of questions. But do you guys have any questions for this? Take your order. I'm curious because when you execute on E-VM1, it's the state that causes the most trouble in terms of performance because you load it from this. So how much impact do E-VM performance improvements have on the total picture? Yeah, that's one of the things why we didn't really consider doing too much on E-VM1.0 because execution doesn't seem to be the big issue. It's run everything else. And if the stateless model doesn't have disk access, you have to supply it. Does that answer your question? I think there was one more there. Yeah, I just wonder, I don't know if this will make sense, but sorry for the name of our team in the last few months we were trying to get NIMP play contracts to pile down E-wasm and then run on the E-wasm 4.0, could go Ethereum with Pira and E-VMC. And we kept trying to use the latest and greatest and sort it out. It was really difficult, so we kept going back to milestone 1, which dates back to November 2018. And I wondered if you guys are going to publish milestone 2 maybe, so the E-wasm 1.0 stuff is a little more caught up and ready to go. Yeah, we're kind of debating this for quite a while. We do have a tiny bit of updates on there. I think we could discuss this also offline. Probably there could be some updates made if there's enough demand. But we're really heads down on finding a solution to the speed questions and the 2.0 stuff. But we have, I think, time and things polished a bit more 1.0, but we haven't focused too much on it. But if there's enough demand, I think we can tell you the time. Paul, you want to set up your stuff? Why me? If there's one more question? So we got the grant to do a small execution machine on the next few years. What we're doing is we've pretty much wanted to cross-pilot two years. We've just made contents in the E-wasm and around security checks and codes. And from the presentation, as I speak, it's not exactly key to your targets because you have, like, the wasm 1.0 and then the 2.0, but it's not a key distinction of what you actually have to support in each case. So that's the base, our job, kind of, where to split, for example, like, there's no hard core in the hard core. So you said you're working on a simple execution engine to, but you said translating E-wasm to E-wasm? Well, yeah, what we're doing is right now we're writing a virtual machine, not an interpreter or wasm, a symbolic one, which means instead of, like, parameterizing the whole container space, and then we analyze each and every branch. And right now we're doing it only on wasm, but, like, in the next few months or so, we're going to support, for example, the E-wasm library. So we can actually analyze some smart contracts. Maybe, hopefully, our goal was to get already-existing smart contracts and transfer the EVX bytecode into wasm and analyze that by itself. So this, that way, we can analyze all those already-existing smart contracts. Using your spec, like, the E-wasm spec to 1.0, but then there's the 2.0 spec, which requires high fork and additional staff such as, you know, the systems smart contracts. Yes, so the main difference, of course, of them are wasm, the, and the metering is kind of the same as today at least. The only difference is the APIs. So, of course, in both cases, you have a way to retrieve call data, and you have a way to store and read stuff. But this is where it's really different. On E-wasm 1.0, you have derivatives like s-store and s-load, and you can store infinite amount of data as much money you have. On E-wasm 2.0, you only have to be fixed size of data, I guess. And the other big difference is that when E-wasm 1.0, you are in an actual state, and you can interact with other addresses, with other contracts. You cannot do anything like that in E-wasm 2.0. You have to supply everything you want to interact with. So, pretty much, in our state, it's the standpoint that you just took the library for. Yeah. So, my question is, can you make it a period? That's good. Yeah, I think you will do that. Yeah, thank you. Which is really an impact. What's your goal? What's your goal? Low point. It's not public yet, but we will be testing out some of it. What's your comment? Low point. Let's sort it out. Thank you for the questions. Now we have a really hard course stuff with E-wasm. I think that's going to be interesting. My name is Paul Borzanski. I guess I was introduced as... Now we have some hard course stuff with E-wasm. I don't know if it's hard core. You can decide for yourself. The question is, is E-wasm suitable for blockchain? So, this is a bunch of questions I'll ask. This talk is structured. First, I'll talk about consensus. And everyone's favorite topic is bugs. I'll give some examples of bugs that Alex already gave and maybe some others. Metering is a big topic for us that we talk about a lot. So it's important that we have good metering. There's some metering slowdown. That's important. So there are interactions between bounds and metering. Then I'll talk about execution speed, which is extremely important. That's maybe the most important thing. Maybe one of the two most important things. And there are some subtle interactions between metering and execution speed. I'll talk about bytecode size and there are interactions between bytecode size and execution speed. So there's a lot of interactions between these topics. Then I might not have time for other topics. Consensus. So, E-wasm... I think the best thing about E-wasm is they advertise some sort of mathematical definition. They have some proofs for something called type soundness. They claim that there's no undefined cave here. They have guarantees and that's very important for blockchain, the verifications for people like this. And they're... It's very sort of constrained in what you can do, the control flow and things like this. These are three bullet points. I won't read them. But there are things we can do. There isn't determinism. E-wasm out of the box. We have to get rid of floating points. The host functions might do something strange. And there are some system resource limits that we might have to fix a number bound on things. So the consensus we think E-wasm is great for. That's everyone's favorite topic. My favorite topic because it's fun to, you know... So define a bomb. It's an input to a wasm engine which exhausts some time or size limits. So when we say this is a bomb, we have to say this is a bomb with respect to ethereum size and time limits. So it's a bomb with respect to whatever. An example that Alex said earlier, it's a JIT bomb. Weed-o-frecken-fuzz-tested and generated a 20 kilobyte wasm byte code which took a compiler from VA 2.5 seconds to compile. This is a bomb because in ethereum we want to have up to 24 kilobytes and it should take a short amount of time under 200 milliseconds to compile. So this will start a whole JIT bomb discussion. This is nested. The reason I'm showing this is a wasm code, actually. It's sort of pathological. This is loop nested, loop nested, nested, nested, nested. There's a lot of nested loops. So maybe the compiler has some trouble... Actually it does have some trouble because this turbofan... This turbofan compiler has some CF noted algorithm that creates a graph and then there's some computational blow-up. I think it's quadratic in a number of loops. And I think we can make it even worse. This is just fuzz-tested 2.5 seconds. But who's to say we just make it... There's so many nested loops it might be 200 seconds or even more. But this was an important example. There are other bounds. It's a subtle one. But the binary format... There are two formats. There's a text format for WebAssembly and there's a binary format. In the binary format we have a shorthand notation for local... a number of local variables. And in the binary format in the text format we have to say I64, I64 for every single one. So if we want to define whatever, a billion locals, we have to say I64 a billion times the binary format of WebAssembly. There's a shorthand notation for this. So this is a WebAssembly module in a text format. You can instantiate it. You can execute it. It's just a module with this function in it. One billion, 64-bit locals. So we have to somehow be aware of this because if this module gets uploaded to Ethereum and starts running and then we enter this function and then we try to instantiate 8 gigabytes or whatever. Or this could be even worse. Recursively call itself 8 gigabytes plus 8 gigabytes each time. So we have to be very careful with things like this. And these things are subtle with the binary format. I think I was the one that found this one. Another bomb. So this is a bomb with respect to any size limit so that it's going to crash your computer. There's a recent paper that executes an EVM bomb. 8 million gas in 72 seconds. So it's using LF. There's certain things I think have said that it'll be easy to fix, but there's some op quotes that are not optimized to their best. But this is a bomb with respect to Ethereum because 72 seconds is unacceptable. You can't be worried when it's one second for an EVM block to execute. So this is something recent. So we can't have we have to somehow meter. This is a bomb that's related to metering. So we have to meter it such that there's no chance that something like 72 seconds. So this is a segue to my next topic. Metering. I think most people know this cost and sum model that we assign a cost to each app code and then we execute it and then we subtract from some whatever gas allowance and once we reach zero we revert to everything. I think people are familiar with Ethereum gas. The idea is to do something similar using this cost and sum model for WebAssembly as well. We have one optimization. This is algorithm. We inject calls. I think a lot of people already know this but for those that don't we inject these use gas calls into basic blocks. Basic block is guaranteed to execute all the way through. So we inject above it the cost for this whole basic block and then we have a call to use gas but this might have a slowdown. There's a metering slowdown and people will start asking sure you have the best fastest engines and things like this but you're still calling every basic block. So there is a slowdown. There's a concern for this. There's some execution cycles. The metering slowdown we have the pre-compiles for Ethereum 1 written in WebAssembly and we found that they have between 1.05 and 2.4 times slowdown after we're running with metering injections. At the worst case we have 2.4 times slowdown but before it was 500 times slowdown I think KC caught that it's this pathological nested examples. There are nested blocks and there are how many nested blocks? 300 nested blocks. I did an ellipsis but you can imagine this is just so many nested blocks before each one. This WebAssembly quote, forgive me if you don't know if it doesn't look familiar but there are just so many of these use gas calls that the run time is dominated by it. So there's a 500 times slowdown in the whole, you know, execution because of this use gas suite. This is unacceptable. So as a program, I don't know a contract writer, an EE writer you have to be aware of this. If you're generating wasn't code that is dominated by use gas it might be a good idea to write long basic blocks and don't take so many branches if it's possible. So KC's idea was to lift he calls it super block I call it lifting. He lifted all of these because we have a guarantee it's kind of subtle because sometimes if there was a loop in here or if there was some branch within one of these that wasn't the case. It was just nested block after nested block so he can pattern match at metering time so it had deploy time and he can lift it all up so there's just one use gas instead of 387 of them. So this is a huge optimization then from 500x slowdown which was dominated by this use gas it's a 2.4x slowdown but it's real. The metering slowdown is real but we have some other ideas to improve things. I mean it's a big deal for your bottlenecks. If you're doing a loop each time, but for the I guess business we call business logic it's no big deal because it's just you know at this token balance track this token balance it's no big deal but for this sort of crypto stuff and sort of these expensive parts of your code you'll be aware of this metering slowdown there's another idea loops if we have guarantees this is the Basin code as well that inside of each loop we use this use gas but if we can do some step analysis at deploy time that we have this sort of loop variable that we increment once it equals 5 we stop branching and then if we notice that pattern match this use gas will get lifted some sort of similar idea for metering this is actually you know realize we have examples working examples for Ketchak with this so Ketchak has the it operates this loop and I think it's 24 iterations we lift it outside so we save it for each block 24 iterations per block but there are many blocks many more but the problem is I don't know if we want to do this pattern matching what if there are two different patterns that collide with each other this might cause problems but maybe in certain cases that are highly used we might consider doing this and it might have a cost you know this isn't free we're patterned we're comparing bytes and things like this but there are examples and we're working towards removing this but as a again the contract writer to be aware so there's another optimization that we did this use gas this call to use gas it could either be a functional function what we call a host function that goes out where it could be inside of the actual it could be WASM code that does this but depending on the engine we found that doing it one way or another way is faster so that helps the metering slow down as well but that's at the implementation side and I don't know how much we're going to this circular problem question that's that all of these optimizations and all of this is being done by the developer before deploying code this is being done by I mix them all up so some of them are some things that the developer sorry to finish I said that's why I was just wanting to clarify the ones that are all being done by the developer like there's not a huge worry about them because the developer can try out 20 different optimizations that's one yes so I'm talking about both the question was are these developer optimizations or clients so a client developer optimization or contract developer optimizations these are both have to be implemented by the client developer but the contract developer can experiment and play with these things but for example this one is an optimization for WASM engine itself yes there are a couple these gas codes are injected by the insurance client while it's executing not while it's at the deploy time sure but it's not the contract developer sorry I'm just clearing that up so is there any thought put into using already existing optimization patterns like LLVM has a lot of these kind of hoisting primitives and constant computation stuff is that something that you're thinking of building on top of or is that too much to ask so is this related to speed or is this related to... these are optimizations so these look very similar to optimizations that the pilots already perform are you going to build on top of a framework that already exists does such a framework? I don't think that anyone cares about me. LLVM will definitely do stuff like this so sure if there's an optimization I'm not aware of these optimization cases but yes thank you for the tip so one of the questions just related to that is if the optimizations are being done on a client then it has to be part of this fact that same optimizations exactly, there's consensus so if we have some pattern matching there must be consensus exactly which pattern matching there is if you're thinking of doing that it's one option we're committed to it it's a good option there are other questions do you include the cost of metering like welding yes the actual yes the idea is that the metering will actually be the metering injection and the validation and things like that will be done in WASM and that will cost gas as well sorry to overload you with the question so earlier you mentioned learn some things at WASM apply it back to EVM maybe get a faster EVM any thoughts about doing some of this superblock metering or metering hoisting and stuff that's what you said yes that's exactly so tomorrow at night so we present tomorrow at night how about something tomorrow at night for EVM stuff like this one more question I'm nowhere expert on this but I've always been under the impression that you pay as you go once you actually do one byte code then you put the gas there but with this kind of lifting optimization you got to the point where it's there but then the gas is ready so the pattern matching includes there's some ellipses here that we have to have guarantees that some strange thing doesn't happen and we can't halt or we can't do some so there's some thing going to happen it might be expensive and complicated pattern matching I have one more question but in case you are using an interpreter and you have access especially for every output exit why can't you just add the gas there stable and that's it you do not need to for you the ones for them the byte code will be even less yes that's the first version and we do that we can do that as well and everything will be correct and someone will say well if we have a use gas then we have a function call and inside of this function we have a stack frame nothing is free and inside of this function we have a branch which might have there's branch conditions so there's a pipeline install so there's a slowdown for doing it so this is an optimization we do it from the whole base of the clock instead of each individual but I think some EVM engines do it this way that they do it per opcode I think tomorrow Paolo did this optimization for the basic clock even possible for wrapping we have access on the C++ for every opcode execute yes so you can just have a map there and that's it yes for implementation for just hacking together a prototype it's perfect I agree with you but I don't think it's faster I think it'll be a slowdown I think there you can avoid the whole stack frame for calling a function with that like you don't need to also so when when you are you're right so yeah if you have in the interpreter loop you have a jump or whatever you have go to and then before each one you actually have that actual line of code or whatever or in the whatever if you're jumping into the interpreter loop the part where you know that gets the next opcode things like this you can just go over there yes that's correct you don't have a function or it's sort of an inline function maybe you can think of it as but still you're doing the branch the conditional branch that's still not free yes yes there's one wasm interpreter which I think by the wasmer people they started the prototype this to do the metering on the interpreter itself so just like it's done on EDM so there's a wasm interpreter which does the metering in the interpreter loop just like EDM engine certainly and I think that's an option what we want to do here is just set some minimum rules which can be implemented in an interpreter if you wanted to do that or take optimizations like what Pablo is doing or just rely on this metering injection tool and use it with wasm engines which are not prepared for it one note if we have consensus and we if use gas has a charge if we have to pay for each basic block and we have consensus on this charge then that has to be taken into account too I have Alex to mention this blueprint metering and we have this we have an example of this catch up we went and catch up to 56 that wasm and this independent access to input length in bytes so we created inputs for catch up that are 0 bytes long 1 byte long 2 byte long 3 byte long all the way up to 6000 bytes long to this catch up function to hash and the dependent access is the number of wasm outputs executed and this is experimental data so we applied it and you can see it's stopped so as you know catch functions operate on blocks so you do update block, update block, update block so you're getting the next 136 bytes you're a bit twiddling on it you're getting the next 136 bytes block, block, block and then finalize it so you can see that there's a step to stop function at every 136 more input bytes and this has a nice structure and for this we actually had a blueprint we had a Python program that was like 30 lines of code that matched exactly this experimental data but we can only do this for certain programs why why can we only do this for certain programs I'm decidable yes, I'm decidable so very funny I think this is maybe the most interesting thing that we can't decide if I give you a last block you can't tell me I know you can't tell me whether it will halt to begin with so if you can't even tell me if it will halt to begin with you can't tell me how much gas it will cost so in the generic case we have no chance at this blueprint stuff for our trade code that's given to us but for certain examples of my structure we can have some formulas and I think Grigory Rosu I think I spoke ever about this as well we automate this process of getting this gas formula but sometimes maybe this automatable process might increase the gas formula might be so huge and computing the gas formula might be more expensive than just doing the injection so I think that the gas formula if there are some pathologies that can grow exponentially this formula can grow exponentially fast and the number of all these different branches and things so it's very funny we're limited by laws of the universe for this one this is like symbolic execution and then figuring out how much gas it could use it's related to symbolic execution but there is limitation that can blow up too so this cost in some metering model but there are other metering models which we're using right now here's my pre-compile here's the gas I propose and then we say okay that gas seems reasonable and then everyone hard codes it that's one metering model that we use that's on the bottom one consensus metering is what I call it I think parity wants to do this life on the deploy code there might be some consensus but I think they have a certain there's a finite number of this deploy code and they have some consensus on the metering slow down by having some consensus to begin with then you need a human being to propose this and you need human beings to cooperate with each other and there might be some problems with that another metering model this is fresh so this is a new idea counting cycles is what it's called so someone might think this costs in some each app code this isn't really how computers work they don't take one app code to process it you know finish processing it and then take the next app code process it and then they finish processing it's actually there's tons of hardware acceleration there are registers, pipelines, branch prediction, multi-issue order execution so we're together with hardware, queue synchronization, memory tricks, metering and more of these things so why not just take a we can compute costs by counting cycles and then execute it on an open source instruction set architecture so people implement instruction sets for example x86 there's some definition of this and some variable language maybe verilog how many people took a computer science course and this is obvious okay good so we'll actually have either physical hardware maybe a verilog or VHDL whatever so some verilog file and we'll execute the code and we'll count the number of cycles then the contract writer would take into account okay so this multiplication is going to take four cycles then I'll hide its latency by doing some things on the ALU and things like this so those would be free the current metering model doesn't even take into account these things so this is sort of a more fine-grained metering model so this is something fresh and it's just being prototyped now so the next one is speed what are the bottlenecks is the first thing we have to know the current bottlenecks are IO at least for F1 the disk IO and then maybe some network IO expensive crypto Casey just talked for a while about expensive crypto our focus is fast crypto so this big number so I'm just going to go forward get everything so these are all the proposals that are possible all at one time maybe I should go I kind of want to jump around with them because I don't know but maybe I should just go in order just for this around sanity so compilers single pass the Firefox has one Chrome has one which we found some bugs in when we fixed there's another one called a Guillaume fix by the way on our team maybe there are more bugs, light beam is in development by parity and the next task is to work on metering, how can we meter compilers, it's subtle because this cost in some model works well on interpreters and people wonder, and with less variance because there's an interpreter and then there's some you know get the next out code you know go to switch whatever do to some instruction and execute it and then do all these things but with compilers we're doing all of this stuff away and there's hardware acceleration things are hidden, latency is hidden and there's a question, how well can we meter it, we're hoping that we can meter it with this cost in some model there might be some variance and that might hurt metering so this is a subtle interaction between the last section which was metering and execution speed so that's what we're working on that's what we're starting to work on now is metering and we're hoping that compilers will be ready in time and then this metering will work well there are some other proposals a fast subset of wasm I don't know maybe I should stop for questions on the way there might be some fast subset of wasm I just read about micro wasm that doesn't like something about blocks being able to leave something on the stack this is something about wasm so if they don't allow that then they wrote this whole compiler this whole intermediate stuff these few parts are just so awkward to do if we just exclude them we can do things much faster so that it allows meter easier to compile and some better optimizations and then there's so we have a fast subset of wasm and then maybe a fast superset maybe I should have said a fast superset of wasm but there might be some annotations or compiler hints I don't know if you can read his quotes yourself but there are automatic, fully automatic take this code and compile it it sort of has limitations and then the idea that he talked about is compiler's hints maybe even interact with the compiler as it's doing so it can make some better optimizations so there are some proposals to put some annotations in the wasm code to give hints to the compilers big numPost functions, I'm just repeating but we'll set it a few times but this is among the execution speed proposals so the big num is the big bottleneck by far, the multiplication and the crypto the hash functions but the big num is coming even with compilers compilers are too slow for some things like this so it would be better to hand write it there's a fresh idea I just want, I have to say that I'm biased because I'm the champion of this idea so everything I say I might sound very excited but other people might not be as excited as me so please don't interpret everyone's excitement so it's called universal assembly motivated by QHASM GoAssembly this burgundy packet filter and then this annotations proposal motivated it and the proposal is for all the pre-compiles all these big num functions to be written in a universal what is it, universal assembly there are x86, arm, risk 5 MIPS instruction sets there are similarities to them QHASM GoAssembly already noticed this we're not doing anything new so the proposal is to hand write in this sort of a universal assembly language that applies to all of these instruction sets to write all these big num post functions and then we'll meter them so the interaction with the metering stuff we'll meter them by counting the cycles so there's a benchmark there's an example in a benchmark MUL-256 this is just for one limb of the whole computation this is in the wasm syntax of this universal assembly this is in the x86 syntax this is just assembly, handwritten arm in risk 5 and the benchmark is in this universal assembly 60 milliseconds to do 64,000 iterations of MUL-256 but with native we had 64 bit we had 15 milliseconds so why the 4x speedup we had 64 bit multiplication and add with carry and the universal may not be able to support it but Pavel gave me a good tip that all of these instruction sets not wasm but all the other ones do support a 64 bit times 64 bit to 128 bit so with that I'm hoping that these are going to be much closer so the universal assembly might not be competitive and the beauty of the universal assembly is there's a wasm syntax to it too it's awkward because wasm is a stack machine so you have like instead of MUL you have a MUL local get this register local get this register and MUL will set some other register so you're doing the same thing as this MUL in arm you're multiplying two registers and putting a result in another register but this is the first time that we can have a lot of people hand write code and this is the proposal to do these big functions in this syntax so the precompile to make the whole precompile system fully automated so if you want to do a precompile give it to us in this wasm syntax or one of these syntax we'll meter it automatically and it'll simplify the precompile proposal process so if you're a programmer this maybe is sort of more on the side of the implementation if you're a programmer wasm compilers are still being still getting better and better but you have to be very careful with resources I think the best thing is the embedded people are there any people that are interested in embedded systems yes so when you're writing smart contracts think like an embedded system keep in person with your final resources every cycle is project like this I think because compilers if you do all this abstraction it slows things down so there's a concurrency proposal sharding itself is concurrency in itself but then there's a proposal I gave a talk last year for paralyzing several transactions which are independent so if you have guarantees which in the saleless model we have guarantees we could have guarantees if we're passing all the addresses we can see that these two transactions are independent of each other they don't touch the same things so we can execute them concurrently I don't see why we can't inside of transaction there's a new proposal in wasm 10 this is dangerous there could be race conditions in deadlock there might be some concurrency which we can use so we can allow but there's still a cost there's a spawning you have to do a system call so we have to everything has to be metered and then there's some contact switches instruction system scheduler and some strange things can happen for example if everything is waiting for a bunch of threads there's some block that waits for all the threads to arrive and then one is contact switched out we're waiting for the slow flow of threads so that's a serial part so what was supposed to be 8x speed up instruction level with this universal assembly we can hide latency and multiplication of addition we can have it for free that's one type of concurrency and then there's an optimization for the client developer the startup costs you can leave the module instantiated and no need to revalidate you just revalidate it once all right byte code size so this is the last sort of sequence of things and I'm going to rush through it just telling a story so byte code size is very important because now we're on a stateless but everything is in everyone stores everything that's one of the proposals so Blake2b I took the reference implementation see I compiled it and this number of bytes on the left 7100 bytes for Blake2b referenced that lozen then I looked at it and I said wait it's inlining aggressively and it's unrolling loops aggressively so the byte code size blew up so we went from 7100 bytes to 2143 bytes huge improvement and this is huge because on chain if we're going to store this this is just a master it's much smaller but then I used I can do better with compression and there's another so we get it down another 2x here with the best compression and then there's another idea split stream compression which compresses that splits the opcodes from the operands from the immediate and then the opcodes can be better it's all sort of the best one is dictionary based but the problem is these immediate to the opcodes are just arbitrary values and they can be whatever so if we split split stream the opcodes themselves from the immediate we can compress the opcode streams so it's called split stream compression and this is a proposal for the general lab assembly this is more general than ewasm but I don't think anyone's working on that right now and then another proposal is on the client we might be able to recover wasm binary from an instance so right now maybe you have to store both you have to store the binary for instance reasons and then you have to store the instantiated module but maybe you can if there's if we do certain things we can restrict the binary format in a few small ways then we can jump back and forth we can recover the binary from the instance we can instantiate so we only store one of them now both other topics we're running out of time anyway so there's some future proposals the multi-mem would be great for us multi-return would be great for us but that's going to come soon I think we found a champion the multi-mem global arrays I don't think that's going to have the memory size the page size is 64 kilobytes it's too big if it was much smaller it would be better for us contracts might want to share you know you don't want to reupload Blake if it's already on the chain so how do we share code so people don't have to reupload things how do contracts call each other you can import the function we have table which is something that it's related to function pointers but we can we can go through post functions so there could be a post function called call call contract so we're going like beyond how to communicate between contracts you can have arguments and returns shared memory call data and return data so we're going to do the host again we can have some shared buffers all these things are being discussed I'm just sort of giving you a sketch of why is the team is doing a lot of good work I just want to leave there's all these different concerns all these projects going on we want to build the best system possible and then stick to the last one trying to standardization of other blocks we're talking to parity, we're talking to a bunch of other people and we want everyone to share infrastructure so you can compile your stuff and then when whatever next blockchain comes out you can deploy there too that's it