 They're subjecting me to this again, and the title was supposed to be a placeholder, but George, he liked it so much, he made me keep it. Louder, louder. I was supposed to have a headset too, so I didn't have to do this, but here we are. My job when I got here was to take this interpreter and make it go faster. I wanted to make it scream like this. This was exactly the image I had in mind. There was a kid in junior high art who just drew pictures like this over and over. I loved them. The art teacher didn't. And there's a little video you can play here. This is the jit that I'm competing against. There's no sound. It's screaming. Oh well, you don't get to enjoy the sound. Well, I had it was a jit. This is a Plymouth bell of the deer. It's a comfortable family sedan. Yeah, compared to the jit, it looked like this. It was pathetic. It was really pathetic. And what was I going to do? Okay, to make it faster, the first thing I wanted to do is I didn't really know how the code worked. I couldn't really understand the yellow paper. Some people tell me it's wonderful, but I'm a machinist, not a mathematician. So the idea was a small series of correctness preserving transformations, assuming the code was correct to begin with. I make a small change and I run a test. And Dmitri here had a wonderful suite of tests. I'm not sure how many how many tests are there, Dmitri. I know at one point I broke thousands of them. And I wasn't looking to make any structural changes to the machine. Just make just get out of the way anything that was hurting the performance. A lot of the instructions were using infinite precision arithmetic. Almost none of them needed to. I was able to get away with 512 bits. The gas calculations were being done with 256 bits. The GO machine had already switched to 64 bits. 2 to 256 is approximately the number of atoms in the universe, which I thought was probably more gas than anybody was ever going to need. So in general, also there was one routine which was doing all of the gas and memory computations. It was doing it for every single instruction that it ran. And that routine was generally more expensive than the computation itself. So that was a matter of breaking it into little pieces and running only those pieces that were relevant to the routine that I was trying to run. There it goes. And so when you're done, you've stripped out everything from that Belvedere that was in the way of making it go faster and you have the Roadrunner, which was Plymouth's attempt to compete with the Dodge Charger. And it did pretty well. And this is the interpreter which is shipping in today's code. The loop test, it's hard to make much faster. It just goes in a circle and the jet sees that circle and reduces it to almost nothing. The RNG test is a little random number generator, old style new generator, just does a lot of computation. That's respectively faster, not where I wanted it. The RC5 interpreter is the old RC5 cipher. So it's not much used anymore, but it's a good example of an actual program. And to get from there, the approach was to start making more actual changes to the structure of the interpreter. So I wanted some new faster upcodes. I wanted a constant pool that is currently, if there's a push instruction, the constant to be pushed is laid out right in the byte code. It's the most significant byte first order and it has to be loaded up into the stack one byte at a time. So I create a pool of preloaded constants. I can index them with one byte and move them onto the stack with a quick assignment. So that's specifically a new push C instruction that just has that one byte. So I can just index in. The other thing I did which Jeffrey has also done in the Go interpreter is that all of the jumps in the EVM are computed go-to's. How many FORTRAN programmers left in this room? Wow. Three. I make four. That's scary. It's the only language I recall besides assembly with computed go-to's. The trouble with the computed go-to is at runtime you have to check as to whether the go-to is going to somewhere valid or just landing in the middle of nowhere. And that takes time. So what we do is at load time, we look to see is the place it's going to constant, is the place it's going to correct. If so, I replace the jump with a jump fee or a jump fee I, and I can just do the jump at runtime. And both the go and the C++ interpreters have found that to be pretty advantageous. And so tomorrow's interpreter which is running on my machine right now and not anywhere else. We're starting to look pretty good. The loop's a little better, still not wonderful. The random number generator is starting to get pretty competitive with the JIT and RC5 is really doing pretty well. Paul's got some work to do to catch up with me. And here we have a race. We have the JIT is the charger on the left and we have the interpreter on the right and make it screen. Roll it. Where'd Soup go? There's a video to play here. Somebody knows how to play it. This guy, he's a video. He plays. That's a quarter mile. Yeah heck with Tesla's, they don't burn enough alcohol. So that went backwards. So where to go from here? Well then to get from here you have to start doing completely different. This is a blown hand by that burns alcohol instead of gasoline. And these are ideas on the table we're going to work with. First just some fine tuning. Right now we're using the boost library for the 256-bit arithmetic. It's really designed for much bigger arithmetic. So right now it's using five 64-bit words to do just 464-bit words worth of arithmetic. So I'm going to look at the GMP library which we'll use just for. It's written in assembly language instead of C. And we'll see if we can do better with that. If nothing else those four words will line up on cash line boundaries. And that's always good. And 64-bit arithmetic. We've got to get some op codes for 64-bit arithmetic because counters especially nobody's going to count for the number of atoms in the universe. The interpreter can only make a little bit of good use of this but the JIT can really really be helped by knowing that it's only got 64 bits to work with. SIMD. An awful lot of silicon on the chips right now is SIMD operations. And a lot of crypto stuff can take a really good advantage of SIMD registers. So we're looking at that and more structural changes. Right now DVM is a stack machine. If you want to add two numbers you push a one on the stack. You push a two on the stack then you say an add. It adds two. It pulls two of them up the stack. It pushes another one on the stack and off you go. This was like really great for hand calculators. Very good way to do hand calculators. Real chips anymore are register machines. The problem here for a JIT that's the way you do it but JITs tend to run in the background. They can run slowly. So I'm looking basically for a very fast order and algorithm to get me from stack code to register code. I've done it before for a different VM, a Java VM, so I've got to find a way to make it work for the EVM. And probably would go with just 256 registers. I think that'll be enough for most purposes. Again I can index them with one byte so I don't blow up the code too much and the code becomes basically three address code. So one byte for the operator, three bytes for the source, the other source, and the destination. And that goes backwards but gee I like that. I want one of those. And we'll see what happens. And that's that. Who's next?