 I can't see any of you which is a bit of a relief today because there's people in this audience who are going to know what a pile of rubbish I'm about to talk. It's so rare I'm ever anywhere where anybody actually builds VMs. I can just wander around at conferences and say you do a bit of this, a bit of that. They taught it to you at uni. Today isn't really going to be that talk because it's kind of mad. Hands up anybody who's ever thought it would be a good idea to write a virtual machine in Ruby. Okay, there's not a lot of us. Hands up anybody who's ever seen any random talk I've given on this subject in the past. That's good. I'm trying to be more arty as I get older and actually put more human friendly bits and slides. I do promise the wall of code effect will kick in soon. We're going to build machines for the simple reason that we talk to machines and machines talk to us. We spend an awful lot of time worrying about process and testing and forcing the damn machine to do what we've told it to. But we spend very little time asking machines what they think. That's kind of a bit of a strange point of view because if you think about it, you can model everything as a machine. I'm definitely got a whole pile of parts that are separate machines all bound together in an overarching one. Machines obviously have opinions because I've got a hell of a lot of opinions. So we like to think we're philosophers and wizards and we wander around and we talk all of this lovely language around thinking stuff and I think what our machines actually need are some good engineers and some good analysts and I do mean psychoanalysts in a lot of cases. A lot of machines are born buggy and they need as much love and comfort and empathy from us as they can get in their little silicon diodes. So this is sort of a talk about machine philosophy but I promise this is the only part of this that's going to be excruciatingly human. So I'm going to frame it all though in a language that we will understand. Ruby. And we're going to love to learn our Turing machines today, I hope. I'd say that a thousand flowers bloom but I can't help but think of nuclear bombs going off when everybody says that. So I don't really want this audience to run away and build a thousand digital nuclear bombs and unleash a thousand awful languages on the world except please do. I'm going to do that by making some machines of our own that are images of machines. We're going to build metamachines today. Who plays with system virtualisation on anything? I could filter you out on parallels or VMware or virtual box users but most people think of that when they think of virtualisation or maybe they think of containers and Docker which is sort of like a lower level abstraction. We all at various times play with hardware emulation and this is the point where I now do what you're not supposed to do and break the third wall. There is an entire world of online game communities obsessed with playing games from the 80s, the whole retro gaming culture. They rely on emulators to do that, that means somebody has to implement old machines in software so that they actually work the way old machines worked. I still I still run a Scion Series 5 environment on one of my palm tops, modern PC palm tops because it's got one of the nicest basic interfaces for just typing and getting random stuff out of my head when I'm travelling. Plus it's really nice to program in old languages and their version of basic that they never called basic was so dumbheaded that you end up having to do your own memory management to do anything which is basic where memory management is so much fun. This down here is Doom running it thinks on Rysgos being emulated in Linux running in Firefox written in JavaScript. It has a full networking stack and it works. You can actually play, you can do LAN parties or playing Doom together from your web browser which is kind of neat. Over here I don't think any talk that involves anything that involves emulation is complete without at least mentioning Einstein, the Newton message pad emulator. Now this flattens modern phones, they're not powerful enough to emulate a Newton which is a sort of and this is really bizarre because Newton's a strong arm processes so they're actually armchips but modern armchips at 1.6 gigahertz have great difficulty with quad cores being 166 megahertz single core strong arm chip which I think is just so phenomenally weird. You think it would be much easier. The kind of VMs we're interested in today really though are more about program execution. We're interested in what is a machine and that's not what somebody else's machine is, that's basically figuring out for ourselves random machines we can build. So I have four basic virtual machines up here. This over here nobody even thinks of as a VM. This is the dot net environment, the common language one time. You never program that as a VM because it's all taken care of by compiler tools but it's a virtual machine that describes how dot net actually hangs things together. We all know the JVM because it's almost impossible to get away from it. I mean my inbox is so full of Java job ads and I haven't done Java commercially since 1998 but just there's so much need for Java programmers that somebody hasn't done since 1998 looks like a fair bet for a job. I don't think it's even the same language in many very basic structural ways. These down here are two of the simplest virtual machines we ever deal with. One's the fourth VM. Any fourth programmers? First audience in years I haven't had a fourth programmer in. I must be moving in some very strange circles. This is the C virtual machine or at least a simplification of it. C has a virtual machine. When you write a C program you're programming for a virtual machine. You just don't know it. This over here, who loves functional languages? This is very famous. This is the sector machine. Stack environment code and I can never remember what the D is. But basically most functional languages are built using this particular virtual machine structure inside because it's the easiest way to actually do functional guarantees whilst living in an environment that's full of mutable stuff. Under here inside your functional language is a mutable runtime. You're running on a langan. You think you're safe? The beam VM has mutability in it. You need mutability at a certain level. This is my really bad and awful hand-worn diagram which we will see bits of throughout this talk explaining approximately what a computer is. These timings are all very approximate and some of them aren't at all really true. They just look true. At heart, when we're programming a computer we've got a series of instructions we want to get to happen. We have some states that makes them happen in the right sort of order. We've got this lovely CPU here. My MacBook Air can execute a single op code in around a quarter to a third of a nanosecond. Thanks to yesterday I'm going to get my measurement right for once. That's about a nanosecond. My MacBook is accessing a register or doing an ad operation in that distance like we'll travel that far in the time it takes. That's really small. We don't normally think about doing stuff that's sort of in the frame of light hasn't gone very far yet. It's quite important to know that though because all of these are in orders of magnitude so my registers are as fast as my CPU really for all effective purposes. I've got some kind of stack that actually probably lives in those registers really that allows me to do calls and returns and things like that. I've got a cache, a level one cache, a level two cache to talk to main memory because main memory is slow and there's this narrow pipe to get there. I've got a processor that's fast and it wants to be doing as much as it possibly can so it's going to slurp stuff into that cache and try and live in just that cache as much as possible. Already by the time you're getting out to main memory in the heap well if that's a register access a hundred of those in a row is a heap access. I mean you can actually start to visualise these things as why some things are a lot slower than others in a way that you can't when you just see numbers on a board. We've got a whole pile of buffers that control our IO so that we can talk to the outside world. We've got this lovely hard drive over here. Who still uses a hard drive in anything that they actually run on? Who's moved over to SSD? Oh you would. Who's using SSDs then? What do you think is actually the speed gain that you get out of using an SSD? Why do you think an SSD is so much faster? Is it because it's 500 megabytes a second of transfer speed? It's the latency. This over here by the time you get here and you're talking to a hard drive not only are you getting something every few milliseconds or tenths of milliseconds but every time you want to talk to it you're waiting for milliseconds at best for the drive to move ahead to another place to start reading. At this point that's so phenomenally slow compared to a computer. It's kind of like we could send a space probe and we could get to Alpha Centauri in principle before this is finished and figure out what the space probe is. That's the difference in order of magnitude. Then by the time you get to the network you're now so swamped by the cost of light moving and electrons moving that really this is completely decoupled from the idea of computing. Every now and then over long glacial ages data turns up from a remote place and you can do something with it and then you can send it back and wait a very long time. I spend far too much time poking around in these things. I'm going to start off by talking normally when I give this talk to an audience I'm giving it to a go audience and so the first thing I dig into is playing around with memory because yeah it's a horrible problem to solve in Go being able to use a piece of memory for two different types at the same time it just doesn't like it and you have to hit it with bricks. Whereas in Ruby if I start talking about memory first I'm going to be start talking about fiddle pointers and because there's this whole thing in the standard library called fiddle that's a foreign function interface and how many people here really want to think about malloc right now. So we do dispatch loops instead they're more fun this is where machines are actually being machines those are where machines are kind of having to deal with the fact they're stateful and they live over time. So a CPU effectively consists of a dispatch loop that fetches some instructions decodes it to figure out what to do with it and then executes it. The execution is quite interesting because the execution is not part of the decoding it's a separate thing whereas a lot of people they think oh it's actually it's sort of like well I've got an instruction so it just happens and it's under the hood there were millions of transistors going off in your average modern piece of hardware to figure out what the actual decoding correctly is because you're not normally running on a machine that's got the same instruction set natively as you think you're programming for. They all do dynamic recompilation effectively in the core from what looks like 8086 assembler or something into what looks like some bizarre selection of ports. It's something called transport triggering where effectively you write two addresses and then something happens because you wrote to the right magic number which is not how programmers think at all except if they're doing web apps. So we're going to start off with a C-switch. I'm not really going to talk in great detail about the code. There's loads of code in here because there are people who download my slides. I don't blog this is the closest to publication I ever get. So there's a few simple structural features though that are easy to point out. So firstly I'm taking it for granted we have some kind of stack abstraction. We can push things on we can pop things off that's all we really need to know about it. We don't care what kind of stack. We might have time later in the talk for me to introduce you to the two main kinds of stack. But then again we might just run out of time. There's always hoping. First thing is we need some op codes. We need some instructions that our program can run. So I've defined for very simple instructions we can do a push, we can do an ad, we can do a print and we can exit a program. And those are literally just numbers. They're just labels that I need to be able to use later to do something. Down here we have a nice example of a program and I've been really lazy. These are all integers. There's no attempt at any clever typing to differentiate instructions from numbers. And in fact inside the average processor there's no clever attempt to differentiate numbers and instructions. That's why you have to be careful with buffers and not let people overrun them because all kinds of weird things can happen. With all on the side many of the best games that were ever written for the Xilog Z80 used op codes that Xilog didn't know existed because they were never part of the specification. They were bugs. There were accidental overlaps of particular sets of gate paths that 13 and 14 year olds in their bedrooms discovered by just trying to write every single possible number as an instruction to see what happened. And then you suddenly find all kinds of things. So you've actually got emergent behaviour even in a processor, simple as that, that the designers didn't even know was there. I think it's really amazing. So we've got a stack that's storing some state. We've got a program that's going to run and we have a very simple interpreter. I put a couple of registers in because the easiest way to think about a lot of operations, for me to make sure I do add correctly, it's nice to know the left and the right operands that's going to give me the answer. I'm basically quite dumb at this stuff and I think to make it as explicit as possible. And we've got this little loop that will just go on forever. We can just leave that running and we can just inject numbers into it and it will try and do something within the frame of what it's allowed to do, what it knows how to do. It's a very dumb machine, this one. All of the ones I'm going to show you today are. But as you can imagine, push, add, print, those could all be much more complicated operations. Anything that you can imagine as a discrete step in some kind of process can be an operation. And then eventually we just interpret the program. Now, we can do this sort of stuff in Ruby as well. People don't do this sort of stuff in Ruby that way. We'll get to some ways later that are much more elegant, I think. But it's very simple yet again, the code is almost what is identical in structure. We can use an array as a stack because obviously arrays have stack semantics, which is nice. And we've got a program counter just so that we know where we are in our program. We can step through. And you'll notice sometimes our operations themselves read the program, which is nice. This is a proper Turing machine. It can read its own tape and decide to do things with the tape based on read again. So we've got everything there. You need to go off and build moonlanders, nuclear bomb timing controllers, or probably not that because we haven't got any concept of timing here yet. And as you can see, we have a very elegant way of expressing our program because we've got symbols. Now, symbols under the hood are basically just incredibly big numbers that aren't going to collide very often. But being able to read push 13, et cetera, compared to having to do a cast of push is just a little bit nicer. I think I prefer it. I prefer Ruby to C anyway. This is going to be my favourite Ruby comfortable time because I put in a proposal about six years ago, real-time Ruby. And it got turned out for being ludicrous. And it's not ludicrous anymore, which really cheers me up. So we can do slightly more complicated things. What we were doing previously was just having a great big switch statement. But switch statements are quite slow. See it's got a whole pile of boilerplate it puts down that slows down the dispatch in a switch statement. So what we can actually do is we can optimise that a little bit by dealing not in a switch, but dealing in a set of pointers that will go off and do something. So with direct call, effectively, when I read a program, it tells me exactly where I should go off and run a piece of code. I don't have to interpret it. I've already got it. Effectively, it's compiled down to where it lives. And it's a direct call. So it's nice and easy to understand when you're actually writing these things. The code is shorter. In fact, most of it's just disappeared. I mean, basically, we just start interpreting by calling an opcode and reading an opcode and then we just go round and round and round. And we can sort of do something similar in Ruby. So I've put a few bits of additional handy boilerplate in here and I've also put in a problem that I'm going to explain and then I'm going to show you how to solve. So firstly, it's kind of, if we want to have anything that's looping around permanently and that we also want to have a nice clean exit from, we can use catch and throw for that because that's quite cost effective for unwinding the stack and that will get us out of our infinite loop. Up at the top, I'm doing something a little bit strange. I'm taking in a program, which is just basically a string of an array of different data I want to pass through to my dispatch loop. But I'm checking to see whether or not any of those things that are being passed in are methods that I actually understand. I'm actually compiling directly down to the method calls, which is, I was expecting this to be quite difficult to port from C to Ruby, but it turns out it's actually trivially easy. So now when I'm actually running my program, my object, my VM object, it's just going off and asking itself, ah, okay, I've done that. Where should I call now? Where should I call now? And it's just calling itself internally. I'm sure there's probably some kind of stack nightmare going on, but I'm not going to worry about that. I've got this thing over here, dangerous method. This is a method I don't want run. If this program down here is actually run, it crashes because obviously it raises an exception. It never actually completes running the program. So I'm going to then solve that because you should never, ever allow data into your program without properly sanitising it to make sure it's stuff you should run. I mean, effectively back there, we have the VM equivalent of an SQL injection attack. Here we don't because we've got a black list that says I'm not going to allow any program that runs to run my load, my compile or my interpret methods, and I'm not going to allow it to run any method that's defined on a subject either. It's only allowed to run things that I, as VM, have decided to allow to be available. Obviously, it goes without saying that the code from previous would be used in a subclass. I won't say that later. Also, because we're compiling, it's kind of nice actually when you're compiling to give some kind of useful errors back to people to explain why you're refusing to compile the program you're compiling. So down here, we allow people to actually pass in actual methods, not names of methods. I thought it would be quite nice to just allow people to pass in random methods. Who knows? Somebody might have written a clever method that happens to be able to close over my current object and do something useful. But it would be kind of nice to make sure that that's not on my black list. So first time I'm going to check that it's not called something I told that I don't want to deal with. I can't do this version of this talk and go at all. There is no way to do most of this stuff and go, which is so damn frustrating. Seven years of abusing its reflection library and I cannot make it do this, which is really, bearing in mind, that's what everybody remembers me for. Oh, she does go to have one thing that you love that you can't actually do in the language everybody wants you to talk about. It's really frustrating. But also, we can then go to it and we can check, is that method actually known? I mean, there's no point compiling it in if it's not known. You could say that's a little bit naughty in Ruby because my object might actually acquire that method in the future. I'm sort of cutting it off from being able to do lazy compilation effectively, which is a bit of a weird concept to start with. In this case, I've got a method. I have to unbind it from whatever object it originally came from and bind it to the current object, which is yet again a lovely thing you can do in Ruby. I'm sure you shouldn't do it in production code. But as a result, I can now basically just inject a method at runtime and say, please handle this dispatch and it works perfectly well. I can take that and go. There's a very good way of doing that and go. But at the same time also, I mean, I've got to do the normal boilerplate to make sure that people aren't calling things in the backlist. So, this is already effectively a sanitised black sandboxed VM. It's already able to say, sorry, those instructions aren't allowed. They're dangerous. There aren't any talks I'm aware of that show you how to do that in C because it will be a phenomenal amount of code to do it. So, I'm really pleased with this. That took a day to knock that together. I was just so wow. Our interpreter is exactly the same as it was. I mean, the actual interpret function. Here's a little bit of program code that will go off and some of it's going to pass and some of it's going to fail. It's a very small amount of code to actually have a decent VM structure. Now we get into the bits that always make my head hurt. So, basically, direct calls and switches are available in, well, switches are available in every programming language. You can always fake them with if statements if you have to. Direct calls are available in any programming language that will either give you a function pointer or allow you to dynamically call a function. Once we get into indirect threading, we start moving into territory that's specific to, used to be specific to your GCC. Selang has finally acquired disability. The ability to have a label in a program and actually treat that address as a useful bit of data with which to figure out where to do a jump to. This is a great way of speeding up the insides of loop, of dispatch loops. Because now, instead of having to individually work out the next one and jump to it, or do a function call to do it, which means creating stack frames and all the nonsense that goes with that, and then rolling them back. Now, I can literally say, well, that's like 40 bytes away, please do a local jump back 40 bytes, which is like one op code, and it's going to be in the cache, it's going to be in the program cache, all the data is going to be in the data cache. It's going to be very inexpensive. It's just basically going to be, we're running with this now in our 10 to the minus nine. Rather than running in our 10 to the minus seven potentially, with function pointers that might involve having to call things from the heap. That's basically 100 fold theoretical speed up, which is lovely. I don't like the notation that you have to use in C. Double ampersands look entirely wrong, but this bit of code I think is possibly relatively easy to follow. We've got clear go-tos that are telling us where to go to and things like that. Indirect threading isn't as fast as direct threading. Direct threading is effectively jit compilation of indirect threading for one of a whole string of garbage terms that we use in the industry. What we're basically going to do is we can write a routine that knows how to compile a whole set of the op codes to the offsets that they'd actually have in the function that's the dispatch loop. When I first wrote this program, of course, I forgot about the compilation step and wondered why I kept getting segforts because if you go, well you don't always reliably get segforts. This is quite strange. This is the point where I always need a whiteboard. I have a call stack and so I decide to go off and I call a function. I've got a whole pile of stuff on that call stack that now makes sense. I've got addresses relative to the start of that function's code to be able to do jumps to and then I return from that. But say I keep all the numbers that I had in there, I take the numbers back that told you the locations and I carry them back. That seems quite reasonable if you've just spent seven years doing go with escape analysis so that automatically everything that's in that state has just been shoved on the heap by the compiler to make sure it won't go out of scope. But C doesn't have escape analysis. The moment I dump the stack frame by going up the call stack, all of that's gone. I'm no longer accessing memory that's actually pageable by my process. Sometimes it will be accessible, sometimes it won't. It's down to whether or not the operating system has had time to clear out the stack frame and do various bits of patching and whether there's dirty pages on the hardware and all of that. So sometimes I could just effectively pass up a version of the program from a function from the dispatch loop when it's been called once and it might run perfectly well the next time I ask it to and then other times it will successful. So I thought it's the only way to really do direct threading and be able to understand how the code works. So I thought well we might as well fix that. So what you basically do is you have a compiler that when you go in to your interpret function it's actually going to be your dispatch loop. You pass into it a dispatch table that tells another function that lives under it roughly how to patch up the program. And then it gives you back this thing that you just run and you can leave it running or you can pass it around useful data later because you've abstracted away the two sections. That probably makes no sense in English but I can assure you that the C code works for doing it. I'm not sure. Yes. So apparently I am going to show some code which means firstly I have to remember which piece of random software on here it is. Is it three for dispatch loops? Yes. Oops. I don't still have the version that does the seg for though unfortunately. So we're in direct threading aren't we? I should mention I'm a luddite and I don't get on very well with technology. Nobody ever believes that because of the kind of talks I normally give but it's really true. I'm not the person that any of this stuff was invented for. So effectively run. We get a very unexciting output because all we're doing is adding two numbers. But we're adding two numbers where we've just jit-compiled down to the most efficient form of dispatch loop possible in around about 20 lines of C total for running the dispatch loop and doing the compilation which is kind of neat and makes me think I might yet be able to become a C programmer one day if I can get my head around that one. I forgot my mouse. So because it's actually not that easy to deal with writing large dispatch loops with lots and lots of different labels in, I thought it was quite it would be quite nice to do C's equivalent of of meta-programming. So hello macros. Now the previous is certainly much less code for doing the kind of problem as a toy. Oops wrong direction. Wrong direction. But now I've got this nice little macro language that says I'm writing an interpreter. I'm defining some primitives to do particular things. And here by the way, here's the actual compilation table. This is a compilation table. These are what the compiler has to know about to be able to do dispatch compilation. And then this is the oh I've done it in one order. That's actually the interpreter side. That is the dispatch table. It is too early in the day for this. The compiler follows almost identical structure to the interpreter. Now that's not surprising. The place where this technique is most often used is in the inside of forth interpreters. So forth uses an immediate mode that will actually interpret a program and it uses a compilation mode where it will actually create this dictionary that consists of very optimised versions of its understanding of the program. And the two look almost identical except that one of them is producing actual addresses that will run directly and the other is effectively producing labels that can be patched to address it later. So right. I believe direct threading can be done in Ruby but I didn't really have time to write it. I thought I'd have time to write it but I spent far too much time playing around with Fiddle pointer which we'll get to possibly in a while because you can mallock in Ruby and like really efficient doubly linked lists. Sorry. Unfortunately monkey mine sometimes gets taken over when I'm doing this sort of stuff. Direct threading is probably not a good idea because effectively we will be looking at something even more compilation need than the last bit of Ruby code but it's something I plan to play with. What would actually happen in principle is we'd be doing a whole pile effectively we'd have a one way execution flow where we would keep calling another method and calling another method and calling another method until we run out of stack. But hopefully if we have a return somewhere down that loop we'd work our way back up the stack and not actually blow our computer stack to hell with a Ruby program that actually is just an infinitely deep stack. So that's why I haven't got a version because I know it's going to be a pain to actually debug that it's going to be a couple of days probably best done with alcohol. Now dispatch loop is half of what's in a CPU the other half of what's in a CPU is registers and depending on how we're doing for time when are we actually supposed to stop? 12, 10 passed? 1205 we probably won't get on to playing with memory then today. Which is odd because that's that's yeah okay. So when you build a machine you get to pick a level of abstraction that your machine works at. To be able to do anything useful we need to have so we're going to have a standard harness that we're going to use for the next few examples. This is a virtual machine as far as I'm concerned it's all the cool bits out of the last bit of ruby code all put into one VM class that we will subclass. I'm not sure that subclassing is really the way I'd want to do it in practice. I fell out of love with with class hierarchies some years ago. They always seem kind of fragile in all the wrong ways as if there's fragile in the right ways but I mean they don't but they do so we've got a very simple structure for running any kind of interpreter well virtual machine we want we've got the ability to load the program we can run the interpret we can read the program and then we can do the compile step to get it down to the efficient direct calling of methods and the program we're actually going to run is an adder we've just invented an adder we're one third of the way to building an ALU and our arithmetic logic unit we can add two numbers what I've done that slightly radical is we can add negative numbers so there's I'm going to hop back up there in a minute and show this actually running but we're going to print out a bit of state while it's running it's just going to tell us what's on the stack we've got the ability to push some data on to the stack and when we do an ad we're actually doing we're actually consuming the top two items on the on the stack and putting the answer back in place of them outside of that is also jump if not zero because up until now we haven't thought about branching or about being able to move around in the program ourselves we've only thought in terms of this linear flow there's an infinite number of kinds of moving around in a program that you can choose to make hot codes interestingly our program is quite simple we're going to add 13 we're going to add minus one to it repeatedly and down here if we don't hit until we hit zero we're going to keep jumping straight back up to the the push where we push the minus one whoops rather than push me as you can see this is not the nicest of formats for writing programs with loops because obviously I have to count the op codes to know where I'm going to jump to there's stuff a compiler would do to make that go away we're not interested in that kind of high level artifarty highfaluting nonsense we're just interested in feeding more op codes into a machine and making it work we're doing jack hard looms not doing anything more modern so whoops yeah okay let's see if we can run that wrong terminal oh actually I don't need to do that because I've got a hash bang on it so don't right we have sat for the requisite number of loop loop iterations counting down from 13 we don't see the initial 13 but we do see the zero because we're doing we're doing the print state too late but anyway as you can see we have a working loop it's counted it's trivial to actually to implement this first time I started getting into this stuff I was like I was 14 or 15 and I got a home micro running basic really horrible early 80s basic it was exciting because it was called locomotive basic so it was obviously fast and one of my favorite magazines at the time dedicated to my particular micro because obviously all the different versions of basic were subtly different in annoying ways had a two-part article on how to basically it was how to learn to program in fourth and the guy who written it he wrote this program in spaghetti basic the worst spaghetti basic with line numbers and colons in the middle of lines that you have ever seen but it ran basic fourth programs I kind of fell in love with it in all the wrong ways and it was a great antidote to playing elite which is what I'd spent most of my time on a computer doing before that and so these kind of things I get this kind of like teenage nostalgia buzz out of I apologize if nobody else does but you know it's it's I think it's kind of neat that you can do it so stack machines have the limit have the limitation that they're based on a stack now that's really nice from the point of view that you don't have to worry about decoding operands you don't have to take things out of program memory but on the other hand stack machines just aren't as compact as things that have actual memory addresses to access you can have a more compact encoding and also they're not necessarily very fast you can do all kinds of Brazil optimizations that speed up stack programs and I have known people who when writing low level maths routines insisted on doing them in fourth rather than assembler because they got faster results there are odd things you can do in the design of the actual stack machine but they're still just they're simple but they're kind of horrible so we're going to move on to an accumulator machine now this I'm not sure if anybody ever really built one of these but effectively this is where you have one register that register is where everything happens it's if anybody did ever build one of these it will have been because that will have actually been sitting on the same lump of silicon not over there on a different breadboard piece of silicon with a whole pile of cabling between it this would have been a huge optimization in certain kinds of old machine architectures there's not a lot that we actually have to change to be able to cope with having a register I still got the the push on the pop in there because I have no way of referencing registers on memory I just got this one register that will carry a result and it slightly changes how some of the operations occur so now I'm not having to index into an array to figure out where to check a value to do a jump I'm going straight to it which if you're doing it if you're doing it and see it would be a great speed up but as you can see our program's kind of grown a bit out of control to do the same thing accumulator machines are really inefficient in in their their bytecode format but it does it does basically the same thing I'm not going to bother running that one it's not a very interesting one register machines are what most of us are actually using on hard our hardware is you know once you've got a register machine more than one register you start moving into territory where compilers can do tether tricks you start moving into the territory where instead of saying I've got an array of two registers you can actually say I have a register file please treat it as if I have an infinite number of registers and then you get single never remember what's called SSA which is where you get single use of a register for a particular purpose as part of it and then things map in the compiler and all kinds of optimizations so this isn't just useful from the point of view of well it's easier for me to actually think about loading the value 13 into register 0 and then loading minus 1 into register 1 and then doing a whole pile of stuff based on register 0 and 1 it's also really nice from the point of view of writing tools on top of it to bring languages to it the thing is once you've done a register machine you realize that these registers back here I'm effectively treating them as if they can just hold fixed nums they can hold all kinds of things the most obvious speed up that most of us have experienced over the last 25-30 years of using computers is SIM a single instruction multiple data effectively vectorized operations there's no reason why a register can't be a vector so now you can do vector math with a single dispatch there are people in the city of London well before we get to the city of London which is hypercubes a matrix machine is where you say I've got a matrix rather than just a vector and now you can do a whole pile of stuff to do matrix transforms so you can build processes that can naturally do 3D movement or you can if you're doing high performance stuff in physics you often wish that machines had matrix registers and they don't but a hypercube this is basically what the city of the city of London's financial trading systems run on we're no longer just looking at a matrix oh no we're saying we can have an arbitrary number of dimensions in each matrix but we're still doing it off a single dispatch so now I mean these are the kind of tricks at the reason why NumPy is so fast and we could have that in movie quite easily if anybody could be asked hands up anybody who can be asked please we're two decades in there must be somebody who wants to do I know gas pipeline traverse a lot of something graph processes now this is one I've been thinking about a lot over the last couple of years my actual professional job is I design anonymous identity systems so I spend a lot of my time making data flow in one direction and trying to figure out how I find the right branch of the graph at any particular point to be useful once you can have a register that's a graph and another register that's a graph there's a whole pile of stuff you can do that you would never have thought of doing by writing code for it because it would be too complicated but now you're just dealing with the two things and you can talk in terms of higher concepts so we moved effectively from at the start of this talk worrying about having some numbers that happen to be labels in switches to do some very simple things to actually we can move to taking any problem we want and we can turn it into a machine that's reliably scriptable where our program can be written just in terms we want effectively the more complex we make those registers the more we've just turned assembly language into a DSL which is a totally you know I mean it's like I've been saying for years we can do Ruby assembler and we're going to skip past the memory stuff because we haven't really got time even though it's really fun so and we're going to come up to the bits that I really shouldn't talk about so once we've got a dispatch loop we want to reach out to the rest of the world well we can be polite about it and use Ruby C extensions or maybe we can be polite about it and use Ruby inline which is still a very polite way of wrapping bits of C and calling them from Ruby or we could use Fiddl's C bindings to be able to get C functions that way who remembers Wilson? You would it's always good having somebody in the audience who you know is actually as mad as you are so Wilson is an x8664 assembler as a DSL in Ruby we have in Fiddl which is for FFI we can mallock and so we can grab chunks of memory and we can write stuff into that chunks of memory and if we haven't have a stream of x8664 assembly code that we've also produced in pure Ruby and we inject it into that and then we decide to turn it into a function pointer and call it I think we just optimised a Ruby program with a direct jit to assembler now I'm not saying that's a good idea because for one thing this was written as a joke I make no guarantees it's robust you can guess probably who's behind it I'm not not not blaming them but of course we all know who's behind this one I would have liked to have done the memory bit because one of the things about playing with pointers direct C pointers in Ruby is you get a lot of segfaults where you're figuring out how it all works and those segfaults aren't always bad things a segfault is sometime all right so I've written a really complicated C++ program and I get a segfault I've done something stupid and I'm never going to figure it out I should throw away a whole chunk of code and start again but when I'm playing around with pointers in Ruby what I'm actually saying is oh I accidentally wrote outside the space I got available I wonder why and I can actually understand my code I've written I claim occasionally and I can work back through it we'd never say die I can capture the segfault and I can ask myself did it really matter was I going to use that memory yet no I wasn't fine continue I have actually found a use case for never say die so I'm expecting Aaron to give me drinks for that I found this talk yesterday a friend of mine in the go community posted around this is well worth going in and reading he works at Valve he mostly is concerned about low level optimization of code so as to not have his cache lines blowing and it's just a brilliant talk basically he's got two D and three D animations that were done by proper professional animators at Valve with cars not being able to get down the very narrow bus connection from main memory to the processor and it's just a really good talk it really explains a lot about caches and memory latency and then finally there were four books that if you want to learn more about this are really interesting to read the one up in the top corner is an Elsevier publication which means it's probably about $120 I think I managed to finally get a legal copy of it as a PDF when they had a massive sale on I got it for like $40 I've got a print copy so I didn't feel bad about having a bootleg before that it's only about four chapters in it that are actually genuinely useful because most of it's about the other kinds of virtual machines not processed ones if you can track down a copy of Java virtual machine this is hilariously good fun because it comes with a three and a half inch floppy disk with the jasmine assembler on it so you can write your own low-level Java stuff in Java assembler Threaded Interpretive Languages is a book from the 70s published by Byte and it's all about the great and exciting world of stringing together the inside of forth without using the word forth anywhere for copyright and trademark issues you know all of those diagrams in computer science books of how lists work with the little boxes and the things basically about a third of the book is just solid dense diagrams like that it's very difficult to get into and this is my all-time favorite programming book I have name checked this in so many talks over the years that the chap who wrote it unfortunately is no longer with us it's 10 chapters long eight of them are actually about implementing different programming languages and two of them are about how to do certain kinds of optimizations to them all the code is in Pascal in which he implements s expressions in the first chapter so you can actually understand why s expressions work how they do and then he implements a whole pile of other languages that aren't Lisp plus Lisp with s expressions there's Clue which is sort of like the language that gave us data encapsulation before anybody figured out you could have that these objects could have methods on them there's I think there's ML there's Small Talk there's APL so the whole of that chapter is basically about how do I like vector operations in Pascal it's brilliant and then the two round-up chapters one is on machine translation how you actually turn your internal format into assembler and the other is about garbage collection and it's a really simple chapter on garbage collection there are massive volumes that you can get on garbage collection and they all disagree most of the time about what's good and bad but this is a nice simple I've written a really boring list when I need to garbage collect stuff chapter so I highly recommend that unfortunately it's not in print but all the code is online in C because somebody ported it a few years back and it's well worth hunting down thank you very much