 I think we're gonna get started. So everybody smile. So because I'm a millennial and therefore I feed off of avocado toast and Facebook likes, feel free to like that on Twitter. It'll be up later. Hi, my name's Kevin Dice. This is what I go by on the internet. It's Katie Dice. If you can't remember, my last name is in alphabetical order. I only recently discovered that. 27 years later. I work at a company called Culture HQ. We build better workplace communities. If your workplace could use a better community, come and talk to me. We have a lot of fun and yeah. So we're gonna talk about Ruby. We're gonna talk about what happens when Ruby sees this program. And so if math is right, that's gonna put out 10. We go to our command line, we get 10, looks like I can do math. So we're gonna talk about everything that happens between the time when you type RubyExample.rb and that 10 gets printed out. So what we're talking about is the Ruby execution process. There's a lot of different parts of this and it starts with code getting required. After code gets required, the source is read. It's then tokenized. You build an abstract syntax tree. You interpret the abstract syntax tree. That was the case with Ruby 1.8. That's actually a little old. We're still gonna talk about that. Nowadays we build instruction sequences and we go and execute those instruction sequences. And so this is the flow that your code goes through when Ruby gets executed. So let's talk about tokenizing. So tokenization is a topic that's talked about a lot in computer science. It's pretty well discussed, pretty well documented and you can go see it online. But this is called a lexical analysis. And what it's gonna do is it's gonna run through your program and it's gonna find all of the various tokens within your program. And as it goes, it's gonna build a nice little list and so you have all your tokens. And you notice something interesting about this. Puts is an ident, A is an ident. Now we know from Ruby that those are two different semantic things. One is a variable, one is a method call. But in lexical analysis we don't have that meaning. We don't have that semantic meaning yet. So we have our list and that's tokenization. So from there we're gonna go build an abstract syntax tree. So we're gonna look at this list again and we're gonna go build a tree from it. So we're gonna start from the beginning and we're gonna build a root node. And what this is gonna do is it's gonna match the first pattern that it finds, the longest pattern that it finds. And so it sees, okay, I understand ident op equals integer new line. I understand that pattern. That pattern is a local variable assignment. So, okay, I have my local variable assignment. I see the next pattern. I see an expression that is a method call and then I see a sub-expression that is an addition. And so we can add that. So now we have our abstract syntax tree. And what we've done is injected semantic meaning into our previously just tokens, our list of tokens. Okay. So we have our instruction sequences. And so in Ruby 1.8 and before, this was immediately interpreted. So when you have a list of abstract syntax tree, when you have your nodes and you're gonna interpret it, you need two things. You need one, which is a state and apparently a truck. Okay. You need a state, which is a global state that is going to run around with your program as it's being interpreted. And you have a stack. This is a stack-based virtual machine. You're interpreting it as a stack. There are other kinds of virtual machines. This one is not. And so we type RubyExample.rb and we run down our tree and we say, okay, this could be a better tree. It's, anyway, that would be the leftmost node. So we see our integer and we push the integer onto the stack. We see our local variable assignment. We push a set A onto the stack. And so then because it's a local variable assignment, we just pop that off and that goes into the local table. Then go down to the bottom of the next pattern and we see integer five. Okay, we put that onto the stack. We see local variable A. We're going to pull that out of the local table and replace it with a five. We're going to send plus to five, right? So addition in Ruby is sending the plus operator to whatever object is being called plus on. So then we get 10, that gets pushed onto the stack. We get puts, we put the receiver, which is the implicit receiver self. We get that, we get 10, and now we've successfully interpreted our abstract syntax tree. So this was Ruby 1.8. Ruby 1.9 introduced YARV. I believe it stands for yet another Ruby virtual machine. YARV is a virtual machine. It is a way of taking, instead of taking an abstract syntax tree and immediately interpreting it, it's taking an abstract syntax tree, building instruction sequences, much the same you would build assembly from C, right? This is a compilation process. And it goes and executes those. So let's look at building instruction sequences from an abstract syntax tree. It's much the same interpretation process that it was previously. We go trace, by the way, just hooks into a trace point, not important for this. But we can see put object. That would have previously put an object onto the stack. Now we just have an instruction that says put object and an argument to that. We have a local variable assignment, which is set local, the argument three. We're gonna send puts, we're gonna local variable pull, and so on and so forth. Now we have our list of instruction sequences. So from there, we can execute them. And we can execute them in a more intelligent way. We can do all kinds of optimizations. We can do what's called a threaded VM approach where the previous instructions point to the next instructions and it goes much more quickly. And you can run through this process in the same way that we were previously interpreting it. It looks very similar, this whole process. But you'll notice it's just running straight down. And it turns out this is much more efficient. This can be done much more quickly. There's all kinds of great optimizations. I don't pretend to be an expert on virtual machines, but I know when I see a huge speed improvement that something went right. So thank you Ruby Core team for making that happen. So in Ruby 2.3, we got to go even further with this. We got the ability to take those instruction sequences that we compiled. And before we execute them, we can write those out to a file and we can read them back in. So. Kuiti introduced this by a year and a half ago. If you scroll down on the Ruby bug report, you see this. It's pretty excellent. This is what the rest of this talk is based on. Kuiti was in the room when I presented this at RubyKaigi. That was really entertaining. And not at all nerve-wracking. Anyway, the funny part about this also is that MGA is coming, but we're gonna talk about that later. So let's talk about this. Let's look at this example. We can take this file and in Ruby 2.3 we got this new API called compile file on Ruby VM instruction sequence. And this compile file function will return an instruction sequence instance. You can call two binary on it. That returns a big thing that you can't read. We can write that out to a file. And then you can see that file and you might notice that it's quite a bit larger than the other file. It contains all the information that is necessary, all the semantic information that is implied by the source code that is needed to execute that in another place. Now, there are a couple caveats to this. You need, this is machine-specific, right? This is architecture-specific. It also has information inside of it like underscore underscore file underscore underscore is gonna be different on different machines. So you can't take this file and go throw it somewhere else and run it. Actually you might be able to, but you know, probably not. And on the other side of that, you can go and read that file in. You can load that file from binary and you can evaluate it and you get your same 10. That works, that's Ruby 2.3. So one thing that we can do is we can take this and we can programmatically load those instruction sequences. And this came in Ruby 2.3, we can programmatically load them. What that means is we're gonna take this tree that we had and we're gonna split it into two paths. So there's still the normal path. There's still the normal require, read, tokenize, build, build instruction sequences. Except instead of immediately executing instruction sequences, we can instead write the instruction sequences out to a file and then execute them. And that means the next time we go to require that file, we can follow the right path, read those instruction sequences out of the written binary file and execute them. So this is internal to Ruby. Doesn't matter if you don't know C. You can see that as a call, RBISEC loadISEC. Basically, if RBISEC loadISEC returns a value for a given file name, then it's going to skip all the rest of that code. It's not gonna build a new review parser. It's not going to go and build everything. It's just going to take that and evaluate it. And this is the internals of that. That is checking whether or not a function exists. That's checking whether or not loadISEC exists on the RBVM instruction sequence Singleton class. And then if it does, it's just going to return that. So here's an example. We're gonna walk through this code real quick. This is an example of adding that loadISEC function. So this loadISEC function is going to take, is going to get every single file that is required. Every single time a file is required, it's gonna run through this class. And so source path is going to be the path to the file that you are trying to compile. And this happens, right? If you go and type this into your Ruby instance, your app, it's gonna run through this file. And so you define this method, and the first thing we're gonna do is we're gonna, you can pretty much squint at that and just say, okay, that's just building a path to a dumped binary file that we're gonna use. And we're gonna say basically, whether or not this file is fresh, if this file has been, if the ISEC file has been updated more recently than the source code has been updated, that means that we have already compiled this file, this file is the most recent version, and we can just use that. And we're gonna return the instruction sequence load from binary, that this is the right path, the right side path of that tree. If it's not, then we're gonna compile and go through the normal process. You can rescue syntax error and runtime error and return nil, this is a nice thing because if this method returns nil, it goes to the normal process, we don't have to worry about it. So there are a couple examples out there in the wild. This one is from BootSnap. BootSnap is a gem that came out of Shopify, it's pretty great, it does a whole mess of stuff, but one of the things that it does is it compiles, your ISEC files, if you tell it to, compiles your files down to instruction sequences and loads them appropriately, and this class gets included into RBVM instruction sequence. The other one is Yomikomu, this came out, this is also a gem, it also ships with your Ruby install if you look inside of it. It's in the Ruby source and it is an example loader and you can go and check that out as well. So now for the fun part. So I saw this, all this stuff happening, and for reasons that we will get into after I show you the good parts, I really wanted to mess around with this and see what we could do and stretch the limits of this compilation process. So let's go back and look at loadISEC, and we see a couple things here. ISEC equals RBVM instruction, is this gonna work? What, nice. So this little part right here, the instruction sequence compile part. We're gonna split that out. So now we've got the content and we're gonna load it in the next line. And we're gonna split this up. This is the fun part. This right here, right here, we have a variable called content. Content has the Ruby source in a string. The next line, Ruby instruction sequence compile goes and loads that, but we can do a whole mess of stuff in here, like arbitrarily G-subbing and eval-ing. Because who doesn't love to do that? For the, you know, like five people in this room that can't read regex off the top of their head, that should be everyone. You can take this, you can take this source, and you can do this. Why would you ever do this? Why? Well, here's the reason Ruby is incredibly flexible. You can override the multiplication operator in Ruby. Why? I don't know, you shouldn't do that. Don't do that. Why do people do things? But you can, it gives you that flexibility. And because it gives you that flexibility, Ruby's virtual machine cannot optimize away 24 times 60 times 60. It can't do it. It can do it, there are proposals, there's like optimistic, decomp... There's a whole bunch of stuff, but in general, it can't do it. And the reason it can't do it is because 24 is an integer. Inager times could have been overwritten at this point. You don't know. So what this actually looks like in the instruction sequence, the list of instruction sequences is integer, it's put object, and then it's, you know, opt times, and then put object, and then opt times, and then put object, and you've got all these different instruction sequences that are creating bloat in memory, in time, and all this stuff. Any other language, well, I shouldn't say. A lot of other languages would be able to go through a process called instruction elimination. They would see this code, and they would replace it with just the value that is returned by that multiplication, such that when the code goes to execute, it would only be multiplying days times that value. Now you may say, Kevin, why don't you just replace it with the value of the multiplication? And that's true, you could, you absolutely could. But I like this. I don't remember how many seconds are in a day, ever. I do know that there are 60 seconds in a minute, that there are 60 minutes in an hour, and there are 24 hours in a day. So I want that visibility. I want to see that in my code, but I don't wanna pay for it. So I'm gonna replace it with this gsub. And it turns out that this works. You can just replace this. And what I've done with this gsub is allowed myself to inject more semantic meaning into your code without sacrificing anything in terms of performance. So you can do it again. We can do more weird things. Say you have some date literal in your code. You have some date literal, and date.parse is notoriously slow because date.parse can handle pretty much anything you throw at it. What you should be using is date.strp time. Date.strp time takes a format, expects a certain format for a date, and then when you hand it a date, it will parse it out much more quickly. But I don't want to deal with strp format. I can never, ever remember what strf time format takes. Is it a percent y? Is it a percent m? I never know. So what I wanna do is I want to do this with tilde d, and what this is going to do is during the compilation process, it's going to parse that date and replace it with that date.strp time. Now this is a date literal. Obviously this can be pulled out into a separate variable. You could do this in many other ways that are probably just as efficient, but I like this. I like being able to see this in my code. I like the simplicity of it. If you're seeing a pattern with the tilde as I completely stole that from Elixir, so you might look at Sigil's in Elixir. So this works. You can do this. I wrote a gem. It's called Vernacular. You can check it out. And you can write your own arbitrary g-subs. It will hook into the compilation process. You can run it in production if you like, if you're feeling feisty. It looks like this. You basically define a regex modifier. And it does stuff for you. So what it does is we've extended this tree. We've extended this flow chart. We've added a step. We modify the source after we read it, and then we tokenize it. But let's go further. I wanna do more weird things to your code. I want to not just be able to do lexical analysis. What we were doing with g-subbing is lexical analysis. We didn't really have enough semantic information to really understand what the code was doing. All we could do was arbitrarily g-sub stuff. That's effectively compiler macros, right? Any other language, that's just compiler macros. We're literally replacing source with other source. But I wanna do more things. So there's a gem called parser. The parser gem has an interesting part of it that does rewriting automatically. That's the fun part. And if you look at this, you can see I'm blatantly abusing the parser gem. I love this. Just send, because what is method privacy? So I'm abusing this gem, but there are certain things that you can do. You can arbitrarily build rewriters in this gem. And so I can take this code, I can inject this into the compilation process, and I can do really weird things. Like take this, and we have this add method. Now I don't know why you would ever write this method, but you could, it's possible. And I want type checks on this. Ruby doesn't have type checks. A lot of people want to add type checks. I don't really wanna add type checks to Ruby. But you could do this, and I wanted to look like that, because I like to pretend I know Scala. And I want this. What I'm really representing is this. That's verbose. I don't want that. I want just this. I just want a succinct way of saying, hey, throw an error if that's not an integer. And you can do this. You can do this by building your own version of the AST with the parser gem, rewriting the code, putting it back in, letting Ruby compile it with a valid Ruby source, and it goes on and executes. Here's another example. I want to check the output of this. I want that to equal an integer. And so we're gonna wrap the execution of this function in a begin and end. We're gonna throw an error if it's not the type we expect. And all I wanna do is write this. I only wanna represent it with this. So it works. You check it out in the gem. It's called an AST modifier. This does horrible things to the parser gem. You also need to have pretty intimate knowledge of the way that the parser gem is parsing your stuff. You have to extend the source code. So what it does is it goes in. It sees your stuff. It extends the parser.y, it rebuilds the parser, and then injects it into the parser gem. It does bad things. But it does work. And so we've taken this already horrendously large flowchart and added another step. And that is to build the AST, modify the AST, go back, read the source, tokenize, build the AST, and go through the whole process. But I wanna go even further. I wanna do really weird stuff. So really weird stuff looks like this. Let's go and look back at when we compiled that file to binary. And let's hex-stump it. What even is that? I had a moment of panic when I saw that for the first time. I'm not a low-level programmer. I like Ruby to handle this stuff for me. But it turns out you can go into the C source. That's what open source is all about. You can find out exactly what is in that code. I love the comment, by the way, at the top of this struct that is YARB. I was like, what the hell is that? What does that mean? And then I looked. It's the first four characters of every dump. It's amazing. You also notice in here, you go through, you find, hey, that's my platform. And so I started playing around with this and I realized, hey, we can get all kinds of information out of here. I just have to learn C. Small barrier to enter. So you can find the header, you can find the Ruby version, you can find the offset within that binary dump of each of the compiled instruction sequences. You can find the list of IDs, which are symbols and other things. You can find the objects. You can go and find all of the information in here that you actually need. You can fork Ruby. You can throw debugging information into it. You can get all this stuff. So what does this mean? This means load isek allows us to provide our own bytecode. We can inject debugging information into Ruby to learn the binary structure of a YARF bytecode dump. We already showed that we can take that source and hand it to instruction sequence and return it. So we can write our own code that compiles down to YARF bytecode. What this has done, load isek, what load isek has opened is a window into the Ruby version machine as a compilation target. It is not a well-supported compilation target. It is not a desired compilation target. But it is a compilation target nonetheless. So what we can do is go back into this code and we can add a little conditional. You can say if the file ends with .tb.rb, we're gonna call it 2b.compile. 2b is my little thing. It's a project. Stands for tiny Ruby. It does nothing. But it does show something interesting. It shows that what load isek is really doing is establishing a contract between something that takes a string and then something that respond, takes a string and it's an object and responds to two binary. And what two binary should do is return a string that represents your bytecode and then RubyVM instruction sequence compile, that is what that does. It returns an instance of the instruction sequence. It responds to two binary, returns you with a string. 2b.compile returns a compiled 2b instruction sequence responds to two binary. You can put it out to a string. We've added even more to this obnoxious flowchart that is now haunting me. And we've added this step where we go require, we read source, we split now on the end of the file name, we go all the way down to read 2b, we write the instruction sequences. Because we're writing our bytecode, we maintain interoperability with Ruby. This is kind of key, this is an interesting point. We're just writing code that compiles to the Ruby bytecode. Therefore, we're effectively writing something, we're not writing a new version of Ruby. We're not writing an alternate implementation of the Ruby spec, right? This is not J-Ruby, this is not Truffle Ruby, this is not any of the other ones. We're still writing your bytecode, so we still can call back and forth. We can get the benefits of a compiler, instruction elimination, anything else you wanna implement on the compiler side without having to write a native extension. I don't like writing native extensions. C and Rust are great, I like C and Rust. I don't like writing native extensions. They're a pain. I don't like worrying about memory. I just don't. I choose Ruby because it makes me happy, not because everything is a segfault. So what this looks like is this little thing. I actually do like Rust, but this is a function. And we can look at this code and you don't have to understand the individual semantic way of writing stuff in this little language, but something you can see is this little four over three times 3.1415, right? We already talked about how this couldn't be eliminated in Ruby because of the flexibility of Ruby. To be is not flexible, it does nothing, as I said. And so it can eliminate that because it only handles mathematical functions. And you can call into that from Ruby and get the value. So it doesn't really work. It does, it does, it works sometimes. No, really don't use it. But you can check it out on GitHub. I'm making progress slowly, but surely. Now caveat, if MJIT becomes a thing, as Chad said yesterday, I will euthanize this very gently. It will become a read-only repo on GitHub. And you might look at this flowchart and just say why? Why would you do this? Why would you ever do this? Because in the end, this is what I care about. This is it, this is all of that flowchart that I actually care about. I want to require my code, I want to read the compiled code and I want to execute the compiled code. This is what you get with a compiled language. And you might complain about a compiled language. You might say, oh, that's such a pain. I have to go through this whole process of compilation, but look at this flowchart. There's nothing in it. And the benefits of this are pretty huge when you don't have to worry about, I don't know, things breaking, I guess. But you can do all kinds of weird stuff and still end up with this flowchart. So this is the obvious question. Why would you ever do this? I get this question a lot when people ask me why I spend hours and hours on this thing. And there's a couple of thoughts that I want to leave you with. The first is that programming is using human language to represent computer lexicon. And that goes along with code gets read many more times that it gets written. Thank you, Sandy, for that. And finally, you need to optimize her reading code. So let's take a second and just think about this. The very first thing I said to this room when I got up to the stage was how y'all doing. If you go to an English grammar book, you will not find how y'all doing. And yet every single person in this room understood what I said. That's slang, that is what slang is. Language only matters, slang only matters for being incorrect if you cannot extract semantic meaning from what I'm saying to you. If I use slang and you understand that communication, then what does it matter? It's not important that I use slang. It's important that you got the meaning that I was trying to express. In much the same way, it does not matter which way you use this in code. The contract is that the computer can understand what you are trying to say. That is the only contract. Code is therefore not better if it conforms to one particular style. It is not better code. You might think that it is more effective for translating that code to multiple teams. If you enforce a style guide, maybe more developers can use that code and it's better for refactoring and all kinds of benefits. I definitely see the value of using a linter. But what I'm saying is that as long as you get the message across, it's still effective. So you can use this vernacular gem. You can inject a certain tribal knowledge into your application of what tilde n means. You can get the performance benefits and you can use it in production. So what's next? There's a couple things. The vernacular gem will work provided compiling to instruction sequences always works. If MJIT becomes a thing, if literally anything changes about the binary format of Ruby's dumps, which is not guaranteed in any way, then again, I will euthanize to be in a very gracious way. But there's a lot of different things that can be done with this kind of compilation process. And for those of you who saw Justin Cyril's talk yesterday, he was talking about the inherent value of doing something, even if the code itself is not valuable. I think this project has value, no matter what. I think that the thought that you can take Ruby and inject some process into the compilation process, I think thinking about these kinds of things can push the Ruby virtual machine forward. You know, we talk about people like to rag on Java a lot. But the JVM, in my estimation, is still one of the greatest programming achievements we've ever had. It does a lot of things and it's a lot of ways it's a standard. And you look at all the different languages that compile to the JVM and it's amazing. I mean, it's a whole world of different programming tools and toolkits and everything. So I want that for the Ruby virtual machine. I want that a lot because I want more Ruby jobs. I want more people to be employed by Ruby. I want more people to be programming Ruby. I really like Ruby and I want it to keep going. So I want us to think about this as a community. Okay, that's all I got. Thank you. So we have a little bit of time for Q&A. I'm going to preempt two questions. One, yes, I'm using this in production. Get over it. The second one, I guess I already did cover, which is if MJIT comes in, yeah, 2B dies. It dies, it's sad, but anyway. Any other questions? Oh yeah, because I have no time. But yeah, no, I mean, it doesn't necessarily die. I have to. I'm sorry, yes. The question is why would 2B die if the opcode format changes? And he's right, if people want to help me, it doesn't die. Yeah, I just bought a house. It's pretty exciting. I have no time. Yeah, good questions. What's the most compelling use of vernacular in production is the question. I love it for test macros. I have one that is pretty great. It's all caps debug. And it's require IRB, binding.irb. Because you can do that by putting it into a separate method and then calling, but then you have to go up one binding level and it's weird, but you can do all kinds of weird things with test macros, right? There's a certain approach to testing that is much more functional and less object oriented. And when you think about it that way and you just, you can, I don't know, I love it. So testing. I have the tilde n and the tilde d. I don't use it a lot, but I love tilde n because I really like having anything that has a lot of different, if it's any kind of formula, I really like having it all expressed out as opposed to just writing the optimization. Yes, yes it will. The reason I rescue that is because, sure yes, in the code it rescues syntax errors and runtime errors in the compilation process and then returns nil. And the question is, isn't that gonna fail if you have any of these extra like sigils or anything in the source code? And yes it will fail. The reason I'm rescuing that is in case you have a syntax error in your normal Ruby code. It still needs to fall back and I don't actually know how Ruby handles all that. It throws a syntax error somewhere, but I'm just letting it handle it entirely. Yeah, so there's a, the question was when you, when I require parser, it's gonna like recursively require stuff because it's gonna have to parse the parser gem and do a whole bunch of weird stuff. Yeah, that took me like a couple days. I realized it halfway through what I was doing. I was like, oh my god, this is dying. And so what I did was I have a big if statement that says if it's from the parser gem, just ignore it. Just like, don't compile it. Conditionals are great. The question is have I tried hacking up the C interpreter so that it just loads the binary compiled files? No, I have not. Just bought a house. Any other questions? Great, well if you have any more, feel free to come talk to me. As I said, I live off of Avocado Toast and Twitter Likes so check out my Twitter profile. Thanks for coming.