 Today, I'll talk about just-in-time compiler for MRI or CRB, and I'm Takashi Kokubin, so let me introduce myself first. Last year of the RubyConf, there are some similar people from Japanese, and people are confused who is Matt or Koichi or me. They are wearing grasses or black foodies. So this year, I'm wearing very convenient t-shirt that has an icon. Yeah, nice to meet you. This is my icon, so please remember me with this. And like, even if I attended last year and I met some people and this year, some people said to me, nice to meet you, but some of you guys I met so before. Yeah, and so one of them, my project is the hamlet. It's the originally eight times faster version of Hamlet template language implementation. So I'm interested in template engine optimization. Then I joined Hamlet organization, and this year, I optimized Hamlet four times faster. And then joined the Ruby organization and ERB maintainer. Then I made ERB two times faster. This will be shifting to point five. So I have fast things and making things faster. And I love Ruby language. So why not I optimize Ruby? So it's time to introduce JIT Compile. I did this. The right one is the JIT Compile I made this year. So this talk focuses on how I achieve this. And before that, probably many of you have no experience to develop JIT Compile. So I introduced what's JIT Compile. JIT is the abbreviation of just in time. And JIT Compile optimizes the program by compiling the, compiling the program to native code. So probably many of you guys have no experience to compile to native code. So I will show the difference between native code and byte code. So example method is that this, just three, multiply three method, and it's by Ruby, it's passed to this kind of tree, and it's compared to this kind of thing. So and the right side one appears frequently. So please remember this. It means put three and put three again, and multiply operation, and then give means return. So it's very simple. And Ruby VM can understand the right side code as the byte code. Bite code means the not native and the VMs are language. So the, and Ruby VM is stack-based instruction, no, no, no, a stack-based VM. So when Ruby VM interprets the byte code, it's rows the instruction one by one, and it's pushed to stack when the put instruction is read. And then three is pushed to stack again. And when it reaches the operation of much, right, it's calculated to nine. And then it's returned in leave from the VM evaluation method returns the nine. So how is different? Is it different from JIT compiler's version? And with JIT compiler, normally, it has another thread in for JIT compiler. And for, from initial, it's interprets the, like the, above, the former way. And it's which is the core count, method core count is the increased. JIT compiler detects it as the hotspot, and JIT compiler compiles the hotspot to native code. Then VM switch to code, just call the native code, instead of dispatching instructions by their cell. So the byte code is the actual code of Ruby. And after JIT generates a native code from byte code, Ruby VM starts to call native code. So it's the basic of JIT compiler. So if the native code is faster than evaluating byte code one by one, it becomes faster. But why this kind of complex thing should be introduced to Ruby? So first of all, I want to make things faster. But the reason is, one of the reasons is money, because, because fast means that it requires less computing resources. So the, your number of server can be decreased by the, if the Ruby is optimized. And also any experience, if Ruby becomes two times faster, probably your application becomes two times faster. So if the application is faster, probably user experience is improved. And also to always use Ruby. This year, I developed distributed Q middleware with Java, because of the parallelism and the performance. But I want to use Ruby in ideal situation, because I just love Ruby. So then another criticism is that just in time versus ahead of time, the characteristics of the JIT compiler is that it compiles program during the execution of the method itself. And you may think currently, byte code is compiled before the execution, but JIT does not compile the program before execution. But the AOT compiles the method to native code before the execution. So why not AOT for this purpose? One of the reason is that keep boot time shorter, because the optimizing method to native code is very time consuming. And if the compilation is asynchronous, the compilation time can be skipped. So boot time or some one-shot task can be very fast. And another reason is that some optimization require the runtime information, because Ruby is very dynamic, so many things can't be known until it's executed. So we can use the runtime information for that kind of optimization. So then another reason is there is working kind of vacuuming implementations and that achieved better performance. So there are some promising candidates of the JIT compiles. And the right side one, MJIT is the Vladimir's JIT compiler. And the one left side that you have, MJIT is one I created. So they are very faster than Ruby 2.5. It's the current trunk, most of the trunk. And it's the two of the JIT compiles doubles the performance. So it's worth merging to the trunk maybe. So I want to explain the mechanisms as much as possible. There are three implementations I want to introduce here. Two of them are made by me. In half a year, I developed two different JIT compiles. First one, Eril Abbey. Ah, thank you. The original title of this talk was the Eril Abbey JIT compiler, but method JIT compiler. But after Ruby-Kaigi finishes, I developed another JIT compiler called FMJIT. So this talk title is changed. The middle one, MJIT is developed by Bradimo and introduced by Mats in the first keynote. So I'll introduce one of them. Oh, sorry, three of them. Okay, there. So Eril Abbey is motivated by the Ruby-Kaigi 2015 keynote. It's presented by Evan Phoenix. It's very impressive, and I very like the talk. And its essence was the converting the Ruby core by CRAN to Eril Abbey's code and generate Eril Abbey's byte code, then compare them all together. So if you do that, we can inline Ruby core's methods to Ruby's methods and very, very optimized. So, and I made it possible with Eril Abbey. Then I'll introduce how it works. So when it interprets the byte code frequently, Eril Abbey detects the hotspot by stack-prof-like profiler, and when it's detected as hotspot, JIT compiles the methods to Eril VMIR. Not good point is that it's not different thread. As it's experimental project, it compares the same in the same thread. And there is a pre-compiled Eril VMIR bit code from Ruby's core methods. So it's the, and the Eril VMIR uses both of the Eril VMIR, then it's optimized well with those Eril VMIRs to native code. Then switch to, from evaluating byte code to calling native code. So how was its performance? With this, well-designed for this implementation benchmark, was about five times faster. Why is it fast anyway? So it is the trick of the Eril VMIR optimization is called Eril VMIR pass. Eril VMIR pass is a kind of framework to analyze and optimize Eril VMIR. And any, many of the optimization of CRAN compiler is done in Eril VMIR pass. It's kind of rack and middlewares. So, and it takes Eril VMIR and outputs Eril VMIR. So we can organize many Eril VMIR pass you want. And then probably you can't read them, but they are the Eril VMIR pass used in Eril RB and affected their performance. But I could reproduce the same five times faster performance with only two passes. So let's see how they contribute to performance. So it would be the secret of Eril RB's optimization. The summary, just compiling the code to native code is not makes it faster because just comparing to native code means just removing the dispatch of instructions. Bite code dispatch was the go to, just go to, leaving a pointer of label from some array and go to some address in function. Some address of function, yeah, okay. So it's not so optimizing. Then if we add the function inlining pass to this pipeline, it's optimized well. And Eril CM pass is doob invariant code in motion is also has the major impact. But as you can see, inlining function is the very important thing for optimizing native code. So I learned that thing from this Eril RB project. But learnings from Eril RB is that Eril VM is hard to develop with. And also, Eril RB works with Eril VM 4.0, but it does not work with Eril VM 5.0. It's very breaking, yeah. And also, debugging is very important to maintain program for a long time, but Eril VM IR is, of course, it's not C language. So when we do debugging with GDB, we can see C code for native code because it's in Eril VM IR. So it will be harder to debug. So I wanted to tackle another version. It's built with C language. So yeah, important learnings from this is that main optimization target of Eril RB is Ruby cores methods. So directory generating the Eril VM IR was not important or didn't have major impact for optimization. So using CRAN would be enough for optimizing C language code of MRI. So I changed the approach. Then before I introduce another one, I want to introduce Mjits for the version of Matz keynotes. And Mjits is developed by Vladimir Makarov. He's GCC main term. And Mjits project consists of two components. It's just not JIT compiler. It has VM instruction replacement and also Mjits method JIT in compiler. VM instructions are kind of put object or leave, all of them were replaced in this project. And as you were seeing, YARB is currently stack-based VM like Java, but this project replaces it to register-based language instructions. And another point I want to notice is this dynamic bytecode specialization. It also does static bytecode specializations but also dynamic ones. When method is executed or evaluated in VM, it changes the bytecode to another instruction. Like this example is in the set. In the set is a RF reference as known set. And the set has the version of array version or under argument is integer version. So many specialized versions exist in Mjits and it's also, it will contribute to the JIT performance because if the code is where it can be smaller, integer version can be, of course, many optimized well. So it's good for JIT compiler. And background, the method difference between LARB and Mjit is that it's compared in background thread with pre-thread. So it's currently not portable to Windows, but the important thing is that it's compared as synchronously, so it's good performance. And it also has de-optimization. Probably you may not know, but during the intermediate states of JIT's calling, it can fall back to bytecode by restoring the state of original VM states. So let's see how it works. Here's original version, a current version of MRI and it's depressed like this. The right side's instructions are Mjits original out here instructions and VM has infinite number of registers. And Mjits reads the instruction and assigns the values to register instead of stack. And when during execution, it specializes the instruction by the receiver or arguments type. It's either be good for inlining the definitions for in the JIT compiler. Then register instruction returns the value from register. It's different from the current behavior. And Mjits interprets the, during the interprets bytecode on a thread, compiles the bytecode to C code. It's the very kind of invention in Mjits. It writes the C code to disk directory and it's let's GCC or CRAN, the C compiler, binary or the JIT compiler executes. X calls exec to execute the compiler and it's compiles to SO file. And the calling the LCA function allows us to roll the native code as a C function pointer. So we can call that C function from VM instead of evaluating the bytecode. So how is Mjits? Mjits achieved great performance, very great. It claims three times faster from Ruby 2.4. Actually it was not true in my machine but kind of very faster. And replacing, but also it replaces the compiler to generate the other instruction sets and also VM instruction. So replacing VM instruction at all might be very risky. So I thought I want to fix this. Then I focused this. Yeah, Mjits. Yeah, this is the focus of Mjits project and this project is currently I'm working on. And yeah, and the major characteristic is that it achieves JIT comparison without changing VM instructions at all. So it simulates stack on comparison and does not require register-based instructions. So let's see how your Mjits optimizes this. Ah, okay, okay, ah, oh, okay, I understood. So before seeing how Jits compiler optimizes the method, how would you optimize this method in VM level? You may think this can be possible, like just putting nine from initial, but consider this method. This is the definition allows us to do this kind of three, multiplied by three equals to three somehow. So this optimization can't be done in, it's not impossible, but it's hard. And currently YARB does not optimize in this way. But in YARB Mjits, during its interpretation, JIT compiler checks the code count and generates C code in like this. Probably it's hard to see, but just doing automata is called with three and three arguments. And also important thing is that it has the definition of automata. It is for inline in the future. Because I learned inlining is important in LRB, I made sure that it's inline in this version. So C compiler compiles this source code. And then if as the method definition of automata is included in this source code, it's inline in this way in C compiler. And then as three and three is provided in this source code, the three multiplied three is calculated beforehand in before execution. And also notable thing is that it checks the method of the definition properly. So it really works. And we can return the result, optimized result. It's optimized by C compiler. I want to say that this is not optimized by our work, but it's optimized by other C compiler's efforts. So optimizing code would be very complex thing, but we can delegate such hard work to see very good other things. Then it's loaded as so far and there seem to native code and call this one. So we can return the optimized nine, just nine, without calculating the multiply. Of course, it's probably such optimization wouldn't be needed if just it's a multiply calculation, but another very complex version of the optimization can be provided because we don't do some specific optimization in ourselves. So basic ideas to optimize MGIT is the inlining functions and skip unnecessary work and decrease in comparison time. And so I introduced secrets of MGIT optimizations. Many of the following slides are in only text, so you may be sleepy, but please ask me questions later. The first one is inlining function by having VM source code as header. MGIT originally do this by compiling VM.C beforehand and compiling not fully compiled, but just preprocessing header include or define or things like that. So by having those VM instructions, many of the frequently used methods in VM instructions, we can inline those methods. And then transform functions in header to static. As the VM.C compiled header can be dealt with by our script, we can modify the header by our Ruby script. And we MGIT adds the static to the non-static functions and if the function is static, C compiler can know it is not necessary to be compiled. So comparison is skipped for those static functions. And another technique is the skip set jump. Probably you may not know set jump, but it's used for implementing exception. Exception is implemented with set jump and jump. And of course it needs to store some states of the VM, so it's very slow. So MGIT skips it. If the method does not seem to raise exception. So of course we need to check it may not raise exception, but currently YavMGIT does not check it properly, so I'll fix it later. And about inline language function, it's there for just decreasing comparison time. When I developed YavMGIT in the first, its comparison is very slow, like 500 milliseconds for each method, but if it take 500 milliseconds for one method comparison, very limited number of methods can be compiled in during the benchmark. So Optical score was very small in that state. So VM search method is the function that searches the method. So it's very complex things because Ruby is very complex about Ruby methods. So it's very slow to compile. So YavMGIT just implements also implements the optimization to skip that execution. And then very complicated thing is the base pointer. I can't explain this precisely, but as the JIT code is executed frequently, only JIT compiler compiles hotspots. So all of the methods compiled one is the hotspot. So all of the very small task would have impact. And we can calculate the base pointer is needed to restore the state before the JIT run. Please ask this later, sorry. And our last two ones are inline external functions using VMS sessions. Some of the functions in instances.dev is the not static run and it's linked as the external functions. So as the JIT header is compiled from VM.C, functions not in VM.C are not available for VM compiler, JIT compiler. So I added some, put some of the methods definitions to header by many of some efforts. But MGIT does this by dynamic bytecode specialization. As it replaces the instructions to some specialist ones, specialist ones has the definitions. So MGIT can inline those things. So it's faster. And the last one is the inline set of methods code. This is the using runtime information and Ruby VM Yav has the call cache in bytecode. Bytecode has its inline cache. Core cache is used to implement inline methods cache. Inline methods cache is the only use for the one place. Mm, sorry. It's very hard to explain, but it has the definition of the method. And the cache is created during the execution. So we can use the core cache to detect which type is used for Ruby method. So there are many types of Ruby methods like implemented in Ruby and implemented in C function or alias or attribute reader or things like that. If we detect those types, we can inline the setup of the methods code. And this optimization had a large impact. Then with those techniques, Yav MGIT achieved this very good performance compared to current trunk. So future works of Yav MGIT optimizations are both more techniques from MGIT because it's folk. And removing unnecessary functions in per combined header is not implemented yet because it depends on some other limited code. Also, I think I can inline core methods defined with C function because we can detect the type in core cache. And we, probably we can know which function can be used for the Ruby C function method code. And probably we can inline a method defined with bytecode by using the detecting it by core cache. Oh, so no, no, no, no, no. It needs to save some last compiled some gted code. If we store some gted C functions, previous C functions, we can use it to refer to the method. So I think method defined with bytecode can be inline. Then the main part of this talk is the how can we realize JIT for MRI? Because this is conference. Yeah. So many people have failed to introduce JIT. There were some talks about JIT compiler in RubyKaigi. So LRV is very similar or uses that same technique with Yav to LRVM. It's very old project and we have failed from such old age. Also, there was promising JIT compiler, tracing-based JIT compiler, LRVJIT, but also it's not merged to RubyKaigi for now. And also, probably you may know, LRVJIT uses LRVM and it has JIT compiler. I believe so and I studied, I read the previous code before I implement LRV, but actually it didn't have JIT compiler in 3.0, because they said JIT compiler is hard to maintain and it has many bugs. So it's very difficult thing to introduce JIT. But we should solve those problems. Why did they fail? Yav to LRVM was the how to improve performance with MRL time, because probably it's very old, so the VM are different. And LRVJIT was the memory consuming, so it's not merged, but MGIT solved this part well. Also, LRVJIT was the how to fix bugs in JIT. So let's see one by one those problems. Can we solve improved performance? So MGIT achieved the performance and I did improved performance. And also, many of you good people, EndoSan and NoaGives made very good benchmarks, so thank you. And so it's very good for improving performance. And memory consuming, MGIT solved this problem too. And how to fix bugs is very large problem. So MGIT does not use LRVM directory. It can be used by a crank, but it doesn't use LRVM directory. And so no need to keep up with breaking changes. And we can debug using C code using GDB. So very easy, very so easier to debug in C code. Because as VM code is written in C, we can debug is only seeing C codes. But MGIT needs many changes for interaction. And the very normal thing is that the rest you change, the more unlikely it will be have bugs. So I want to reduce the changes. The, I want to say in this talk, make initial risk as safe as possible. Because we are using Ruby in production. And also my application in my company is very serious to want to be, I want to make it to be very stable. So I don't want to introduce very breaking change to Ruby. So don't solve many difficult problems at the same time. So MGIT does replacing VM instructions all of the instructions and into this compiler, but it's not portable. So we need to do two different very difficult problems. And I want to make it optional at first, but we can't turn off replacing VM instructions because byte code is compiled beforehand. And unlike MGIT, it can't be turned off. So if the only MGIT is introduced, we can turn off the JIT compiler by just stop to pass the option. So even if we found the problem in JIT compiler, we can turn off. So it's very safe. And also I want to change it gradually. So big band release is very risky and it's hard to develop, it's also hard to develop because releasing takes may face conflicts, many conflicts, I want to develop in gradually. So this is the current status of those JIT projects. MGIT is not slower than Ruby 2.0, but it's actually slower than current Ruby trunk. And but YARV-MGIT does not modify VM, so it's not slower. And another thing is that YARV-MGIT does not modify VM instructions, so it passes all the tests in Ruby core without JIT, but MGIT replaces instructions and it fails with current tests all and Ruby spec. So I think YARV-MGIT is very safer. So we need your help. Please try and report bugs to those repositories. Reporting bugs is very helpful to develop and one of the Ruby committers called Wanabe-san is reporting bugs to YARV-MGIT is very helpful. So I write this, how to use YARV-MGIT, so read this, I will publish this. So conclusion, method JIT compiler for MRA can gain good performance by inlining and YARV-MGIT is a safe migration path to realize JIT, realization. I want to realize JIT compiler. To realize JIT, bug reports using manually are very helpful. That's all, thank you.