 Okay, today I'm here to talk about a project that one of my co-workers and I started a few months ago. It's had a couple of weeks of work, so bear with me as we go through some of the details. So, basically, I'm the project lead on, my name's Charlie Gracie. I'm an employee at IBM, but I'm also the project lead, our co-lead for the Eclipse OMR project. So, I'll give you a bit more details about that as we get going on the talk, but sorry, I work for a large company. I have to put this slide up saying you can't trust me or believe anything I say. Okay, so this talk, I'm going to quickly introduce the Eclipse OMR project, and then I'm going to go into a quick bit of details on one of the components of the project called GIP Builder. And then I'm going to talk about some of the experiments we've been doing using GIP Builder in Lua 5.3 VM to actually add a GIP to the Lua VM. And then some future work and open it up for some questions. So, the mission for the OMR project is to basically create a reusable set of runtime tool kit that any runtime, any existing runtime or any new runtime could use to actually get different components for their runtime. So, if you have a runtime that you've spent a lot of time working on, but you didn't have anybody with deep compiler knowledge, well, you can go pick up the OMR project and plug it into your runtime and hopefully get a GIP. GIP Builder that I'll talk about later is making that easier than actually having to understand the 800,000 lines of GIP code that are there. So, that's that. And we want to accelerate this for a lot of the current runtimes or that's our hope. So, and lots of other examples, if you look at some of the other communities, they'll end up doing another VM or something and there's the Lua JIT VM as well. And so, we're kind of hoping with the Eclipse project that you can use these and keep your community the same way as it is. You can just plug it into your runtime and hopefully there's no really big differences or problems for using that runtime still, but you get the benefits of a scalable GC if you want it or some RAS tooling or a JIT. So, the OMR project itself has no language semantics. As far as I'm the GC architect, as far as I'm concerned, a GC is a GC. You have objects as long as you can tell me what shape they are. I can go collect your objects and watch the memory for you. And we believe that most of the components in all the VMs are kind of similar. So, originally the first drop of the code was last March. It's on GitHub. It's dual licensed under Apache and Eclipse. We're always looking for contributors. So, right now this is sort of a quick overview of the components that happen to be there. So, if you have a runtime that wanted something, you might be able to consume the one from OMR. The first two are sort of for platform support. So, support and thread library. If you used it, it works on Windows, ZOS, Linux, OS X. So, you just sort of get the threading and porting of those. GC, a JIT, and then some more sort of diagnostic stuff that you would see from runtimes like Java that you could hook up and monitor at runtime how your application is going, the object allocation and all of those things. So, now I'm going to move into JIT builder kind of quickly. I apologize. There's going to be some code start showing up on the screen. Hopefully it doesn't put you guys to sleep. So, what is JIT builder? It is an interface to our compiler technology to hopefully make it easier for people to get up and going with a JIT in their runtime. We don't believe that using this sort of front end API into the JIT would ever give you our peak performance as you can get by actually going in and generating IOL yourself, intermediate language yourself and all of these things. If you really dug in to the deep internals of the compiler itself, but we believe you could get a significant performance improvement and it's very straightforward and easy to use using this API. So, the code was only contributed on September for this. So, it's still kind of work in progress. We're still ironing out the API, changing it, modifying it and adding new features as we plug it into different runtimes. So, actually the last month we've been working on the Lua JIT probably has only been about a week of work of actually putting it in the JIT into Lua. We've been sidetracked on adding new features and fixing things in the, or fixing the JIT builder stuff where fixing is changing how things work so it could be more useful to other languages. So, it's very straightforward if you were using it. There's a simple initialize and a shutdown API. Put those in your initialization for your runtime and your shutdown and you're ready to go. But then you actually have to start using it to compile your methods or functions or whatever your runtime has. So, that's the bit of the complicated part. So, to do this we've created a method builder. If anyone is familiar with LLVM stuff, some of the terminology is pretty similar I've noticed recently as you're doing the same types of things so it kind of makes sense I guess. So, this method builder is basically what you would use to define what a method is in your runtime. So, any parameters that it would have, what the type, like what's the return you want back and gives you, and all of the stuff being generated is actually done using system linkage. So, basically you're, it's like making a C call when you want to call one of these JIT functions. There's more tricky things when a JIT's calling the JIT later on but for most part it's like a C call. So, sorry. The two main parts of the method builder are the constructor and then there's this build IL method and I'm going to go into an example to sort of describe those quickly before I move on to the Lua versions of it. So, what can you do with JIT builder? So, it basically has these types of operations. So, you have all your primitive data types. Of course you want to be able to do all of your arithmetic, add, subtraction, divide, all of those types of things exist. Then there's conditionals and different systems for doing loopings and those types of things. And then there's a generic call that you can call any other arbitrary C function. Just tell it the function name, give it an address of where it will be and it passes the right parameters. So, most of the operations that used to say operations are typeless, now they're mostly typeless. A few of them actually need to know the type. Once you've been working with JIT builder, if you have an value that you're passing around doing stuff with, it has a type. So, most of the time you don't actually have to say, I don't want to do an add 32. If you're adding two things together, you're saying add on them. If they're both ins it's going to do the proper one. If one's larger int it'll do the right extension or tell you you need to convert it yourself. But there are a few things like load and store at, which are for indexing things out of array type data and your memory. So, a simple method builder example. So, right away one of the first things you need to do is you need to give a name because these are basically C functions. They can be called by basically by name and we're setting up a proper C stack for everything. So, you name them and that also allows the JIT builder to be able to call from one to the other. Parameters. So, this one is simply going to do an increment. So, it takes an int. Return type is also an int 32. Type dictionary is a list of types. So, you can actually define your own type. So, when we get talking about the Lua one later, if anyone's familiar with the Lua VM, there's the Lua state, there's call info, all of these things. We actually go and have a mapping of defined all of those structures so that we can actually, if we get the main parameter when we get to the Lua one, we'll actually be the Lua state. That's what's passed into the function. So, to be able to get everything off of it, we can do a bunch of like load and direct instructions or things and we just pass it the field name that we want out of the particular one. So, these type dictionary allows you to use the generic int boolean or double, it's probably not a bool. But then also define your own complicated ones. The Lua ones have actually driven a lot of change in because the structures there are full of unions. It's a union of all these different types. So, we didn't actually support that before. So, that's been most of our work in making sure we can alias the types properly on the JIT side. So, and then the other part, that was the construction. The body of the function, the build IL would just be add this value that's passed in as a parameter to a constant one and return the value. So, in the end, that would actually just generate some very simple C code for taking what was passed in in a register on x86, adding one to it and putting it in the proper register to return. That was the value that we were talking about. So, one other thing about the builder stuff is actually builds do control flow, so different code paths. So, if you were looking at a method in the interpreter for Lua, something like add, if your type isn't in, you do this type of add, if it's a float or if it's just a number, you do a different type. So, we'd be able to handle these control flows. We have basically this stuff set up so that you could actually do different builders. So, this one is you create a couple of builders so that you could do the paths with them. Then you would do if then else. So, in this case, the then and else path are the builders that would be executed depending on the condition. So, the condition here is less than, if a is less than b, then you would do the if path, the then path, otherwise you would go down the else path. And this is just going through quickly talking about what the value would be in the end, either one or zero. So, again, type dictionary to cover this quickly. So, when you want to define a struct, now you can define unions. Basically, what you would do is you give it a name for the struct and then everywhere inside, you would actually just give it that same name and then you just give it the name of the field. So, usually you make these match up to the C struct or the C++ class. We're looking at trying to find some way to actually have this automatically do it for you if you could give it the header file that is contained in and say generate me one for this struct but that's just something we're looking into for the future. And one last thing about this is most of the JIT builder stuff is very generic. So, you could use it for anything. You could go write your own little program to do whatever you want but to actually make it easier to do interpreters which is kind of the focus of what we're looking at. We've gone a step further and kind of created this bytecode builder or opcode builder depending on your runtime which allows you to do, it handles a lot more of the flow for you because if you look at some bytecodes like jumps and go-tos they actually can sometime fall through or go somewhere else so to actually keep the IL that we're generating a lot cleaner and make it a lot easier for the JIT to optimize we've gone and added this so that we can give hints at least how the control flow is going to happen. There's a great talk on this that I have the link to here one of my coworkers. So, Lua Vermella my co-worker created it with the name I don't actually know where he got it from that's what's on there so we started working on this in about January so it's there, it's on GitHub it's under his right now we're not sure if we would ever move it anywhere else if anybody cares so that's the link for it and we're basically the main goal is to actually just integrate it into the Lua VM with as minimal changes as possible at this point we're at only about 30 lines of code I believe it'll be on the slide later of changes to actually be able to use the JIT so the JIT builder design that we decided to do for Lua was we're going to do all of the compilation synchronously we could have another thread or something to go do some asynchronous but which is probably a future but for now we just do it synchronously after so many executions of the method I don't remember the number off hand but I believe it's 10 or something right now after you've executed something 10 times you will go and JIT it and then from then on you'll execute the JIT it version so the major change to the interpreter was in do call in the interpreter so basically if you get to the point where you're going to call a function once you get the prototype if it has a compile code that directly or you just let the interpreter continue going we keep the Lua state and call info up to date everywhere this allows us to fall back to the interpreter at any point we want it causes a bit of a loss here and there because we're updating things needlessly at times but for now right now the whole ideas will just keep it completely up to date and at any point we make a decision or we see something that goes against our decisions we made in the past we just fall back to the interpreter and that's we let the interpreter handle all the complicated cases right now so the Lua function builder method will look pretty similar to the basic one I showed before but really the main thing we take in for a parameter is a prototype so in the Lua VM anytime you have a function that you're going to call there is a prototype this proto struct and we use that to actually go and generate all of our code because that has the pointer to your array of opcodes and all of those things so build iL for it the first thing I do is this code is not complete I would not fit on one page be multiple pages so I just have some quick examples of what the code looks like so create the bytecode builder so that we could go and generate all of the code the first thing we need to do so this kind of looks very similar to the top of the bytecode loop and the interpreter so you go fetch the call info off of the Lua state and you fetch base which is the base for where all the registers are that are going to be used and then you fetch the next loop over all of the instructions for the code and then you switch on the code and handle the appropriate one so this looks a lot like the interpreter itself so if we really wanted to be crazy we could actually go and macro up the interpreter itself so that the code could actually be shared but that really complicates the code on the interpreter side for no reason having this loop over here as well doesn't hurt anything so we've opted to do that so that was do move on the screen so I'm going to go so this is basically the code from the interpreter for do move it copies the value from register RB into RA very straightforward so the jib builder version of do move so the first thing I'm going to do is get RA so I'm going to go load that guy and then I'm going to get B so RA we know that basically every opcode uses the RA register so we load that all the time in the main loop it's always up to date but here I'm going to go we have another convenience helper just for going and fetching the RB for us so call that and then it's basically the same thing we have a helper for set object which is down below here where we just fetch value and type out of the source object and store them into the destination object so that would be generating complete IL for that function but to start off and actually the very first thing we did is we actually didn't implement IL for any of the methods to start out we actually just went and basically took the interpreter and created little C functions for every piece of code that was in there and that actually kept us very honest and making sure we do keep the state and everything up to date all the time because we were actually just relying on the helpers for everything but this is an example of when we haven't gone back and actually generated the IL for it in particular so the AND one so all basically we have is a function that we call for this one so you have a bunch of the code happening in line and then you can see a call out to some C function and then continue exiting in the JIT it's not falling back to the interpreter this is a helper that we've created so the VM band and if we make a call because we have base stored in a local for us we always make our helpers return base in case the stack happened to grow or something again to pessimistic at this point we could be more clever but we just always do it for now because we don't have the expectation is we won't be called these helpers very often because we'll finish implementing the code for the rest of them the CI and stuff again but the code is basically identical and this is just for us to be able to do helpers quickly to make progress and be able to show some performance improvements later so this one is basically the meat of the entire change to the interpreter so basically this is exactly what we changed other than adding a few fields to the prototype structure itself and setting those up this is basically the code so once you're decided you're actually going to call a lua function we check to see if it's already been jitted or blacklisted to say we can't get it one thing that I don't have on the slide is we've actually created a helper our own little extension for lua to be able to actually completely direct if you want to go have a method a function compiled you can just say as long as you import the right library you can just go say compile this method that meant we then created this black and white list as well so that you could never compile something for our testing so that's the quick check if it's not blacklisted and the counter is at zero and you've never compiled it go compile and then blacklisted so it won't be compiled again then if it is compiled go just execute that compile code and if not decrement the counter the good part is of the way that lua VM actually works here is you would have just set up the state for the call so when we come back we actually don't have to do anything fancy because we'll have set up for the next call the interpreter loop will just jump back to the beginning load all of the right things and keep going or load the stuff for this current one so performance we added a jit you would expect some performance if you've been familiar with the lua jit this performance numbers are not going to look anywhere as good yet but very quickly for some things like phib and mandl brought very quickly like a 2 to 2. something x performance improvement so basically we did very minimal changes of the interpreter and very quickly you can see over 2x performance for a lot of these little simple workloads add test is something I created myself it's a loop just basically adding a bunch of their values together all from they would end up being registers I really want to test that through because it's mostly math and that we end up generating here and there's no calls or anything that one right away is 5x faster and factorial everyone knows what factorial is so the current state is there are a few opcodes we just haven't even created the helper for because we've never ran into any code that actually generates those so if you did have code that did it I think it's like load xk or something some other one anyway that I've never seen it's a constant when you have more than 32,000 constants or something we haven't even bothered so if you did you would fail to compile and we wouldn't run JIT it for that one but we'll do it eventually so in total we're at less than I guess 50 lines of code including the make file changes and everything and we wrote less than 2,000 lines of code to actually be able to do this on the JIT side so our JIT flew a function builder where a ton of that is actually the loop the switch statement like that actually consumes a lot of the lines so it actually isn't very much code so some future work that we're going to plan and actually have a lot of it started right now is we want to get rid of 100% of our VM helpers for all the common paths we believe we'll still end up with some again because we want to let the JIT the interpreter handle the hard cases if you're trying to add a string to something else I'm not going to probably do it to begin with because I don't want to convert the string to an int or whatever to do the math fall back to the interpreter for that one whenever possible one of the things we've noticed in our JIT right now with this I was expecting to see some way better perf especially some of the math only like Mandelbrot and those things I was expecting way more than 2x the problem is every time we go to do a math operation I have to check is our A an int or a double is our B an int sorry it's B and C are they ints or doubles or what are they doing so it creates all of these diamonds in the code so the JIT has a hard job actually optimizing down through but very recently like when I was on the plane on the way over I was doing some work by actually keeping track and the interpreter of what the types were when they were called and if I do that then I can actually assume based on the instructions that are taking place like if you do an add of two ints it's always an int so I can always know the type what the result would be if I start doing things like that I can see some very significant improvements like Mandelbrot's like 10 to 15x faster which is where in the ballpark I really wanted to be in a bunch of these things so that work is sort of in my laptop right now and not in any state to show anybody so I don't want to talk about those numbers but that's something that we want to do and keep going with that to actually be able to do it and the good part is is if we do it based on the input parameters at any point if they actually were not the right type we just fall back to the interpreter and keep going you would have a big perf loss you called them in opposite order all the time like if you pass it an int now then float then int but we also quickly hacked in that if you ever fall back more than n times from the interpreter we could actually go back and rejit without the other information so that then you could handle the you would do the checks in the jit but you would do more and still get the 2 to 3x perf so to quickly wrap up just to cover again our mission for OMR is to kind of make this toolkit available for these other runtimes we really don't want to create a new community we don't want to do anything to harm any communities we actually are just trying to make these available so that they can work with the communities that are already there a lot of these languages have very large communities so we're hoping that we can work with them and not saying our text the best actually the more people to use it we may actually get some improvements ourselves by having some of the other smart people working on these things provide stuff back to us check out our Lua VM from the link if you want give it a compile use it it should be fairly stable we test a lot before we commit anything and again that's just my quick contact information and the link again here to the Lua VM that's it any questions I don't have a questions start I'll go left to right back here so on disk footprint right now we've increased I want to say it's a couple of megabytes that we've increased I can get the actual number it's the original Lua I believe is 150 yes and we're like yes do you do you depend on specific flags in the Lua config file for instance if you change the number type of the it still works so we actually when we create the structures that we do all of those things based on we actually are doing it based on whatever the size of Lua integer and all of those things for the while we've tested it all seems to be working that way so you can say I want no questions between strings yep exactly and then we would never have to jump back exactly we're starting to put some of those checks in as well as we keep going but for now we just bail on everything we don't recognize here the other you can get my calls and if not this example or I'm sorry questions for the sorry so the first question was are we planning on using this for any other languages and the second question was do we believe we can get anywhere close to the Lua Jits performance I believe you said Mike Paul's the guy's name so the first one yes so we've I've got another small language that I implemented this for that is not really using it where it's a small talk derivative again that was just our first prototype of it we are doing this for Ruby MRI as well not using Jibbiller but the JIT itself directly we've had people present at the last two Ruby Kagees for that that's making slow progress Ruby is actually very difficult because all of their for MRI all of the class libraries are written in C so it's very hard for us to see through it actually write a bunch of the code in Ruby to make it easier but it's used in our IBM's Java VM this is actually 100% consumed back into our builds every day for Java and we're looking at a few other things at this point but nothing concrete enough to bother talking about and the performance I don't know I have to do more comparisons I saw quickly this chart on the perf for a bunch of things lots of things were in the 10 to 40x that's completely doable depending on what we can do here but maybe I miss on maybe he's got 300x on some things I didn't see of this yeah so for a lot of these things I'm 2 to 5x faster than the interpreter by itself but I haven't done any direct comparisons to the Lua JIT itself which interpreter Lua interpreter