 You lucky people we have of course saved the best till last So if you could all sit down and open your ears for Charles nuts All right So we're gonna get right into this we'll be planned for a little bit longer than 25 minutes So we'll we'll rail through it. All right, so I'm Tom Ennebo. I'm Charles Nutter We've been working on Jay Ruby for with 15 years now those 15 years of peace. Yeah crazy For people who don't know Jay Ruby is just another implementation of the Ruby programming language and runtime we recommended But why did we make it we made it to exploit some features that see Ruby doesn't have like the ability to execute concurrently on native threads We can access Java libraries from a Ruby syntax and pretty much get all the good stuff that Java has to provide we run fast But we have a problem Startup time is not where we'd like it. It's slower than see Ruby Ruby is constantly going type commands on the command line all the time That's the whole development process is all command line driven. So and for as long as we've been doing this We've been trying to solve it Your keyboard sucks So here we're executing the simplest program we're going to evaluate the fix num one We by this period it load 415 Ruby classes and modules Over 300 of those come from Ruby source that we have to parse and then interpret If we actually go under the covers now, we're loading over 6,000 Java types and 5,000 of those are from Jay Ruby our internals method handles and gender instructions all of that Ruby's also very dynamic. So we can't look at a source file like this and go oh require normal Oh, we want to load a file called normal someone might have overridden that give you something else could be a different path We've tried speculatively trying to load these things in the background But you know, we don't fully know It's also a path based language So when we do require normal, we're gonna just do a lot of stats So basically it's like It's like if Maven libraries were all distributed so they go all have their own loose file path on the file system And then you have to search through all three thousand of those file paths to find the one file that you're looking for So that's how it works in Ruby. It's a problem There are efforts to to try and cache some of this information But it's still early for that but of course and we're doing startup almost everything We're loading at this point is only going to get called or executed once and in Java That's going to be stuck in the bytecode interpreter not many things are going to actually make it to see to and A good way of illustrating this is with the graph So this is e1 again with C Ruby takes almost no time when we run it. This is an older slide But it was about 10 times slower when this was made now as an experiment we wrapped invoking a Ruby runtime into a loop and Then timed it so at the 10th iteration We're starting to catch up to see Ruby and by the 50th. Well, we're pretty much done and we're beating it so if If we could get this performance right away, we wouldn't have to have this talk Again, this is kind of a duplicated slide doing the same thing gem list is an important command for us because Ruby gems is the packaging system for Ruby So it kind of shows that as we're doing a little bit more work See Ruby is going up, but that ratio doesn't change a whole lot and how slow we are compared to them So a little bit more background other implementations of course we're talking about C Ruby or MRI here the standard C implementation The peak performance is low. It's the lowest of the available Ruby run times But everything starts out hot so the parser starts out hot They have a very lightweight bark bytecode compiler and interpreter and so they get up and going fast They're loading all of those same Ruby files and that's defining that same number of Ruby classes But they're doing it about point one seconds. Oh, okay The other one that is is interesting right now actually I don't need this I have this The other one that's interesting right now is truffle Ruby, which is a Ruby implemented on top of truffle and grawl VM It's a very interesting project. They show very good peak performance But they also have this same sort of startup issue even more more so Just because of the way they're designed they do a lot more interpretation at boot time So they're they're solving it in some of the similar ways. We'll talk a little bit more about them later All right. Oh, and we have a few more minutes. Okay. Yeah, so we're gonna talk about some stuff We've done in the past some of this has stayed and some of his went Towards the beginning of the 2000s. We had a simple AST interpreter So we just parse that into a stream of lexical tokens and build an outcheck syntax tree Interpreter it bounce around What's that? Oh that should have been removed And startup was okay back then Ruby was a much simpler environment It wasn't loading as much stuff and we might have still been two or three times slower, but Two or three times slower when it takes like one point six seconds. It's not really a big deal The first optimization that's ever happened for startup that happened before either one of us started on this project They would go and save the the lexer Tokens to a file and then reload it back in Java one four days This was actually a pretty big optimization But by the time we supported Java five and up we had optimized the lexer and parser and Just it got down to like three percent and then we're like why are we supporting this weird civilization format? It's just not worth it a bit more time passed and then the first jit compiler came and This was about the same time that Ruby was also improving its performance So we were kind of having that performance arms race or staying ahead of it anyways But this doesn't have any effect on Short-lived processes because by the time we actually jit something the process is already done all right and this is to make it clear This is a jit from our internal instructions to JVM bytecode, which then the JVM would eventually jit So we're way way off of that tail of getting any sort of optimization Startup wise, but there was one benefit if you loaded a huge application Things started to warm up and you started to get the payoff and then the startup time of a really long process Got faster Well, if you can go and compile everything to Java bytecode why not compile everything This experiment didn't really pan out too well With the verification on it was like ten times slower. I was really really slow If we disabled it it still was slower and just parsing Ruby So well Charlie's gonna talk about that more later And our race for performance we created our own internal representation with an in virtual Instruction set we can do things like in line a method with a closure and in line that back to this call site But because we're doing this extra building. We actually made our startup time worse, which sucks But again if it was a really long process We continue to get a little bit faster in that case Well, we realized that the AST tree in memory was quite a bit smaller than our IR so we decided to get lazy and Until the first method was called we didn't actually build the IR itself It was mostly a memory optimization, but it did actually improve startup a little bit. Oh We're back to serialization again Google Google summer of code Project someone spiked this and we really were hoping it was going to be magic Turns out it's not We'll be talking about serialization more in this talk and the parser and the compiler end up getting so hot that Really doesn't make any difference compared to serialization. We still create all the same number of IR objects in the end so Our most effective optimization we've ever made for startup time is to disable everything We only use the our startup interpreter we disable C2 to this day This is still really difficult to beat C1 on just everything. It's always the fastest But now we have to worry about people passing dash dash Dev into their production environments Yeah, yeah sending us benchmark results that aren't what they expect it and like Just this client server all over again But you can see doing this gem list again that it's about 33 percent faster. So it was a pretty big win In the past we played with native compilers Excelsior just the one that I had some experience with and it It got better in dash dash Dev But again, it's one of these things were we didn't really want to go that extra step It's something else to support and then it was also accompanied may they rest in peace But we'll talk about native compilers later, too okay, so Well startup experience. I was doing pre-booting right you got a mic. Yeah. Oh, right Do you want to sound I want I want I want everyone to really hear me. Okay. Do you want to sound twice as good? Okay, okay, I'll use this one. That's fine Yeah, so now we're gonna go over some of the current experiments that we've been working on things that are still sort of active projects We'll just jump right into those here So pre-booting this is similar to what like Scala compiler or gradle build will spin up a background Process and then you throw more work at it But we we have some specific things for rails and people do use those But we've tried some general-purpose options the first one was nail gun which you start up a background Damon JVM And then you continually throw new operations at it. They run in their own class loader They're isolated as well as class loaders can isolate things But it really didn't work well with the way Rubyists write these applications They would spin up threads and expect them to go away when they're done They would allocate resources and maybe walk away because it's gonna be a short run process stuff like that So a lot of resource issues never really kind of panned out Drip is a little bit better. Not a lot of folks know about drip drip Basically, we'll just start the next JVM. So you run a command. There's nothing available It'll start the one you're targeting and a second one to get ready for the next command And so you can have it you can have a stack of these so you can have up to you know Five or ten or whatever and then as you throw more commands at him. You've already got a JVM up and going But it's also got to do a lot of TTY juggling to hook you up to that process correctly And these instances in the background will eventually pick up stale code And you'll have to say oh wipe out all of them and then I'm back to slow start up again So kind of more problematic than it was actually worth We are interested now in the the using the checkpoint and restore stuff on Linux Some folks at Red Hat are experimenting with this, but I think it's still kind of early days We haven't had a chance to play with it much yet. So that's kind of where we stand on pre-booting stuff So we revisited Serialization recently and one thing that we always wanted to do was lazily load the Instructions just like we were being lazy with the methods with IR build versus AST But there was a weird artifact in our implementation that we solved a week or two ago. So now that works So let's see how that's turning out I don't know why these are built out. Ah so here If we look at old serialization to new serialization, we got a good bump by being lazy It's fine and it's actually now went from being a little bit slower to a little bit faster This is only 20 gems. So this is doing virtually no work. It's kind of a worst case of being something useful But now if we go to 2,000 gems, which happens to be my personal work dev. Oh one more We can see that it still improves and it still holds true, but it's getting a little less interesting So is this worthwhile right at this point enough stuff starts to get at the JVM that we don't get as much of a gain off of it We suspect and so on this last one. It's it's going into Rails console This is doing multiple invocations of Ruby and you can see that serialization really isn't playing any role at all here so There's a different issue of startup there, but this really Makes us not know whether we want to continue this or not, but When we came to Europe, I noticed that there was a constant pool index that we weren't using in our format So I'm like, oh, let's add some constant pooling So I'm saving symbols to a pool. This prevents having to go and decode a bunch of bytes for The symbol name and it's encoding and it doesn't have to look up in our global symbol table So and basically we could just have a new box here Just got a little bit faster. So it's it's encouraging because there's a lot of other stuff that we could put in there I'm just gonna pop through these quick and You see the same approximate ratio. So I'm always one off on that so As you'll see later on there's other things that might be more exciting than this Yeah, there's there's more to do here, too. This is still doing a constant pool per scope a Constant pool for an entire file would make some sense. You try and share those symbols as much as possible Yeah, we can We're gonna keep chipping away and it might become worth it right because it works really well for short really short commands Okay, so returning back to the JVM bytecode compiler because of the How much it didn't help us in the original the older J version of J Ruby when we went to our 9.x series We thought we won't even bother with it We wrote a new dot class compiler that well all it did it didn't actually compile bytecode It just took the serialized IR format format Stuffed it into a bunch of constants in the constant pool and then when you boot the class up It just deserializes the IR and starts running it So it was a clever way to get a dot class without actually emitting any bytecode But we wanted to revisit actually emitting JVM bytecode doing the compile ahead of time So normally the bytecode compilers used as a jit We've used the same call threshold for years and it's mostly served as well if you method gets called 50 times or a block It gets called 50 times then we will turn it into JVM bytecode and eventually the JVM will continue to optimize it from there But it also could certainly support compiling an entire script One of the things we learned was that a lot of people love to benchmark stuff at the root of the main file that they're running So if we don't compile the entire file, you'll get a loop that won't run We don't have any on-stack replacement and stuff. It will never actually optimize So we always compile the target script completely Expanding that to the rest of the files that are loaded was not a big leap So goals for this obviously we J Ruby and JVM initialization about the same We're not going to be able to do much to reduce that cost But hopefully we can get rid of the reading of the file parsing it compiling the IR Optimizing the IR interpreting it we can launch ourselves straight into the bytecode execution Maybe reduce the number of JVM classes Probably reduce the amount of heap used because we don't have all of our IR stuff that we have to stand up And we mostly will get to the bytecode eventually anyway, so it's kind of wasted extra space But Unfortunately, it hasn't worked out as well We kind of expected that it would be the same thing as we had seen before So here's just some output from these in the normal jit mode with running J Ruby You'll see only the target script the main script actually compiles ahead of time Everything else at runtime. It'll eventually compile once it's been hit enough Down in the bottom just change the flag at a dash x plus c which forces every script to completely compile before it Executing then we can see lots of scripts come through if we combine this with the new aot mode Which is compile cache classes up at the top turn on some logging Then we get the whole list of the scripts that are being pre-compiled Dumped into a cache directory and then the next time they can be loaded directly from the class files So does it work? Well, it turns out that we're dealing with a tremendous amount of class files here as you would expect Just generating a piece of a rails app like generating a blog post the blog typical blog post thing that people do Produces over 1200 class files. It's loading at least 1200 12 to 1400 Ruby sources Those all get dumped into giant class files 80 megabytes of classes as a result of this one rails command And then almost none of this jits so we're loading all of this bytecode into the system running through it once and then That's it. We don't we don't run it ever again for a short command So the first thing that I started doing to try and explore this was to trace the actual bytecodes that are being executed I don't know if anybody's played with this. This is a lot of fun to look at If you run on the debug build with the trace bytecodes option You can see all of the bytecodes as they're executed by the JVM's bytecode interpreter If you want to see them after they if you want to force it to not jit You'll see all of whatever would run but once they jit normally they will not show up in this trace anymore So it's a good way to see what cold bytecode is executing how much a cold bytecode is executing I'm using clauses byte stacks tool here Which you will take that trace bytecode output and it will turn it into a flame graph of your applications You can see where most of the cold bytecode is executing Yes, so we're looking for cold execution mostly here So here is cold bytecodes for dash e1. This is split up in JVM initialization, which is the the Java part of modules that that first runs to get things going The base jruby, which is defining all of our core classes mostly Loading up jruby libraries, which is Ruby gems and any additional plugins and whatnot You've got and then other is just additional jruby stuff like setting up our native access layer and things Not really bytecode sensitive. It's mostly callouts to native code And that's just e1 just the hello world And you can see that with the cache bytecodes. We're actually running more cold at this point Rather than getting the gain that we hoped where we would get into hot code faster hot for us is actually cold for the JVM So it doesn't give us much of a gain similarly on gem list You can see that the ruby libraries portion and the actual command being executed gem list now are they have more cold code They actually slow things down generally So went back and looked at this And realized that we were still using a lot of invoke dynamic in a whole bunch of bytecode that only ran once That really is ends up being a waste of time because all of the call the bytecode to bootstrap these these Call sites all the lambda forms that are inside method handles They would all execute through one time and then never be run again. So they'd never jit We really saw this if I turn dash x int on it would use a tremendous amount more bytecode Because of all those lambda forms that just keep churning and churning and churning and never turn into native code So did a modification of this for aot mode That basically uses no invoke dynamic at all it's a great feature for us We love it for peak performance for cold execution. It is really not not that great There there's a lot of issues trying to get stuff to run at cold I know there's work to try and get lambda forms and Method handles and whatnot to compile in with J link and stuff We haven't started playing with any of the work done being done there so this These the demos I've got here the results I have here still are using invoke dynamic for constant lookup But almost everything else is just using the equivalent bytecode with a little bit less dynamic a little bit less optimizable But less less cold bytecode to run So we did get a good reduction here This is just a very slight reduction from the original cache version more interestingly In the case of running a larger command now We actually start to see that we are running fewer cold bytecodes and we do get a little bit of a boost from Precaching that so we can shrink the size of that bytecode maybe emit less efficient simpler bytecode for For the class bodies for the script body But then use invoke dynamic for the methods and blocks that are called a lot I think we can get a kind of a happy medium between those two Okay, so all of these things combined we we've been playing with interpreter modes and jit and aot And then there's all these JVM flags that are coming up and other options So we wanted to try and assemble a bunch of these together Just to see what the best startup we could get with all of our current tricks So here this is gem list with just 20 gems so a fairly small ruby command running Here is our original dash dash dev result Here is throwing app cds at it. So this is helping us all of those JVM classes. We're loading up Manages the trim some time off of there This is with the lazy serialization So that's the serialized IR now. We're we're kind of getting a similar gain, but lost a little bit of time 1.71 here is using the bytecode cache So again the aot to bytecode is just not really a win for startup in any way that we found so far And then even app cds with our pre-cached code. It's still just too much cold bytecode executing So it's not going to get us the startup that we're looking for but we can do other things with it later Here on a larger example gem list with 2,000 gems and it has to read through all of them Here's our dev app cds gets us there With these lazy serialization is slightly better than that And then we start to see that the other the cache bytecode does have a larger effect with a long running app Again once we actually give it enough time to jit then things start to improve here And this one we also threw open j9 at it since it has a feature Similar to app cds, but that can also save some jit code We got some gains here, but again It's it's saving code that's jitted and most of our bytecode is cold So we need to basically like we're talking to some of the open j9 folks We're going to tell it to try and pre compile those script bodies as well Then maybe we can launch into almost all native code for all of our scripts. That'll be the next experiment to try And what's this that the last one is rails console again the rails console most of the rails commands are kind of a worse case because Not only do they run very short, but usually it launches jruby twice It launches it wants to determine a set of dependency paths and then relaunches with just those paths to isolate it from other libraries and whatnot So here dash dash dev A little bit of a gain from app cds app cds here More or so with the serialization Which is weird This is the class cache again slower than normal Class cache with app cds now looking that's the that's the best But oddly enough in this case If we only cached the classes that were used for that parent process the launch process that went and dug out all those dependencies That ends up being the fastest of all now So there is enough in that that top process that warms up and gets going That we we trim off quite a bit of time there kind of makes me want a profile Which which things are actually yeah, that'll be the next thing it's and then only classcast those It's very difficult for us to profile. What's hot in the ruby code because We see either the methods we've jitted or jruby interpreter And it doesn't really tell us like what of ruby is executing in the interpreter Okay, so on to some futures, um, we'll be able to wrap up pretty quick here So now of course native compilation is cool again I feel like maybe we need to get gcj out of mothballs and and everyone will be thrilled about it So we are experimenting with the the the interesting Uses of the native compilation in grawl vm at this point This is early days. This is still a future work, but we've got some proof of concepts here so First of all talking about I mentioned truffle ruby before truffle ruby does usually run out of a native image So they've compiled all of truffle all of truffle ruby's implementation logic down And some of their some of their internal class logic So they they do actually pretty well on getting this base startup Here's us with our our best flag dash dash dev There is truffle ruby on the jvm not with the native compilation And here it is with native compile But there's there's more to this than meets the eye They're also actually pre loading all of ruby gems and saving that in the native image So it's already booted. They're not actually loading all of the code that we load So it's not quite apples to apples here They're essentially doing like a criu like a restore of where they were at zero So that they can launch right to zero if we make them do a little bit more work So here we have two different ones gem version We should be very lightweight But forces all of ruby gems to load and then of course a gem list of a large number of gems Here's j ruby versus truffle ruby jvm and native They're two to three times slower than us usually for just a basic gem command And then it continues to get worse the more work they have to do They're still essentially running cold because all of that ruby code goes through the same sort of process it does with j ruby So what we want to do We've already been able to compile all of j ruby itself too native and get it to run and boot up But there are a lot of limitations have to turn off invoke dynamic. We can't do any runtime jit We can't dynamically load any classes or libraries So the ultimate goal then is to compile j ruby and all of those ahead of time compiled ruby scripts down to native And so this would be the first real fully functional Native compile for a ruby application that leaves nothing but nothing but native code behind So that's that's coming up what we have right now is compiling j ruby down to a native executable So here is our c ruby startup for the basic thing the basic dash e1 j ruby on jdk8 Here is after we've let it jit And there's where we get if we compile j ruby down So this is at least getting us to zero as fast as truffle ruby Without being truffle and without doing all of the other tricks But we want to be able to do this and have all the ruby code loaded up. So that's that's the next thing So futures for this, uh, like I said, we needed the bytecode aot to be working and now it is so we'll be Expanding that native compile proof of concept to do the entire application Probably starting with just smaller services and command line tools at first Also have some ideas for static optimization that We can get some profile information from the ruby code and for monomorphic call sites Just compile into our ahead of time bytecode A guess that it's probably the same method. It was last time you ran it and then ideally the The native native image compile could see through some of that and probably get a lot of the inlining that we want out of it Most calls are monomorphic and ruby applications anyway All right, and the last item so in working with the ir serialization because we hadn't touched that in quite a Long time we started thinking about ways that we could actually speed things up and like One one big problem. We have is our ruby parsers like That's an la lr grammar with like 170,000 like States so it's it's it takes a long time to warm up But what if we could do something that was a bit simpler something that could fit into a single method? So in looking at in looking at Ruby typical ruby files in libraries There's usually a couple of requires Which are it's just a function call and There's some modules and classes to find and they're reasonably simple so what if we limit a ruby interpreter to only those things and if If if the ruby has more complicated things like it has if conditionals inside of a module body Then we'll just use the the current serialization But if it doesn't well use this new interpreter that is basically just going to be a single static method That's only going to have a limited number of instructions and We'll we'll we'll go with that I realized that all the stuff that i'm talking about here are things that will only execute once So we have no need to have ir at all. So that that was something I realized after that slide. Yeah So this is a typical structure of a ruby source And on the right it's just a little napkin thing It'll probably just be five or seven Case switches with the inline bodies for just standing up a new module in a new class It also has another benefit which is that When we normally interpret stuff into ir we go through this prologue and epilogue of Executing the script body and then executing the class and then so forth But in this case, uh, it's all going to happen in the same context All right, and we at least see that there are no other calls no unrecognized calls like We got require we've got some visibility changes But we can basically store the structure and just run through it quickly Saving the entire ruby context that we would normally need to execute in so So wrapping up here Precompiling to bytecode works, but it generally is hurting startup right now Clearly a prerequisite for doing a native compile. So that'll be the next step to try that out Uh, the class sharing features are looking very good these days Running app cds on jdk 13 Is probably the fastest startup for j ruby at the moment We're going to continue playing with mixing these different options together, but cds looks good The share classes quick start stuff on open j9 is also fairly competitive there We're looking to try out say ready now I think it was the alibaba folks are working on saving jit stuff in the background as well on open jdk So we're interested in that possibility as well Lazier and lighter ir like I mean really the the best things we can do right now is just do less work at boot And try to get you to a running application quicker It's really just that first response if the whole application the whole run takes 20 seconds Just getting you to the first output of it makes people feel so much better It's that sitting there staring at a blank command line with just doing nothing while we boot up That's what really bothers people some people have suggested we should have a splash screen every time we run a Run a command so that they know we're going we are going trust me We still occasionally get that with java 9 plus with module warnings. Yeah, exactly. We have progress bar. Maybe like starting up Yeah, but I mean if we can do less and get people at least some output right away, that would be better And the native compile stuff is really cool. That shows a lot of promise But it's really super limited right now. It's not java and in the way we know it I think we can get small ruby applications to compile completely down to a native binary Hopefully that binary is smaller than one gigabyte once we get there But we're going to continue to play with it and see how much we can actually squeeze out of all of these different options And that's what we got. Thank you who Told you it'd be good How long does it take to compile j ruby native? To compile j ruby native, um, it wasn't terribly long I mean in the in the order of a few minutes, I guess it was our code base isn't that large And I also stripped I I had to strip out like our bytecode compiler anything that would reference stuff that That the native image does not like I basically just removed it. So it cut j ruby down to essentially Parser compiler interpreter and some of our core classes. So it's you know, five minutes. Maybe yeah, it wasn't too bad It's not something that we would say Oh, this is part of your dev cycle now now that you've updated your libraries do a native compile Now be more like we would pre-compile j ruby plus some key libraries And then compile our interpreter in so that you'd continue to use the interpreter But you'd get the fast startup of basic things So a partial solution there or if you're going to production compile your home microservice down or something Hello, so my question is A few a few years ago We we talked about this on twitter that you still At the time j ruby still defaulted to have compiled our time for dynamic set to fall So by default without dash dash dev j ruby would avoid avoid invoke dynamic Do you think some of these could be used as an enabler to also Have that on as default and not have as big as a penalty as in the past Right. So so trying to get closer to having invoke dynamic on all the time We've kind of we've moved that bar a little bit over time So things that are Literal values will now they're like They're sort of like a constant dynamic sort of thing They they boot up and then they cash a constant using invoke dynamic That's always on and that's what I had to kind of re remove from the compiler The real problem with us using invoke dynamic all the time Is that we get those long chains of totally dynamically constructed lambda forms There's not it's not something in a constant pool. It's not something we can represent as a as an expression there easily So those would always still be slow It's possible that with constant dynamic or something like that We might be able to say here is the shape of a ruby Method call chain cash these and then stick this direct method handle on the end and save some of that effort But we we do so much of this programmatic building of large method handle chains. That's where most of the problem is There are two other things One is that charlie just recently changed our our bytecode generator so that you can decouple whether Indy's on for everything Or or or not so sliding scale more now so we can adjust that so we might be able to actually emit less Indian places where it probably doesn't matter But then the second thing we've been trying to add a timing metric In to change our jit heuristics so that we're actually jitting less code And that should enable us to If we if we truly make it to a place where we can know that something's hot We can just use tons of invoke dynamic in that but if it doesn't hit that threshold then maybe we do An indie free compilation right well We we've got we're kind of building this tiered vm on top of another tiered vm We we even talked about like oh we could use indie but maybe use really simple call sites that just basically do You know a virtual dispatch to some function object That would be very quick to bootstrap and then you know add a counter into the method handle chain And then once it gets really hot we go back and we make it an optimized invoke dynamic I mean there's lots of lots of different things we can try with this but There's only so many hours in the day Um When you tried all the app cds things and so on did you also try the open traded k aot thing? Uh j aotc. Yeah, uh, we have played with it And it it was it was similar to what we get what we saw out of app cds and other stuff It basically got our zero flag startup pretty close to the dev the c1 performance, which is about what we expect Um, we hope for a little bit more because we wanted those bodies of code to actually also compile It just turns out to be such a tremendous amount of code that we end up loading into the system at that point I think we lose it there It creates a gigantic executable for all the stuff that we we actually would need to run at startup So we need to just figure out how to make that startup stuff as lightweight as possible And only do the real hard work for methods and blocks There's a question for me clearly, you know In some detail Which methods are going to be hot? if you could drive The hot spot jit compiler from the script say to say Compile these methods now Inline these methods into them. Would you be able to get a better performance that way? Uh, well, I would I would say almost none of this affects our peak performance In general, we're already getting good inlining through invoke dynamic call sites It just takes a long time what it would do and what the pre-booting bytecode and what that would do Was we might be able to hint and get warm up curve a lot lower Um, because we're we're confusing the hell out of the jvm We run this code for a long time in the interpreter and then we're like, oh, no It's not the interpreter anymore and now it's going to be this version of the bytecode Oh, and now we're going to re-optimize that bytecode and change it again But if we could give some hints to the vm, it would it would certainly help us Shorten that curve because there is a sort of mechanism that will do that in the shape of the ci replay data that Engineers used to debunk the jit. Yeah It's a serialized form that tells you exactly which methods to jit and exactly where to inline everything And exactly what all the branch probabilities are. Yeah, you can feed it with all that if you want Right and and we actually we have a profiling inliner for the ir already So it's possible we'd be able to take that our interpreter Profile data feed it into a tool like that and then say hey, we already know this stuff Don't start from zero again. Yeah, both our parser and our ir interpreter are just java if we could just Force that to c2 immediately. Yeah, right You can use a white box api Okay to put stuff To take the time Yeah, it works. Yeah, we'll give it a shot then Anything else if I want to get dinner on top of that Uh compile queue is not a 50 as it it's it there's an algorithm behind how methods are explained So in your case since you care so much about you know the warm-up curve It really worth it. It's worth using visiting how Which in which order methods get compiled and this might impact your your warm-up curve The way to effectively is like white box api You know the replay file can help you there sure by tweaking some indexes indices Things like that. Sure. All right. Thank you. We'll definitely do that Thank you. Thank you very much. That was fascinating. I think we should wrap up at that point. All right. Thank you