 All right, so I was told I could start anytime I'm ready and this being a talk on startup. Let's get going so JVM startup Why it matters to the new world order Again, I work for a company with lawyers. And so don't trust the things that I say Who am I I'm Dan Hidinga. I work for IBM. I worked for IBM for about 10 years now All that time doing virtual machine development. I've been involved in a bunch of different JSRs And the big thing for me right now is that I'm pleased to say I'm one of the project leads for Eclipse open J9 So open J9 was created September of 2017, but of course the code goes back much further than that Being IBM's production JVM. So if you're interested, you know, while we're talking, I know the Wi-Fi is not great here But check out the website Check out the the github and take a look at the code One of the nice things about this particular set of code is the license for it. It's EPL v2 with the secondary licenses clause Which means that it's compatible with open JDK And that Apache 2 as well Being an Eclipse project. We're really open and interested in anybody who wants to join in and contribute So now I'm going to jump into talking a little bit about the old world order And then we'll talk a little bit about the new world order and some of the changes that we've seen And how that relates to JVM startup So in the old world, you typically wrote your code Tested it checked it into your version control and then it sat there for a while Eventually you deploy it maybe to a nice server farm that looked like this Maybe it looked more like that But those deployments they didn't happen very often Deployments used to be kind of a scary thing for a lot of people and so they were commonly only done maybe every six months Maybe a couple of times a year And they were scary because they were done infrequently So there was a lot of code changes in there a lot of development had been done between Deployments and so people were scared of them which meant they did them infrequently which meant that they were scary to do But it also meant that startup was a very very small fraction of your actual run time Right if your application runs for six months before you start it again Who cares how long startup takes? Right the goal at that time was often peak throughput So even if startup took a little bit took a little bit to get your application warmed up Who cares it was such a small fraction of that total lifetime of your application But you know the world has changed And it's changed because now we're starting to look at more Metrics than just the peak throughput right almost everybody is doing continuous integration today in some form You know you check out your code And then your poor you create a pull request And your code gets tested before it gets merged back in and that build is going to run a bunch of your testing And often that means you're getting a lot of JVM startups Inside your CI pipeline right you're starting the JVM to run your testing Maybe you can batch your tests together Maybe you can't depending on what they're doing in the VM and so suddenly JVM startup and Application startup starts to become a measurable part of your build process. It starts to affect how quickly You can turn the crank and innovate and then once you've done the magic thing of getting CI running You know you want to get into continuous deployment, right? You've checked out you've compiled you've tested And now your system can automatically deploy for you, which means you go from deploying Maybe a couple of times a year to possibly several times a day Even if this isn't automated We're still seeing people deploying much more frequently And so that startup time suddenly becomes a much bigger factor in your runtime and then what talk is complete without mentioning the cloud and With cloud you get a lot of horizontal scaling right instead of buying a bigger machine You start to run more instances of your application the more instances you run the more JVM startups you have And actually the length of time it takes for the JVM to start up for your application to start up and to be ready Effects how early you have to scale the earlier you have to scale the more money it costs you This is cloud economics You're paying for your memory the more you have to pay the more instances you're running the more memory you're paying for and Then you know to continue to be buzzword compliant We have microservices which takes your scaling problem and multiplies it across multiple Microservices so instead of having one monolith that you're starting up multiple times when you scale you've now got Dozens or hundreds of microservices each one of those paying startup costs and then the next major Change we're starting to see is serverless computing. So I've pulled this picture from open whisk or Apache open whisk and the model here for those who aren't familiar with serverless is You know you have some set of events that occur and you have rules that listen for those events when they happen They fire some action and typically the way those actions get fired is they run a A docker container right the model is often your action comes in your docker container spins up It services your action and then the docker container goes away So that means your startup now is not just the startup of your application It's also the startup of the docker container start up of the JVM startup of your application So there's a long latency. So if you look into serverless computing, they often complain about cold starts right the problem is so bad that the serverless providers actually cheat a little bit here and they claim that Your container will be shut down or conceptually is there for one request But they actually keep them around a small number of them So that second requests don't always pay that same hot start or that same cold start problem Right ideally though the JVM should be able to start fast enough that none of this is Needed that the serverless providers can go back to the very simple start a container on every request The other interesting piece of this is that we start to see a Loss of the ability for the JVM to learn from previous runs So in a serverless model where you're starting a new container for every request Everything is treated as a first run the JVM can't easily cash things away To be able to use them for later runs So what is our new world look like deployments are frequent Often multiple times a day Startup even if it's exactly the same time as it length of time as it used to be In the old world has now become a much larger fraction of your uptime and we've lost The ability to learn from previous runs Because now we're seeing lots and lots of first runs and never really seeing what would be considered a second run so even in the old world though there were times where the JVM Still cared about other metrics than pure throughput So historically the JVM didn't have to be a good neighbor. It was able to say I'm running on this machine I'm going to grab all the resources. I'm going to assume. I'm the only one here and You know, it gave you really good throughput for that But there were still times where that wasn't good enough you know the classic XK CD about Developers slacking off while they're compliant compiling their code This also applies to the length of time it takes to deploy your code Right if the JVM takes a long time to start up to get your application up and running It means you can only do so many startups in a given day and so we the JVM needed to come up with ways to make that better and Debugging is even worse because there's often a higher performance hit when you enter debug mode compared to regular mode So the JVMs gave us options to work around some of this right? There's X quick start for open J9 I think the equivalent of that would have been dash client for hotspot And what this did is it traded your peak throughput for a faster startup so you might wonder why is there a trade here and The answer is the interpreter One of the really good sources of profiling data for a JVM is the interpreter because it's okay for the interpreter to take a Little bit more time to do things to be able to get that profiling data for you And so the sooner you compile the less of that really rich profiling data you get And so you make this trade-off between your peak throughput, which is going to rely on some of that that interpreter profiling data and You know your faster startup, you know you can get into JIT code faster, but you're not going to have as much data So the other place even in the old world that people cared about Other metrics was actually footprint And that drove us to create something we call the shared classes cache, right? This is where maybe I'm running multiple JVMs on the same system. I really don't want to pay The footprint for all of the data all the time so I've given other talks where I've talked about why the class file format is a horrible format for the the JVM to interpret And what J9 has done is it takes that class file and it creates a new piece of metadata it calls the J9 ROM class So this is all the symbolic information from the class file But it's put in a better format to be able to walk for the interpreter to be able to find the the data It needs at runtime and it's typically much smaller than a class file because we throw away the bits you don't need and Then in addition to that we also have a J9 RAM class This acts like a cache for any of the live pointers So when you've resolved a string Using when you're doing a load constant for example, you need to put that string on Pointer somewhere and so that ends up in the J9 RAM class So if I'm running three JVMs on the same system, and I've loaded basically the same classes I've got You know ROM classes and RAM classes for the same things in each of them Now if we look to the world of something like C or C++ if I'm running the same application three times The code there is in shared libraries a lot of times And so I'm only paying for one copy of that code in memory Each process is able to share that executable code. And so you know I get a footprint savings so we said our ROM classes are true ROM once they're created nobody writes to them and They're position independent so I can load them in to memory anywhere I want and so really what I want to do is take those three copies and make them one And so we created a shared memory area that all JVMs on the machine that have access to Can share that and so you only pay for one copy of the ROM class, right? So we've taken the executable code Java's byte code The constant pool that kind of data and we've said look you only really need one copy of this in memory all the JVMs That are currently running can share that So the major thing this gives you is a footprint win usually about 20% But it also gives you a startup win because you don't have to take those class files anymore and parse them and convert them That's already done. You can just load the existing one out of the shared classes cache So I know I show three JVMs running at the same time you get the same kind of benefits if you're running JVMs one after each other because once you don't get the footprint win in that case because you're still paying for one copy but you do get The startup win because you don't the second JVM can reuse the ROM classes that have already been created Right, so we've been talking about startup. What actually happens at startup? I'm going to give a really high level overview of some of the things that occur So we can see you know areas that the JVM needs to change in You know, here's my command line, you know Java dash X share classes Maybe you've got a bunch of other options there You know if we're being truly honest most people's command lines are pages long worth of options Usually copied from stack overflow But what happens when you hit enter well The invocation API comes along this is defined in a specification You can go look up exactly what it is, but it basically is a call that creates the Java VM for you So the Java launcher is just a piece of C code Anybody can write their own launcher using the invocation API and it'll go off and create you a Java VM and so what what happens in this process is There's a lot of string parsing to parse your command line options Some allocation to create your data structures and out of this you get back a Java VM structure that you can use And you get a bunch of threads usually, right? There are GC helper threads that have to be created. There are jit compilation threads There are other random threads that the runtime needs So there's a lot of allocation happening at startup. This isn't different than what you would see out of any other managed runtime and Then the GC has to allocate the heap and the reason I call out the heap is there's often a hidden startup cost in the heap and that's due to the fact that you've Reserved a large chunk of memory and now you have to page it in and we often see this in GC logs when you've you know Been running a scavenge and the first time you touch the evacuate space It's actually a slower scavenge than any other scavenge So there's a little bit of a cost there as well and Then the classes that have to be loaded at startup right we have a bunch of class files They have to be loaded so you've got disk access. They've got to be right into memory. They have to be parsed They have to be verified and then you create your ROM and your RAM classes out of these things And so running hello world on a Java 8 you're talking probably about 400 classes to get loaded And by the time you add a larger application in that pulls in more and more classes And then all the while that this is happening your interpreters running. It's profiling the data And your jit compiler is running in the background jitting and so there's this fight between Getting your application up and running and getting it up and running fast So if we look at that process, there's a lot of things in there that we could do that could really change the JVM So we need to look at how do we do that and with open J9? We've been doing a lot of investigation into these areas So the things I'm going to talk about some are areas. We're interested in working in some are areas. We've already started So the first one is how many people are aware of Jlink in Java 9? What if I told you it was good the more for more than footprint? Jlink is usually pushed for footprint we say you know modularize your application run Jlink And you'll get a smaller custom runtime you can deploy All of that's true. The really cool opportunity though is it gets people used to running a tool before they do deployment Once you're running a tool the JVM can do all kinds of other things You know these are things we could have done in the past But getting people to run those tools has been difficult and so now you know if we can if Jlink gets a lot of buy-in There's a lot of opportunity here. We could pre-create share classes caches we could pre-do the The ROM class creation, maybe we could create ahead-of-time compiled code for you There's all kinds of opportunities for optimization once people get used to running an extra tool So I've mentioned AOT and J9's head AOT for a long time now I'm not sure I can't remember exactly when we introduced it, but Back in the days of web sphere web sphere real-time. We put a lot of effort into making sure that That we head AOT capabilities And so we looked at our shared classes cache and said this is great for storing ROM classes But the other thing I want to store there is the other kind of executable code I want to store my jiddic code in there and so on startup the JVM generates AOT code and Stores that away into the shared classes cache along with some metadata to make sure that you know It's valid to reuse that code And it stores some interpreter profiling data We've seen start-up improvements between 10 and 30 percent when you use the shared classes cache That won't be the first run that'll be the second run So this is what we call dynamic AOT You don't run a tool and have it generate your ahead-of-time code You actually just run your Java application and so the first run creates your AOT code and saves it away and later runs can then reload that and reuse that and And so as we look at the new world we have to look at ways of increasing the ability to use this especially in the face of that First run problem. I mentioned earlier So one of the other areas I've talked about has been interpreter profiling getting rich data out of that and so in open j9 we've Introduced what we're calling J profiling which is a way of putting counters into compiled code But being very careful in how you place them so that there's minimal counters on hot pads The obvious algorithm when you're working on something like this is to put a counter on every basic block And then you can just you know walk whatever path you want and see which one's the hottest But then you're paying for a lot of a lot of counter updates on your hot pads Which you don't want to do So there's a document in the open j9 repo that describes Some of the details of how this works, but it's it's an algorithm that very carefully places those counters and then patches them in and out The cool thing about this is it means that you don't have to stay in the interpreter as long It means that you can go to jitted code earlier because you can still get that high fidelity profiling data out of that jitted code Because you can just turn your counters on or off and strobe them to the level that you want to get the data you need Part of this is in there's still more development work on this going on at the project And then the next one is an area. We've been looking at What if you know we talked about microservices earlier What if you wanted to deploy an application, you know hundreds of copies of the same application in containers Each container has a copy of your application a copy of the VM and a copy of the jet We've already got the shared classes cache that helps you save on startup time But what if we were able to make it so you could save on all your Compilations right instead of having each of these applications compile String and object and whatever else over and over and over again What if we made this into some kind of service? All right, this is the next evolution of our shared classes cache Technology and that's to be able to take the jits out of the application Right right off the bat. You're going to get a fairly significant footprint savings Probably about a hundred meg for the jits scratch space maybe eight meg for the compiled code or sorry the The jits DLL But you're also going to probably get startup wins out of this right if you're running the same application over and over and over again When you go to request to compile, there's no reason that service can't give you back something. It's already compiled Right that you don't have to wait for the compile You might not even have to wait for lower tiered compiles. You could jump straighter straight into higher tiers So this is an area that we're actively investigating at the project and just to To throw this up because we're pretty proud of this I know what said in the old world. We talked and looked mostly at Throughput, but that's not the whole story with open J9 We've always worked on trying to trade off throughput start-up and footprint and I think we've done a pretty good job So if you check out the Eclipse open J9 builds and you run with X shared classes We have you know some descriptions of running day trader which showed that we started 35 percent quicker Then a vanilla open JDK and we used about half the memory Even after a lot of load. So if you're interested in this You know check out the link there that talks about the description of this or Go to the adopt open JDK project Download it yourself So has anybody heard of the adopt open JDK project? Okay, so some people this is sort of the best place to get your open JDK builds They build vanilla open JDK and they build Open JDK with Eclipse open J9 They're building the same code that you get out of open JDK They're doing their best to run JCKs on it to run testing on it to make sure that you're getting a valid build That's not mystery meat So, you know check this out. It's a great place to get builds and then of course working for open J9 Check out open J9, you know download the builds from adopt run your application. Let us know, you know We want to hear about your successes We also want to hear about your failures and your amazing results with it We're gonna keep hacking on it to improve it So, you know join us meet us at the community on github or you can join our slack Gaging from the room earlier this morning slack is not well loved but And at this point, I think I've finished with about a minute to spare so maybe one question Yep, so we're still early days on the JIT as a service. We're still prototyping that We've seen you know with our experience with our shared classes cache technology that the AOT code being able to store that plus the The profiling data and some hints to the JIT has been pretty good I expect that experience to continue to hold across most applications and as with anything there's probably There's probably a balancing point some applications might be very Sensitive to the data they're running But most are going to you know benefit from this kind of approach. Thanks everybody