 Well, welcome to the open source summit day to my name is Tom Watson today I'm going to be talking about lightning fast Java application startup and in particular. I'm going to using a technology called a checkpoint and restore With the CRU project along with eclipse open J9 for my JVM So just a little bit about myself I've Live and work here in Austin, Texas at IBM. I've been involved with open source for Nearly 20 years starting with the original eclipse project before the eclipse foundation was even started And I was involved with getting the platform moved over to an OS GI Based runtime for modularity, and that's where the the equinox project started I'm also involved with the patchy Felix project, which is also implements various OS GI specification implementations and then My primary focus nowadays is a project called open liberty and this is an IBM open source project that we host at github and it has our implementations of Jakarta EE and Micro-profile as well as the previous versions of Java EE I'm also involved in a few open specifications that Eclipse the OS GI working group. I have a long history with OS GI So I was involved with the OS GI Alliance before it moved over into Eclipse for doing specification Work there and now I'm also getting involved in the Jakarta working group So first I thought I'd go into a little bit of motivation on why We're looking into a fast startup for Java applications and in particular for Micro-profile and Jakarta EE applications So today when you're deploying things to the cloud oftentimes You're using containers. You're containerizing your application. This just basically provides that environment for running your programs but it allows for it to have isolation between you know processes from one container to the next and Oftentimes you have a short lifespan for these containers. They're not Like the typical Jakarta EE platform or Java EE platform from the past They were long-running servers that you deployed your application to Now with containerized applications You basically bundle all that you need Together within that container and then you have to start start all up to serve your serve your application So when we're talking about something like a web service that's implemented in Java. It's still We'll want to use some of those Java EE technologies such as servlet or CDI and so on and many of these implementations Have been designed from the ground up to be kind of a long-running process and The server themselves are designed to be able to receive and handle many requests over long periods of time So it's sort of a conflict of interest between this kind of serverless environment where you want to be able to come up quickly Run your application and then perhaps shut it down After you're done using it so with open liberty we When we started developing open liberty kind of had this in mind to make it Right from the start be able to be designed to run in the cloud efficiently So it's our platform for basically running Jakarta and micro profile applications But the runtime itself is highly componentized It's built upon some of those technologies. I mentioned that I'm involved with on top of equinox and OS GI and this allows for us to Componentize the runtime such that it can provide a fit-for-purpose Runtime and what I mean by that is you can configure liberty to only enable a certain features that your particular application needs So if you're just a simple rest application that maybe needs a little bit of Dependency injection with CDI you can just enable those two features of liberty and then we'll only Load up that part of the runtime So this lets us do a couple of things one is We can shrink your container because we no longer need to pull in the full platform We just need to pull in the bits that your particular application needs It also allows for us to have a faster startup time because we don't need to load up as much of the runtime to Get your application up and running So for a pretty simple application just a simple rest endpoint with a little bit of CDI We can see comparably fast startup times sub seconds startup to get a simple application up and ready and ready to Take incoming requests But as I'll go into a little bit more details Later this may not be actually fast enough for some scenarios in particular. I'm talking about this Scale to zero where you want to be able to take your instances all the way down to zero instances whenever you're not Run not having any traffic to your particular microservice So this diagram here is just basically an overview of something called k native k native is a Technology built on top of kubernetes that does auto scaling and part of its functionality is it does Have the ability to detect whenever there's no traffic coming into your your service and it can Take those instances down to zero instances And then when a request comes in and it detects. Oh, there's no instances running it goes ahead and scales that up And similarly if you get a burst of incoming requests it can continually add more pods or instances of your application as needed But the advantage of this Ability to go down to zero instances is whenever you don't have any traffic coming to your service You no longer need to incur the cost of having at least, you know one instance there that can always be ready to serve But when you start seeing your start This is where it starts becoming vitally important that you can start up your applications very quickly Things that start approaching a second to start up Will start causing high latency for your users because they'll interact with it and all of a sudden They'll see this big pause and it won't be a good user experience until you get enough instances up and running to serve them efficiently So this kind of forces your deployments to have this set a minimum number of instances that they want to have available at all times even when there's no traffic coming through just so that they can be able to Service the request quickly when they come in later And so this is a just additional cost that you have to pay over time just to keep something up It's not really doing any work So one approach that's been taken by a few projects like corkis and microknot and so on is to take Your job application and do a native compilation So they use the grawl VM and it has this thing called the substrate Compiler and it basically takes your application in all your libraries and everything that you use and it can produce a standalone executable And this actually can be run without any Java installation whatsoever. It's completely self-contained And it she's achieved this by using a closed world assumption So when it's compiling your application it it needs to be able to detect absolutely everything that that application is going to need all of its code paths all of the Parts that it's going to do reflection on all these things have to be Known ahead of time so it can compile it into the native executable And you get some obvious gains by this approach because you can start seeing much much faster start-up times For example corkis with that simple application. I'd mentioned just a rest endpoint With maybe a little CDI can start up in about 50 milliseconds. So Nearly instantaneous just comes up and is ready to serve The other thing is it's an overall a smaller code size because we were able to just take away everything else that the application doesn't need including parts of the JVM. That's not using or If you pull in third-party libraries, it's only going to pull in and compile in the parts that you actually use But there are some cons you you for Applications that are maybe running for a little bit longer period of time You're going to start seeing a little lower peak performance than you would for a long-running Java Instance and that's because the jit compiler over time can make your execution actually faster and faster and you'll get Better throughput as more and more requests are coming through and exercising the code and the compiler There's a little bit more costly memory management And this is really just attributed to the fact that the the Java garbage collectors been around for you know more than 20 years and It's very good at what it does and there is also a native garbage collector, but it just has a little bit of catching up to do I expect that will improve over time The closed-world assumption may not actually apply to all of your applications So if you're just looking to take an off-the-shelf application that you've been developing for years You're likely going to need to make some adjustments in order to make this fit into the native compilation And one of those things is like all reflection must be known ahead of time So things like dependency injection and all this kind of stuff they they oftentimes use a lot of reflection And so you need to get those kind of things configured and known at compilation times so that they can get in there And the compilation times themselves is also can be quite long And finally just sort of have this difference in what you're going to be deploying With the native executable versus the environment you're developing and so oftentimes you're pulling in third-party libraries And those things are going to be developed More than likely on a full JVM So you sort of have to retest to make sure that all that stuff is working properly when you natively compile it So we're looking at an alternative approach. It's called checkpoint and restore. So it's uses this project called CRIU It's a Linux technology. So this is a Linux only solution right now And basically what CRIU can do is take a running process and it can freeze it at a specific point in time and then save off its state and then that state can be persisted and Then read at a later time perhaps even on another machine to restore that process right from where it was Checkpointed and That restoration process is very quick Definitely much less than a hundred milliseconds and it's going to load that up and instantly get it running again So the idea that we have is that you can select basically a specific point in time in Your server startup when it's processing your applications and getting it ready to serve and pick a point in time to freeze it So that you can then restore from that point into many different instances And then that restore time is going to be vastly faster than all that work you did up to the checkpoint So first I thought I'd go into a quick demo. So this is Just running on my local system. I have a a build of open Liberty that supports checkpoint and By the way, this is also running on a build of open J9 that supports checkpoint And then I have you know, the other dependencies you need such as CRR you installed and so on So in Liberty We have this server server script that you run and so I'm going to run something called instant on So this is a server configuration that just has that very simple Rest application. That's really just a hello world Application so this isn't doing any checkpoint, but you'll see It was just the normal startup process and it came up in about 799 milliseconds, which is actually Quite fast compared to usually I see it take about a second But now this it's just a really simple hello world end point. So it's using Jax RS out of Java EE But we can do is we added a new command called checkpoint and Then there's another parameter that where you tell you what where you want to checkpoint it at so we have a Few different points defined where we can do the checkpoint for this purpose. I'm going to use something called applications And so that's basically the latest point that we allow for you to do a checkpoint So what that does is it brings up the server and it's going through its startup process But as soon as we've detected that we've Started all the configured applications, but before we're opening up any ports and accepting any incoming requests We asked the JVM. Hey checkpoint the process and save us off and then in turn that CRI you go ahead and do your your checkpoint that saves off the process information and then CRI you Ends up killing the actual process. So it's so it pauses it saves it off and then kills it. So it's no longer actually running now if I run this same server what Liberty does is going to detect that Checkpointed image the process image and it's going to just resume it directly from there instead of starting the server from scratch And so you'll notice that it here now it came up in a hundred and six milliseconds, so it's about seven times faster and A majority of that time was actually spent by CRI you doing the restoration phase It's only the last little bit that we had to do to open up the ports and get it ready to serve so there is so you can Bring down the server if you run it again It should restore directly from that point again as you see this time it came up in 71 milliseconds And so that's just a very very simple demo of a very simple application Let's go into a little bit more details on on what I mean by where where to checkpoint So that particular demo I checkpointed at applications. So like I said, that was the last Point that we allow for you to do the checkpoint We have earlier points the one in the middle there is deployment And so that is basically where where servers came up and it's running and it's processed all of your application metadata all of its Annotations any of its metadata For example, if it's a war it's gonna have a it may have a web XML that needs a process So it's processed all that stuff and it knows everything about the application But it hasn't actually run any application code So if your application code has early startups Code in Java E you can have something called a servlet context listener Which it gets informed whenever your servlet context for your application has been initialized by the container Those things happen after deployment And then we have an earlier one which is called features and this is before we even process your your application metadata So it's just literally getting all the Liberty features loaded and started but and ready to start processing your application and The later you do your checkpoint the faster the restore times gonna be because you're gonna have Less code that you have to run before you can start serving up requests But on the other hand the later that you do the checkpoint the more things you need to consider for When you do a restore on the process um so in So the approach we're taking with open J9 and open Liberty is we're adding Hooks and these hooks can get called to prepare for a checkpoint and then they get called again on the Restore side when we're restoring before we continue on and start serving your application so Some of the things like the JVM will do When it's doing a preparation is it all? When we're doing the checkpoint it'll let you only do certain things with a security stack because we don't want some things to get Initialized too early and then on the restore side it opens those up and allows them to continue on similarly Liberty has a number of hooks that prepare for the checkpoint as well as on the restore side the hooks get called to To let them go, and I'll give some a little list a little bit Some of the concerns that that these hooks are needing to do whenever we're running like this As of now we're not actually letting application code take part of the the checkpoint prepare and restore But I do anticipate in the future if this becomes More readily used by more and more applications We'll start seeing demand where applications will may want to do something to prepare for a checkpoint and then Restore stuff on the restore side So here's just some examples of things that that we need to prepare for the one of the ones that's Maybe easy to understand is timers if you establish a timer in Java one of the properties Like you can say I want to run some Some action every five minutes and you establish this timer What the timer does is it keeps track of it missed any events? So if it missed a couple events and didn't notify you and now it's been ten minutes later It's going to immediately call you twice just to get you caught up So if you're thinking about it establish a timer like that before you do the checkpoint And then you restore let's say an hour later What what may happen is that the JVM will come up and the timer will wake up and say oh an hour has passed I need to call this this timer all those times just to catch up So we need help from the JVM to fix those the the timer implementations in the JVM We also need to be able to do certain things in The run time for example the EJB container has timers as well And the way we handle those is we just to simply don't let the timers Became becomes active until the restore side Another one that really needs to be thought about is any type of connection You really don't want to be connecting with any data sources and all that kind of stuff before Checkpoint because the goal is you want to be able to restore into many instances Of that process from that same Checkpoint, so you're not going to be able to restore the same connection into 10 parallel instances And even if you could you you likely would not want to do that because such connections often require either some kind of authorization or Authentication or at the very minimum some configuration of the endpoint that they're connecting to and You're not necessarily going to know that kind of information at build time you want to configure that kind of stuff at deployment time and Speaking of configuration some of those things that that can get configured in the cloud often time come from environment variables or secrets and these things aren't known and configured until you're you know Deploying your stuff to Kubernetes or your pod And we need to be able to have those kind of configurations come through and take action at the restore side You're not going to be able to do that in your build pipeline and have it make any sense but once we can get through some of these difficulties and be able to establish the Process getting restored very quickly We do see a lot of advantages We think it will be able to retain all the benefits of you know running the fold JVM on the restore side You get fold you know dynamic loading all the advanced JIT Garbage collection. It's no longer having to deal with this closed world Assumption of doing native compilation and we believe it will enable a much wider set of applications to be able to Participate in this checkpoint and have very fast start-up time or in this case It'd be restore time whenever they're trying to bring their applications up quickly But there are some additional challenges We definitely want to make sure that we enable our applications that are running in container to be able to run Rootless, you know without being the root user But CRRU itself needs a lot of elevated privileges in order to be able to take that snapshot of the process and be able to restore it and And But we we definitely don't want the to require you running as roots So what we're taking advantage of is the Linux capabilities So these are able to grant CRRU certain privileges that it needs in order to process in order to take a process and do the actual checkpoint and the restore and there was a capability it was added and Factor was added to the 5.9 kernel called cap checkpoint and restore and this was In it an attempt to reduce the overall set of capabilities that you need to sign CRRU Prior to this you had to sign it the the sys admin capability, which is a very wide range Of capabilities that you often want to avoid because you don't want to give that much power away And so this was added and is added specifically to help out CRRU It was back ported to the 4x kernels at least If you're running on red hat But otherwise in order to use that you need to be on kernel 5.9 or above But once we are able to do that We can now think about how we're going to do this in the container and what what it takes to actually restore the container with that process Restoring that process right in that container So I'm going to be going over how we build an application image in Liberty So this is a typical Docker file that you would see if you went out to open Liberty IO and went through some of the guides It talks about containerizing applications So this is based the simplest form of a Docker file in this particular case I have it saying open Liberty beta checkpoint. That's not actually an image that exists yet The beta one does so that's the only difference It would be you use beta and then it do this and it and then if you do the build command it's going to Produce this demo application again. This Will not actually do anything with checkpoint yet That's just the standard containerization of your application with open Liberty And if you launch that it would start the server with your application and your configuration and be up and running But what we're looking to do is be able to take that application Image and then run it So that we can do a checkpoint while we're bringing that server up so Here I'm just passing this in environment variable WLP checkpoint and it's specifying applications and so that's going to tell the runtime just like I showed when I was on host To call open J9 in order to checkpoint the JBM that produces the process image And it's now contained in the running container Once the check points done of course CRI kills the process and then the container stops with that Saved a process image But now you need to get that Stopped container Committed down into an actual image that you can now run over and over So you can do this commit command and that takes the the checkpoint Container and then commits it into this thing just called the demo One thing I neglected to mention is Whenever we are doing this run I I specified the privileged option. So that was just to avoid having to do all the More scaled down kind of permissions when you're doing this Just for brevity But the when you're doing this you have to grant the container Then the amount of privilege is necessary for CRIU to work and then when you're restoring the restores Is like this and again It has to specify these three different Linux Capabilities to let CRIU do its work. So checkpoint restore net admin is actually needed because Liberty has an Local port that it uses for server commands and then finally there's the sys p trace Capability and CRIU needs that in addition to the checkpoint restore to do all of it all of the work that it needs on the process The other thing is this security option. It's a it's a file that contains An additional set of system calls that CRIU needs above what the default is and then You could specify the option Unconfined and then that just basically enables all the system calls And then you wouldn't have to pass in this this file That's allowing you to do just the subset the CRIU needs and Then finally we need to be able to mount this in its last PID because this is what CRIU needs to be able to restore your process into the exact PID that it was running on and and Actually in order to do this we needed a change to Run C and run C is what podman uses to to actually start the container and it doesn't allow for this Mounting of anything under proc so we had this Pull request that that we open three four five one and that's already been merged And it's actually in a release of run C the one one three If you don't do that You basically have to run in a privileged container because run C just won't let you do Do this unless you have that that That back ported So let's go ahead and go into a container Demo so So I have this very simple script. It's basically does What I showed you Earlier where it's do it is it's going to build with the docker file and and spit it out and then Run the container do the checkpoint and then commit it So I'm going to just go ahead and run that So you'll notice up here before this went off the screen It was bringing the server up again, and that was whenever we were running In container the the server configuration along with the application and that's at the point where it detected Oh, I want to do a checkpoint add applications. So it did the checkpoint stop the server The container exited and then it did the the committing of that container to the actual image So now to do a restore And you probably can't see that but it's what I had in the presentation It's just passing in the capabilities and that security option with that file That contains the extra system calls along with the mounting of NS last kid But I still have my server running over here. So I can't bind on port 90 90 90 80 So shut that guy down So now it brought up the container it restored the process and it came up in 70 milliseconds once we got once podman loaded the actual image and can and started the container Just to show a difference that there was a demo application That got created in that process. So this is the the container before I did or the image before I did that checkpoint so I can start this guy as well and in this case It's actually starting the whole server from scratch bringing it up. And then as you can see it was nine nine hundred and forty six milliseconds versus the 70 milliseconds So if we go back I have one last Thing that I want to show and that is for Deploying stuff into Kubernetes with now that you built this checkpoint image That contains that that's a paused process In order to deploy it to Kubernetes you you have to do some of the same things to get to grant the capabilities and mounting of In this last pit, so this is just a section of the YAML that's used to deploy that application to a Kubernetes cluster So there's the security context with the three capabilities you need to add and then the mounting of the in this last pit Again, I didn't mention this before I skipped over it, but there's a new system call called clone three and That CRI you will use that if it's available and that Eliminates this need to do this Mounting of this in this last pit So you won't actually need that once we get the systems out there We're deploying into how to have clone three readily available I Think the earlier chart said that that came in kernel version five point three So let's go ahead and just do a demo of that. So I have a local running cluster on my machine Kubernetes and I also have that patched version of Run C here and then I have that YAML in order to go ahead and Well, I'm just going to go ahead and apply that to my cluster to go ahead and load up that That container image along with the The pause process So that went ahead and deployed it So there's my pod running if I get the logs from it You'll see that the logs there now report that the server is ready to serve the application in 69 milliseconds I Have that particular cluster working on port 31,000 so that's this one. So it's up and running if I get the logs again It shows those requests that came in The final part is that I can Scale this up to three replicas. So I'm just going to add two additional Replicas of that so it's been scaled up if I get the The pods it shows that they're running This one here came has been running for six seconds. So it's one of the ones that got added So yeah, so that one, you know came back up and in 73 milliseconds So that is the end of my Main presentation. I do have a number of links here So that first one is a git repository I it basically has the the stuff that I'm showing with it being able to build That that container it it lets you actually build the the support for Checkpoint and restore with the latest open Liberty nightly build along with open J9 I just pointing out the the IBM run times Link there. That's where the the Simaru early access builds are so that's where we're going to be Publishing the the builds that support checkpoint and restore from a Java perspective and then finally That last one is the open Liberty nightly builds. So those Contain the support for checkpoint restore. It hasn't officially gone into beta. So you have to get the the all Zip that that contains all the support So and a future time We do anticipate well, we'll beta this so it gets into the beta images for Liberty so with that I can take any questions in addition to Showing you and letting you know that we think open SSF is quite important and we highly encourage everybody to go Take a look at this understand s bombs and and so on but I'm sorry Can't hear the question. Oh the three capabilities the Net admin checkpoint while the checkpoint restore was specifically added to try and really reduce that down Was it the net admin and the the p-trace one I I don't have a definite definitive answer on how how That can take down the whole box or let it basically be open to Vulnerabilities is that or a tax? Yeah, that's what I was trying to ask. Yeah That's a good point. It's something that definitely needs to be looked into and understood how how much of a risk that is I know It's discouraged to deploy fully privileged containers just for that Concern so we definitely trying to reduce down any of the privileges that are needed to try and reduce those kind of concerns The way CRIU works, I don't think we're going to be able to get that down to zero elevated capabilities, but but we definitely need to have a Understanding of what may be at risk if you're elevating those privileges Yeah, sorry if I misunderstood something but It seems that you can run legacy application job application with just open J9 and do a restart checkpoint Yeah, so open J9 is adding APIs into it to checkpoint the JVM from the perspective of any Job application. I was just showing you one application of liberty Utilizing those APIs and calling it a specific point of a Liberty startup process Did you try that with all the legacy job application? I Haven't tried with jetty, but I have tried it. I'm a long-time eclipse ID developer So I thought it'd be cool to try it with eclipse ID. It fell on its face because native SWT So I'll tell you if you have a lot of J and I are native native code in your application. It's probably going to be problematic We don't in Liberty so But the JVM team So I'm part I'm primarily open Liberty team the the Java team They're obviously trying to do this generally with other applications not just open Liberty First off a great presentation. Thank you for presenting My question is is there a way to name or version the checkpoints So if you are running multiple different Java apps on the same host like in Kubernetes Either different types of Java apps or different versions of the same service So the way we're viewing this is that it's part of your build pipeline when you're building your container So you would version it and name it at that point whenever you're committing that container with the save process Into the final image. So that's the point where you'd say, okay, this is application X version Five and it's been check pointed. Here's my final image And then that checkpoint is I guess stored on the host with that It's not stored on the host. It's actually stored in the container In the image that you're going to deploy So that was what I was trying to show with With that's what this picture is trying to show that Blue circley thing. That's the process image. So that's the stopped container with the process image And then when you do the commit you named it the final name with demo and that's the That's the image with the stopped process stored in it And when when you start that Liberty will detect that image and restore it right from there Very cool. Thank you I guess that's it. I'll be around if anybody needs has additional questions, but I guess that's the end of my time