 All right. Hello, everybody. I'm Christine Flood. Everybody's talked about how long they worked on Java. I started in 1998 working for Sun Microsystems. I worked on the first production quality JVM called ExactVM, which was back before there was even a hotspot. So there are my credentials. I'm here to talk to you about fast startup for Java applications. I work for Red Hat and I work in the Linux ecosystem and people are saying, well, Go starts up really fast and Java is too slow. So I started thinking about what we could do to make Java start up faster and checkpoint restore was a clear solution in my mind. So what do I mean by checkpoint restore? You have your Java VM. You've loaded all your classes. You've warmed up your JIT. You've run some sample things. You've got everything the way you want it to run quickly and you've run a couple of GCs. So you've got your heap as tight as possible. What you do then is you run the checkpoint and that rates out the state of your JVM to the file system. Now I want to be clear here. This is not object serialization. This is more of a bit blit, right? It's a bulk copy of the memory out to disk so it can be quickly copied and quickly restored. So another thing that's cool is once you have those JVM image files, you're not limited to restoring it once. So let's say you have an application that has a, you know, a large startup time. You can build it up, write out an image and then later on when you're running it you can say, hey, I'm going to run it here, here, here and here. So this is another way to deploy your Java applications. It's not quite as pretty as right once run everywhere was, but it gives us the ability to come up very quickly with a hot JVM that has all of the capabilities that you expect from OpenJDK, that has the dynamic class loading. It has everything that you could want and it's all right there and available. So another use case, I had to bring this up, right? I'm not the first person to talk about doing this for Java. IBM had a paper on it. Inria had a couple of papers on it. I'm just doing it a quick and dirty way, which I'll get to in a minute. But let's say you're running some sort of long-running number crunching application on Java. Maybe you don't want that to crash and I'll go away. So you can checkpoint every iteration and then if you do have an unfortunate crash, you can just restore from the last working iteration and keep going. Or that's another application you can replay the crash. If you have a crash that happens every iteration, you can save the world right before the crash happens. And then rather than having to wait 10 hours for your machine to get back into the crash state, you can bring it up immediately at the bug state. All right, so let me be clear here. I'm not proposing anything heroic. When IBM and Inria did it, they started from the JVM and they wanted to work all of the JVM data structures. I want to do something quick and dirty. There's already a utility out there in the Linux world called Crewe. And that does the heavy lifting for us. Today, without me doing anything, I can checkpoint and restore... Well, they can checkpoint and restore a Linux process. I can checkpoint and restore most Java processes without any additions to the JVM. So checkpoint, restoring user space, it's a Linux utility. I'll go into a little bit more detail about how it works later. It handles a lot of the things you wouldn't expect it to, like files and sockets. Pretty much all of the hairy stuff it can do. And if you want to read more about it, you can read it there. And there are people here that actually work on Crewe. I don't know if they're in the room right now, but there are people here at Fosdum. So this is used mostly for process migration. The DevOps folks who want to move a process somewhere else. Quick process spin-up, which is why I got interested. Container migration. So the other reason I got interested is because I'm old. And I used to work on list machines a long time ago, and I really miss being able to save the world. Imagine you have a redevelopment loop and you've done some stuff and you've got your state where you think you want it. You can save the state of that world. And then you can do some more changes. And if they don't work, you can come back to where you were. They also had the ability to do incremental builds, right? So if you do that, save the world once, do some stuff, save the world again. You don't have to copy the whole memory state out. You only have to copy the stuff that's changed since the last time you did it. And that is extremely powerful. So, okay, so most of this is already there in Crewe. Why are we talking about having a Java API? And that's because there's some things we as JVM guys can see as potential opportunities for this. If we had a Java API. One example is, let's say you wanted to save the world. Well, you could do a full compaction of your heap and get it all the way down. You know, if it was 64 gigabytes, but you did two full GCs, you got it all copied out and you got it down to two gigabytes. That way, we can give memory back to the OS. And then when we write it out, we don't have to worry about all that space. It'll make it faster to copy it out to disk and make it faster if you want to transfer it across the network. And hopefully, it'll make it faster to restore as well. Okay, so I'm a GC weenie, right? I've worked on a lot of garbage collectors. So, I see this in the first thing I see as a GC optimization that we can do. So, you have your Java process. You're building up all of your data structures. You're running parallel GC and you're getting rid of stuff. But then it's time to checkpoint it. And hey, wouldn't it be cool if we went in and we got rid of parallel GC and we put in epsilon GC to do nothing garbage collector. We could get rid of your card tables. We could get rid of the card mark in your card table. Just maintain your card tables. And so, when you went and you actually restored this thing and ran it, if you're doing something quick and dirty, like, you know, function as a service or something like that, you now have less space. You have faster run time because you don't have the card marks. And life is good. And that is totally doable. If you have any questions or if you doubt me, please speak up. There was a really fascinating paper at PLDI a few years ago on remix. And they paid attention to the hardware counters to see when they got, you know, cash stashing. And they were able to pad their data structures. This is the kind of thing that, you know, I was a little skeptical of. I mean, they showed good results on some benchmarks. But I can believe that there are times where this would slow you down. And the good thing about checkpoint and restore is you can do this kind of work while you're doing the buildup phase because that's the time, it's not time critical. So if you want to monitor performance counters and do things like this, you can. And then later on, you know, at the restore time, you'll have something faster. All right. So there's also some JVM specific parameters that we're going to need to restore if we want this to work right, right? Things get cached in the JVM. And some of them are cached in clear places like the number of available processors, things like that. You're going to want to fix those, right? If you save it on an A processor machine, you restore it on a two processor machine, you're probably going to want to do a little bit of cleanup. Also, you know, if you have eight garbage collection threads in the first one and you're restoring it over here, you might want to think about tuning it to the machine you're actually running on. So we probably, this is another reason why we want to have control in the Java world of some of the things when we do the restore. Securing network connections and handshakes. I don't know for sure that Creo handles that all properly. I'm hoping to talk to the Creo guy later today. But I suspect there's some stuff that we're going to want to do when we restore the JVM to make sure they're right. Maybe we want to bring your work stealing cues to quiescence, right? If we want to get everything to a sane state before we check point it. Maybe we want to get rid of your work stealing cues and recreate them on startup. And maybe you want to specify when check pointing occurs, right? You can do this from Linux, from outside of the JVM, but maybe then you're getting the JVM in a state where it's got a lot of garbage and it's inconsistent. If we control it from inside of Java, so you've got a loop, it creates a lot of garbage, you get to the end of the loop and all of that garbage can go away, it's all temporary. Maybe that's where you want to be doing your safe points. And having an API to Java allows the Java programmer to have control over when that happens. So have I given you motivation that we need to be able to do this from inside of Java? Is anybody buying what I'm selling? Okay, so I have a proposed API for Java. And this is not set in stone. This is, okay. So I have just, this is from my head what I would like to see. So I'm open for people to come up and talk to me about where I'm wrong or what else we need that we don't have. But we want to be able to check the world, that's obvious. We want to know that we're running on an appropriate Linux kernel that can do this. I have no plans as of yet to do it for Solaris or any other operating system. I don't know what tools are there and what aren't. But that's the thing that you have to do before you do it to make sure it's there. Save the world, we'll save the world exactly the way you're currently running. So we're not going to do anything clever, we're just going to save it out so you can get back to that bug that you had. Save the world incremental gives you the ability to do a faster save world because you're only saving the bits of change. Save the world with Epsilon GC. Whether that belongs there or not, I don't know. But that was one of my use cases. So I put it up there. Optimize the world so we can run back to backfill GCs. We can optimize the memory layout. I don't know what else can go in there, right? It's this big open question of what could we do. You can restore the world, obviously, and maybe even migrate the world. Like I said, this isn't set in stone. This is just what I'm thinking about right now. I welcome any and all feedback, except from Alexei. Okay, so what's the current status? I have a prototype that uses JNI just so I could play with it. I don't have everything. I can check the world. I can save the world. And I can restore the world from the command line. And I'll get into a little bit later why it's hard to restore the world from Java right now. But it shows that Cree works and it does what we want to do. Test random is one of my other things I like is random numbers. So this generates a whole bunch of random numbers and bins them into different bins. And so we do the setup where we do all the random number generation and the binning. And then we save the world. And then when we restore the world, it just gives us the output of what the binning was. It tells us the data, the information about the data. This is sort of the use case that I'm envisioning for this, is that you have some big setup phase that generates a lot of garbage and then you have some small fast thing that runs afterwards. And this just shows that we ran our big program, we had Java running, we dumped it out, we have no Java running and we restored it and it all works. So this is just to show that this does work for Java, which was one of my first concerns. And yes, I have restored multiple, the same stored JVM multiple times when no problems. I ran a little prototype that called it from Java. There are some things here that surprised me and it shouldn't have, I'm feeling kind of stupid about that but I'll print it out here. So I wanted to time the first step and time the second step and I did the first step on Monday and the second step on Tuesday and it said that it took 25 hours. So that's one of the things, if you cache variables like the current time in milliseconds, it's the current time from before you check pointed it. Checkpoint restore, this is the actually interesting thing which is where you call save the world. And you can sort of think of that like a fork. You have an option of what you do next. You can keep running your Java program or you can stop it either way. And so that just gives you an idea of what checkpoint restore, check the world, save the world, save in the world twice. That stuff all works now. All right, so how does Creo work under the covers? Because if you're skeptical like I am, you want to know. At some point in time, this is what your Java process look like. It's changed since then, I think. But you had some GC threads, some compiler threads, some Java threads. I'm not going to talk about all those. All I'm going to say is that they are all mirrored in the slash proc hierarchy. So anything I can do for the Java PID, I can do for all of the subpids. And that's the last time I'm going to talk about the subpids. All right, so the Creo process is really cool. It calls, it's built on top of Ptrace and it causes a seize that seizes the Java process and stops it. And of course this is recursive and it seizes all of the threads under the Java process. It then inserts some parasite code into the Java process. And this code now runs with all of the permissions that the Java process had. So it can do anything the Java process could do. So one thing that is, is it can copy all the virtual memory. All right, and this is from S-Maps. So it can copy all of the contents of all your auxiliary data structures. This was my heap. All of these things get copied out to the file system and it's BIP-lit it. It's very fast. And you have mapped files and those all get copied over there too because they're in somewhere else in the proc structure. And so on. Basically the file descriptor numbers, the core parameters, everything, it can get access to everything. It can get access to the registers and the stack using Ptrace peak user. I don't know if anybody out here ever programmed a TRS-80. I did. And peak and poke were really cool way back then. So having peak user is very nostalgic for me. The parasite code can read, like I said, it operates with all of the permissions of the Java process. Then they can use the Ptrace code to clean up the parasite code, restore the original code and your Java process can just keep going. And it detaches and we go on. The reverse is pretty much an inverse except that the CRIO process morphs itself into the Java process, which I thought was pretty cool. If you have any questions or if you have any ideas, I'll open to questions now but if you want to contact me later, chf.redhot.com and I'd love to hear from you. Yes. I don't have that timing in front of me. The question was how long does it take to get a snapshot? I haven't worried about that. The restore process is very fast but I don't have the numbers in front of me. We are trying to optimize the restore process. Is there a reason why you would want to optimize the checkpointing process? That is an awesome idea. What's your name? Sonny. Sonny, I will give you credit for that idea. I think that's an excellent use case and I will look into it. What was it that you didn't hear about? The question was how long does it take to take a checkpoint in your Java program? I haven't worried about that because I assumed that it didn't matter but he wants a heap dump and that takes too long. If I can do a checkpoint faster than he can do a heap dump, he's going to be happy with me. How do you handle file descriptors during restore time when the files are not there anymore on another machine? That's an excellent question and I defer to Creeo to do that. I know that they have done it and in the worst case scenario, if they don't do it, then you have to run your Java in a container but I don't know. I think it's... Oh, thank you. If the files on the destination system are not there, Creeo will abort the restore. It will just not work. The files need to be there and need to be at least the same size because Creeo wants to read at the position the process used to read and it makes sense to have the same files there because else you get something else back or you're right to the wrong location. Are there any other questions? Yes, back there. I suppose I really had to wait for the mic. So when we re-spin the process back up again, certain VM structures like the runtime thread pools are going to be dependent on things like the number of CPUs in the machine. So if you go from a larger machine to a smaller one, you risk crunching the destination machine because it's now got a very large VM thread pool. That's what I said. That's why I think we need a Java restore process. The Java restore process is going to have to be smart about some things and the number of processes on the machine, the size of the heap, things like that, the restore process is going to have to be smart about or there's got to be an interface to let the Java developer tell the restore process or I want these particular parameters. So have you given thoughts about how to handle random numbers and other stuff that's required for security or certificates or some secure stuff generation because it might get checkpointed and then it becomes predictable at the next startup? That's a great question. I will add that to my list. What's your name? Okay, thank you. Are there any other questions or any other things I missed? Software development by committee, I love this. It does not need root access because it goes in as the, it has the same permission. Well, Creo process needs root access, but that's already in there. It goes in and has the permissions of the Java VM. Yes? If I can condense that small talk did everything first and they did it better. They did. No, no. I didn't want to tell them that way. You also credited the things that small talk stole, which was the snapshot from the list machine, which I had next to me running while I was developing these snapshots, the first small talk that we worked on. So Lisp did it first. No small talk is taken credit here. Just to be clear, Java did it last, however. I don't know. We're not done yet. Are there any other questions or comments? Yes? Well, if you go past in time, it is. So monotonic time, clock monotonic time is indeed a problem with process migration and container migration. And there's our currently discussions for a new Linux kernel name space for time. Clock monotonic, but this is nothing there yet. It's still under discussion, but it's a known problem with process migration. Yes? Thank you. I don't know how much time we have. Are we worried? Where are we? There's a question way in the back so you can run. It's great that I get to tell him what to do for once. So as I understand, your primary use case is to restore the original process on the original machine from a snapshot. But I can also imagine that you want to like warm up the JVM, warm up the process and then scale it to a number of different systems. But then the file handles, as you mentioned before, becomes a problem and also network socket. So is there any plans to go in that direction? I'm sure that once this spins up right now. OK, I used to work in sun labs and I'm not a very practical person. I want to prove that it's pragmatic and that we can do it. But there are details like that that are going to need to be worked out in the long term if people want it to just work for Java. I acknowledge that. But I think this is cool and I think it's worth talking about and getting spun up. And I hope you guys agree. Are we anywhere else? Are we done? I think we're done. Thank you. Thank you, Christine.