 All right. Hi everyone. I'm Eric. Thanks a lot for coming today. I want to talk about free ideas for the G1 GC, so the garbage collection theme will continue for another half an hour. It's great to see more garbage collection talks here. And I work at Oracle as a member of the hotspot GC team. So first, Oracle mandatory slide. I have to show you this and I will leave it hanging up there for a few awkward seconds. And then we can move on. So I'll go rather quickly for this. And if you have questions, please ask. I'd rather have this be an sort of interactive session than me just blasting through the slides so no one's following along. But we have a quick intro to G1 in 2017. We look at the free ideas, throughput remember sets, rebuilding remember sets concurrently, and also pearl folk collection. And we'll finish off about how you guys can contribute and get involved. So first, who am I? I did my master's thesis on load balancing during garbage collection in Open ADK. I worked in the Oracle GC team for four years on G1, primarily the last two years. And I'm also a reviewer in Open ADK. So let's start off with a quick intro to G1 in 2017. So G1 is actually rather old by now. The paper was published in 2004. It got experimental support in ADK 6, update 14, official support in ADK 7, update 4. It has all the latest pieces in ADK U121. And it is about to become the default in ADK 9. So with this, please be wary if you read old blog posts, old documentations, old explanations. It has changed quite a lot over the years, actually. It continues to change, as we will see. So G1 devised the Java HAPE into regions. We've seen other examples from Shenandoah and from the Balanced GC from IBM. So there are multiple regions, and objects will start out to be allocated in the Eden regions. When the Eden regions are full, a giant collection will occur, and the live objects will be moved to a survivor region. And this will free up the Eden regions, so objects can be allocated there again. As more and more survivor regions get full, those will also have to become evacuated, and the objects will then move on to the orange old regions. This is called a mixed collection. Oh, sorry, this is a giant collection still. Then after a while, the heap will start to fill up with more and more old regions. So if G1 doesn't do anything, the heap will become full eventually. Now, fortunately, G1 has a concurrent mark face, so it will start to look through all the live objects on the heap and mark them as live. When the concurrent mark is finished, G1 can now collect old regions as well as survivor regions and Eden regions and free up even more space. And after a couple of mixed collections, the heap will consist of a few old regions and quite a lot of free space. So these are the three sort of faces for G1 that G1 moves around in its little state diagram. It will start out with young collections, and then you'll see an initial mark collection, which will start young collections along with a concurrent marking cycle that will keep running concurrently with your program. Once the concurrent marking cycle is finished, you will have a mixed collection that will run a couple of mixed collections, not just one. And once enough, heap has been freed up. It will return to once again do young collections. So with that, let's dive deeper into some barriers and the pretty hardcore easy code. So G1 is divided into regions, and they can definitely be pointers between objects in different regions. So in this example, we have region A and region B, and there are definitely objects referring to each other in different regions. Now, if G1 needs to evacuate an object into another region, like this little one over here, is live and wants to be evacuated over here. There's a slight issue if we do this, because this object over here is still pointing to the old object. If we don't update this object to point to the recently evacuated object, well, the program will crash when it reach this memory. So we need to solve this somehow. And the way this is done in G1 is we are remembered sets. So for region B, we will keep track of incoming pointers to that region. So we will say that for this region, there is one pointer incoming from region A, and you will find it if you follow the first entry in this list that points to that object. When we then evacuate the object, we can first move the live object over to region C. Then we can check maybe there's some pointers to this object. We can follow pointer from your memory set up to it, so it now points correctly to the recently moved object. Now, this is all good, but this abstraction or feature comes with a cost, as it all is in GC. So if you in your Java code have a simple write, you have o.x equals y, where o is an object, x is a field of that object, and y is another object. It could look like this, for example, this is the object o. Here's the x field, and here's the object y. So now something interesting happens. This will create the edge from the object o to the object y, because we are writing the address for the object y into the field x in the object o. That means there's now an incoming pointer to region B. So if you would ever move this object, then the field o.x over here must be updated as well. So this again is solved using the remember set, which will keep an pointer to the incoming object. And then we can safely move that object if we want to. But the magic, of course, is how does that work? So there are more that goes on than what you see is not what you get when you do a write in Java. You're thinking of it as a simple write to a field in an object, but there's a lot more going on behind the scenes. So for g1, and this is different per collector, usually all the different collectors have their own mechanism for whatever they want to do when a field gets written to in an object. But for g1, there's something called the pre-write barrier that has to do with marking, and then there's the post-write barrier that has to do with keeping the remember sets in place. So as you can see, we will not focus on this right now, and not today. There's another interesting talk and discussion, but we will look a bit at the post-write barrier. So just looking at this, we quickly realize that this is quite a lot. Not just a little going on behind the scenes, but pretty much. So we want to make this smaller. Now why would we want to do this? Well, if you're running g1, as Alex said, I'll show you. A few minutes ago, this will impact your throughput. These are quite a few instructions. This is just a store, but then you have plenty of more instructions to execute. That will hurt your throughput quite a bit. So one idea that is proposed on the main list is to replace this. Don't do this. We can move it down to a barrier of this size with a post-write barrier. This is much, much smaller. Or we can just move to this and keep it at that. Now if it were this simple that we could just change the barrier, then why not just do it? Well, everything in GC has a price. Every feature you will pay for in some way somehow. So the problem with this is that even though your throughput will greatly improve, as we have seen in some prototypes, you will have to do some more work during the pauses. And this is a trade-off. Either you want the shorter pauses and maybe a slightly lower throughput or you want a higher throughput and a potentially longer pause. Personally, I do think this is the way to go. I think the throughput impact will outweigh a slight increase in post-time. A slight increase in post-time. And I'm not even sure that the post-time will increase that much. So this idea is out. People are contributing to this idea and starting to experiment with it and look into it. And we will now move on to the next idea, which is about rebuilding their member sets concurrently. So we talked a bit about this data structure, the remember set that keeps track of incoming pointers to a region. Now the problem with a remember set is that if you have a lot of incoming pointers to a region, well, the remember set will start to grow and grow and grow and grow. And this will take up memory and increase your footprint of your process. And if you think about this some more, if there is a lot of incoming pointers to a region like this, there are objects in this region seems to be very popular. There are pointers to them from everywhere. Well, yeah. How did you wire up your laptop? With HDMI? Yeah, because we are not getting any of your slides captured. Yeah, your box. It's not working. Is that the box? Is that you working with the analog? Then somebody should have gotten in touch with one of the video guys and could have brought you an HDMI to a VGA converter. Oh, sorry. So I have to run back to the building and get an HDMI converter if we still have one. Okay. Sorry, guys. Come on here. There's no VGA on my laptop any longer. You have to reduce the talk through the VGA now. So for this object, the region we're looking at, it seems to be pretty popular because there's a lot of incoming pointers from the rest of the HIPAA region. But what that probably means that the most of the objects here are live. They're probably been kept alive by other objects on the HIPAA. So looking at this region, remember, we had the remembers that we needed that if we were to evacuate objects in this region. Now, if most of these objects are live, we will not gain much by evacuating this region. See, the only free space we have is here, which isn't that much. So evacuating this will be costly. We're just moving live objects around. There's no point in that. When we evacuate a region, we want to free up memory so we can allocate in it. And we don't really free up a lot by evacuating this region. So what we want to do instead is just drop the memory sets. We only need them if we're going to move this region. There's no point in keeping them around if you're not going to move what we have in this region. So then the concurrent market cycles will start to run, and they will run, and run, and time will pass, and objects will change, pointers will change, and suddenly, well, now this region looks kind of suitable for evacuation, but now there's a problem again. We just dropped the memory sets, so we can't move the objects now. There are only a few ones that are live, and we can't unfortunately move them because they don't have any memory sets. But now we want to move them again. So we can solve that by doing another concurrent market cycle and just wait a bit longer until we start to move the objects in this region. And during that concurrent market cycle, we will traverse all the live objects in the heap, which means we will traverse all the pointers going into this region, which means that we now have the information for rebuilding a memory set. The concurrent market cycle will follow all these pointers, and then they can add them as entries into the memory set. And once this concurrent market cycle is completed, well, we now have a region we can evacuate with the memory set once again. So this is the second idea that is out on the main lists. No one has picked it up yet. So if you're eager to do some GC hacking, feel free to get in touch. I think this can have quite a great impact on the footprint of G1. And then we'll look quickly at maybe a slightly bit more straightforward idea, which is the last one. So parallel full collection. So G1 can fall back to a full GC when the heuristics fail. A full GC is in G1 considered a failure mode. We don't want a full GC to happen. It shouldn't have to happen if the heuristics are working correctly, but it can happen. It is currently single-fragment. And that is, of course, really bad for performance. So we just want to rewrite it in parallel and have it be faster when the heuristics do fail. And work is underway on this by Stefan Johansson in the Oracle GC team. But if you want to help out and contribute, then do please get in touch with either me or Stefan. So how to contribute to GC development in general and get involved? Well, working on the garbage collection inside the VM in hotspot is quite daunting if you haven't been into virtual machine development. Before it takes a little bit of time to ramp up on all the thousands of lines of code and how they all interact and how it's all working. So you can, if you have this kind of experience or if you want to learn, feel free to get the code from OpenDK. Start hanging out on the hotspot GC Dev mailing lists. And you can definitely help out one of these ideas. But maybe a bit more realistic way to help out is to essentially test your applications with the early access build. We put a lot of improvements into JDK9. So if you can provide us feedback like, oh, you wrote the heuristics this way or the policy changed this way. If you tell us it's working great in my application, well, that's great. Thank you. We love the positive feedback as well. But more commonly, you will come on the mailing list and say, no, it doesn't work. The heuristics failed. And that you also want to know, because then we can change it and make sure it's good to go when we make a general release. You can also help out by helping other users on hotspot.gc.dustuse at OpenDK.job.net. I know there are a lot of experienced GC tuning guys in the community. I know some of you run really large heaps. You have been all down the different pitfalls and dangers before. So you can please help out other newcomers to the community if they ask for performance advice or if they need help with how to run a particular GC, what do they mean, et cetera. And finally, if you want to feel like writing some Java code, you can definitely contribute tests or benchmarks. These are usually written in Java. We have a little tool called JTreg, which helps you write your tests. And if you have a particular small application that exercise maybe particular behavior of the GC, then please consider contributing that as a benchmark if it can show some strange behavior or a corner case or something. And with that, thank you very much. Yes, this is the... So questions, please ask questions if you have. Hello. Hi. My name is Volkan. Thanks so much for the presentation. While the Shenandoah guys and girls were talking about their GC algorithm, they were strictly having a target audience. They mentioned about these are the existing solutions and reason that we are doing such a new approach is because we were particularly targeting such a user case. Do you also really have something in your mind or do you want to be as general as possible? So for G1, we do look at there's a post-time goal, which you can set. And it will try to adjust as good as possible to meet that post-time goal. There are no guarantees. It's not a hard real-time collector, but it will try. But then it has been... We have put it through some pretty big tests. It has been running on one terabyte heaps. And it has been run on some rather small heaps around four megabyte as well. Obviously, there will be trade-offs as with any collector. But we do try to make it as automatic as possible so you can just specify the post-time and it will run with as good as throughput as possible. That might not be true for your application. You might need some tuning or we might need to update our heuristics. But that is the goal. So it tries to be pretty general and try to get the maximum throughput while still keeping the post-time as good as possible. All right, questions? You can feel free to ask about the other collectors as well, if you want to. Or anything else in the OpenJDK. I will ask exactly the same question I wanted to ask the Shenandoah, guys. So the question is, what's the story now with the API for pluggable GCs? So I know there is... I think there is a proposal from Roman and somebody else. Yeah. Yeah, you mean the API in terms of the VM? The API. So right now, GC is pretty hardwired to the rest of the VM. And if you wanted to plug in for GC, you pretty much need to glue it in many parts. And I think the goal was to make a transparent API to make it pluggable. Ken, well, you are from Oracle. You are the author of the JEP. Maybe you guys can comment on what's the story there. Yeah, we already talked about it on the main list as well. So I think Roman made a great start in trying to separate out... And this is all internally in the VM. So if you work on a garbage collection algorithm inside the JVM, in the C++ code, I mean, how can you maintain them? As Christine showed, we have quite a few collectors. It's quite a maintenance burden for all to take care of all lines of code. And so we want to start it maybe tear them apart a bit more so they don't get intermixed. And Roman made a great proposal on a start. And we have a small API today, the collective EP API. But that doesn't capture all of it. So this is rather internal for the VM development. But yeah, I think Roman started great. We have Eric in the team. He's doing his template magic right now. And he's almost there with templating up lots of the API. So it will hopefully become a lot easier to maintain multiple GCs in the JVM. Yeah, it's not much that I can add. So I fight this jab. I'm basically waiting now for JDK 10 to fully open up so that I can start working with. I do have a prototype that covers like half of the stuff that I want to abstract into an API. Yeah, that's what it in JDK 10 comes. I'm studying. Okay, well, I will be around for the rest of the day and also tomorrow. So yes, come up and talk to me if you have any more questions. Thank you.