 So till that time, I just want to know that how many of you have ever hear about the garbage collector and I about the G1 that we are talking. Now, the good part of this presentation is that there is no prerequisite. If you have passed your college and you are from the engineering background, you can understand garbage collector. You don't even need to know Java. Okay, good to start. So I am Vaibhav and I am going to talk about the garbage first collector, which is not so a new collector because we are working from 2006 on this. And now we are telling, okay, we are ready. So it took us 10 years to give this presentation. I am from Java platforms teams and I am involved with garbage collector from long time. So first I will tell you a short story about the garbage collectors to those people who don't know. And then I will tell you those greedy customers who are not happy in anything. And then I will explain you how G1 works and then a little bit of technicality of G1 and then very important the logging mechanism and when to use. Yeah, so and it's going to be interactive session because we are like 40, 45 people. So we can discuss things as question whenever it comes in your mind. So garbage collector provides an automated memory management for Java, which was not in CC++. So that is very simple. And it is not about finding the dead object. Whereas it is about finding the live object and objects which are not live are called dead. So never think G1, never think GC in terms of marking the dead object because it don't works for the for the cyclic link list and for a lot of other things. Yeah, yeah, before going to the details of GC, I will show you a very ever been picture of Java heap structure. Okay, why that heap structure came out into the picture is the last point. We have something called weak generational hypothesis. And as I mentioned, this is a hypothesis and hypothesis are those things which which are not theory. It says two things. It says the most object die young and our stats say the almost 80 to 85% object die young. And there are very few references from old object to the new object. Okay. Can you think of example where you have a reference from old object to a new object? Yes. Yeah. So let's go tell me more into technical what you are telling is a map. So a map can be old but you are putting a new item into the map. Okay. So those things qualified for the second second part. So taking care of these two things, we have decided that we should divide the heap into young generation and old generation because most of the object die young. So it is good to keep collecting objects from the young region. And that is what this is. So this is the young region and then we have a old region. Okay. GC runs very frequently in the young region and GC runs not so frequently in the old region. Yeah. And then there is a hidden story of per one generation which we will not discuss because that will bring a lot of new questions. Any questions you can ask any time. Yeah. Okay. And one more thing you need to remember that there is something called parallel connection and concurrent collection. When you say parallel collection means your thread can run in parallel but it will do one thing. Either it will run the application or it will run GC when you are saying concurrent collection like the second line which I have shown. That means your GC threads can run parallel with the application. All good. These are some of the very common definitions which we will use as we move forward. Mark is finding the accessible object. Sweep is cleaning the accessible object. Popping is moving the objects from one place to another place. Okay. Compaction is moving a contiguous space from one place to another place. Okay. Why? We will talk about the compaction and how it costs a lot of things. Okay. It's fine if you not get anything but it's okay. These are the collectors which is available so far to us. Serial collector is an ancient collector where things can done only in one thread. Okay. So the GC thread will run only single threaded. Yeah. You can use minus xx colon plus use serial GC. In Java we use plus to include the things if someone don't know. Parallel correct. Okay. This is stop the world. Stop the world means when your GC will run it will stop the application. So it is a very common terminology. Yep. Parallel collector means again it says stop the world but the GC runs in parallel so it can do the collection work more fast. And by default it is equal to the number of processor you have but you can define this. Yep. CMS collector is the default collector till now before JDK9. In JDK9 G1 is the default. Okay. It is again stop the world and it is concurrent in nature. So in CMS there are a lot of part which where the thread where the GC thread and application thread run in parallel. Yep. You can use con mark and sweep GC. This is the default. You must have seen in your application. And then we have a G1 collector which is again stop the world and but it is more predictable than any of these algorithms. Okay. Things are little technical from here. If you will not understand please stop me at that time. This is how the CMS works. So your application threads are running. And you have to run your GC because you don't have any space now left. So what you will do? You will something called initial marking. Initial marking is finding the live object and it is a stop the world event. That is what I have written. So whatever you do you cannot mark your memory and you can run your application at the same time. So people ask me that. Okay. Why did you stop the world application should run? GC should not stop the application but it is not possible by any algorithm to mark the accessibility of the object and let the object getting created. So both is not possible at the same time. Okay. Then we do a concurrent marking. Initial marking. Okay. How GC works is that you have you have something called root object and you have to find out the accessibility of all the objects from the root object. Okay. And that is what we do in initial marking. Then we keep doing the concurrent marking where whatever the new accessible objects we got we will find out the reachability from there. Then we will do a remarking and finally we will sweep it. Okay. The sweep is again the concurrent. Yeah. So the stop the world event happens very less for this algorithm. This is CMS. Okay. This slide may be little complicated but we will understand as we will go forward. So I will directly jump over G1 because this was a dilemma before the presentation that I should go more into GC or I should explain G1. Both cannot be explained in 45 minutes. If you want to G1 you should use G1 GC which you don't have to pass in JDK9. It's default now. So it's a long time replacement for the CMS and we are telling from JDK9 onwards G1 GC will be the default garbage collector. If you don't like it you can move it to CMS. A typical command line option will look like something like this where you will define the XMX. XMX is the total Java heap space. And you will tell you G1 GC and then you can tell what is your pause time. Okay. That you will tell by the minus max GC pause and that's it. So we recommend people not to pass any other parameter. Okay. Stop writing your long Java commands. Okay. Minus XMX minus XMN minus this minus that minus heap space minus threshold. Just leave that on the Java. Okay. What we are going to do with this command that we will try to make the pause as much as possible under 200 milliseconds. Okay. But we cannot guarantee you. It's not a guarantee. Okay. So that is what I have written. It's a soft goal and G1 will try its best but it cannot guarantee it. If you want guarantee you have to use your real time Java system. And in 9 we are coming with something called minus dead GC that is called deterministic GC. We are trying to be more guaranteeing but still even dead GC cannot guarantee you. This is one command line option which is very important. The G1 triggers when 45% of the heap will be full. The complete heap will be full. This is a default value. This is a default value. You can change it. And it triggers the full GC actually much before. And that's why this algorithm is called garbage first. Okay. So this structure just forget it. Now it is of no use. This is the new structure. So before going into this I will just tell you how the G1 works. What G1 do it that it creates block rather than the complete heap space it will create the blocks of memory. Okay. These are all even blocks which can be one MB or two MB. Initially it used to be one MB. Now we have made it two MB. Okay. And these are young generations and the blue one is the old generation. Yeah. So what it do that according to your time it will try to predict that how much garbage collection need to be done. And then it will stop. Yeah. If you will see the old CMS you cannot stop in between. If you have started scanning your heap you have to end it. Right. You cannot stop in somewhere in between that. Okay. I got the memory so I cannot scan it's a for loop which you have to complete it. Right. Here it's a more predictable because I will say that do GC in this do GC in this do GC in this and I will see my time. Okay. I am done with 200 millisecond. I will tell. Okay. We will not run the GC thread for that. So you can see that how we can reach more predictability. Right. So this is what we tell it will do it when it will run into the young generation. Okay. As I told you it will run in young and old world. So these are the young generation the green in color. Okay. It will move all the things to to the new region to the old region and the survivor region. I will not go too much into the detail of survivor region. It is some region into the young generation. Okay. I think we look like this. You will have a recent copy from the young and you will have a recent copy from the old. Right. And you are done with the young generation. I hope there is any questions. This is little complicated but we will understand more. Okay. So now this will keep running frequently. Right. So wait, let me show you how the GC runs. So I use something called J visual VM and you have even plugin in eclipse but sorry eclipse guys JDK9 is broken. Why 121? I am a JDK developer. People are telling me to be on 141 which is not yet released also. And then this is our old fast and demo which is Java 2D demo. Have you guys ever seen this demo? This is a very interesting demo. See how many things it can do. Yeah. So you can see there is a Java 2D demo. This is Java 2D demo. Someone who cannot read it. So I am telling that load this. Yeah. It has lot of transforms and all those things. This is a perfect demo to run GC actually. And why this is not loading? It should not take this much time. So it seems everything is broken. Not only eclipse. Okay. Let it get load. I leave it in the background. Okay. So I will go to the old, the old gen GC now. And before going to the old gen, this is very important to understand. Okay. This is a complicated data structure which even engineering a student can understand it. And it is called tricolor data structure. But I have not colored anything. That's why I have not given the name. So your first step will be the root scan. Okay. When a GC will get triggered into the old generation, it will scan all the roots. So you know what are the roots generally? Roots are generally like the static blocks which will not die, right? Or something from the old to young generation pointing. Okay. So you scan the roots. Yep. Now you will see which all objects are accessible from the roots, right? You will mark this some color, some black color, right? So these two objects are accessible from the roots, right? What you will do, you will see the further accessibility and you will mark this as black, right? And then finally you will reach till the end. And finally you will color this as red and this as blue. So all the objects which are not pointing from anything, it will be like this, like this and this and this, right? So these are available for garbage collection, yeah? And this is a concurrent algorithm. So can you think of any problem happens here with the concurrency? Something can go wrong when this is happening. Yes, something can go wrong. Okay. I will explain you. When this whole process is going on, it's possible that this pointer will change from this place to this place, right? Because it is concurrent and your application is also running. So there is a very high chance that the pointer will change, right? So what you will do in that case? What actually the G1 do is that it is called snap sort at the beginning. Okay. And it will take a snap sort of this at the beginning and it will work according to that. Even if the concurrency will change the pointers. Okay. So I am not bothered that my pointer will go from here to here. I will still say that this object is live. Okay. And we have a concept. Again, there are a lot of naming studies coming and that's why I think people get confused. There is something called card table in GC. Card table are those things which is according to the generous, weak generational hypothesis number two. If you have any reference from old generation to young generation, you will put that into the card table because you don't want to scan your complete old generation, right? So to find the reference, you have to scan the complete old generation and that is a very bad thing. So you will put into the something called card table that these are those things which is still getting referred by the young generation. Yeah. So same thing we will do here. So this will explain much better. We will kick off the GC when the 45% will be the full. Okay. And you can check this. This can be changed to 45 to something else. Okay. It will stop the world, means it will stop the application. It will find out the roots that is the initial marking and it will give this job to the young generation. It will tell young generation to find out the roots for me, the old generation. That is why it is piggy-packs on young generation. It resumes the application. After doing that work, it resumes the application. It go for the remarking phrase. And then it takes a stab and the reference processing from the J and I and from the other references wherever. Okay. And then it do the marking proceeds and then it, it automatically cleans the whole generation. The one very awesome thing I want to show you that while it traversing the graph, while it will traverse this graph, it will keep the complete block knowing that how many objects are living in that block. So there can be some blocks. There can be some blocks. This block where after the, where at the time of GC, I see, okay, nothing is there. So I can just say that clean this. Okay. This I will see. I will tell two, three references are there. This probably I will say that maybe a hundred references are left. I will store this information somewhere. And when next time I will trigger the GC, I know that I should claim this object and I should not claim this object. Right. Because this has more live objects. Do you, do you got the point? So for every block, I will have the information that how many objects are alive in that region. Okay. And according to that, I will take the decision next time. I can, I can clean this object also. I can clean this object. Which one to decide? Because you are giving me a timeline. You are telling me 20 milliseconds. I have to finish the work. Right. So I will see where the references are less active. I will go and clean that. And this information we gather when you are doing, when we are doing this route scanning work. Not clear? It's okay. It's complicated. Yes. And that's why we say that we can clean up the old region immediately. The complete region itself. If there are very few references left, we can even copy into another region. And I have not written anywhere, but the G1 also do the compaction which CMS don't do. So that is the biggest disadvantage of CMS that it don't do the compaction until you are almost about to reach the, what is that lovely world called, out of memory, right? So till you are not about to reach out of memory, CMS don't do the compaction. And when you do the compaction, it runs the full GC and it moves all the memory to the new location and it says stop the world event and your pause can go even for the two seconds, three seconds, five seconds. I had a talk with the flip card guy. They told that it paused for 30 seconds also. Yeah, this is it. If there is any takeaway as such, I will say that just understand that there is a new algorithm in the market which is good, which is default in JDK9. This we have introduced in JDK6 from the 2006, 7 time frame. And there were a lot of bugs. We have fixed those bugs in 8, 10 years. And it's working quite good now. By the way, where are we here? So for those people who have seen this, this is distorted line. Okay, no, this is two lines. Anyway, this is how it looks like. This is meta space. Just forget what it is. This is old generation. This is young generation. And it has two survivor space. Okay. And you can see the graphs will go up and then GC will run and it will go suddenly down. And a lot of things will happen. Even in the old generation, you can see the graphs moving up. And if you trigger something here, maybe some buttons moving, you can see a lot of activity will go there. And it is written 17 collections. Means it has done the 17 old-gen collection. 15, which not looks like Tome. 35, that is 35. This is 35, which is young generation collection. If you are finding ASU, you have to log with these options. That is called print GC, print GC details. For G1, you will see something like root scanning and survivor space. You can go for the more detailed logging by making the G1 log to the finest. And you can determine the time taking by each of the actions from print GC stamp time stamp. This is one of the most important things which I have seen enterprise using it. Because if GC is taking too much of time, you should just see the time that the events are taking and you should find out that, okay, this time is right or not. Like running too much of full GC is not a good thing. It depends on your application. I cannot put a general comment for that. Going to go for G1, this is again subject to risk. But it is a good replacement for CMS or the parallel old GC, which is again AGC for the old region. So when your GC is going for too long and it is going too frequent, you should probably move to G1. When the rate of object allocation is very high, you should probably think for the G1. Or you are getting the compaction pause. Because the compaction pause is very huge in CMS. So most of our engineers will come on Saturday Sunday and they will just move the XMX. So the compaction pause will not work. And then on one weekend they will close the system so that everything will be fine. But if you are running application which is running from 10 years or 12 years, you probably will see a very high compaction in your algorithm if the application is not written properly. Even I know few people who are running Java and one of the guy called me last time, he is running his application from 1995. And he has not closed his machine. And he has some minor issue for which he has called me. I told, I am happy to know that you are running from 1995. Don't ask me any further question. But we have people who are running the applications continuously from 20 years and 30 years and they are not closing the application. They are not closing the server. But for that you have to write good code which you should do by Jigsaw. This is very important take away. If you are happy with your current GC performance, stay happy. Don't try to go on G1 unnecessarily. If your application is small or a middle size application and CMS is doing all the work because many people have big, big huge servers and CMS is good enough to handle it. Don't go for G1. Yep, I am done. Must be for the time. Any questions? Like one who said any question from Java? Yes, please. It is highly dependable on the application which you run. Unfortunately, I always see a degradation in performance because only those people come to me who have seen a degradation. That is my bad luck. We have done some of the common benchmarking in our house and we are seeing a reduce in the past time by almost two seconds. We are reaching to now 50 milliseconds. Again, it depends upon the application. That is why the people who are in the performance tuning they are so highly paid because they hide so much of details that you cannot do it from your own. What is your second question? And the past time. What you mean by expanding the heap? You are not saying that I will mention the XMX. Yes. It will increase the heap. It will increase the heap. No, it will increase the heap. GC runs at the worst case. That is the whole point of GC. We kind of not enjoy running GC. It will first expand the memory. It will reach to the complete XMX and then it will start running the young GC or the old GC. Any other questions? One simple question you can ask me. Not that I have done. If your application is not performing you can write a mail to me. Thank you.