 Welcome everybody. I talk a little bit about benchmarking and capturing memory metrics. I'm Jens. I'm actually doing e-business e-commerce applications in Java since 1999. And sometimes I like to dig a little bit deeper into things and like to solve things at the right place. And I'm also a performance fanatic and that brings me to the topic of caching. And I write a caching library which is called cached okay. And so this is like my major topic of interest. And because of that I do a lot of benchmarking. So that's the outline of the talk, a little bit of motivation. And how can we gather different memory metrics from the JVM? And how can we not gather it while running a JMH benchmark? And then I present some results and then there is some happiness and maybe confusion. Okay. Yeah, if there is any confusion or some question rise up please straight away. So yeah, while I'm doing this, there are a lot of libraries with caching out there and everybody says like we are high performance. I'm German, I'm always a little bit more accurate and a little bit more detailed about things. So before I say I am high performance, I better check that. I use JMH for benchmarking for a couple of reasons. I'm not going into detail about this. This is actually a whole other topic. When you do JMH benchmarking, throughput benchmarking, the primary result, the primary metric you get is operations per second. And this is quite nice and there are JMH profilers that lets you collect a lot of other metrics. But actually there is nothing that lets you collect the memory usage right now. But when I do benchmarks about a caching library, then how much memory is used is actually quite essential because caching library is actually a thing to tune the time and space trade off. So with caching you can always trade in memory for faster speed. So yeah, what are the different opportunities to get memory information? One thing that people are doing is object graph traversing. There is some libraries from EHCache and something that is called Java agent for memory measurements. So what is happening is that you give those libraries a root object and then they traverse like a depth scan into the object hierarchy via the references. And then there is some magic, sometimes there is some risk unsafe happening to get the size of an object and then this is summed up and then you have a result. This is quite nice because you can just fill your data structures and then ask the library how much memory this is going to cost me. So you can integrate it in continuous integration and get some results out of it. But it only covers a partial set of the memory. And the other problem is how much is actually traversed? Like how do I keep the thing from traversing the whole heap? And of course it's the heap only and it's only like a static result and does not cover what's happening when the program is actually running. The other thing what you can do is heap dump or heap histogram via jmap for example and count all the objects and get the metric how much data they about the memory size. And yeah, it's actually same as before but it's a bit costly especially the heap dump and again only a static result. So what's happening when I run a Java program? So this is the output of visual VM when you click monitor and heap then you see the space allocated by the objects in blue. That's the heap that moves up and down. Down each time the garbage collector does its work and the orange thing is the total memory that is occupied. So how does this relate to my JMH benchmark? JMH is spinning up a JVM running its benchmarks in there and then there are iterations and you can specify how much iterations you want to do for warming up your workload and you can specify how much iterations you want to do for actually measurement. In this case I do two iterations for warming up and three iterations for the measurement. So when I want to know how much heap I use I can take a look and then I get interesting values. It could be down here or up there. So no idea. So one idea you might have is let's force garbage collection after each iteration. So I filled the data structures, and then let's force the garbage collector and then take a look what's left on the heap and we have a consistent metric about the heap space. But when you compare the two runs of the benchmarks one with no force garbage collection and one with the force garbage collection you see that it's actually quite different in the shape so you're actually interfering a lot with what the garbage collector is doing and how would the JVM react naturally if you just let it do its things. So yeah, one short thing. Who is actually working with JMH? Okay, couple of guys. Okay, that's great. So is there any remark to this picture here from people knowing running JMH? Is there anything interesting? So the thing is these three iterations are my measurement iterations and we see that actually the garbage collector here is still expanding. So with iteration one, two, three we are actually not running at a steady pace the garbage collector is not yet at a stable memory size. So what might happen is that our results in the different iterations are quite different between the iterations because the garbage collector is still expanding. The other interesting thing is that we have about two garbage collectors garbage collections happening within the iterations and this might be a problem as well because maybe there are two happening, there are three happening, maybe just one. So it's interfering with our throughput measurement result a lot. So as a general thing, I would say there are two kinds of microbenchmarks here. There are the real microbenchmarks, the micro-microbenchmarks where a garbage collection cycle might happen occasionally but in this case you probably want to get the garbage collector out of the equation of your measurement. You can do this by forcing a garbage collection before your iteration or maybe even use the zero GC that is new now However, when you have a not so microbenchmark and the garbage collection cycle is happening a lot during your iterations then actually you want it to happen a lot that it's not interfering too much with your measurement results and you better know what your garbage collector is doing or the other way around by monitoring what the garbage collector is doing and how much memory your JVM is using you get to know what your garbage collector is doing a lot more. So if you cannot avoid the garbage collection you actually want to make it go steady during the iteration. So what kind of metrics I can extract here? So what I did implement is use the things that the operating system is giving us via proc self-status. There are two things of interest here. This is the resident set size and there is also called the high watermark which is actually the highest level of the resident set size a process had in its run. So these are two nice metrics. You can use garbage collector notifications you actually get a notification from the garbage collector each garbage collection cycle and you get some information about the use memory before the garbage collection and after the garbage collection and I take the maximum of that and add it to the JMH results. And a good also feature is the allocation rate which you should keep looking at so this is actually the rate how many objects are allocated. You get this, this is actually built in in JMH with the garbage connector profiler. And then finally I decided like when we are finished with our iterations then it's safe to do a forced garbage collection and use JMH to get a heap histogram and also use the management extension to get the value of the used heap. So here's the example of the running scheme I use for the upcoming examples results. So JMH has a control process and from that control process it forks the measurement JVMs and you can say how many forks JMH should do and so in each measurement JVM there's a warm-up iteration, there are warm-up iterations and measurement iterations and those forks JVMs run one after the other. So actually I have altogether nine measurement iterations here and so from the primary metric I get nine results after each iteration or the memory metrics I gather with the forced garbage collection I only get three because this runs at the end of the fork. So here are some results but first I need to explain the benchmark that I'm doing. I walk you through. So first here there is for each thread I use a fast Sipfian sequence generator. This is actually a skewed random pattern that yields about 90% hit rate in the cache with this configuration. It's either, yeah. Oh, I see here's a mistake. I actually do it with one million entries and here's the benchmark operation. It's actually a cache get and whenever the entry is not in the cache a loading function is called and this loading function is actually using a JMH feature called black hole consume CPU so there's a heavy penalty when you have a cache miss. So this means there's a lot of things happening on a single operation, the cache get and there's some cache eviction data structures and garbage produced and also some auto boxing. That's the benchmark set up the environment. It's a machine with four physical cores. I limit the core usage. I'm not using hyper threading via CPU hot plugging. The benchmarks runs with four threads. I tested with the Oracle JDK 11 with parallel GC and G1 and for reference there are the versions of the libraries. These are the JMH power meters. I already talked about it. Free measurement iterations free forks gives three measurements, nine measurements iterations all together and 60 seconds iteration time. The graphs show the confidence interval in the error bars and JMH uses a 99.9% confidence level which is pretty tough. So these are the performance resize like the cache to cache implementation comes with seven million operations per second but this is actually not what I want to talk about. Let's look at the memory usage for the parallel garbage collector and here is the heap usage, the different metrics. The first one is via jmap and histogram. The second one is after I run the garbage collector and the third one also and the second one is use heap and the third one is the total used memory so not only the heap but also the non-heap memory and here's something astonishing that the histogram actually reports more used memory than the management extension beans. You also see that the error bars here are like there's a lot of variance in the results here. Let's look at the resident set size so what's the operating system giving us and like so like from the very right this one is the high watermark metric so the highest resident set size then this is the resident set size at the end after the garbage collection and this is the highest amount of committed memory the garbage collector reported after the garbage collector run and this is the total committed memory that the JVM is reporting. We see sometimes there is a lot of variance but actually those values are quite close together also when I go back here we see like those values are also quite close actually together so when we are not like debugging the garbage collector we are fine just looking at one of these values so let's take a look at the G1 garbage collector here the result of the histogram is more consistent with the thing that the M is reporting via the management extensions it's pretty much on the same level and what is interesting is now that the memory the actual used memory by the operating system is differing a lot between the caching implementations and also between what the operating system is reporting as the high watermark level and the resident set size at the end of the run after the forced garbage collection so here we get a lot of different results the reason why we get a lower resident set size is that G1 is giving back aggressively memory to the operating system so the high watermark is different and the other thing is why now there are a lot of different results for G1 is that actually we are doing here a throughput benchmark and G1 is actually not intended for highest throughput but for low pause and this is you see here actually the toll we are going to pay for that and because what is happening is that the allocation rate between the implementations is quite different and the higher the allocation rate here there is a correlation to the used memory so and this is the allocation rate per second oh this is wrong here so this is the allocation rate per second and this is the allocation rate normed by the operation in bytes per operation so let's wrap it up when garbage collection is happening it's good to keep an eye of your memory usage there are various metrics you can put record that have varying degree of accuracy and meaning and when the garbage collection is happening and you are not having like a real micro benchmark maybe then you need to run your benchmarks longer and the more heap you have the longer you need to run your benchmark iterations to give the garbage collector time to swing in you can actually use JMAH also to construct benchmarks to evaluate your memory usage be aware of the different garbage collector implementations and their behaviors or explore the different garbage collector behaviors and the plan and idea is to see and include those metrics in the JMAH code base the code that I did is available on github at the cache2k benchmark project those are the two classes and yeah if you like to have a fast cache then look at cache2k and yeah thanks a lot enjoy life any questions yes can you repeat the question please yeah you mean is there an integration into continuous integration or I'm actually doing it but not in CI because it's a little bit tricky you need to have the hardware exclusively if you want to have accurate results and like even I don't trust like a Jenkins process or anything running on the machine or doing like network traffic and things like that so it's a little bit tricky to do it right but it's a good idea to do it of course