 Good afternoon everyone, welcome to FosAsia and this is Howie, he used to be at Dropbox and now he works for Fluent and he is going to tell us how to beat your JVM in for submission. Okay hello, so this talk is going to be about taming the Java virtual machine. About myself, previously I was at Dropbox Engineering in San Francisco, currently I'm working on a startup Fluent Technologies, so we build Fluent which is the world's fastest online code browser, so if any of you are working on a project using GitHub and are sick of taking through a GitHub UI because of how slow it is, then you should check out Fluentcode.com and see if this is of interest to you. I've done a lot of work in various programming languages, currently Fluent runs on the JVM like many other systems do and so this talk is going to be about how to understand what's going on inside the Java virtual machine. So when most people learn how to program in Java, who has used Java before? So most of you have used Java to some extent. So you start off writing some code like this, it runs, you define some variables, some operatives, some objects, in the end they look kind of the same and you can do things to them. But this code, although that's what you'd expect, leaves open a lot of questions. For example, there are things like the memory usage of these things that you're defining, nothing is for free and each of these things cost you something. What happens if you run out, there's a garbage collector, what does that mean to you concretely? And here we say we system out a print line S plus 1, 2, 3, but what's actually happening there is not just trivially adding a string and an integer, there's actually a bunch of code underneath that makes it all work. And where does jit compilation come in? So a lot of this is traditionally brushed off when you're learning Java as implementation defined, but in practice, everyone here is probably using the same implementation. And most people use Oracle or OpenJDK and not that many people use alternative JVMs like IBM J9 or one of those. So although this implementation defined, if you're working with implementation, you want to understand what your JVM implementation is doing, what it's doing with regard to memory use, what it's doing with regard to jit compilation, what it's doing when it's actually running the system out a print line. And so in this talk, we'll cover a few things. One is memory usage of things within the JVM. One is garbage collection. And the last is how the compilation really works. So to start off is memory layouts. So memory is having garbage collected memory is great if you have enough of it in that you can just run your code, write whatever you want, and the stuff you don't need just cleared up in the background automatically. And it's terrible when you don't have enough of it because you write your code and things blow up. It says out of memory exception, blah, blah, blah, program cannot run. And you're not sure what, why it's giving you out of memory. It's not so easy to say, Oh, I allocated this much memory here, that much memory there, and I didn't free it. Because when you're writing Java, you don't keep track often. You often do not keep track of where your memory is being allocated and how much. And again, although technically implementation defined, everyone's using the same implementation, so we can dig into that. So over here, let's make this bigger so people can see it. Let's see. Memory grow here. Still not very clear. So over here, I have a small demo of a piece of a piece of Java, a tiny Java program that you can kind of use to see how much memory things use. The key is this run time dot get run time dot total memory and run time dot get run time dot full memory, free memory. So this lets you see how much, how much is the total memory of your entire JVM and how much of it is free and therefore by subtracting them, you can find how much of it is used. And by subtracting that each time that we, by subtracting that each time we call this, they call this method, we can see what's the change in memory usage after we perform some operation. So for example, over here, I'm defining an array of array of individuals with length count. I'm just going to set it to 10 million for now just so it's big and easy to see the effect. And I'm going to fill them with a bunch of two length two integer arrays and then I'll see how much memory is being used. So over here, if I go back to my shell, memory, I'm just going to run this from the command line so I can easily total with the flags. Come on. So I run, I compile it, I run it and in the end it prints out 28. And what's the units of 28? 28 says that the increase in memory divided by the total number of items is 28 bytes. So there are count items. I'm dividing by count. So each of these size two interays takes up 28 bytes of memory. And that's interesting in that we can see how the effect changes as we change the number. So for example, if you make them size zero interays, it takes up 20 bytes. If we make them size one interays, it takes up 28 bytes. And size, for example, three takes up 36 bytes. So here we can see something's interesting because integers are take by what we know should be four bytes large in that negative 2 billion, the positive 2 billion. But here we can see that the size of the array is actually rounded to the nearest 8 bytes. Or on a 64 bit computer like this Mac, it's a so-called normal word. It's like a word. So we can repeat this operation for many other things. You can make it a long array, a double array, array of objects of various size. And what you'd find is, where's my size? Here it is. And what you'd find is for the primitives, they more or less behave as you'd expect. Booleans are surprisingly one byte large. Maybe not surprisingly, but not obviously. They're not a single bit. So if you want to do something that can store many Booleans, you need to explicitly use a bit set rather than just an array of Booleans. Bites are one byte, shorts are two, ints and throats are four, long as in doubles are eight, that's to be expected. If you repeat it with Java box primitive types, that may get a bit more interesting. So Booleans and bytes, Java.lang.b Boolean and Java.lang.by are both four bytes large. That's because each of those is a wrapped object. And by default, the object pointers are four bytes large, even on a 64 bit JVM, on a 64 bit operating system. For the rest of them, for the rest of them, you get four bytes plus the data that's actually being used to store the value. So for example, for shorts, it's four bytes for the pointer to the JVM.lang.short object plus 16 bytes for the actual object because the object has an object header, it has the idea of the class of the object, and then it has the data of what's the short value rounded up to the nearest eight bytes. So it turns out like Java.lang.short, Java.lang.int, or integer actually, even though shorts are small integers, they both take up the same number of bytes once they're boxed, and longs and doubles correspondingly also take up more than that. And if you start looking at arbitrary objects, you'll see that arrays take up four bytes for the pointer to the array followed by 16 bytes, followed by 16 byte header, and then whatever data that the array actually takes. So each array has an, and each object is a four bytes, four pointer plus a 12 byte header plus whatever data it takes rounded to the nearest eight. So this is, this may seem a bit obscure, but this actually matters because if you are looking at, for example, comparing an array of ints to a Java.lang integer, you're looking at a five times increase in memory usage if you do not use a primitive array versus using say a Java.util.list or Java.util.vector. And a five time memory usage is the same, is the difference between a $10 EC Amazon instance and a $50 a month Amazon instance in terms of memory. So concretely, the many things you can do to reduce memory usage, in any application, there are a lot of high level changes you can use to put things on this, put things in database, put things in memcache, whatever. But even within your code, very simple things like using arrays or primitives instead of using box Java.util.collections. And if you need something that's a bit fancy in an array, for example, you need a map which has tons of primitives inside, but it's a map, not just an array, or you need a hash set that will only take integers. There are several libraries that will give you this. For example, one that I'm using right now is a Colovoke collections, which gives you hash sets, hash maps, and all sorts of other things that are so-called unboxed. So they do not take up the extra 16 bytes of object per integer. One other question, I hear that a lot of people would like to store the data out of heap also, not just unboxed. Are these collection types unboxed and out of heap or inside heap? These are unboxed inside heap. So you need to find a different library that gives you off-heap collections. Even just for on-heap collections, you look at the ratio of size, that still is about four, five times difference in memory, which can save a lot of either money if this is EC2 instances, or it could save performance if you're finding yourself paging a disk, or it could save even time because if your thing takes up so much less memory, you don't need to worry about memcache, database, whatever, you just hold it all in one big data structure in your process, and it doesn't take up too much space. So in this case, the top one is a Java detail hash map, and the bottom one is the Kolo Bokeh special unboxed hash map, which only can take ints and ints in this case, but it takes up a lot less space. So after memory usage, the next thing that people often ignore is garbage collection. So garbage collection is easy to take for granted. Everyone says, oh, Java's a garbage collector language. Don't worry about where your objects are being stored. A garbage collector will take care of it, and theory is invisible, but in practice, it can make your application perform very weirdly because it starts running at unopportuned times and it makes your users wait while it's collecting its own garbage. So there are many garbage collectors that you can pick in Java, and that's something that many people don't know and something you don't learn early on, but it actually makes a huge difference. For example, the default garbage collector is called a parallel garbage collector. It is supposedly the fastest garbage collector because it stops the world and can collect everything in parallel and let everything start running again, but as a result, it doesn't, it has to stop the whole world relatively frequently, and that can be a problem because if someone kicks a button and suddenly your garbage collector starts running, that person will be waiting for it to finish garbage collecting before they get a response. So if you check on multi-core system, it's quite usual that any stop the world parallel gc will not scale below beyond certain number of cores. That's a bit misleading. So it's really not parallel. If you have enough... For the correct value of enough, like for two or four cores, it's probably fine. If you have 30, two or 64 cores, then that's a totally different... A lot of cores you will actually seed and at some point it will just be only one thread used or two threads instead of all cores. Yeah, but it still stops it. So it will stop everything else. Yeah, so what's interesting about the stop the world garbage collector is that it is, at least as a simplification, it is a copying garbage collector. So it basically lets your program fill up with garbage and fill up your memory space with garbage and once it's full, it will take the parts of the memory that you actually care about and copy only those parts to the new memory space and just ignore all the garbage. So this has a few interesting characteristics in that how long it takes to copy over the stuff to a new memory space and start running the code using the new memory space does not depend on how much garbage there is because it only copies over the stuff that it is currently being actively used. So for example, the live set of data within your program, the stuff that hasn't been turned to garbage yet. The other interesting characteristic is that how long this garbage collection happens, how long it takes to perform this garbage collection may copy over the live data to a new memory space depends on how big your live data is. So if you have a huge amount of garbage and very little actual data, this process is really fast and if you have not much garbage but a lot of live data that you're keeping in memory, this process becomes very slow and you can see that here. So there are many small garbage collections that happen and these typically don't take up very long but when it's the real stop the world garbage collection, this is a point where all of this was garbage and copied over the valid set of data, the live set of data which is not garbage that can still be used in your program to a new memory space, not shown here and all the garbage disappears. So we can actually see this ourselves. So over here I have a small garbage collection benchmark. It's not a very good garbage collection benchmark but it is a garbage collection benchmark. What it does is we create a bunch of, we basically have an array of random arrays that we use as a live set of data. So we fill up this base array with a bunch of object arrays of varying sizes and exactly what we fill it up is not that important as long as you fill it up, at least as a first approximation. And then over here I have a wire loop that runs forever. I need to do this so the compiler doesn't complain that the stuff below is unreachable and I need to put the stuff below so the compiler doesn't remove all the code as dead code. But inside this infinite wire loop I just create a bunch of garbage. So every iteration of the outer loop I look for garbage rate times and I stuff a bunch of newly created which I will be completely empty object arrays into the base array. And in doing so that makes our base array what was previously in that stock garbage and I just keep doing it so I have a base array and I just keep putting in new stuff and putting in new stuff and putting in new stuff and the old stuff will have to get garbage collected and because that's the job of the garbage collector. So we can tweak the max object size, so we can run this using by passing in the working set size and the garbage rate from the command line. Let's use the default garbage collector. So if I java gc.java and then java setting the max heap to 1 gigabyte with 500,000 items in the working set and 200 items of garbage per millisecond. We can see that there's a big garbage collection that some of the iterations are going to be really slow and there's a big garbage collection and the other iterations are going to be moderately fast either because there are no garbage collection or they're having a small garbage collection. And if you don't want to look at numbers you can hook up a profiler. So for example I like using jprofiler but you can use jvisualvm or use your kit or some other monitoring system. You can attach your jvm, I don't care about that and you can see the memory going up and down which is quite nice. So you can see this small collection, small collection, collects everything and this is a step that takes 500 milliseconds and then it crawls up again, collects everything. Another interesting property of the default garbage collector is that it will use all the memory you give it no matter how much it actually needs. So in this case the working set is about 200 megabytes but I gave it 1 gigabyte of heap when I passed in the command line over here with the xmx 1 gig and so it's going to use 1 gigabyte and the reason that is is because the more heap you give it the less frequently it is to perform these garbage collections. So as I mentioned before how long this step takes in terms of time does not depend on how much garbage there is but how frequently this step occurs depends on how much how big the heap is because if the heap was half as big it would have to perform this garbage collection at this point and at that point and at that point and each time it would still take the same amount of time but it would have to happen much more frequently and similarly if the heap was much larger it could go a lot further before performing the garbage collection but still take the same amount of time and so it because in order to optimize the runtime the Java virtual machine just waits until you fill up the whole memory you can give it performance garbage collection and does that over and over and over and we can see that in that here it's taking about 400 milliseconds per garbage collection once in a while and if I kill this and I give it and I give it like 500 megabytes of heat yes let's let that run you see now it's take it should be taking let's zoom in a bit now I can see it's more or less taking 400 milliseconds every time so and if I attach a profiler to see what's happening you can see that it's it's approaching the limit and the garbage collecting each time it approaches the limit but it still has the same it still has the same pause time of 400 milliseconds conversely if I give it the full 1 gigabyte of heat but I reduce the working set from 500,000 items to 300,000 items we should see the pause times go down so let's so if you attach a profiler you can see now it's still once in a while it still needs to perform a big garbage collection but a big garbage collection takes about 250 300 milliseconds and conversely if I increase the size of the working set the pause times will go up so this is kind of a unintuitive behavior most people imagine if you create less garbage your garbage collector will pause for short amounts of time which is actually not accurate so you create less garbage your garbage collector will pause less often but each pause will still be just as long so if you have a really big heap and see if you have a really big heap even if you create small amounts of garbage maybe it'll only pause once every two or three minutes but after two or three minutes it's gonna have a half second one second pause and people your users may notice so here I gave it a 700,000 item working set and now it's pausing for 500 almost 600 milliseconds each time and you see it scales more or less linearly so if we go back to the slides I've dotted this out earlier so you can see without having to wait so long this is on the left side you have the number of items in the working set each item is array from zero to 200 units long full of null so they don't actually have much complexity in the heap it's just a bunch of arrays it's just a bunch of arrays just sitting there doing nothing and you have the garbage rate like how much garbage am I creating every iteration of the loop and you'll see that it doesn't really matter how much garbage you create your max pause time like only depend on how big your live set is so it is a live set that determines how much how long a garbage collector pauses it's how much data structures you're actually using that matters and not how much garbage you're creating so for a parallel garbage collector what things remember pause time is proportional to the working set or like what live data structures you have in your program not your garbage and at least on my computer and I think on the on the VMs I measured typically is about one millisecond per megabyte of working set or yeah so maybe plus minus order of two or factor of two or three but if you have a one if you have one gigabyte of live memory in your heap you are going to be waiting one second each garbage collection probably and one second is something that's actually quite noticeable like if you're using IntelliJ with large project and the beach wall starts spinning that is when it's garbage collecting it's large IDE heap that depends whether I use heap objects yeah and so it is some some things that do not contain pointers because if they are not search for pointers they don't have a problem yeah so if you need gigabyte of live context yeah which is IntelliJ uses the gigabyte sometimes and that's all life and eclipse does so next thing frequency is proportional to garbage load inversely proportional to total memory so more memory will not make your garbage collection shorter creating less garbage will not make your garbage collection shorter either they will make them that happen less often but when they happen they'll be the same length so if the length is a problem more memory or less garbage will not help and it will use up as much heap as it can which is something that's somewhat unusual like most most times you imagine write a program it'll allocate memory dynamically but only allocate what it needs but in the case of the parallel garbage factor because of these two properties the more memory you give it the more memory uses the faster it runs because it runs garbage collection less often so if you if you give it one gigabyte it'll use one gigabyte even if it only needs 20 50 megabytes of heat it'll use the whole thing just to be faster if you don't say what you're giving it by default it uses one quarter of the memory on your system so if you don't use if you don't pass in the x mx whatever gigabytes on my computer by default use four meg four gigabytes of RAM even if it's some tiny program it will still use up the four gigabytes of RAM just to optimize a garbage collection so this is so java is a reputation of having a slow garbage collector and in many ways it's fast but it results in long pauses at least by default so the reputation is not entirely unjustified if you just start running java programs without any tweaking you will result in you will find yourself with relatively large relatively large garbage collections that do take up noticeable amounts of time but java has multiple garbage collectors and sopping the others is actually something that's very easy and it's worth trying if these long pauses are problematic so well okay this slide illustrates that less garbage doesn't affect false times you can create more and more garbage the graph looks like that to that to that to that but in the end each garbage collection still takes up the same amount of time it just happens more often so the next garbage collector that we'll touch on is a concurrent mark and sweep garbage collector so the memory profile of this looks very different first thing you may notice you can't really see it but this only goes up to 500 megabytes and this hangs around 300 megabytes so the concurrent mark and sweep garbage collector works differently from parallel rather than taking the live objects and copying them to a new space and just ignoring the rest it goes and marks the various of as a first approximation simplified it marks all the objects within your heap as live or not live and collects the not live ones collects the dead ones and allows you to allocate new objects in the space of the dead ones or previously allocated in so one property of this is um the garbage collector now can depend on the how long garbage collection takes can depend on how big your how much garbage you have in your heap not just on your live set and having waiting till you have more garbage does not speed up that process so the or the parallel garbage collector more garbage it all takes the same amount of time to to collect some ice away and fill you up as much as possible for the concurrent mark and sweep more garbage will take longer to collect so it just collects as it goes along because there's nothing to optimize so um so this one does not take up as much heap as possible this one kind of has a buffer but if you give it one gigabyte and it only needs a few hundred meg it will stick with a few hundred meg until the needs grow um in terms of the in terms of the garbage collection pause times you can see that for a lot for a lot of the smaller live sets the pause times are much less in that the concurrent mark and sweep is concurrent and it doesn't is concurrent and doesn't wait until the garbage builds up to collect all and so as long as you're not generating more garbage than is then the thing can collect in the background it generally will not pause for very long it spins on all of like few hundred milliseconds few tens of milliseconds per pause which actually is reasonable like if you're making a if you're making a game you'll notice the framerate's suffering a bit but you won't have a whole thing freeze for half a second while it collects its trash um the other thing we notice is that once you start having a bigger more garbage collection pressure like a bigger heap the amount of garbage you collect you create actually does matter so if you look at the case where we have 800 000 items in our live set then it's actually matters whether we are creating a small amount of garbage or a medium amount of garbage or a large amount of garbage in that because the concurrent mark and sweep garbage collector collects concurrently as the name suggests um if you're creating less garbage and it can collect on its parallel on its background thread it's fine it doesn't take up much time but if you're creating more garbage and it can collect on its background thread at some point you're going to run out of memory because it just can't collect fast enough and then it does and then that's a case where it'll stop the world stop you from running stop you from creating more garbage and collect everything back to base the baseline and that's why you see over here it starts when you create more garbage it starts taking more time and we can see this when you run our benchmarks so if we go to the code um to run the mark and sweep garbage collector you there's a flag there's a flag dash x let's see dash xx use conchmark sweep gc so let's go back to the original 500 items let's go back j profiler it's attached to it and let's see what looks like yeah so you can see that the memory is relatively stable at few hundred megabytes it keeps creates it keeps a buffer more than it needs but doesn't grow unbounded like the parallel one does and the garbage collections are all relatively small and if you look at the timings like on all the tens of milliseconds relatively fast um and if we increase the garbage collection load to this case with 800 000 items in the live set and and 64 6400 items garbage items of garbage per iteration um 6400 items let's connect to there and see what looks like so now we can see that the garbage collection has started taking taking longer than previously did um and it's also started taking started taking up more memory and this is after remaining flat basically flat for all the earlier times and that's because it's starting to actually have to stop the main process in order to garbage collect and once you get into sorry once you get into large heap sizes it has to stop the main process very very often and everything takes forever and that's unavoidable so the conchmark and sweep garbage collector is not the default but it's alternative to a parallel gc you can turn it on with a simple flag um is this throughput in parallel gc so if you have like a huge batch job you don't care about pauses you just want to finish as fast as possible the parallel gc does the default garbage collector runs faster um but the pause times are much less so if you're running something that you don't you don't want to pause for half a second or one second when it's a big heap um this actually is quite useful and it also doesn't it does not grow the heap unnecessarily which is useful if you're running some kind of like running some kind of command line application in java and you don't know how much memory it will take because it depends on what the user does with it like whether it was a small file or a huge file though not much memory will take um if you use a parallel gc you'll always take up all the memory whereas if you use a conchmark and sweep it'll use up more or less a bit more memory than it needs but a proportional amount yeah and the maximum pause times are generally relatively low so the last um garbage collector we'll look at is a g1 garbage first garbage collector so this is a relatively new gc relatively new means like less than 20 years old um and it's was it's probably going to be the default in the next version of java java 9 um and it's basically a somewhat like of a hybrid between the concurrent mark and sweep and the parallel garbage collector um in terms of performance characteristics it is kind of similar to both it performs faster than it performs faster than marking super small amounts and performs all the same it's a parallel one for large large large heaps um and there's not that much to say about it except that it's generally a pretty nice it's generally a pretty nice garbage collector it is it has a more or less a nice parts of both the concurrent mark and sweep and the parallel um so for example relatively small program having it pause for 30 seconds garbage collecting is much more reasonable than having it pause than having the having it pause for 300 milliseconds in the case of the parallel garbage collector um so you just actually compare what it's like in the so so as i mentioned the parallel gets really fast if you're working set is really small um even if you're creating lots of garbage the concurrent mark and sweep takes about same amount of time even if your working set is small it still takes the same amount of time when it gets large it takes the same amount of time and the g1 gets you really fast it's really fast and they're working set small and also it doesn't scale up so profusely as your working set gets larger so while the parallel gc may take a third of a second to pause your program to garbage collect in this case the the new g1 gc takes 40 milliseconds which is much less um okay so that's the high level of what these garbage collectors are like another interesting point of note is that most garbage collectors nowadays are generational so the the generational hypothesis as they call it says that most objects in your program are either very long-lived or very short-lived so for example the things you allocate within the method tend to get thrown away when before the method returns whereas things they allocate in like big global data structures tend to live forever and there's no point checking them where they need to be collected um and whether or not that's true it's people think it's true and garbage collectors will optimize for those cases to make them really fast so if your code has either really long-lived or really short-lived objects and no medium-lived objects the garbage collection gets much happier um so for example if i go back to the garbage collection benchmark we had just now and i'm going to run this with the default garbage collector so no xx flags but instead of instead of writing my new objects all over the old the array of live objects i'm going to write my new objects only the first 200 elements of that array so for example if i 500 000 elements in my in my set of random objects the first 200 will be written and over written over written over and over and get thrown away quickly while the next almost 500 000 will just sit there and live forever so this is this will be a profile that matches the so-called generational hypothesis where things live for really short or live really long um and if we run this code with the same 500 000 200 that we will originally see we're using let's attach a profile to it so you can see what's happening profile you can see that the garbage collections take really short now so this is same number of objects in the live set same number of objects created for iteration of our loop so still 200 objects iteration 500 000 objects in our live set but because our workload is generational and most of the objects in the live set live forever and the few get thrown away get thrown away really quickly um the garbage collector is much much happier and in this case you see it takes almost no time um despite having the same amount of garbage and same live set um trying to connect wi-fi down okay whatever so concretely if you want to learn more about this um the garbage collector at least the default one has multiple spaces and the garbage collects the first one the most the second one less and the third one hardly at all and it is also in order of increasing time it takes to perform these garbage collections so if you allocate the object and it gets garbage collected the next garbage collect it gets collected in the next garbage collection then it's gone if it survives it gets moved from this space in survivor space and if it survives again it gets moved into tenured and if all are new if all the garbage dies in Eden your garbage collections will get much shorter and much faster and you can read you can read more if you want there's a lot of data material about this fact online so i've already showed a demo yeah so if it's generational and if it's non-generational it has a really different memory profile um okay so takeaways the default garbage collector will take up all your memory give it and it will result in pauses if you have a large working set and doesn't matter how much memory give it a whole little garbage you've cut you create but you can try as if you want to you can swap out the garbage collector and have a pretty a much short generally have much shorter pauses and have much better not taking up all my memory behavior um so now we're done with garbage collection the next thing i'll cover is compilation so we we all know probably at java compiles to bytecode and that jvm will just in time compile it to assembly at runtime but what does that actually mean um so if you look at a method like this for example this is from our memory our garbage collection demo earlier this is a init method static void calls to calls a few methods does a subtraction assigns it to some variable um this collects this gets compiled into bytecode it looks like this where you invoke you invoke a static method you invoke total memory you invoke get runtime again you invoke free memory the java bytecode runs on a on a stack so after you invoke total memory the load the total memory is just the last item on the stack and then the free memory just put on top of it i mean you call sub it just subtracts the last two the top two items on the stack and puts in the result in its place so that's why you don't see any variable assignments in the bytecode um these two these things just put the variables on this stack and this just subtracts it and this just takes it off and puts it in the field on the heat and it returns from the function so we can see this um so if i kill my garbage collection process and run java p java p-c gc so java p prints out but prints out our metadata about the java class files and if you put in dash c well let's render without dash c first so it says like i have this class gc with these methods and if i put in dash c you get all the bytecode of what get is being executed when you run this method when you run this class um so for example if i go to my memory if i dash see my memory class um you can see that over here over here is the init method with the bytecode compiled inside um so why is this useful why should you care about bytecode when you are writing when you are writing java so this is useful because you can see what is actually what is so called actually being run when you run your code for example if let's say input is an integer in this case for simplicity i didn't put the type declaration but these two snippets of code in theory do the same thing and if i look at my example look at my example here i have a random integer one is added one is valued off i print out both of them if i java see stringify.java and java stringify you'll see that they both print out the same thing which is the stringified version of the integer not that interesting i suppose but if i java p with dash c so i can see the bytecode you can see that they actually behave pretty differently so in the bottom case it's a single method call to invoke static so it calls invoke static of the method java lang string value of which checks the integer returns a string and then that's it in the in the code plus input case it gets compiled into this which is the whole um creates a string builder initially like calls init on the string builder it calls append to append the empty quotes to the string builder it then it then calls append append integer to the string builder and then it calls two string to give you your string so so this is why it's sometimes useful to know understand what the bytecode is like because although these two these two snippets of code look about the same and behave about the same in most cases maybe sometimes you're wondering why is there garbage being collected when i create so much garbage being created when i do this or you're wondering why it's a performance of this slower than that and when you look into a bytecode you'll see that you'll see that they actually have pretty different implementations under the cover in the bytecode um similarly if you look at switches and in switch looks like that in the bytecode so there's a lookup switch instruction that says if zero jump to 40 if it's one jump to 48 and otherwise jump to 53 so it comes here it jumps either here um here or here and then that's very fast a string switch on the other hand looks very different from an integer switch even though in the code it looks about the same and semantically behaves the same so a string switch you can see whether the string is zero the string is fun but unlike the integer switch there's no there's no simple like jump on string bytecode and so what happens is first it calls let's go to the next slide where it's bigger first it calls hash code on the string then it does a switch on the hash code of the string and then so i goes to 104 which is here or 119 which is down here and after it jumps there it then loads the string and compares calls dot equals on the two strings and if it's equal it then will go and do something and similarly on the other case if it's equal it'll go and do something um so this is something that you can't really see in the code but it comes obvious when you look at the bytecode that string switches are basically just if l statements to the hash hash code in front and have the same property as if l statements to the hash code in front in that they take linear time depending how long they are instead of constant time to jump to the correct offset okay so why read bytecode it helps you understand what your java code compiles to if you care about memory usage compare about you care about performance characteristics if you really need to speed that thing up and you're not sure how this helps if you are debugging frameworks and muck with bytecode if you're using aspect j or java system one of those many other frameworks that will add magic things your bytecode to turn your method asynchronous or turn your turn your bean into a database accessor object then knowing what the bytecode is like helps you when things go wrong and if you're working with non-java jvm languages scala closure root groovy whatever j ruby the lowest comedy nominator when all of these interrupts with java is the java bytecode none of these languages generate scala uh java source code they all generate java bytecode and that's how they actually talk to your program okay the last thing i'm going to touch on is looking at the assembly so we've talked about compilation from java to bytecode there's also the jit compilation from bytecode to assembly which often you don't look at but you can and it's of not not it's not as much of a black box as people tend to think so this takes a bit of setup so there's this you can need to at least on osx need to install some dll or shared object file in the correct case for it to work um but once you've done so you can see things like this um this java source code ends up corresponding to this assembly instructions these three assembly instructions with a move a decrement and a jump not equal to go back to here and it's not done and looping quickly in that in in that order um so when i said bytecode is what's actually being run that's not quite true it's what's being run one level down but below the bytecode yeah what's actually actually being run which is the assembly and this is a thing that's fed into your intel or amd cpu and then gets executed so if we look at to do that can take a look at this demo so here is um a small class that does total equals one it at it multiplies it keeps multiplying total with the current iterator sorry current loop variable and it prints out total at the end um so how do i do this um java c jit dot java and java jit so this prints out zero because it's i starts from zero get multiplied by zero totally goes total times i see i see let me see what's happening here i equals one try along one second let me see why this is misbehaving totally goes total times i plus hundred iterations okay so it works it just caps out and ends up being zero at some point so so we run this and maybe we think this is what we expected to but if we want to want to look at the assembly we can um so let me go and pull up the assembly instructions no internet okay yep i have noticed so uh let's see what i have in my command line history come on okay well i guess i have to pull it up on my phone what five four minutes left okay um almost three print assembly if not i guess i'll just skip this okay so i i guess i probably won't have time to cut cut to show in demo in detail um but one one interesting example of looking through the assembly is if you have polymorphic code like this so for example if i interface hello and hello one hello two and i have a loop that does this then one question that we may ask is how does it matter how many how many subclasses of hello are in there um if you look at the bytecode so if you look at the if you measure it before you look at the bytecode because that's often easier you measure it you'll see that yes it does matter how many subclasses of of hello there are in this array if there's one subclass it takes one point six seconds with two subclass it takes two point two seconds for how many how many iterations and for three or more subclass it takes about four point five seconds um so you won't me wonder why is that the case um and if we look at the bytecode we'll see that no matter how many subclasses there are um this hello dot get is the same invoke interface call so the bytecode doesn't change depending on how many subclasses there are so that can't be the problem um but what actually when you look at the assembly that is being generated from this bytecode you can see that when there are two subclasses um this invoke interface of get method when i call the get method on that i don't know what this is on this hello interface um it ends up becoming it ends up being in line in that it checks whether it is a hello whether it is a hello one and jumps to somewhere to run some code or whether it's a hello two and jumps to somewhere else to run some other code so this will have this assembly gets generated when there are two subclasses when there are when there are three subclasses there i don't know where their site went so when there are three subclasses this does not this does not get in line and all you end up seeing is you end up seeing a invoke assembly call queue instruction so let's see if i can pull that up um come on okay java see a jet polymorphism java polymorphism not java and java see dash xx colon plus unlock diagnostic vm options dash xx colon plus print assembly polymorphism invalid flag unlock diagnostic vm options okay that's not it the diagnose i can't spell dash xx unlock diagnostic colon plus diagnostic vm options there we go uh i see the problem this is a java call not java see okay so here we can see all the assembly being scattered out i'm going to start out to a file um output dot txt and now if we look at output dot txt where is my output dot txt here it is i'm to look for line 49 it generates thousands of lines of assembly so there's not something you want to do too often um you'll see that in the case of having three if you have three hello hello one hello two hello three in the arrays you have you end up with you end up left with a call a single call queue instruction which is a virtual method call in the intel x86 assembly let me zoom in just so i can see it you end up with that so so that's the difference between having having two subclasses where you end up where in lines a code for you and having three subclasses or more where it doesn't inline the code and it's up to the intel processor to do the do the dispatch and that's why um and that's why you end up with this behavior where it's much faster if you have less than three subclasses in your polymorphism okay so compiler takeaways um so dumb code runs faster even if it runs even if it looks the same in java or it compiles the same bytecode if your code looks more like c it runs faster than every code does not look more like c even though you're writing java so if you're right if you're switching on integers it's faster switching on strings if you don't use much polymorphism it's faster than using lots of polymorphism even if it's the exact same bytecodes that are being run um if you want if you want to eke out the last bit then sure so that's this presentation taming the jvm we've covered memory layouts garbage packs and compilation um all this is stuff that you don't normally think about when you're writing java code at java level but actually can make quite a big impact when you're actually running your program that if your program is stopping for half a millisecond half a second every five seconds these are the things that you want to tweak in order to make it behave better um so that's all the presentation and to plug my product again if you go to fluentcode.com if you want a nice online code browser or you can't go now because the internet's down but yeah okay that's cool okay sure okay no questions yeah you can ask me out i'll be around