 Our next speaker is Ladislav Tone from Quality Engineering team. Please welcome Ladislav. Should this be better? Yes. Okay, so hello. I'm Ladislav Tone. I'm a senior inquisitor at Red Hat working in the middleware quality engineering team. Today I wanted to talk about value types in Java. The subtitle is Why Reference Locality Matters. As it turns out, I have more slides on the second topic than on the first one. So things happen. Maybe it's not that bad because it will be more useful for you. So, value types. If you search the web and go through the history, you will find that in 1997, James Gosling wrote the document. It's called something like the evolution of numerical computing in Java. Because apparently, even in 1997, where Java was fairly young, some people started to be interested in numerical computing in Java. And one of the things he wrote there was that one of the barriers to doing the natural implementation of, for example, complex numbers as classes is that class objects are less efficient than primitive types like double and int. And the entire document goes through a lot of pains that scientific computing people were having with Java and proposes some solutions. And they, as far as I know, none of the solutions happened. So that is probably going to change with Valhalla, which is what I'm going to speak about in a short while. But first, class objects are less efficient than primitive types. Why? Why is that so? We've got a couple of people, I expect that most of you are familiar with Java. So why do you think that classes or objects are slower than primitives? Right. Actually, that's a short version of the correct answer. I'll go through the long version of that answer. Okay, thanks. So the question was the answer. The answer was that value types can be allocated on stack. Objects have to be allocated on heap. And that has a lot of consequences that I'll try to show. So first of all, if I say something slower, I should probably back it up with numbers. This is the naive implementation of complex number as a Java class, in constructor and whatever, but you get the idea. And I implemented a short benchmark with a couple of complex number representations. First of all, how many of you are familiar with Jmh or the Java micro benchmark harness? Okay, so a short course to Jmh, which is probably going to be the most relevant or practical thing that you will get today. Right. So let me open this one. So actually, let me open this one first. So Jmh is first. If you go to the Jmh homepage, you will get two maven archetypes that set up a whole lot of infrastructure, builds a fed jar with Jmh and blah, blah, blah. And because I'm lazy, I went to the absolute essentials. So what do you have to do to start working with Jmh? You include the Jmh core. Library, which is an API that you use. And you include an annotation processor, which is a compile time thing that generates that from the benchmarks you write generates code that wraps your code with a bunch of things that needs to be set up. So why Jmh? Why don't I write the benchmark by hand? It's because Jmh does a whole lot of things for you. There's a whole lot of ways how we can get a benchmark or microbenchmark, especially wrong. Jmh is not, of course, a cure for HIV. It gets you a lot of tools that you can use, but you have to use them correctly. So if you find sometimes reasonable to write a microbenchmark, go through the Jmh examples. It's probably the only documentation that Jmh ever has. They are quite good, well-commented, definitely recommended. So how does a benchmark in Jmh look like? It's a class. Here's a bunch of annotations that all have a special purpose. So quickly run through them. State annotates a class that contains a state of the benchmark. If the benchmark needs some state, it belongs to a class that it annotated with state. It's got a scope which can be thread or the entire benchmark. It doesn't really matter because I'm not writing a multithreaded benchmark here. If you do, go and understand what the state thing does. Second warm-up. Runs five iterations. Each iteration takes one second of a warm-up. This is before the benchmark will be measured. Then there's measurement. Again, five iterations. Each one of them will take two seconds. This is the measured code. The part when performance will be measured. The first one means that there will be only one JVM running this code. Benchmark mode average time means we will report an average time of the benchmark operation and output time unit nanoseconds. So the benchmark is parametrized by size of an array. You can do that by the paramanitation. The setup method which prepares the state of the benchmark here generates a bunch of numbers. Actually, I should have started with this one. It's the same but will progress better. So the setup method will generate a bunch of complex numbers. The benchmark method just goes through the entire list of those complex numbers, creating a sum, total sum which is captured in the variables RE and IM. And why I do return RE plus IM? That's not a complex number. That's a way to tell the compiler to actually force the compiler not to get rid of this method completely. If I did JMH, when I return a value from the benchmark method, JMH makes sure that this value is not optimized out. If I didn't do that, JVM could be so smart to optimize this method down to nothing. So this is a simple way to make sure that both variables RE and IM are used in the benchmark so that JVM doesn't optimize them out. And JMH is designed to be run as a command line tool. So in this case, I create a main method with an options builder. This is something I didn't want to show just yet. And this very simple example just says include this benchmark and go. And if I run that, which I can, just to make sure that you see something. It's building and it's running a benchmark. You see that these are, let me, let me make it a bit bigger. Okay, so this was the first iteration of the benchmark with a parameter of the size of 10. Here you can see that it's running a second iteration of a benchmark with a size of 100. And let me kill that because it's of no interest right now. So I wrote a bunch of benchmarks. First of all, I created a list of complex numbers, an array list. Everyone is familiar with that. Second, instead, I use an array. And for something even better, I used an array of doubles where you first get the real part and imaginary part so that if you have 10 complex numbers, this array has 20 elements. The first two elements represent the first complex number, et cetera. And all these some, okay, any question? There was a bar in the first cycle too. Oh, it definitely should. Wow, this sucks. Let me show you this class. I'm fairly sure I did make this right, the right one. Okay, thank you. So this is very stupid, but thanks. Okay, so what I did, okay, thanks. So I ran this benchmark with 10 complex numbers, 100, 1,000, 10,000. It already shows that there's a huge difference, but it goes like this. A thousand, 100,000, a million, 10 million. You see that the array of complex and the array list of complex behave fairly, behave almost the same. There's a small difference. But the array of doubles, that's a huge difference. Why? So this is how the data structures in memory look like. With the case of array list of complex, the complex number is a reference to a list. That list is a reference to an array. Each element of an array has a reference to an object. So being Java what it is, everything allocated on heap, basically in a random way, at least after a few iterations of the GC. This means that you have to chase pointers through the entire memory. With the complex array, it looks slightly better because there's one reference less. But with the array of doubles, it's way, way better. Because it's all in a single array that's laid out consecutively in memory. So with C, I would have done an array of structs, represent a complex number with a struct of two doubles. So what's the problem? What's the difference in performance? Where is it coming from? And I claim that one of the reasons is locality of reference. I'm going to explain what that is. Of course, there's an entire possibility that there's a flaw in my benchmarks because I'm no benchmark expert. So the benchmarks are online, are on my GitHub, so feel free to kill them. It's entirely possible that they are bad, that they show really bad stuff. But I'm really confident that they are not that bad. So reference locality. If you look up the Wikipedia page, you will read that there's temporal locality and spatial locality and other kinds of reference locality. It boils down to a simple principle. If you access a memory location, it's fairly probable that you will, in near future, need either the same location or nearby locations. And CPUs and computers, how they are currently created, they expect that programs will exhibit locality of reference. If not, your performance goes down through the floor. So what it boils down to directly, the principle of locality, is caching and prefetching. And there's a nice article explaining a lot of cache effects using C-sharp, which is fairly similar to Java. So there's a nice way to show how caches work. It's this simple benchmark, which creates an array of integer numbers, meaning each element of that array has four bytes. And I go with one element, two, four, eight, 16 elements, up to something like 16 million elements, which means that the maximum size of the array is something like 256 megabytes. I'm fairly sure that this doesn't fit into my cache. So what this benchmark does is it rates, it performs 64 million of actions, where each action is incrementing a number in that array. And I go like, first increment the first one, then the 16th one, then the 32th one, 32nd, et cetera, et cetera. So this thing is like to avoid modular, because it has a high cost that would shadow all the effects of my benchmark. So how does it look like? One would expect that it's always doing the same amount of actions. All the benchmarks, no matter how big the array is, all the benchmarks does 64 million of increments of some elements of the array. So how, why does it, why the hell does it look this weird? There's one, two, three, four, five segments where the benchmarks behaves maybe unexpected. So let's go through that. What does this thing mean? I mean, surely it looks like the bigger the array is, the slower the benchmark runs. So why the hell does this, where you have only 4, 8, 16, 32 bytes and 64 bytes, why is this slower than this one? The size of the array. It's like if you divide this by four, you get the number of elements in the array. So it's the size of the array in memory. Okay? So this one doesn't really belong to this talk, so I'm not going to go deep into that, but is there someone who knows why is this slower than this one? Okay. I have, as a speaker, I have these coffee things. I will give you for free if you explain why is this slower than this one. And there's this nice scarf you can get. So this is a good opportunity for you. If you can explain what the hell this means. The code. As I said, it's not the subject of this talk. So let me explain very quickly that this part, the first one, doesn't really do what the benchmark is supposed to do. Oh, it is. Yes. The array is so small that all iterations of the loop will write to the same element of the array, which means that there will be a data dependency between all iterations of the loop. So the first iteration of the loop writes to memory, and modern CPUs are good with storing that memory write into a store buffer and going on. And it didn't reach memory yet. So the second iteration has to wait for the first iteration write to finish. So this is the reason why it's slower than the other ones. Okay. This is where it starts to get relevant to this talk. So why is this fast at this slower? If you look at the numbers at the X axis, you will find that it's from, like, 64 or 128 bytes to something like 32 kilobytes. Clearly this is, oh, fuck! So this clearly fits into my L1 cache. Which is clearly like something like two or four megabytes. This fits into my third-level cache, and this goes to main memory. There's a nice program called LSTopo that will print your topology of the caches. So here it shows that my L1 is indeed 32 kilobytes. L2 is 256 kilobytes. L3 is four megabytes. And then it goes directly to main memory. So I went to the next level. And instead of the first, instead of this, where I went through the array in a predictable fashion, in this benchmark I go through the array in a completely random fashion. It looks like this. This is, of course, the difference of a prefetcher. The first one was completely prefetched. So here, even if I went from eight megabytes to 256 megabytes, it was prefetched and the prefetcher was still ahead of me. So it still looks nice. Here in a random way it goes down. It goes really bad once I get out of the cache. Here's a nice program called Perf. There was a talk on Perf yesterday. I didn't see it, but I expect that it's something like this. So I did a benchmark when I added a Perf profiler because JMH allows that. Let me just very quickly show that what I'm doing here is Add Profiler, Linux Perf norm, which means that. And I'm interested in L1, the cache misses and cache misses. What these graphs shows is cache misses in the L1 cache, normalized to a single benchmark operations. So there's a number of cache misses in the array of doubles case. There's a number of L1 cache misses in the array and array list. Again, it goes really bad very quickly with the array of complex and array list of complex. And there's another problem. This is Java object layout showing the layout of the complex class, which means there shows that there are two double variables, both taking 16 bytes and there's an object header which takes another 16 bytes. So it's like the half of the object is useful and the half of the object is useless for our purposes. With the array of doubles, there's an overhead of 16 bytes, again an object header for the entire array. So with array of doubles, I have half of the size of the respective complex array list or array of complex objects. So this is the solution that James Gosling proposed in 1998. We've got project Valhalla that might not realize in Java like 10 or 11, no one really knows. And first of all, before I get to Valhalla, Java 8 introduced value-based classes, which are final and immutable and generally like value, which I'm not going into that, which the important thing that you have to ignore identity. So you can't compare them by reference, you can't take an identity hash, you can't synchronize on them, or actually you can, but you shouldn't. And there's a possibility that these classes, like javayutl optional or local data and others, will be converted to value types sometime later. So there are two JEPs or Java enhancement proposals in Java, the value objects for adding efficient by-value computation with non-primitive types, and another generic over-primitives that will provide a specialization of generic classes to work over primitive types, including value types. This is how you get, how you build OpenJDK8. This is one, this is a really interesting one. This is how you build Valhalla, the prototype. So, and the prototype by no means has a final syntax, so don't comment on that. So the idea of value types is codes like class, but it works like an int. And there's a nice article or actually a summarization of the efforts by John Rose on this URL. And currently it looks like this. You create a modifier that the class will be by value, it has to be final, it has to have final variables, all the fields have to be final, and instead of new, you call make complex. And this is a value type. So it's supposed to reside on stack and will not have the object header overhead and all that. Then we have generic specialization, which looks something like this. It means that generics will stay the same as they are, but for objects, for references. And there will be added some kind of, and then when you try to instantiate the generic type for a primitive type, you will get a specialization. JVM will at runtime dynamically create another class specialized for that primitive type. So if I create a box of int, it will dynamically create a class specialized for the int. I have a nice demo for that, but sorry, out of time I took a lot of times explaining the interesting bits. You also have a list of int, finally. So this prototype would, it gives you lists and array lists and other collections of primitive types, and in future also of the value types. Here's a reminder that the performance of the prototypes is going to be pathologically awful for quite a while, and this is a couple of days old, so sorry it didn't benchmark that. This is, I'm going to skip, and here's a couple of links. The first one is Brian Gutz speaking about value types on the last JVM languages summit. So sorry for taking too much time in the first part. I wanted to do a bit more demos of the later part, but it didn't happen, sorry. Took a bit, ran way too fast, faster than I expected. So, things happen. Before you have any questions, and I'm sure there are many, please be sure to visit this URL and say that this talk was absolutely awesome. This is mandatory, I'm going to find you. And in case you have any questions that you find later, please email me on this address. So, questions? Can you hear me? So, I was setting the background, I'm not a Java programmer, I code in C++ usually, and I was wondering why was there a warm-up in the benchmarking? For the JVM to optimize the stuff. So, that's one of the problems of JVM benchmarking that JMH solves for you. You have to give the JVM some time to optimize the stuff. If you were benchmarking the unoptimized stuff, you would have benchmark code that wouldn't run in production. Because you expect typically that Java applications will run for a longer time, will be able to optimize stuff for you and optimize heavily. So, that's why the warm-up. So, does it make sense to have a warm-up that is exactly long as the run-time? Or shouldn't it be usually a lot less something? So, I didn't show that, but... But it was like five iterations. Right, right, but if you looked at the numbers, which were scrolling there really quickly, but if you looked at them, it showed that five iterations of warm-up was enough to stabilize. And I think I had the warm-up iterations one second long and the benchmark iterations two seconds long. So, they wasn't exactly the same. But if you look at the JVMH examples, they use exactly the same time for the warm-up and for measurement. Okay, thanks. No problem, okay. Okay, right, the microphone. Okay, thank you. So, my question is about converting the value types into object references, because originally I thought the implementation would be similar to .NET, where there's boxing and unboxing. Do we have similar thing in Java, or is it really going to be a separate instance for every value type, or a separate class for the generic file? Right, so I think that the Java model that they are working on is quite similar to C-Sharp, where I think they're also doing specialization. But to be honest, not the .NET programmer, I'm not really sure. But the idea is that for each value type, there will be a specialized class. And when used in a place that requires conversion, conversion will be done. As far as I know, they do plan on automatically boxing and unboxing value types to reference types. Not sure if it will be in the final, but currently it's like that. Thanks for question. Yeah, I was just wondering, doesn't this break bytecode backwards compatibility? They are actually planning to change bytecodes for this. So it doesn't break bytecode compatibility, and they are going through a lot of hoops to keep bytecodes still compatible. So for example, I'm not going to show that. If there's the box class, this will be compiled to a class file, as if it would be normally like today. And there will also be another class file that will actually be an interface. And all the box instantiations, either the reference ones or the value ones, will implement that interface so that there will still be all boxes, either reference or value types, will still share a single super type. But the reference ones, in the way that it works today, will keep working like they did. So does that answer the question? Yeah, thank you. Okay. Any other question? I think we are exactly out of time. No, we still have eight minutes. Wow, two minutes. So you can finish your... I can show one tiny example, which is let me value types valhalla. What I have here is this list class. It looks like exactly... It's probably a bit too big, so... Okay, it's still... It will work badly, so I'm not gonna do that. I'm not sure why it's... It will be moved to my other display. So anyway, I have a list of int, as in primitive ints, and a list of integers. And a code that prints the class hierarchy of these classes. So let me just run this. I have a run script that compiles using the valhalla Java C and runs using the valhalla Java VM. And actually, there's a special option that the Java C compiler will compile values as references. Because currently, the Java C compiler is able to emit new bytecode for values, but the JVM isn't able to consume it. So this option makes sure that it still generates compatible bytecode, even if it will be actually references. So at the first, at the beginning, you see that it's specializing classes. For example, I declare a list of primitive integers. It will create a class list of this name mangling, which means that the zeroth parameter, actually the first one, will be int. And it will specialize quite a lot of other classes. And here's the list of int. And here's a list of superclasses. And for each superclass, a list of interfaces it implements. So there's an array list of primitive ints, which implements a list, as it should implement, but that list will also be specialized for int. And that list will implement collections, specialized for int, and iterable, and blah, blah, blah. And at the same time, it also implements a list of int, and it implements also a list of any, which is how all lists, either lists of primitive ints or lists of object integers, still have the same supertype, which is the array list of any. Then you have abstract list, and it goes like this forever. So this is, should warn, this is a prototype. It might become, it might end up somewhere really differently. It won't be a superclass, but it will be a supertype. Yes? Sorry. So does this mean that any is a supertype of all objects and all value types including primitives? And the answer is yes. It will be a supertype. It's not a class, but it is a supertype. And if you imagined list of question mark, which is fairly typical, this actually means list of question mark extent object. So list of question mark will still be list of objects, and then there will be list any, which is a supertype of all that, supertype of lists of objects, and at the same time, a supertype of lists of values and primitives. So this is kind of, this shows what they have to do to achieve compatibility. Like what .NET guys did, I think that they broke compatibility with generics. Java guys, they tried to keep compatibility really hard. So they have to go through things like this. Let me show, yeah. If I do a code completion of list of any, how the IDE knows what kind of types that can be. They will handle it the same way they handle it today, except that they will have to, in addition to objects of which there are many, they will have to include value types as well. There's actually an interesting problem with this. For example, the class list, the interface list prescribes an overloaded method, remove, which takes an integer, an index, or the value of that type to remove. This is actually, this is of course problematic when you have a list of primitive integers, which overload should you choose? And this is where these things come from, come to rescue. So the class, the interface will be changed to include those both remove overloads for reference types, and it will not include them for primitives or for value types. Currently, the prototype uses a syntax like this. I'm sure there will be different syntax, or maybe they won't use this approach at all, but currently they are doing this. Did I answer the question? I will type any dot. There's nothing like any dot. What happens if I type in my IDE any dot and invoke code completion? So any is not a real type then? No, so it's not a real type. No, it's not a real type. Right. Did I answer the question? Somewhat. Okay, thanks. Any other questions? Okay. Thank you for the presentation. And please go there. And say that it was also. Thank you. Thank you. I think a little bit of this has been polluted. I think everyone wants to code. Okay, everything should be like, instead of a class, there are some incompatibilities with the frameworks which expect. Right. I think, just as a comment. I'm not sure even I do have. I'm just thinking. Which screen is which one? Well, this is what will show up on the projector. Okay, that is crucial information for me. I just need to figure out, okay. That's what I'm wearing. I mean. My name, Machi Shu. Machi Shu. Machi. It's like, it's a little bit softer than what you guys have. But if you say Machi, I'll be. Okay. I'm not picking up on that. Machi. Just say Machi.