 Okay, room is full, doors closed. Let's start a little bit earlier. Welcome everybody to this presentation Java 4 11 We can say 12 Kotlin code coverage is a best friend bytecode My name is Evgeny Mandrikov. I Work at the great company sonar source. You might know such a product as sonar cube By the way, who knows who uses I Hope you do. I hope you like them But this company can only dream that they own my opinion That's my usual disclaimer and we are not gonna talk about this company. We are gonna talk about something different We are gonna talk about Java code coverage and such a great library as jacocco by the way, who knows who uses Okay, some and About eclipse a dilemma which is Plug-in based on jacocco. I'm call it on those two projects I work on them in my spare time for about eight years already a little bit more I work also on some other open-source projects, but that's not the point and Before we start by the way another check Is there guys who work on Java compiler like from open JDK? No, okay, so I can speak freely Is is there guys from JetBrains who work on Kotlin on intelligent? No, okay, I also can speak freely Let's go ahead. So During work on these projects. We are testing seven and a half JDK's now already Something like eight and a half and the half because we also test early access builds. Why we do this We are gonna see a little bit shorter Out of all this testing we of course sometimes find the bugs But I would say that we find the bugs not in our product, but in JDK This happens and of course, okay, sometimes we develop new features for example code coverage reports of jacocco version zero dot eight dot one For some Kotlin classes was looking like this after some new features we implemented Without any changes in any test it started looking like this Why is this happening at what those features are this exactly gonna be this talk about? Before we start It's important to understand that jacocco works only on class files It receives and from all the information from class files source files Okay, they're shown to you at the very end, but in the process They do not play any role and we're gonna see this. So this heavily simplifies things Also complicates but simplifies Java agent super simple integration You don't need to rebuild your application to get code coverage working. You just attach Java agent This allows this opens door for us to cover some other languages on Java because well bite code is the same So this should work for Kotlin theoretically. This should theoretically work for Scala, etc etc Why theoretically because we're gonna see on which problems we face on the practice with such languages Again absence of source files is not an issue So you can actually measure code coverage if you do not have source files at all That's pretty cool You get someone sells cottoned you can you can run some tests on it and see code coverage out of this And of course it helps us to cover some new Syntaxic sugar which the ones that doesn't change class files the ones that doesn't change bite code For example switch expressions. It should be no brainer for us to support Because it's just a syntaxic sugar over already existing bite code. Okay, no raw string literals We're dropping from JDK 12. There is a little mistake on this slide But it should be also no brainer for us and so on so on so on and Before we really start Here is a real disclaimer Sorry for that. There will be a little a little bit of blood out of GVM Sorry for that. There will be naked Java compiler naked Kotlin compiler Sorry for this. We're gonna talk strong language We're gonna talk primary bite code forget English forget French etc etc and sorry for this I will try to do intense violence of your brain if you're not sure that you're ready You can still try to open door and exit So let's check also to to orient me a little bit Who knows Kotlin who want to hear more about Kotlin's and about Java? Okay Half-half, okay, we will try to do both. So ready Let's go Java Let's take as a starter to warm you up. Let's take a pretty simple example Java empty class With a name bless me who agrees that this class is empty. It's empty. There is nothing inside Who agree? I agree Okay, let's make it check Let's check. So serial version ID empty Some other opinions. Let's don't guess. Let's check Thank you very much. So how are we gonna check we compile it then we're gonna decompile it for the decompilation We're gonna use Java P standard tool out of JDK Minus V verbose minus P show me privates etc etc everything on the resulting class file What we're gonna see inside is okay some metadata about this class file Information about which which bite code version we do have here Reminder 52 major version in bite code is Java 8 We're gonna see constant pool. This is first and the last time we're gonna see constant pool because it's quite big I'm gonna strip it out from other slides constant pool is basically all the constants use it in your class file It's strings its method names names of attributes Attributes Signatures of methods etc why it is useful because well then if you have a lot of calls of the same method You can simply reference the same constant pool and constant pool entry and here we see the actual Content of the class and well blessed place cannot remain apt. There is indeed Constructor default constructor in Java. We all know that it exists. Here it is. What's inside of this constructor? It's also not empty It actually contains invocation of a super constructor We know that to construct any object in Java. We need to call the constructor of its parent In this case. We also know that parent of every class is actually Java land object. So here we call a constructor of Java land object and Nothing can javacal can fold through the method. So we need to return from the method here is a return and Here is exactly The little connection with the source here is exactly why jacoco is capable of showing Q something on the source We have a so-called line number table line number table says Starting from the bytecode instruction zero everything what follows was compiled from a line number one And that's exactly how jacoco can connect What was analyzed on the bytecode with the final report on the source code and that's it Well, we also have source file attribute which says okay file name from which this was compiled is this one This is the only piece information which connects us back to the source code and Let's take a look on the other example again pretty simple also for warm-up You know this time to be sure that it is empty We called you know empty and there is no constant in it this compiles. So do we agree that it's empty? No because it should contain again Constructor default constructor. Let's check we compile it we decompile it what we see inside Not at all a constructor first thing which we see inside is some magical field values We do not expect that but the case there It is private. It's static. It's final all those keywords. We know fine. It is synthetic Remember this words. We soon gonna see what synthetic means It also magically contains some methods which we didn't wrote it contains method Values which returns empty, but we all know this methods actually if we write our enum constant dot Values we are going to get all the values of an enumeration. So we didn't wrote this method compiler generated this method for us There is also method value of Again generated. We all know this method. It returns by string representation of constant it returns the actual constant and Yeah, here it is here is constructors that we expected and Well, there is also static in solider this static in solider exactly in solides is a Field values to put all the values of a constant into into this field so that method values could return this constant Why is this is all important for the tools that work with the bytecodes which analyze this bytecode because User didn't wrote this code, but the compiler generated this code So if you analyze your byte if you analyze a user bytecode and start reporting something to the user They're gonna see something strange. They're gonna see something what they do not expect So for example old version of Jacob was producing such a report saying well, you didn't test that method value of you didn't test that method values Why we should test them? It's unclear. They are generated. Let's trust the compiler. We can in some cases So we need to filter them out. We implement the filters report starts to look a little bit better Unfortunately in the first iteration. We didn't realize it that some people indeed do not define constants in the news They use in news as the containers for static methods. So in such in homes constructor is never called So you also need to filter out constructed. We filter it out Remember it's synthetic attribute. What this beast is we can read in Our Bible in the virtual machine specification that there is a synthetic attribute this Bible simply says Synthetic attribute marks everything What does not appear in the source code but appears in the bytecode? except of courses as in any languages there is some exceptions accept methods You know values you know value of and default constructor Exactly the construction that we've seen Why is this attribute was introduced? Well a paragraph a little bit later explains this synthetic attribute was introduced to support nested classes and interfaces back in JDK 1.1. Let's have a check why so a little bit more complex example Nested classes. So what do we have here? We have an inner class. It's nested into the outer Inner class has a private constructor And we call this private constructor from the inner class First question to you will this compile? Of course it will there is no problem Java language allows all this so we compile it we decompile it What we see inside again to remind you it's Java 8 major version 52 Here is like our example here is constructor indication if you look a little bit precise in this invocation We will notice that actually invocation contains some magical argument with the value now We didn't wrote this our constructor has no arguments, right? Why so? Let's have a check to check we need to have a look on the inner this time and what we will see in inner We will see our private constructor and indeed it has no arguments at all There is no arguments What we will also see we will also see that Java compiler generated another Another constructor on behalf of us and this constructor accepts some argument What we could notice is that indeed it's synthetic this time. It's synthetic generated construction market was synthetic What we could also notice is that our constructor was private on the top most left you see the private This new constructor is not private. It's package local Why so? Well while Java language specification allows you to to write such a code where Classes within the same nested classes could call could access private constructions of each other Java virtual machine doesn't allow this for the Java virtual machine. It's two completely different classes They are not at all nested They lie next to each other and you cannot call the privates from one to another So to overcome this difficulty to overcome this property It was decided. Okay, let's invent synthetic attribute. Let's generate such Constructor which is package local which can be called it from the class Yeah package local synthetic and works our private constructor simple delegations. That's it If this time we will look on this generated class, which is used as an all argument We will see that this time it's indeed empty. This is an example of an empty class Why is this class also generated because well another problem is that you can't have two constructors with the same signatures So you need something to denote this second package local constructor Yeah, and again this class now synthetic um Thanks, God all this changed in Java 11 in Java 11. We got nest mates We got a nest based at access control. So all these crap, which was introduced a long time ago could be Reduced it if we will have a look on exactly the same source code But this time compile it with Java 11 for for the bytecode version 55 We will see that there is a loss in weight. There is no more Constructor generated. We have only our private constructor and that's basically it Also, we see the new attribute which is exactly was introduced it to support nest based access control which says well I am I am I am nested in this nest host which is class outer Which is practical another example? another example Looks like this. We have an outer class. It has a private field And we have an inner class and an inner class we do increment of outer class Can you guess what is going to happen? Something should happen because again, it's private. We're not supposed to access private members Again, we're not gonna guess we're gonna decompile all this and we're gonna see Again back to Java 8 in Java 8. What happens is we are looking exactly on the increment But we do not see increment. We see Magical method invocation Access dollar zero zero not seven for some reason eight What is this we look on the outer in the outer indeed there is new method generated. It is synthetic Package local again to overcome this problem of access to the privates and Here is the bytecode which does the increment again for exactly the same reason to access the private field we need to To invent such a tricks in Java Again, all these changes with Java 11 with Java 11 we lost Wait, we do not need to generate so those successors We can we can directly increments the counter because there is another new attribute next member Which basically says well, I am an outer for this list of classes So as I am an outer I could access their privates, etc. Etc. And Yes, if you look on the inner we will see that there is nest host again, which says yes I confirm that I'm an inner of this outer Another interesting example also related to synthetics So call it bridge methods in Java. So what do we have here? We have two classes a and b b extends a we have two methods one is in a another isn't be and The method isn't be overrides the method in the a So fast so fast so good. I hope The only problem is that the method in B decides to change signature It decides that it will be returning not an object, but a string this perfectly valid this silhouette. However in the places where Collars expect that they will be calling committed funds that returns object We are supposed to have a method in B which returns an object and to do so Yeah, different signatures and to do so again Javak decides to do a trick here is original method and Here is a method that colors would expect the methods that returns an object again. It's synthetic It's also market with an attribute bridge and you won and it simply delegates to our original method Exactly to this time not to align the private access, but to align expected signature Out of all these Logically what we could conclude is well for any bytecode analysis too We should simply ignore synthetic methods because they do not have any relation to the source code, right? User shouldn't see them. It's compiler generated code Maybe maybe not. Let's see if you have a look on example of lambdas that appear it in which version sure Seven or eight. Okay, you'll figure out this at home So what we have here run able that run simply we execute a Renable interface method run and we pass the lambda to the method that does this execution. So here is the lambda lambda is an argument Let's notice that lambda is on the line 9 and lambda contains jacker right Simple right Right or not, right? We compile we decompile we look what happens method exec Nothing special. We do an invocation of a method nothing changed here So invocation where we pass an argument as an argument we pass a lambda here is magic happens. Where is Jacker? There is no Jacker, right? There is no constant Jacker Where's line 9 there is line 8 line 11 no no line 9 9 here is a bytecode instruction return not the line number all These happen it to move to a new private static void lambda dollar fund dollar zero method Which is synthetic which is generated? Here is Jacker So this is a mechanics behind lambdas. This is mechanics behind behind invoke dynamic instruction lambdas appear as a synthetic methods in the class where lambdas were declared So if our bytecode and here's a line number 9 by the way So if our bytecode analyzes to whatever it is in bucks hunt bucks or or jackaco in this case if they will be ignoring Synthetic methods they would miss lambdas So for example here is this exactly what was happening in jaco version 0 7 1 we are missing lambda We don't know anything about lambdas exactly because we decided to ignore synthetic methods We need to fix it. So We decide for our bytecode analyzes out of nowhere nowhere that because it's not specified that lambda is implemented like this It's implementation details of lambda. So out of nowhere. We decide. Okay. Well, we need to ignore synthetics Except if the method name starts from the lambda dollar That works that works perfectly Another example switch and you know so we have you know this time It's not empty. It has two constants falls through and we have a switch by this, you know So two constants which? We compile we decompile what we could see we could see that there is switch Right. No problem free cases Inside of our method and no we see something strange happening here We see that just before the switch we perform access to some magical class that we didn't throw it And to the field switch map dollar switch map dollar here in this class. What's this business? Let's have a look again. We can decompile this class. We see that this is generated synthetic class We could see that it indeed contains a synthetic field and this synthetic field is Inutilized it in initializer how it is in initialize it. Well, we take a particular constant of our genome and we put Ordinal of this constant into into this array and We assign the value one to to the value of the array and we do the same with the second constant Again little connection with the source we could notice that These generated class has a connection with the source file it maps back. It maps back. Let's scroll a little bit It maps back to the line of the switch So for the each switch on the you know Java compiler generate synthetic class. It does so because While you know might change you know might be provided in third-party library. Maybe it's not your enum at all Switch always should work no matter how you know was rearranged if you change it the order of constants So that ordinals change it Or if you edit a new constant switch should continue to work To do so you need to do a mapping between the switch in the bytecode and the original state of the enum And this generated class exactly provides such a mapping Again simple conclusion. We should ignore Synthetic classes this time without exceptions because otherwise they will be showing to the users that Unfortunately, they have a line which is not covered Filters at this moment of time you might start wondering do I have something non-synthetic? Unfortunately or for a good I don't But inside of Java compilator. There is a lot. Let's have a look assertions simple example We have assertion on the Boolean parameter simple, right? What should happen if we start this and pass through? Everything should be okay. If you pass false what should happen Okay, we shouldn't use at all Good point, but okay, let's have a look what happens inside inside again Just because we are using assertions we got some synthetic you might say I gain synthetic no Besides synthetic we got far more generated code Which is not at all market synthetic because there is not rebutes for generated bytecode instructions. We got something strange which is invocation of class dot desired assertion status and we are passing our class in it You might why you all might know that in Java you can enable and disable assertions, right? In order to not always for each assertion in order to not always ask No, sorry fuser more you can enable or disable assertions not only for the whole GVM You can do this on a per class basis for certain classes you enable for certain you disable So this would mean that each time we execute assertion We we should go to the runtime and ask runtime. Could you please tell me what's the current status for me? Should I throw a should I execute assertion or not? This is not so performance. This is This is actually expensive operation so Java again does the trick in order to not go to the runtime We should Yes, so So we simply ask runtime once during initialization We cache we cache the value value that GVM knows and then we simply each time happens assertion we check The cached value if assertions disabled we shouldn't execute assertion if assertions enabled then well we do the Check of assertion and either we throw exception or not again Why this is a problem because user didn't wrote all those branches user didn't wrote if statements So users do not understand why assertion statement contains actually four branches well to decide whether assertions enabled or not and to decide whether to throw exception or not and In case of jacaco, it's it's it's pretty tricky case because you can't actually test One of those two branches because then you need to restart your GVM with assertions disabled or assertions enabled It's very tricky case. We still have an open issue. We do not know how to solve this properly from an end point and User point of view, but we will have a look on this later Another interesting example switch on a string Switch on a string we compile it we decompile what we could see inside inside We could see for some reason two switches. We had one, right? You're not slipping. We had we had just one switch agree What happens inside inside happens pretty interesting thing? Apparently in order to realize string string inside of a switch. It's very It's very performant to actually reuse a string hash code because with a high probability If you just switch by hash code, you will already know which string it is because it's likely all the strings They might have different hash codes that it and the second but no yes Sorry, no We also sometimes get a hash collision so we also need a little bit of logic to compare with the original string and This is exactly what happened here two strings. They have exactly the same hash code. So below we have a couple of string equals to decide exactly which one of the strings we got after string hash code and only then we Perfectly know which switch case was taken either first one either second one Again, this is a little bit of an magic that bite code analyzes tools like code coverage or fin bucks or hundred bucks need to deal with because User wrote just a switch just three cases. He didn't wrote how many? Two ifs and two switches so again We need to write some ad hoc filter which tries to to map begs is the bite code Which compiler generates to map it back to the source code to show to the user. Yes, everything is okay. There is three cases So let's have a look on another example finally so we have try finally and we have Couple of method calls inside super simple, right? Let's have a look after compilation. We again decompile what we see inside We see the free invocations of method knob, but we had only two you're not sleeping right? Let's check Yeah, let's check Two or three. I see three knobs But the first one is a method declaration doesn't count. So to simplify our life Let's introduce a marker. So let's have a string try finally. We again compile decompile What what changed that we can see there is indeed here is one try inside and To finally we've wrote one right? Why is this happens this happens because in GVM in order to implement finally we could use so-called only exception table and exception table doesn't say Execute this instruction always no matter what happened exception table says if exception happen it then Execute exception handler and in order to implement finally you need Finally happens. We all know finally happens when exception was and exception was not right Right. So in case of exception, we're gonna catch it We are gonna execute exception handler and this is exactly the second finally. It's a finally inside of exception handler Here is exception handler and on a normal pass without exception. We also should execute finally So in order to use exception table to implement finally we actually need to duplicate finally two times that's a trick and Again, this surprises the users because users wrote finally only once However, we need to do the deduplication we need to realize that two blocks of code there actually is the same thing So again, we've right filter we we merge all everything back and user is happy believing that he has no duplications in his bytecode a Little bit of our archaeology. This was a little bit differently in version Java 1.4 in Java 1.4 It was not a problem to implement finally without duplication because there was GSR and thread constructions Java subroutine and return from subroutine Looks like I have a missing slide, but okay Apparently those constructions were deprecated because with such constructions It's much harder to verify the bytecode for the compiler and verification is important for security So in favor of security was decided to do a little bit of duplication Let's have a look on the other example and given the amount of time. Unfortunately. I think we we may be not gonna see the Kotlin Try this resources. So we have resource. It's closable and we have try this resources I am not gonna ask when it was introduced because I know that you do not know in which versions what was introduced So try this resources if we decompile it we see a lot of bytecode You even don't see anything because it's too much. We can zoom a little bit We could see four times close close close. Why? Because we know try with the resources should close the resource in order to do this for sure close the resource Let's call close four times No, of course no If you look on the other Bible the other Bible the Java language specification Says the meaning of a basic try with the source statements blah blah blah blah blah and such a such a beast if you look Deeper in this beast. We will look a little bit later You could logically at this moment in time ask a question Could you please prove it that this is equivalent to the try with resources? I could prove it Let's have a look on Fun one Let's have a look. I have try with resources that you've just seen. I also have fun, too I should Fun two is exactly what you also seen. It's a big beast If we now compile file fun one decompile it save result the compilation in file 1.txt If you do exactly the same with fun, too but save it in 2.txt and We are gonna oops and we are gonna do div between two Here's what we're gonna see something changed because we did two compilations But then only nine numbers changed no bait code was changed. Let's check that there is some bait code Indeed there is some bait code. There is a lot of bait code. So This is a pure compilation of this is a pure equivalent of try with resources So what happens here? There is try catch. There is also finally in finally we do all the closing If resource was not open, we need to check that if there was no exception. We need to check that so there is some ifs appear Then we need to close if exception happened and then we need to attach one exception to the other all of this happens in finally and We remember that finally duplicated. So that's why exactly all of these duplicated multiple times Fun fact in case of exception Yeah, exception. Yeah, in case in the following case. We always insolize resource. So there is no point to check now So there is dead cot another dead cot in case of exception We know that exception happened that there is no need to check for now, etc. etc. So This was a little bit improved later in javaq 11 They reduce the duplications, etc. etc. All this means that again, we need to To write the filters and we need to add up the filters for the particular Compiler version, etc. etc. Unfortunately times time went up. So I skip a lot of funny stuff about Kotlin You might find this presentation on on jacocos site, so Don't worry you you go home. You see the recording on YouTube about Kotlin and and everything will be okay conclusion Unfortunately source code is not bytecode. This is very important This also means that what you've wrote in the source code is not at all What is gonna be executed because what is gonna be executed is? Not even bytecode Unfortunately, what is gonna be executed is a jitted bytecode Even this is not true because there is a branch prediction etc. etc. etc. So be careful what you've write is not what is actually executed But unfortunately, this means that bytecode basic tools. They only have to guess All of these is Implementation details. There is no magical way to map Back the bytecode to the source code. It's a solvable task as we've seen because Multiple bytecode they know sorry the same bytecode as we've seen this strike try with resources It it easily can be result of compilation of two source codes How we decide we can only guess In jacoco in particular we assume that no one write such a crazy source code because it's it's big So we assume that it's always try with resources if someone will really write try with resources like Java language Specification says in the code coverage report. We will filter this out assuming that this is try with resources So please bear with us We all we developers as a tools you developers as bytecode analyzers tools We have to live with this. We should write filters now that you know all the internals of jacoco filtering please bear with us and don't scream when code coverage report shows you something that you do not expect Please just decompile and see what actually shown to you Thank you very much. That's it If you have any questions, let's just go out and you ask all the questions