 Hello and welcome. Johannes Bechberger will tell us about open source Java profilers. He likes baking, cooking, riding the bike and open source developing. I wish you much enjoyment. It's the wrong title, I just found out. It's a single thing I missed. Yes, I'm talking about unleashed the power of systems that break. So yes, without further ado, please turn down the lights. So originally I wanted to talk to you about this topic. Nightmares of a profile developer. Johannes Bechberger's Nightmarish Tales on Java Profiling APIs. But it turned out I didn't want to be that gloomy, so we started with an introduction example. You all probably know that the situation when you're like a students union or somewhere else, where they ask you aren't you a Java developer? And so like, yeah, I can do some Java and can you do also some Web development? And I'm like, yeah, that's how it happened to me in the Night of Sciences, when my this year that they asked me, hey, couldn't we have a web app where people could just answer questions? And I was like, cool, I could do this. It's just so people can answer like how questions like how all this the universe during the Night of Sciences, which is like a little bit like the Gulasch program, but fairly smaller where like professors can tell people about their topics in short talks. And we wanted to be JDPR compliant, so we didn't want to use any large platform like Google Forms because yeah, it's Nightmare. And so I just wrote a simple application. It has a client. It has an admin interface. It has also server and I thought like, hey, the server could just store every information in a JSON file. So how it works is that the client asks the server regularly, hey, do you have a question? Yes, cool. I presented to the user. If not, then no. And I thought, yeah, using a database is totally overkill because we've said premature optimization is the root of all evil. The problem was the event came and any guesses what happened with this tiny little application and 90 users coming to the website. Any guesses? Yes, it fell, of course. It failed miserably and you know, probably the feeling when you're on a laptop at an event hacking down, restarting the server and trying to hot-fig things and it doesn't work and it breaks and you're like, no, what have I done the last 10 years? What have I learned in my studies? Because I missed the whole code of Donikov and many other people also do. He wrote, we should forget about the small efficiencies in about 97% of the time. Premature optimization is the root of all evil and he further told us, yet we should not pass up our critical opportunities in the 3%. A good programmer will not be allowed into complacency by such reasoning. He will be wise to let carefully add the critical code but only after that code has been identified. So how do we do this? Profiles to rescue. When I ask people on the street and ask them, hey, they ask me like, what are you doing at work? I'm like, I develop profiles and they're like, I'm profiles. People like Sherlock Holmes and who will look into crime scenes or like Detective Heinrich for the people that grew up in Karlsruhe and remember this, this folk from the university film course who looks into such mysterious things that we are all the AKK cops are at. And so essentially what they look, they look at crime scenes and look how to solve them. And then they say, Jean, that's what we're doing here too with profiles. So consider on the top a pseudocode version of my application and below a flamethrower that builds up. And then it helps you to know how much a method is, how many times a method is called or how much time it takes compared to the caller methods. So for example, here the main method takes all the time, of course, because it's the main method. With no surprise it has a server loop because it's a question server. And the server loop calls a method handle question request because it works by just looping over all requests that it gets in, handles the question request and as client requests are much more likely than admin requests, most of the time is like dominated by this method. But of course there are also other methods and they take some time. And then how does the question request work? It's just asking, is the current question enabled? If yes, emit the current question. If not, emit something else and then wall you. And any guesses what both these methods are taking most of the time in computing or doing? Any guesses here? Yes? What? Yeah, essentially they parse JSON and that was great. That's good revelation and to quote Mario Fusco, I love frame graphs. When you do something really stupid, it punches you in your face and you cannot not see it. And so replacing all the presentation with a proper database in this case SQLite, because overall using something like MariaDB would be really overengineering, but SQLite is nice. It's small. You can use it easily and it worked. And the promise, I developed this application during Corona. So for Corona best event and now Corona is over. So I can never use this application. But again, maybe some other pandemic comes around. So what do you see here? Profiling is a good part of your toolbox of methodologies. It's like debugging, it's like testing and so on. So you should just put it in and then when you have a performance problem, you can log, hey, I know some profiling. Look at that. So who I am, I'm Johannes Bechberg. I'm sometimes talking about profilers there in London. I work at the sub machine team. We have stickers for later if you want one. It's a team at SAP with around 15 people where we work on OpenJDK and I work specifically on profiles, work on profile fronts, and profiles themselves and also profiling APIs. And that's one of the reasons why I'm standing in front of you because and why I wanted to initially name this talk tales of Johannes Bechberg's tales of a profile API developer, because I know a lot of profiling APIs. I worked on them. So what is profile actually? And I think the question had been asked in the beginning. And to quote the new hacker sectionary, it tells us that a profile, so the thing that we obtain by a profile is essentially a list of methods with the runtime so we can tune away the hotspots in it. And that's quite cool. We have, of course, different profiles and they have advantages and disadvantages. And that's the first lesson of, hey, why you shouldn't trust a profile, for example, an instrumenting profile. How an instrumenting profile essentially works is that it inserts instructions automatically at runtime in your code. For example, here we have our server loop. And it works by inserting at the beginning, hey, we're starting with the server loop method at the end. We're exiting this method. And that's quite nice. It works properly. It's easy to understand. The main problem is jits don't like it. And inline doesn't like it. Essentially, you're not measuring the system that you actually want to measure about the different system. So people came up with a different idea. It's sampling profiles. So the idea is here that we only sample what the application is doing. For example, we ask the JVN regularly, hey, what's this application doing around? And that's pretty cool. It's approximate, but it's usually fine because we only care about the hotspots in the application. And we don't disturb your application because we're not actually modifying any code here. And of course, before I can tell you the bad parts of profiling, I want to tell you what the profiler is. And for this, it's most important to really know how to write a profiler because in the beginning I also thought profiles are rocket science. They are these large, complicated systems. And then with some preconditions, I wrote a blog about it here. And it got on hack and use. And people liked it. So I'm giving talks on this. So essentially how my profile works, it's a Java agent. You attach it at runtime at the application. And it consists of a main class, of course. And this starts the profiler. And this profiler as a sampling profiler just asks the JVM, hey, what are you doing? Like the stack traces. And then sleeps in the samples and sleeps and does it in a row. And then we store it in some kind of store class. And so to start with the main method, the main class, what we essentially do is that we attach it at the start of the application run via minus Java agent. And that's quite cool. And then we can pass some arguments. The question is now, how do we create such an entry method? Because we could, of course, we have different main methods. Like we have the main method here. That's for normal applications. Java is a bit weird because this is also valid Java because you see, and in JK21, which is upcoming, you can also write this, but this really doesn't cut it. So what we have here are two different methods because we can attach a profiler at the beginning, like as you see above, but we can also attach it later. So we can only profile an application like after 10 minutes because we think all the setup code doesn't matter anyway. And for when we attach it later, the method agent minus call, when we attach it to the beginning, the method pre-main is called. And so the main class is pretty simple. It just starts a new profiler. It has to create a new thread for this profiler. And then it starts it. What's really important, please give your threads names. And please give everything that can be given a name like class law. Let's give it a name. It makes debugging so much easier. And yeah, then that's the profiler. That's pretty simple. The only thing that you have to care about is like, yeah, we want to print a profiler at the end using the edgehawntone hook. And then the profiling is just this. We asked the JVM for all stack traces. We store them and yay, that's a profiler. And then what is in these stack traces? What are contained? And essentially there's a bunch of information on the frame. That's essentially a method call. It tells us where in the method we're currently in, like the line number, where this class, the method is defined in lives in which file, and also method name, and also class name. And so we then store it in a store. And that's also not too hard. We have a store with some methods. But essentially the most important part is here, how do we come from all these stack traces that we collected here? For example, these are when we overlaid over the flamecraft when we collected them. How do we get to this flamecraft? Because a flamecraft is a pretty simple visualization. So we just pick one out. So we know that at some point in the application there was a call from main, then server loop, then handle, then current question, and parse JSON. So this is what JVM is currently executing. So we take this and we think, hey, we can just make a tree data structure because computer scientists love trace. So what we do is we start with a main node here on the bottom. We say, oh, we found this main node once. That's cool. We also know that the main method calls the server loop. Cool, because it has, currently. We also know that it's the handle question request, and so on. And that's the first one. Then we get another one because, again, we sample and we again hit this. We add it to and then we record, hey, I saw the main method. Yay. Now we call it twice. And I saw that again server loop was called and so on. So we mark it down. And now we have a different one that's missing all the parse JSON on top. So essentially, we're doing the same, but now parse JSON doesn't get a three because we didn't call it. More interesting here, we're having a new branch. And this just means we're adding a new branch here, there too. And the cool thing is we can turn this quite easily into flame crafts. And so we know we have at most four time slots. And then main method takes four time slots, server loop and the question request and current question takes just three time loops, time, time, units, and this question enables just one and parse JSON above. So that's essentially what flame crafts are. And then we use some magic, all these three flame crafts and get some flame crafts. And that's the cool thing. Profiles are pretty simple. You can write your own. I even talked at Milan with someone who said, after I gave this talk, hey, I want to use this in production. Like I want to use this in our system because we can't attach a profile. But I can essentially copy your code into my application. And we can run it. I don't know. It's a prototype, but you can play around with it. But that's of course not the reality. In reality, we have different sampling profiles. Because yes, we have instrumentation based profiles, but they are rubbish, as I told you before. They're not profiling the thing that you want to profile, your application. So essentially what we have here, we have external sampling profiles and built-in. Built-in are directly built into your JVM. And then we have, on the one hand, Visual VM and NetBeans, which is like the older ones. And then we have, then, Sun decided to create a for to analyze in 1992, which is essentially a profiling tool for many languages in the Sun ecosystem then. And then they added a library. Then they added a library function called async-get-call-trace in 2002 at the end. And we moved this three months later. And of course, this is the API that everyone depends on. Especially async-profiler. And they tell you later what this is a bad idea. And there's, of course, built-in ones called jkflash-recorder, which works similar ish to async-profiler and they share some code internal ways of working. And essentially all the modern profilers that you can buy are a mixture of JVR and async-profiler. So how do you obtain a profile? Bear with me, I'm coming to the juicy bit soon. Just so you know how to use it, you come at least away with something from a talk. You can just attach it. It's just a JVR agent. It's a native agent. So you can just tell the JVM, like, minus agent path, please start it and please also print a line graph on it. That's OK. With JDK5 recorder, that's not that they're similar. It's built-in. So we can just tell the JVM, hey, please do it. But should you trust them? And that's probably the reason why you came here. And I think no. So it's a nightmare. I stay down at the abyss and I'm here, one of the survivors to tell you. So it's really horrible. It's like, don't trust your profilers. They are just tools. Don't trust them because for once we have save points and modern profilers don't suffer it, but some of them do like visual VM. So problem is some profilers, only profil adds save points. A save point means that at different parts of a program's execution, a threat asks, hey, should I go into a save point? And if yes, it stops and all threat stops and that's cool because then we are in a defined state. We have less concurrency. The JVM can do things like garbage collection more properly and that's a cool thing and it's really safe and everything. The main problem is it just happens at some points in the program and they aren't that many, especially if you have large loops inline and such. It can take quite a while to get there, so that's not a trait. To have a visualization, I take here the sub machine logo and every time it gets bold and dark, it checks for a save point. So essentially what you're doing here is you're using save points and want to like profile. We're telling the JVM like, hey, please, please, please, stop when we're ready and then we wait and at some point in the future it comes to a point. But then when we're going asynchronously, what we eventually want is we're asking JVM just stop and it stops and it's quite fine. It's even more important when we have multiple threats and some of the applications that we typically write have multiple threats. Because then when we only want to get a set of threats of threat one, we have to wait till all threats are at the save point, which is quite nasty. And when we're fully asynchronous, we can just tell the threat one to stop and the other threat doesn't care. And now there are some different APIs, but one of the most used APIs that doesn't have to save point bias is asynchronous. In a nutshell, it's essentially you have your stack on your operating system. It has some Java methods on it, some C++ method on it, and it essentially turns and essentially reads your stack and returns to the wall use. But any guesses how many tests are there in the OpenJDK for this API? Any guesses, any numbers? It's well used. It's used by large companies and yes, it's kind of not really in there, but it's still, you can access it with DL Sim, but it's well used. Any guesses, any numbers? No, that's too pessimistic. That's too optimistic who wouldn't test their application so much. Yes, one, that's a good one. Yes, who? Yes. And is it a good one? Yes, it isn't a good one. Thanks. That's all pre and most pre and audience that I had today. It's my only talk today now. So essentially what this test does and in a nutshell, it asks in the main method, hey, if dollar check, if exclamation mark check, then please, then please throw and it doesn't work, and it fails. So essentially it boils down to the check method and of course, what did the check method test for? Only that the check method is called, but not that all the methods are called. Not that all the above methods are called. So take profiles for grain of salt and for example, in this example that I show you, it's not really complete because the JVM also uses a JT-REG framework which uses some reflection. So the actual test that we were running before is like this. So the main method gets access to a method test via reflection and then causes method test. The method test essentially calls the Java loop method that just loops around. And what we wanted to see in the flame graph will be hey, on the bottom there's the main method then some reflection stuff and then the test method and then the travel loop method. Any guesses what we actually got before I fix the bug? No, that's not right. But essentially what we got is just the first three frames and that's horrible, but our test in the OpenJK before I fix it just tested for the first frame up there like for the Java loop method and this is horrible because this bug was in for three years because it came in because there were some changes in another part of the JVM that caused this change to happen. That's not great and the problem is like your applications that you write daily they aren't tested as well as you want them to be and that's the same with profiles. But surely the people from IBM are doing it better so any guesses how many tests OpenJ9 has for the implementation? No, zero, of course because it couldn't be better, couldn't it? But yeah, that's nice and IBM people implemented but they're currently contemplating just removing it because there are other nightmare things about it. And there's the synchronous version so at safe points and there the answer to is ready. Yes, that's correct. So you see, profiling APIs are pretty, pretty good tested and of course as I told you before like safe one buys is bad. That's a problem profiles are really just software. Don't treat them differently. Don't treat all the tools you get differently. Don't treat compiles differently. Don't treat IDs differently. They are just simple software. So what should be better like tests should be better? Performance and accuracy could be better because there are some problems in this area. They could be safer. They shouldn't cause segmentation falls and there are currently some problems especially because like what I think Coltray's currently returns are J-Method IDs and they are pretty bad because there is no way to see whether a J-Method ID is still valid because when a glass, glasses can be unloaded in Java isn't there anymore and you access it then you can crash and that's not that great. And it's not designed for safety in mind. So once I started working last year at the OpenJK team at SAP and I wanted to just wrap the whole IPI of Async at Coltray's in our old third crash protection so it protects us against any crashes. And I then wrote it in a podcast and then I got an answer like hey, from someone called Markus Krönenhorn who is quite big in this community I should add that the crash protection mechanism was mainly put in place as a result of having to deliver J-F-R from J-Rocket into hotspot under the deadline upholding feature parity. The stack-working code was in barely bad shape back then. And I would add the bad shape didn't change through the ages and David Holmes, another person active in the community, also wrote like Async at Coltray's is a legacy mechanism that was created for one single unsupported purpose and it's like used by everyone and that's cool, isn't it? So you know the term test-driven development? Ta-da! I've even wrote it down so you can remember it if you haven't heard about it but dislike test-driven development rather like is RDD it's called rage-driven development so I'm so enraged by all this bad quality so yes there are some constraints but I'm enraged by the missing test so I want to crash all these bugs all these moths and really, really crash them down and so it's rage-driven development I'm currently working on a new profiling API called Async get-sectories that has tests, a lot of them and that is probably specified that's supported because I really dislike the current set of the ecosystem and so I want to come from this place where I'm just looking down into the business and being like oh that's high to this where I'm like can look at all the nice architecture of this building that's actually a tower that I visited last weekend it's a 40 meters high tower built only out of wood and some metal nearby the syrish airport if you have the chance to come there test-go dates, a really, really cool tower, it's cool to use it here in the presentation so how can we test of course we could do some unit tests but automated tests are better because I think like when you're doing unit tests you're always missing things so what I introduced was the concept of a laid oracle so essentially at the bottom we have to have some truth and our truth is we have instrumentation we create an instrumentation-based profile because this is like round truth the Java application really executed these lines and then we know because we're like inserting it as shown before and then we know we can use this to verify that getStackTrace is correct and then we can use this to ask whether us and getStackTrace is correct as a safe one because getStackTrace is only valid as a safe one and then we can on top just check whether us and getCalledRace is correct I implemented this and found some interesting cases it turns out getStackTrace and us and getStackTrace are pretty safe and pretty accurate just that any change in the Open2DK that we're doing might change this and might even prick the API so my API might not work on your own system and that's pretty terrible so then there's also the approach that I found a few bugs with is fuzzing through when considering the memory space I just picked two random memory traces in the application and then I called us and get us and getStackTrace with it so the idea is I'm just throwing random data to it and hope that it doesn't crash and it works I run it a long time and it didn't crash and that's cool so I'm really testing it so it doesn't crash so people can use it as a application and to deal with these kinds of issues I invented an experimentation technique that is stole sneakily from life sciences so essentially what we're doing is a profiling loop so in the beginning we have our mental model for example in my application in the beginning I have the mental model okay this is the application I know some information on it so I know how it works kind of and then I make the hypothesis in my case the application is slow because I'm passing a lot of JSON that might be the case and then I adapt the model as so long until I can really express my hypothesis and then I go further to the evolution when I have a working hypothesis and I evaluate I check whether it's really the passing JSON for example I use a profile and it's all what it is and if I notice I can go back to the hypothesis refine it or even go back to the model if I learned something new about the model so in the end I want you to know that profiles are great people doing lots of core work in this area but just don't trust them too much so they just suffer use them when ready but accept that they might sometimes have bugs and if you find them please write to your fellow open dedicated developer at the bug report and it's just open source many of these tools are essentially the underlying APIs are too so I'm a part of Northern Twitter I'm part of Northern GitHub you can find me, you can read every two weeks a blog post on these topics where I go down rabbit hole and so my team is at sweet sub machine so that was my talk and thanks for being here thank you Rach do you have any questions yes in terms of the open JDK are there problems with other languages because like for example is this tech traced or the flame graphs still as readable as if just profiling Java as compared to Kotlin or Scala or whatever yes with Kotlin yes it's as readable because Kotlin is a tiny gem across Java essentially with Scala of course you have a lot of things that Scala has to Scala has to go through a lot of hoops to get to the language that it is and so the bytecode is more complicated but profiler developers that work like on the profiling UIs can mitigate this they can extract usually all the information if they probably support Scala it's far harder of course because the mapping from bytecode to Scala code is far harder in Scala than in Java because Scala introduces lots of new functions, lots of new frames but I think it's still feasible but in the end profiling Scala code isn't really a point I thought that Scala was only used for performance in matter otherwise you use all low level Java code other questions none thank you really much if someone wants some stickers stickers stickers so have a nice day and have a nice week please another round of applause