 Hi, and welcome to this talk on diagnosing issues in Java applications using Thermostat Embipement. I'm Sivillian Gibov. I'm a senior software engineer with Red Hat. I mostly work on serviceability tools and OpenJDK itself. Okay. Here's what we're going to cover today. First, we're going to introduce a demo application because this will be a case study finding an issue in that application. We're going to look at this demo application in Thermostat, and then do a brief course to biteman, and then we look at the source code of that demo application to understand this problem better when we look at this application with Thermostat Embipement together. Finally, we conclude with a demo. Okay. So this is a demo application. You can download it at that URL. It's a simple swing application. You can schedule fast and slow tasks, and we are going to figure out why there's difference between those two tasks. Okay. Sorry that this is cut off. Okay. So the problem here is the task execution times vary greatly. We know there's fast and slow tasks, but as we will see that in the end, when we do profile runs, we only get aggregate results and we see that the computation is stuck in this black box method, but we don't really know where the calls come from. This is where we thought biteman would help and that will give us a better idea and better understanding and really figuring out why there's slow and fast tasks. This example is based on a real customer case. It's of course that it's simplified, but it added or we did this analysis within Red Hat and so this isn't something totally artificial. So what if we look at the demo application with Thermostat? Well, Thermostat, what's Thermostat? Maybe not everybody knows what Thermostat is. It's a serviceability tool for OpenCADK and hotspot JVMs. You can extract metrics from the JVM itself and visualize it in various ways. You can run Thermostat locally and remotely. So what if we look at the demo application in Thermostat? So here, we can see there's a service that allows you to see the host information and the JVM information. If you look at the CPU graphs and run the demo application already, here we can see that we've scheduled two fast tasks and one slow task. There's CPU consumption going on in the JVM and you see that populating through to the host. What if we look at what GC cycles are happening when we run that application? So here, we can see when the tasks run, there's two fast tasks running again and one slow task. When the tasks are running, it has a few GC cycles. When there is a slow task running, there's a few more, but overall, they don't seem to propagate those objects into the old generation. They seem to die young. Okay. What if we look at threads? Threads or a task that this demo application as we later see uses a class task and a task is a thread. So each time we create a new task, it creates a new thread. We see that nicely visualizing the thread view of Thermostat and there you can already see we knew already that there is slow and fast task, but here we got some evidence that this is really the case. You see nicely that there's a thread created. Once the task finished, it got destroyed. Okay. So let's try to find out if we can really see where the problem is. Yeah. First attempt is using Thermostat as profiler. We do two profiling sessions. To the left, we see a profiling session of the fast task. To the right, we see a profiling session of the slow task. But overall, it really shows only that it spends time in those two methods, but we don't really know why there's a difference between the fast and the slow task. The only difference is what we already know, the absolute times differs. So to conclude, Thermostat alone can help for understanding the problem better and getting some evidence, narrowing down what might actually causing this problem. Here, we seem to have a CPU bound problem and when we do the profiling, we see there's no real difference between the fast and the slow task. What now? We thought that we might need a tool where we can drill down on a specific problem in this case, drilling down on certain method calls. So we know already time spent in this and that class and this and that method, but we need more detail. Okay. So we were thinking, can biteman help? Yeah. So what's biteman actually? Biteman is an instrumentation tool. It does bytecode transformations using implemented as a JVMTI agent, a Java agent. The rules files are specified in a domain specific language. It tells the agent which classes you can transfer, it should transform and which methods. So you can be fairly selective by using those rules files, what classes should be changed and how. It's really nice that you can dynamically load and unload rules as you might want to inspect or dive into a specific problem and inject a certain rule while your application is running. You don't actually need to recompile your program or anything. You just have to attach the agent and load a rule file. Do your analysis and you can then later on unload the rule and the bytecode gets re-transformed back to the original version. That's quite useful. So what does an actual usage of biteman look like? Oh yeah, you'd have to know that there's this Java agent switch and figure out the biteman char and script and thread and probably you have to specify a bunch of properties. In this case, the transform all instructs biteman to also transform JDK classes. And if you want to run it on an already running JVM, you have to know the PID, use the script, and use two invocations and that sort of thing. It's nice, but we were thinking, could we do better? Because thermostat knows already the PIDs of the JVMs and it can't figure out the PID and load rules for biteman so that it's actually easier to use. So the thinking was, yeah, let's combine the two. We were thinking that in specific cases, as we've seen before, the profiling of that demo application was inconclusive. You would need more information to really figure out what's causing the slowness of the slow task and why are the fast tasks running faster. And yeah, thermostat can help there to drive biteman. You can just select the JVM, as we will see in the demo later, and then load the rules into that JVM. You don't have to know the PIDs and so on. And what's more, you can do your analysis, extract some metrics that you ad hocly extracted using biteman rules files, and then you can visualize that using thermostat means. So we've implemented that as a thermostat plugin. It's extensible. So OK, onto the demo. Well, almost. We need to look at the classes first because the rules files are tightly coupled to the source code of your application. So we've already determined that the profile are where most of the time is being spent. But we had this compute intensive method. And well, let's look at class demo first. It has this get fast task, get slow task methods. And the only difference there is the input variables. The spread is for the fast task at an average of spread is at one. And for the fast task, the spread is at five. And they both average at 40. If you look at the class task, then yeah, it has this black box compute intensive method. And do work basically delegates to that. But our suspicion is, OK, so we're calling compute intensive from this do work method. Is it actually that call that's causing this? Or that's causing that performance problem? Or is it some call of compute intensive somewhere else? OK, so we decided to instrument class task and do work and IO wait. So here with the class specifier, we specify the class. And the method do work. And we do some stuff when that method is entered. We extract a couple of variables. We generate an ID. We capture the input variable using the $1 sign. And then when the actual heavy lifting is happening here, where we send stuff to thermostat. That send method is implemented by the thermostat helper. So you can extend bitemen to do some stuff and we've written this helper where you can send stuff to thermostat. And here we are sending a metric called work. And it has a symbolic value of transition. We set it to call. We set this variable input to the actual captured input and give it an ID in order to know which call was actually done here because we have seen before there's three calls to do work and IO wait. The next rule is similar but at exit. So we do some things when do work exits. Yeah, we extract the counter is something built into bitemen itself. But here we can get the elapsed time from a timer we've set up in a different rule. And again, we're sending some metric back to the thermostat using this mechanism of an object array. It's really a hash map there. But yeah, so we send a transition of return because we're returning and sending the elapsed time. So at enter, we reset the timer for this invocation. And at return, we're gathering the time that elapsed when we're returning. And so let's move on to the demo then. I've got it pre-recorded due to time and constraints. OK, sorry about that. OK, so I start out where I have. Well, you can't really read, can you? OK, I'm really sorry. If you're I have this screencast available online, but this isn't OK. Even though you can't really read, it should be OK for the chart demo. So I've loaded the rules here. They're all available online. And then I inject the rule by clicking a button into. But I've selected demo up there. But there, I don't see any metrics. And then once I execute the demo, they come into thermostat. And you see those metrics as we've seen there before. There's weight metrics and work metrics. That's that column. I run the fast task ones and the slow task ones so that I get metrics for both of them. And once that finishes, I can select the metrics I'm interested in. Maybe I'm just interested in the one that have an input field or an ID field. But what I'm really interested in is, yeah, how does the runtime look like for the fast task and the slow task? So if I switch to the graphs tab, I can visualize the metrics in a state transition graph, where you can see the transitions from call to return over a time. So here I'm selecting the timestamp as the x-axis and the transition value as the y-axis and filter by metrics that have marker work. And then I see those nice transitions there. For the fast task, it almost looks linear, but for the slow task, I have the first call that pretty much returns immediately, the second call that takes longer, but the third call takes pretty much the most amount of time. And we see that overall, yeah, the third call seems to be dominant, and it indeed comes from this call from do work. If we contrast that same chart for the IO weight, so we want to make sure that it's not the IO weight calls that are influenced in the runtime, but here we can see that it's pretty much constant. So we have three invocations. They're taking a constant amount of time for the slow task or for the fast task. There is no difference there. So right here, we have the evidence we wanted that it's not IO weight. We can rule that out, and it's really the input values that influence the runtime here. So if we visualize them for the slow task to the left and the fast task to the right, we see the input values of 41 to 39 and 45 to 35. And if we contrast that to the elapsed time we've captured with the rule, then we see the interesting curve. The algorithm of compute intensive seems to be exponential here. It's not as apparent for the fast task, but it almost looks linear. But for the slow task, we see that the third call is taking the most amount of time. Contrasting that, again, for the weight task, we see all calls are pretty much at 500 milliseconds there. I'm really sorry about the resolution there. That's what I wanted to show. Any questions? I have a couple of questions. So the byte man, can it work without thermal stats? Or can you actually generate all those reports on the command line at the moment? No. No, you can't. OK. When you do send, when you send those objects out to the thermal stuff or whatever, can you go all for the network? Or has it to have to be the same machine? In our scenario, the thermostat agent runs locally on the wherever the JVM you're inspecting runs. So you can actually do the remote injection, but the extraction stuff happens locally. So you can't inspect JVM running on remote? You can. With thermostat and bitement together, you can do that, yes. OK. And when you send the stuff, is it done by serialization or some other format? Yeah, we're using JSON to send the metrics to the thermostat agent, and that carries the metrics off to the database. You can do this after the fact as well. So you can run your byte man script, unload the byte man script so that your JVM that has a performance problem goes back to original state. But you can do the analysis in thermostat even after that happened. That's kind of nice. OK. Thank you. Yeah, one quick question. So a couple of years ago, you had something like a beat race in the JVM. But I think the project isn't that that's alive anymore. Could I compare? There's system tab. Sorry? Is that system tab you're referring to? Yeah, it was like a language you can use in JVM to or Jcastle to instrument it. Could I compare byte man, something like that, to fill that gap, perhaps? Bitement is purely Java byte code. And it's largely tied to Java code. Yeah, so it's byte code. And if I understand you correctly, you mean system tab probes in the JVM itself. That's more at a native level. Yeah, I'm not talking about beat race, but beat race. OK, I'm not familiar with it. OK. It's similar to the one we're talking about. There are a few differences that make byte man easier to use for this kind, especially for large project, where you want to instrument many classes. So byte man was a little bit better choice. But it's kind of similar. There are differences, obviously, in the script and so forth. But the overall idea is similar. Maybe the byte man site has some good documentation. It may even show up in the FAQs, if you're interested. Any more questions? Thanks, everyone.