 Hello everybody. I'm Tobias Leendochir and I work for a company called Neo4j. We built a graph database management system and we built it using Java. And in a prior life I used to work on creating and implementing a dynamic language for the JVM called Jiphon. So, Python for the JVM. So, throughout the years I've, and still, encounter a few, come across a few interesting things that you can do with the JVM and bend it in a few ways that are fun and interesting. So, I thought I'd, a few times before and I thought now again I'll do a talk and share some of those things. So, this is talk about hacking the JVM and doing it mostly from pure Java code. So, most of you are probably thinking about this guy, Sunmis Gunsave. And you probably know about it, so it's boring. I'm not going to talk too much about that. In fact, most of it is, most of the behavior here is made completely safe and available by other APIs these days. So, it's not very interesting anymore. The only remaining thing of course is a good API for native memory access, but hopefully that will be coming any given future now. So, I'm going to talk about other things instead. So, let's talk about, start with something that is public and not secret in any way. The thread MX beam has a CPU time method that gives you the time that a, the time that the thread that you specify as using the thread ID. The amount of time that that thread has spent actively executing on the CPU. This is really useful for being able to measure how much your application is actually executing versus being swapped out in idle in some way for doing performance measurements on a different level than what you can do with a system nanotime. So, sure, you're getting, a system nanotime is very good for measuring how much wall time is being spent, but you might also be interested in looking at how much time is, did that code spend actually actively executing and not just being swapped out. This thread MX beam has a, has a sibling or child or bastard hidden friend that's also called thread MX beam, but in a different package in comm son management. So, this is actually another interface that the same, on hotspot, the same actual instance also implements. And here you've got another interesting method for getting the allocated bytes by thread. This is also super useful for measuring how much, how many bytes, how much data, how much memory has been allocated by a particular piece of code. So, you take a snapshot of this before your code starts executing, take a snapshot afterwards and then you compare the difference and compute the difference. And that will tell you how much data was being allocated by this particular code that was being executed. Sure, it doesn't tell you how many objects were executed, but the number of bytes is a really interesting measure because that will tell you essentially how much pressure the allocator and garbage collector will be under. Unfortunately, this method has a problem. First of all, obviously it's not a public API, so it's not officially supported, so I'd love it if it could be. And in the process of making it officially supported, I'd love it if it also didn't require acquiring a global lock in the implementation for getting the allocated bytes for the current thread, which technically it shouldn't need to. But the use cases haven't been there, so it always grabs that lock, which means that even though we use this in Neo4j, you can turn tracing of allocation on for your queries when you run queries where it is a database and get information about this much data was allocated while you're query ran. But it unfortunately leads to contention, so it's turned off by default. There's another interesting API that's available in every JVM. This is a public API called Java line instrumentation. This is an API used by Java agents. It's typically used for things like registering bytecode transformers, which I'm sure people have interesting use cases for that. I've never really had a good use case for doing transformation of the bytecode in a running system. I prefer to do my transformation ahead of time. But one reason for that is that it makes code loading much, much slower. But it also provides a handy method for getting object size. Give it an object and it will tell you how big it is. Size off operator, which is missing otherwise in Java. The main downside of using the instrumentation API, if you want to get access to the object size method, is that it's a bit tricky to get to. The only things that get access to the instrumentation API are Java agents. So you have to write an agent in order to look at object size. That's annoying. Luckily, you can deploy an agent from the same JVM into the same JVM while it's running and then via a static field somewhere, give yourself access to the instrumentation instance. You simply write an agent class like this. I've compressed this heavily to fit it on two slides. With an agent main that takes the instrumentation instance as parameter, that's the method that will be invoked when this jar file runs as an agent. And then puts that into a field that we can read. So the instrumentation method will look at a field. If it's not assigned, it will load it. It will load this class as an agent, which will assign the field, as we see in the agent main method, and then return it. So how does this loading work then? Well, this uses a private API, the virtual machine attach API that's part of the hotspot VM, which allows you to get, to attach to a VM and tell that VM to load an agent. So here we get the PID of the current process and load the agent of the current jar as an agent in that VM. And then attach, that's all we need to do. The agent main method then will do the rest. But if we were to think about how to make the JVM better, how to make this easier, why do we really need to require an agent to get the instrumentation class? Sure, it has lots of capabilities that should be declared in the agent jar. So maybe some of those capabilities, like get object size, doesn't need to be reserve for agents. How about having a getter for instrumentation or a getter for some smaller API that provides get object size? If we want to get even more powerful tools, there's, we can write some C code or some native code using the JVM tooling interface, JVM TI. So this is a native API for writing native JVM agents. It's another type of agent, not a Java agent, but I think these are just called agents. The interesting thing about this is that once you've, even though this API is meant to be used from an agent, it's perfectly possible to write JNI code that calls it. So we can do that. Although you might still need an agent in order to enable certain capabilities, because not all JVM TI capabilities are enabled by default, just like not all instrumentation capabilities are enabled by default. And some of them cannot be enabled once the JVM has started running. They have to be enabled before. And at that point, they will probably, most likely, disable a few of the optimizations that a JVM forms. But if you want to play with them, you need to have an agent to enable those capabilities. I don't actually remember which capabilities those are, because I wrote this code a couple years ago. But once we've done that, we can play with a lot of interesting JVM introspection capabilities from pure Java code. Just calling through JNI to this native code. That's actually quite simple and straightforward. We can do things like reflecting a call stack. So getting the stack of running Java code and getting local variables out of the calling method for interesting things like that. So there's a method in this JVM TI API called getStackTaste. We can use that to get the stack trace from a certain depth and other particular length in the number of frames. So we can say from depth zero and guesstimate that the stack will be no bigger than 10 million or something like that, some big number. And we'll get the frame information from all those frames and a counter telling us exactly how deep the stack was. Or we can say I want, at a particular depth I want just one stack frame because that's the frame I'm interested in. So this will give us this little structure of information containing the method ID, which is a JNI reference to a method. And the current location, the current offset in the byte code of that method where execution is currently act. There's also a method for getting the frame count so we don't actually have to guess the depth of the stack. We can call this method to get the actual frame count. And we can also use this method to, the total frame count to convert the depth at which we inspected the frame to a height from the bottom of the stack where it's at, which is useful if we want to do, if we want to reflect the frame, create a Java representation of the frame as an object and have that be sort of a live mirror to the actual underlying frame. We have to have some way of detaching it. And in order to do that we need to know, we need a fixed number of the offset that is at, because it's not always going to be at the same depth, but it's always going to be at the same depth. So once we have this, what's it called, this frame info, we can then use the method ID to get the variable table of that method that gives us another little array of structs that contain information about what the, where in the method at which the byte code offsets the variable is valid, what the name and signature of the variable is, and also which slot it's stored in. This return table pointer needs to be deallocated, so we need to use the deallocate API once we're done with it. So now that we've found our stack frame, gotten the information about it, found the method that is executing there, gotten the local variable information for that method and found which slots, which slot contains the particular local variable by name that we're interested in. We can call the get local object or get local float or get local long method to get the actual value stored in that local variable. We can even set the local object. And as I mentioned, if we want to reflect this as a Java object that has a live sort of live coupling to the actual frame, we need to be able to detach that and we can use the notify frame pop method to be able to do that detaching. So we call notify frame pop and say that when the frame that is currently up just given depth is popped from the stack, so when that method returns, call my callback, my frame pop callback. And the frame pop callback is registered. There's one global callback that you register for your agent. And when that frame is called, you get a callback that tells you which thread and which method and if it was popped by exception or a normal return. So this allows us to introspect by calling this from JNI, we can inspect the frame of the call frames of regular, of any Java code from regular Java code, which is pretty neat. Another thing that is useful that JVMTI can provide is the ability to walk the heap. This is pretty logical that this is a possibility to do because the JVM has to do this all the time. This is what garbage collection does, it walks the heap. And typically marks objects for collection or for moving because they're going to be kept. And even this ability is exposed through the JVMTI interface, the ability to tag objects. So we have an API that allows us to walk the heap and tag objects that we think are interesting in the heap. So we can, for example, follow references from a particular object, follow all the references to other objects that it references, or we can linearly scan the entire heap. Both all objects and scan live objects. We can also iterate through, there are also APIs for iterating through all instances of a given class. So we can use that to create a Java API that allows us to get a, given a class, give me all the reference, give me all instances of that class. We would use the iterate over instances of a class method in JVMTI for receiving this until it to iterate over all these, all instances of this particular class and call this callback. What we'll do in this callback is simply just tag all the objects that we see. So an implementation like that. Assign the tag pointer to the tag that was specified. Then we use the get objects with tags method to get an array containing all the objects that were tagged, which we can then turn into a Java array and turn to our Java code. We can do even more powerful things that are similar to what the get object size did in the Instrumentation API by getting the retained size of an object. So the size of an object and all of the objects that it references by iterating through using the iterate over object reasonable from object method, which does what it says on its end. It iterates over all objects reasonable from a given object. It doesn't, and gives you a callback for each of them. It doesn't give you a callback for the object that you start from. So we also have to use get object size on that object, which is also available as a JVM3 I did. In the callback function, we get the size handed to us, so we actually don't need to invoke get object size for each visited object. But instead what we do is just take that size and sum it up into the user data field or the user data pointer. And that was it. That was a short overview of things that I wanted to share that you can do with hidden APIs, not very frequently used in the, but that are available in pretty much any JVM.