 Well, my name is Nicole Gochevsky. I'm here with my co-presenter, Mario Matias. We're here to talk about instrumenting applications that are written in various dynamically managed runtimes rather than static group pile binaries. And some of the challenges we've experienced working with these programming languages and applications. I'll do the introduction sort of a little bit first, and then Mario Matias will continue, and then I'll come back after. So this is going to be a really much low level talk. Maybe too deep for some of the audience here. Apologies for that in advance, but it's kind of hard to make a talk where it goes both high level and low level. So we'll talk about managed runtimes, the stuff they do, garbage collections. And we saw a lot of this stuff actually mentioned before by Val and Data Talk. We're going to talk about threading models, how they impact us, and two different technologies. So what's stuff that we've looked into go, how we actually instrument go applications, some of the challenges there, and then switch over to what I call depth march from, I guess, gaming reference into Java. Mario, take it away. OK, thank you, Nicola. To set the context of this talk, we are auto instrumenting, or we want to auto instrument applications using eBPF. So for a giving web application, we want to get metrics and traces of the different service communications with CELOTouch. How do we do it? We have developed and released a product named Grafana Bela, which is deployed in the same host as your instrumented application. And using eBPF, it is able to hook into the application runtime and libraries, as well as the Linux kernel, in order to get events and extract information from those events. And send those metrics and traces to Grafana, of course. But if not, we provide standard Prometheus and open telemetry interfaces to send it to any other collector of your preference. The way to instrument binaries, or the tools we have, we can instrument the kernel code with K-probes. But to instrument the user space binaries, we might use U-probes and URED-probes, which are mechanisms provided by the Linux kernel, or user-statically defined trace points. With U-probes and URED-probes, we know when a function starts and a function ends, and then we can get these parameters information. From Bela, currently, we only use U-probes and URED-probes as the user-statically defined trace points are typically uncommon. Open JDK use them, but are not built by default. So this is an example of U-probes. Of the code we need to attach into this example to a library in order to get when a function starts. This is only the skeleton. This doesn't do anything else but getting the information we need. So since we need to know when a function starts and ends, we need to attach a U-probe and a URED-probe that will be executed. The U-probe will give us some information, like, for example, the buffer. This is the live SSL3, the SSLRIP information. This way, we can know when SSL data is being read. We can get the parameters, like, for example, the buffer or the size. And in the URED-probe, we might know or we can know the return code. Those events are executed independently from a program site. So those programs are stateless. We cannot store. I mean, in main memory, we need to store the data from the invocation to the end. It needs to be stored in BPF maps, which are special data structures that run in the kernel. And also, we need to take into account that even if the URED-probe is executed after the U-probe, that doesn't mean that always after a U-probe is executed immediately after comes a URED-probe because this SSLRIP function might be invoked multiple times in parallel from different threads. So when we store the data of the U-probe and we want later in the URED-probe to know the information, we need to relate each code or each start, which is end. For that case, we have a helper function that gets the current PID and thread group ID. This way, we can use this as a map key in order to know or to store the arguments when the U-probe starts and to retrieve them when the U-probe ends. That way, we can relate each invocation with each return. Also, we are doing some assumptions. Here, it's a relatively simple example because live SSL is written in C. It's written in an unmanaged language, in an unmanaged runtime. So we can do some assumptions, like, for example, the buffer we are reading, the buffer with SSL contents, doesn't change the address of that buffer. The buffer might change. But the address of that buffer is the same when the function starts and when the function ends. This is not always the case in other languages from manager runtimes. We still need to take some caution because we are assuming here that this library will have always the same structure. But libraries have versions. APIs might change, especially private APIs. That could mean that if a user updates the library, and it's API, it can introduce some changes in the code. At the kernel side, there is a mechanism named btf and core compilons run everywhere, which allows that the code written for the kernel keeps track of, for example, the different offsets and arguments for the different versions of the kernel API. But that doesn't exist for U-probes. For example, imagine we have this FULIP 13.1, and it has a struct. And we want to get information from this struct. From the evpf side, we cannot see this struct as a name of a type and a field name, but we need to see this struct as a set of offsets. If we want to access, for example, the bytes, we know that for this FULIP version, it will be in the beginning of these flow metrics, variable plus flow bytes. And that's the way we need to access the information of the structs. But what happens if for our deployed code, the user updates to a new version, and this new version adds, for example, a new field in between. If we were accessing the error number, Erno field, and they add a flux in that place, we will immediately access part of the flux field and the Erno will be in another place. So we need to take care about that. We can use the book information. So if you have the debug information, if you are executable, you can get those offsets. But if many people strip the book information just to reduce the size of their binaries. So in that case, we need to maintain a database, a local database with all the structs of all the fields we access for all the different versions of the libraries we access. Then this relative fragility gets increased when we go to managed runtimes, like Go and Java. Because those managed runtimes provide garbage collection that might mean that an object might change its location during the life of the program. Also, we can get managed stacks. We will see that later. Even managed threads like virtual threads which are not running at the operating system level but run at the application level managed by the runtime. So our EVPF code needs to make use of that. Another, also other aspects like linking or calling conventions that can change not only from one language to another, but from one compiler to another or from one completion mode to another. In, for garbage collection in Go, fortunately we don't have so many problems at the moment because that this garbage collection algorithm could change in our future because Go, when passes the garbage collector doesn't compact memory. That means that if your data is alive after a garbage collection, the data will be in the same position. So we can be, we can know safely that our pointers will be still valid. We have us in Go. However, we had an issue that when we were trying to instrument Go programs, we had the issue that programs often failed and that's because Go use Go routines abstract the thread model into a higher level Go routines and Go routines have very small stacks. It happens that when this stack grows too much, Go just grows the stack and might relocate it. That means that the frame pointer, the written code could change completely its location since you initially start the stack at the beginning of a function and you are trying to retrieve that stack from a wrong location at the end of a function. So what we do for that is that instead of using Uprobes, we need to, Uread Probes, we need to use Uprobes always. This is to know when a function ends, we need to parse the binary code of a program, look for return instructions and then attach their Uprobes on each read instruction. That way we might know when a function ends without letting our program to crash that it is what was happening before. Also with managed threads, there's Go routines run on top of build operating system threads but multiple Go routines runs on a smaller set of threads. So in order to correlate a function start and a function end, we cannot know, or we don't know, we cannot relate them by thread ID. Fortunately, we can know the current Go routine, which in Go is always defined in a standard register. So accessing that register, we might use the Go routine pointer instead of the thread ID. This is valid for Go 117 in advance. This is something that might change in the future, the same way change from Go 116 to from 117 from 116. Also, we need to be careful with these linkage and calling conventions. For example, from Go 116 to 17, the function calling conventions changes, all the code was broken. So all the auto instrument at the time needed to change and support both. Also for the register calling convention, it's not the same and the system VAP-AVI, so it will differ in Go then from other languages. And EVPF probes are also sensible to linker options that you need to detect at runtime and support. This is an example of how we do this skeleton of your probes in Go. Instead of getting the thread ID, you need to get the Go routine address. This Go routine pointer is a macro. We used to get this Go routine address and use it as key of the map. And also, the arguments instead of being set as argument of the U-probe, you need to know internally how is the calling convention in order to get it from well-known registers. And also, instead of U-read probes, for example, for this function, server handler stream, you need to set another U-probe. This is done dynamically inspecting the binary code. There are some other corner cases that, as we say, that the hit memory reference don't move, but the stacks might move. So make sure your pointers are always valid. Pointers to hit might, we know that might be or are valid, but for example, pointers to the stack might not be valid during the whole life cycle of your functions. So you need to take that into consideration always because also because Go performs escape analysis. So when you think even if you are allocating something in the heap, Go compiler internally might know or might demonstrate that this is not, or this can be allocated safely in the stack and then it will allocate that in the stack. So you cannot do so many assumptions. You need to inspect your instrumentation targets in order to make sure your values are safe. And now, Nicola, we'll go to that. Go was the relatively simple case. Now the death match is Java. Yeah, so I wanna say to the previous implementation for Go, I wanna see a contrast of what it is to look like for Java and how is Java different if we want it to go there? And some of this applies to .NET and other similar technologies, but we'll stack for Java. So the first thing about Java is that there's different implementations and some of those implementations are completely rewrites and don't use the same technologies. So for this purpose of the top, we'll stick with OpenJDK, which is the most commonly used Java, JVM. Now, the important thing here is about garbage collection. We said we go, references don't move. In Java, they always move because all the garbage collectors the Java have move references around, the copy and so on. Which means the stuff that we did before where we kind of remembered in a map a certain reference to a pointer that we're gonna use later, we can't do that. So to demonstrate a simple picture that we had before, even the simplest garbage collector in Java will do this additional step at the end, not always, but occasionally. So it will compact the memory to reduce the fragmentation of the heap, to speed up the heap allocations within a life cycle. Go with chosen not to do that because according to their talk, they won't micro-sake and stop the world latency so they don't do that part. Now, managed stacks over here, typically they don't exist in Java. Like you get this stack max that you can change via the command line. But with the introduction of virtual threads they do exact same thing as go-routines. So if virtual threads become a thing in the future and people use them more often, then we have the same problem as with go-routines which means you read probes will not work. Now, fortunately there's a dedicated thread register in the code similar to go that we can use to make keys but then the question is what can we store those keys? Similar to go, the linkage convention is not, doesn't match system five API. Now, we can start with what we thought about a solution so we could instrument applications or use some level of instrumentation for Java. Well, we can use the Uprobs only. And then we can't remember references because we have to assume that at some point everything will move. If we need to read data from the heap that needs to happen always on enter of the function or if you're lucky then something gets returned at the end of the return of the function then you can safely read that return value. And since the go-runt, actually the Java runtime will recompile methods. We need to maybe instrument more than one method that we like. And of course just like in go, we can adapt our linkage. So I mentioned here is Uprobs and but what can actually be Uprobs actually used for? So we can only instrument the JIT compile methods. So Java JVM started interpreting the code initially and then later on they compiled with one compiler in OpenJTK case called C1 then later on they compile it with another compiler the same method called C2 and the C2 can recompile multiple times. So that actually would be difficult to deal with because which means we have to somehow when they're generated on the fly there's no binary to inspect and this binary actually changes over time. So the other kind of problem is that's not the only problem. The other problem is that inlining is very much unstable. So if you run the program once you will get one set of actual binary compiled methods. You run it another time you may get slightly different or completely different set. Why? Because the JVM actually uses a lot of runtime profiling information to guide the inliner to produce more optimal code. So because of that you cannot be guaranteed that there will be the same methods that you saw last time. Which means if you're targeting instrumentation of a specific functionality of the application you may need to instrument more than one method because inlining may be different between the stable state that you expect to have. So the other thing that's sort of nasty about Java is that the JVM itself does a lot of code patching. So it would make assumptions about the method it's compiling. Let's say you have multiple classes extending the same method, same virtual method and at the beginning where the JIT compiler compiled this method there was only one implementation of this. But later on when you run the program some new class gets loaded and this actual assumption gets invalidated. So the JVM runtime will patch all the places where this sort of assumption was being made. Which means approach of disassembling and looking for return instructions may not always work. Why? Because the patch code may not actually disassemble really well. And then temporarily use any, lose any context for the disassemblers who may get gibberish or garbage and not actually find the right return instruction. So attaching a return of a function can be dicey. Not always compatible. So the question on this side is do we find these Java symbols? I'm gonna skip to the second point before I go to the first one where in the case of Gral VM native which is also popular in Sunspaces this is pretty simple. Because there is a binary, it's compiled one way and it's always that same binary. So those are good. Now how do we compile, how do we catch the compiled methods? Well there's two problems. First is like you can catch the compiled methods as they come. And one way to do it is to attach a Upro on register end method which is a function in OpenGDK. It's called every time the compiler wants to commit a new method in the code cache. So in this case you can attach a Uprobe, send that event over to the user space of the eBPF program and then drop the previous Uprobe on that method, attach a new one to the new address and keep doing that as the methods get recompiled all the time. There's one problem with that. You don't see all the methods so it depends when you first started running your eBPF program. What about the stuff that was compiled before you started monitoring the calls to end method? So one possibility is that, well maybe you can dump the list of methods that existed before you went through and there's programs for that like the JLM provides JCMD. Fortunately when they made the decision to disable dynamic agent loading they didn't disable some of these interfaces so this is still possible to do with JCMD. You just can't load anymore by default agents but this method list and garbage collection stuff you can still do. And if you can control the Java command line then actually things become a lot simpler. If your application is running in a mode where you actually also deploy the eBPF program and run the Java program then we stand a better chance. There's command line options in the JVM command line that you can actually for one thing get the list of compiled methods that are already out there without attaching to the JVM but you can also maybe even control the inlining to say maybe stabilize around this one methods or a couple of methods that I'm really interested in actually instrumenting. So that's about it about Java. Now a further step than that we thought about is like well there's some languages and runtimes that are impossible. So interpreted, just put all byte codes that run inside with some interpretation engine that is written in another programming language or maybe the same programming language but nonetheless there's no symbols or that you can particularly attach to. So example like Python running in interpret mode. There's also partially compiled methods. Java does this for example. If you had a method that you called once and it was an endless loop where the JVM will do there is actually we'll start interpreting it then later on there's code that will detect that this is really hot method perhaps and the Java will do on stack replacement replace the remainder of the method until the point where it was actually stuck in that endless loop where the binary compiled to the method. Now you lost your chance to actually instrument the beginning of that particular function. So nothing we can do there. There's also trace compilers. Trace compilers don't compile based on method or function boundaries. They just compile whatever the interpreter tells them that is hot path and they see it as a stream of some intermediate language byte code or whatever it is and they turn it into native code. Fortunately for us, like depending on the runtime where instrumenting there might be ways that we can instrument some parts of those managed runtimes. Like if they use a lot of native calls like Python does or Node.js or Ruby then maybe you can instrument those parts but not the interpreter's side. So we'll be implementing so far. No project is that we've done the Go support that's released. We care mostly about open telemetry observability so we sort of stay away from Java.net and those guys because we think that those agents actually by open telemetry are pretty good so they do a lot of stuff that customers take a use of. Our primary interest in Java.net has been around native binaries where you cannot dynamically load agents and do auto instrumentation and fortunately for us those actually do come with symbols that actually don't change or recompile on the fly. And in future work is that we'll probably add more user level instrumentation for other languages and managed runtimes. And overall I think also one thing that people sort of miss is that EVPF is also great for getting information by the runtime itself. So attaching new probes to the JVN runtime you can get a lot of information like calculating GC times or in the Go case maybe even counting a number of Go routines or how much in Node.js the events are stuck on the event loop because those actually parts of the managed runtimes do have symbols it's actually easy to apply EVPF to get some of them. So to summarize I think there's some level of EVPF instrumentation that we can do for managed runtimes. Some managed runtimes are much easier to deal with than others. And some managed runtimes are most impossible to do. I think that typical approaches for dealing for EVPF instrumentation of static compile languages can be adapted to some extent and successfully be applied for managed runtimes but not always everything has a limitation in this role. But just like it does in static compile languages if the symbol if a particular met function that you want an instrument is inlined in every possibility or in the case where you care about well tough luck, right? So limitations exist in instrumenting static binaries as well. Thank you. That's it. You can find us on there so please relax.