 So, hello everyone, it's good to see you all here and I'm very excited to be here. So, as Bill said, I'm going to present today a slightly different topic. It's actually out of comfort zone for ABPF. Small introduction by myself. I'm Val from Tel Aviv, Israel. Staff engineer at Datadog. Before that, I've been a CTO and a co-founder of my own startup called Secret. And before that, I was doing some security research, so when you think about low level, I'm gonna begin to zest about it. At Datadog, I'm working on a USM product which stands for Universal Service Monitoring. And USM provides visibility into your service health metrics universally across your entire stack. But this talk is not about the product, but more about the underlying technology, ABPF. And I will try to show you in the next 25 minutes some blind spots of ABPF. And so, unfortunately, I won't be able to delve into all details of every subject of the presentation because each one of them can be a separate presentation on its own, but instead I'll try to show you a full picture and at the end show the solution that I ended up implementing to solve the problem of capturing encrypted TLS traffic of Java applications. And please don't throw rotten tomatoes at me for exposing the blind spots of ABPF. So let's begin from a 30,000 foot view and talk a bit very, very briefly about the general scope of cloud application monitoring. The goal here is to track performance health and availability of the applications running in the cloud environments. And one popular approach which is widely adopted called RedMethod was invented by Tom Wilkie while working at Google. So Red is abbreviation which stands for R for requests, R for errors and D for duration. So basically we are measuring the number of requests that your service is handling. The portion of those requests that are errors allows you to know how your service is functioning and whether it's within your SLO. And then finally the duration of time it takes for each request to be handled by your service gives you an insight in the overall user experience of your application. So here we have a short diagram taken from a data platform that illustrates the red metrics. So now let's begin gradually zooming in and to understand the actual problem before we are talking about any solutions. So this time we will talk a bit about ABPF positioning and cloud application monitoring. And heads up I'm not going to give any basic introduction about ABPF, I'm assuming the audience here is familiar with it. So ABPF has rapidly become a cornerstone in the ecosystem of cloud application monitoring products and it's marked by its versatility, minimal invasiveness and negligible performance impact. So this positions ABPF as a comprehensive solution for modern cloud application monitoring offering a unified approach for diverse monitoring needs deployment that seamlessly integrates without disrupting existing operations what I call zero touch and only marginal effect on the application performance ensuring efficiency. At DataDoc we are utilizing ABPF in multiple products. One of them is USM that I mentioned before. That's the one that I'm working on this poll. So now we are narrowing the scope even further and let's have a very crash course about Java and JVM. So Java being a high level language known as write once run anywhere, Vora. That allows for a developer to code without worrying about any specific operating system. The magic happens because of the Java code is being compiled into intermediate bytecode which is a universal format and then the Java virtual machine which is the JVM which presents on most system translates this bytecode into the native code that is being executed in your environment. JVM offers a few key features. One of them is just in time compilation to speed up and optimize the performance of the application. Finally, JVM also performs automatic memory management which is also known as garbage collection to free up and use memory making development smoother and prone to nasty memory corruption bugs. So let's focus for a second on a few important aspects that we will see in the following slides that will be crucial and impact our straightforward approach with the VPF to handle Java applications. The highlighted properties of JVM will become a major blocker for the straightforward solution, as I said. So first of all is the optimization mechanism used by JIT and the fact that the final native code is dynamically created and modified during the lifetime of the application. And to add on top of that, there is also a built in memory management of JVM that controls the memory layer target application, which adds a dynamic property to it. Okay, now we have enough background and we can start assembling the pieces of the puzzle to understand that the problem exists. So let's start with a very simpler use case than the problem we will be talking soon. Let's start with the plain traffic coming from a Java application. And this is the area where VPF is actually excels because we can utilize K-probs that are agnostic to the target application programming language. And we are hooking the traffic from the kernel side and we are able to capture those payloads. Small disclaimer, in the USM we are using socket filters, but for the sake of this presentation, it doesn't really matter whether K-probs or socket filters. Actually in the reality, this approach works pretty well in customer environments because a lot of times the TLS of loading is happening at the load balancer on the gateway level and the inter cluster communication between the microservices is actually plain traffic. So now let's look on a slightly different use case. Here we are talking about TLS communication but from a Python application. So the inclusion of Python is not by mistake. It's actually to emphasize the real problem of Java and the uniqueness of this use case. So let's start, yeah, there are a lot of arrows so I'll try to break down this diagram. Let's start with the previous approach. If you are trying to use K-probs, those will not work because the TLS traffic is already encrypted on the kernel side. Fortunately for us, Python is using a native library, OpenSSL to encrypt and decrypt the traffic. And the EBPF deals with it relatively easy using the Uprobs mechanism, which basically user mode hooks on exported symbols. I mentioned relatively because it's also a bit devious from the approach one solution to rule them all since we need to support each one of those user mode libraries independently. For example, OpenSSL, GNU TLS, BoringSSL and et cetera, et cetera. But again, this is also not a scope for this presentation. So now we are moving to the actual problem. So now we are facing TLS traffic originating from Java application. And here EBPF is actually powerless when we are taking the straightforward approach. So let's understand why. As mentioned, there is the Python case. K-probs won't help us here. The traffic is encrypted. Unfortunately, we cannot set up the Uprobs because the aspects of JVM and the fact that the bytecode being dynamically translated into the native code, which prevents us from having exported symbols and fixed offsets to set up the Uprobs in the beginning. So now we have also the problem. And we are trying just to clarify the problem. We are trying to get the red metrics that I mentioned before from a Java application that are using TLS encrypted traffic as a communication. And to add on top of that problem, let's add a few more requirements that we want to match when we are trying to develop a solution. So similar to general requirements of the USM product, we want to be non-destructive and have a zero touch deployment. So basically we don't want our customers to ask making any changes to their application or redeploy them. Obviously, we want to support different flavors of different HTTP libraries written in Java and also different Java versions. And lastly, we want to be able to run non-containerized and non-containerized environments and hopefully also have a minimal resource usage overhead. So to begin with, I'll present here a slide that will show a pseudo code for the full solution. And then I'll dive into each one of the steps in the following slides. So full solution will contain four different steps. The first step will be detection that we want to detect the target application, Java application that we need to capture its TLS traffic. Then we will have the injection phase and in this phase we will inject the Java agent which we will talk in a few minutes about it that will allow us to capture the traffic. How we are going to do that by using instrumentation technique which is a mechanism that is built in in Java in order to instrument the target classes and capture the plain payload just before the encryption of the ingress traffic and just after the decryption of the ingress. And finally, once we have this payload in our hands inside the target Java application we would want to extract it somehow into our existing pipeline to the USM product for further processing. All parts of the solution are fully open sourced. In the next slides I left some links for code in our GitHub so you can see. So let's start with the first step which is detection. I will not dive too much into the details. Basically we have an existing component in the USM called process watcher that allows us to detect processes based on some attributes. We are reusing this component for our solution here but we are also using it for other use cases. For example to detect Go applications and deploy Uprobs and Go TLS. Once the target application is detected now we want to do the injection phase. So the injection phase is actually a unique phase for Java TLS problem. For other languages or other like plain traffic we could reuse Kprobs or Uprobs as I mentioned before. But here we need to implement an injector and that injector is part of our product. And once we have the injector we also need some payload that we want to inject so naturally we need to embed some Java agent, a Java file inside of our Datalog agent. For the payload and I will explain in a bit details later we are going to reuse our own DD Java Tracer which is also open sourced and we are going to extend it a little bit for with the functionality that we need to capture TLS traffic. So let's talk a little bit about injection. Java has a built-in mechanism allowing injection of the agent into a running Java application. This is called dynamic attach. And it allows among other options to inject a package jar into a running Java application. So those Java agents can be actually dynamically attached or statically attached upon target application startup via a special command flag. And dynamic attach basically how it works. There is a listener's thread in the target JVM and we triggered this thread by creating a special file with a specific name, data touch underscore PID and the PID of our process. And then we send the secret signal to the target application. On the JVM side we have a handler for this signal that captures the signal and then establish a communication channel on Linux, it happens over a unique socket. And then over that socket we have a proprietary protocol to perform different operations from our injector to that target application. There's an open source utility called Jattach that implements a simple injector which supports also containerized environments that does like namespace switching. It's written in C, so what we did in our solution is we actually ported the solution to go since our agent and the user space is written in go. And fortunately this functionality of dynamic injection is allowed by default for Java applications with a small asterisk, so it's true until Java version 21. And then from Java version 21 the flag was flipped and it's disabled now by default. Once our agent is injected, now we can start instrumenting the target application, classes and modify their behavior in runtime. So in the next two slides we will go over the basics of Java instrumentation and dive a bit into the particular targets that we want to instrument for our use case. Reminder, we're trying to capture TLS traffic here. So let's do a crash course in the Java instrumentation. Any Java agent must have a special two entry points, pre-main and agent main. So pre-main is used for static injection and startup and agent main is used via dynamic injection. That's our use case here. In our case we want to capture playing buffers. As I said just before those requests are encrypted and just after the response is received and decrypted. So in order to perform this instrumentation there are a few options. Like there's a built-in option in JDK and there's a library called ASM which operates on the bytecode level and allows us to do instrumentation of Java classes that already compiled into bytecode. This is too low level Java for us. So instead of that we are using another open source framework called bytebody which is basically a high level wrapper that allows us to operate on the Java source code level and under the hood it is using USM. And bytebody in a very brief contains like three major components. So there are matchers. Matchers are meant to find the target Java classes that we want to instrument. There are advices. Those are the snippets that we write in Java and this is the code that we want to execute once the target classes are instrumented to execute our custom behavior. And there are transformers. Those are in charge of transforming the target class to include our advices. And bytebody also provides a rich annotation capabilities that allow us to execute this advices in different times of the target application execution. For example, on method enter or on method exit plus it gives us a direct access to different variables and arguments of the target function like a return value or this pointer which is the current class pointer in Java and more. So now let's talk a little bit about what do we want to actually instrument. So on the Java side there are a lot of different implementations for different HTTP web frameworks and network frameworks. Just taken from our own documentation, this is just a sub list of different frameworks that today trace your supports. And but as I mentioned before, one of our goals is try to minimize the solution and have sort of like one hook to rule them all. So instead of targeting the application layer, we want to go on the networks stack a bit down and go for a session and presentation layer. And here I have a small diagram just to illustrate the solution. So there are different frameworks like Apache client, OK HTTP and a few others but they are all using the same libraries under the hood. So they basically class into two different groups. They are synchronous communication and asynchronous communication. For the simplicity in this demonstration I will be talking about synchronous present communication and in this case all those frameworks are using SSL socket which is part of the JDK Java base module. But the asynchronous case can be handled in a similar manner with just a few extra cavies of asynchronous programs. This one. So now let's show some example of the advice like what is the instrumentation? How does it look like? As we saw in the previous slide, we want to target the SSL socket but SSL socket is actually an abstract class and there are different implementations of this class. So our matcher will look something like that. We are targeting SSL socket and derives which are concrete classes. And then our transformer will look like that that we want to target specific methods of this target class. So in this case the SSL socket implements some stream interface in order to read and write into the socket and there's also a closed method in order to close the socket and determine the connection. So we want to instrument those in order to apply our proxy and capture the traffic which while it's still playing. So lastly, that's how our advice will look like. So for example, here we are instrumenting the get output stream function of the SSL socket class and we are replacing the return value. Again, like take a look on the annotations provided by the body. We are replacing it with our own implementations or basically installing a proxy in the middle. And finally, once we managed to capture the traffic so we want to extract this traffic into our pipeline which is in a separate process or a separate container depends on the environment. So in order to do that we need to implement a sort of inter-process communication protocol. And for that we have our own implementation what we call ERPC. ERPC is like for inter-process communication over ABPF. Actually the solution is very simple but that's what makes this also very elegant. Basically we have a handler on the, in our ABPF site that we are setting up a K-PROP on the Yocto Cisco and we are filtering out only the Yoctos with a specific code. And then on the target application site, in our cases the target Java, we are just firing this Yocto with the payloads. And that actually works in any environment whether containers or nodes since our agent is running on that host in a privileged container. So now it's a live demo time. Start with running an agent. This is our USM product. And let me show here the agent and we're going to execute a Java client that will just send a HTTPS get commands to HTTP.org slash UUID every second. And for some reason that doesn't work. Let's try it again. So for this case I have a pre-recorded demo. I'm going to do the same thing but just now I'm just talking and not typing anything. So we are running an agent here with a config flag enabled for Java TLS monitoring as I highlight here. Then what we are going to do, we are going to do a serial to some debug Unix socket to see the payloads that we capture. Now we didn't see anything. Now we're going to run the client which will send a request to HTTP slash UUID every second. And now we are going to send serial again to that socket and we will see the payloads. I think that will be it. So any questions? So we have time for one question. If anybody has it, has one. Do you have a plan for after Java 21, I think it was? Yes and there's no. So we do have some plans that we want to explore but we don't have a solution that will for sure work. Our main plan is actually to integrate our work with another product of Datadoc called APM that are working now on doing also sort of dynamic injection support for Java applications. We have a bit different method but it's like for a long-term roadmap. Thanks. Great, thank you, Val. Thank you.