 Hi, and welcome to Instrument and Find Out, writing parasitic tracers for high level languages. I'm Jeff, I'm a good little NCC group. I like to do hack on stuff and do various things for the purpose of this talk, that means programs, languages, run times, memory, and bytes. But first up and notice, by viewing this presentation you agree to indemnify and hold harmless the presenter in the event you decide to take any of his advice and find yourself unable to sleep at four in the morning due to language demons. So just as an outline of the structure of this talk, I'm gonna talk about kind of the background of what led me to this work, what parasitic tracers are, how to kind of design them for tracing high level language run times, looking at Ruby as a sort of case study, and then some concluding thoughts. So first about me, I've done a fair amount of work with dynamic instrumentation and tracing, from Java byte code, various stuff in Android, both user land and in the kernel through BPF. Generally I do a lot of this stuff mostly for reversing and learning stuff and also to kind of script up existing things to do other things. So for dynamic instrumentation, just as a quick refresher, this generally means function hooking or instruction instrumentation, the latter of which mostly means that you kind of modify byte code or assembly to be different byte code or assembly to do something different. Whereas with function hooking, generally you are doing something that hijacks control flow directly to go somewhere else. Dynamic tracing can refer to dynamically enabling or disabling existing logging functionality, but for our purposes this mostly means adding enhanced logging functionality that wasn't there before. I've also been recently doing tracing with Frida for Ruby, which is what this talk is about. So some background on Ruby and myself. A little while back I had to do some Ruby byte code transformation stuff and convert more modern byte code to an older one, older format. So translate newer op codes into equivalent older ones so that a decompiler that only knew the older format would work. And that worked quite nicely for me at the time. More recently, a colleague and I were looking at Ruby's DRuby protocol. We were writing a scanner for it in Ruby, of all things. We gave a talk on this at NorthSec earlier this year. There were some weird issues that came up and I spent a lot of time debugging this and going through the Ruby internal source code and see to find out that basically you don't wanna call IO read on a socket object. Instead you wanna just call receive. This led me to start writing this parasitic low-level Ruby tracer. So what are parasitic tracers? Well, what are tracers? So tracer is an enhanced logger that basically dumps everything you might want about program state, running code, et cetera. And a parasite is a highly specialized, unwanted organism that symbolically lives off of but inside of another organism that is completely adapted to. So a parasitic tracer is a combination of these two. It's basically a tracer that's especially adapted to the target process that it hooks onto and injects itself into and makes use of its internal functionality that wasn't really intended to be accessible. So the tracing of this part is just a goal. I want to write a tracer for Ruby to better understand it but the parasitic part is more of an implementation detail. Chances are you've done this if you've ever used LD preload to inject code into something. So why would you write these things? Well, to get a better understanding of where the higher level abstractions meet the lower level implementations in say run times and things. So for reversing or debugging or just plain performance analysis, you could also be writing one of these things mostly to avoid having to maintain a fork of the actual code base if you want to kind of maintain a tracer out of tree because you can just do it on the process itself and not have to recompile it against the whole code base. So some examples of these parasitic tracers would be Frida's Java Bridge API, which is actually arguably two of them. One for Android and one for the JVM itself. They provide basically a API for hooking into higher level Java operations but in ways that weren't really intended to be allowed by the platform. So in Android, it's totally hooking the runtime and for the JVM, it's using some of the JVM instrumentation APIs but it's definitely doing stuff that doesn't involve those in weird ways. And so whereas a normal vanilla Java agent that uses those things wouldn't really qualify as a parasitic tracer because it's using kind of public APIs specifically for this purpose, the way that Frida does it is a little bit more invasive. But let's just say that if you're crawling around the memory of a processor intercepting its calls, chances are you just have a tracer but like S-trace. But if you are hooking around in functions inside of the process itself or really calling functions from inside the process then you're doing some parasitic stuff. So let's talk about designing these things for high level language runtimes. So first some prereqs. You're gonna need some means to actually hook the code or instrument it. Generally ideally one that allows you to kind of remove those hooks or re-edit runtime. You could do this with debugger and breakpoints, especially scripted debugger. You could do this with an instrumentation toolkit like Frida, which is what I generally do these days. You will also need a way to invoke existing functionality if it's in that code, in that process. So generally speaking, you do that with a debugger or with Frida, debugger would be something like the expression syntax for calling functions. But the thing is, is what you need to know what you're gonna call. So the hierarchy of how you wanna be prefer, preferring things is ideally public APIs that aren't going to change all that often. Then internal APIs with symbols. Then internal APIs that don't have symbols, but that you can get handles on fairly easily, say if the pointers to them are passed into other functions, you can just catch them there. Then after that, you're probably just gonna wanna opt for re-implementing stuff locally yourself. And then finally, all the way at the end, if you need to reuse existing code that's inside the process that you can't get to find a good way to find, you might need to just search for bytecode sequences and match on them. But moving on, the first step to owning a target is recon and that is the first step to designing a parasitic tracer. You're going to need to be doing some reverse engineering really to understand the internals of what it is you're gonna be mucking around with. And as you do, so you'll learn more. But you may actually have source, like I was looking into Ruby, C Ruby. But you still need to know what's actually going on at the native level, especially with the way that your instrumentation itself works, because at that point, C doesn't really matter anymore. And optimizations can lead up functions or lead to weird situations where garbage values are sent to functions that don't process on them anyway. And you need to be careful when handling those inputs, stuff like that. And then additionally, just all of these kind of run times heavily rely on like super implementation to find behavior. And so you need to be really careful about how you're interacting with their code from your code. After that, you're gonna wanna identify all of the things that you're gonna wanna hook on or call into to build up all of your, whatever it is you're going to get out of the runtime or the language. And then next is actually doing all of the hooking and calling of those things. So you're gonna hook all that functionality, you're gonna extract all the relevance state that you can get. You're gonna start invoking function calls that are in the thing to get other pieces of data out of it, et cetera. And then after that, you're gonna kind of bring it all together and orchestrate all that in what I like to call puppeteering. To bring together all your hooks, have them coordinate with one another, possibly be managed by some sort of injected thread or whatnot, but at this point, you're mostly building up from there to have better interrupt between your own hooks and better interrupt between the actual platform you are messing around with. So in this case, for me it was Frida, which is JavaScript, so basically a JavaScript to Ruby bridge more or less. So ideally, you start small and build big. You compose together a larger set of hooks from a smaller set of modular pieces. You are in a good position to do this because you're hooking on to a full program that already exists and runs on its own. So mostly you just need to make sure that you don't break it with what you're doing and injecting into it. But other than that, the thing will continue to run on its own just fine. So the next thing with about this layering stuff is that you can take advantage of layering on abstract calls that are implemented with version-specific behaviors. So for example, if you have a pointer to a struct but between two different versions of the binary, the field you want that's inside of that struct is a different offsets, you need to have some functionality to be able to handle that, but the pointer to the start of it is still the same. So you could do this with perversion implementations or version-based switches, kind of like if-defs or both. But let's talk about Ruby. So Ruby is a scripting language, that's right. The most interesting thing about Ruby is it's super object-oriented and every time you try to access something on an object that's actually a method call, and all the method calls are basically handled by sending messages. Ruby is super featureful, but it doesn't have really any good low-level introspection or tracing capabilities. It does have this thing called TracePoint, which is an API for various events that go on as Ruby executes, but it can't really intercept method arguments or native function parameters. It can't really provide information on bytecode execution and it doesn't really provide all that useful information for any time you're switching back and forth between Ruby and native code. This is mostly an artifact of the fact that Ruby is a language and CRuby is a implementation, so this bytecode stuff, all this lower-level stuff are kind of implementation details, and this API needs to theoretically work across multiple different Ruby implementations, but really the CRuby implementation should have better tracing stuff given that it basically functions similarly to Java, and Java has a very well-defined and extensive API for instrumentation. So I wrote this thing called Ruby Trace, which is a Frida-based CLI tool for instrumenting Ruby and kind of dumping everything that goes on as it executes. So it hooks all the opcodes. The interesting thing about that is the implementation of the opcode handlers are kind of all a bunch of labeled go-to spots in a giant state machine. They're not really their own function so they don't have your standard calling convention preludes. And then separately, Ruby has a bunch of C functions to call Ruby methods and do a whole bunch of stuff about handling the methods and tying them to objects, both native code to Ruby and Ruby back to native code calls. So I hook all that stuff and I hook all the, I hook the transition between Ruby and native code, and then hook those native functions and et cetera. And then separately, it supports kind of hooking into Ruby's internal exception handling mechanisms. What it pulls out of that is basically all the arguments of all kinds, even the special internal ones for the opcodes, and then it basically, Ruby inspects everything, which is a stringification, kind of like wrapper and Python. One problem with that is many times, values aren't fully initialized or the Ruby VM itself isn't fully initialized. You need to be very careful about how you try to call things on things that aren't fully initialized. So it handles a lot of that, trying to be very careful, but when it's safe to actually send the inspect method over and doing alternative fallback approaches when it can't. It dumps out the byte code whenever you, it sees something like a method or a block being defined. It dumps all the return values for opcodes and the native functions it hooks, it gets all sorts of other metadata and takes all sorts of things to make them human readable. It supports Ruby 2.6 to 3.0 and I assume once 3.1 comes out, it won't be too much effort to get it working on 3.1. I have a sort of generic implementation with a couple of version-specific behaviors and switches and then for a separate lower level, anytime I need to deal with Ruby structs from C, I just have a version-specific set of structs to pull fields from, essentially, using Frida's C module API. So other cool things that it does is it actually makes use of the TracePoint API, but not in a way you'd expect. It's just that the TracePoint API has a very good way of controlling whether or not it's enabled based on various aspects and so whenever tracing, the TracePoint API is enabled, that turns it on and whenever it's turned off, it turns it off. It gives you fine-grained control to very minutely trace certain pieces of execution. I have a bunch of test cases for various bytecode sequences that seem to cover a greater span of more detail of edge cases than Ruby's own internal opcode test suite, although not necessarily some of the other ones. I also implement support for dead Ruby opcodes that shouldn't even exist anymore for some reason, but basically Ruby Trace is kind of like its own CRuby bytecode interpreter because of how it works. So as a demo, let's switch to this view. So I have some Ruby code here that defines a TracePoint tracer and then in the middle of this big block is actually a stringified block of this Ruby code with this foo method and then some calls into it and then it redefines symbol to redefine its triple equal operator then calls a lot of those same things over again and then it compiles that code from the string and then evaluates it under the tracing. So in this case, the tracer that's being used doesn't really do anything. So when you run this code, it just kind of spits things out. The more interesting thing is that the not found Watt on the left side gets replaced with a symbol on the right side because that triple equal when it hits the comparison for the symbol, it instantly matches stuff. So the Watt string will on the last case hit the symbol check against foo, symbol foo and that will just pass. So now let's run this under Ruby Trace and basically Ruby Trace dumps out a whole bunch of stuff. One of the first things you can see is that the instruction sequence part from that compilation you see there and then you see the call to the eval on it and then we're inside of that. The first thing that happens is the foo method is defined and so it dumps out all of the byte code of that foo method. We can see a bunch of values from it. Next it adds that to the class it's in. Then we see the first call into foo from the hello string and then we run through that check operation. So the first thing that happens we see is a call into this opt case dispatch which is a special case of byte code generated for switches that don't have special types in them, only simple types and basically it optimizes so that all of the cases get added into a single Ruby hash and then it just checks if the value is a member of the hash but it first checks a bunch of things about the object coming in to make sure it's a simple type that the comparison would work in the first place. So in this case hello is the string, it's a simple string, it's in there, it takes the hello path, we move on. The next thing is we have one, it takes the one path and so forth. But then we eventually see this big decimal 3.0 value which you'll see represented variously as 0.3E1 and that thing will get passed in to foo and the problem is is that because it is not a simple type, it will fall through and the way that this works is that the optimization is just a quick check first and it is kind of a guard on top of what the rest of the switch implementation would be which is a series of subsequent if else checks that's just how Ruby does it. And so it falls through and then starts doing all of the if else's, it doesn't match like a string, it doesn't match a whole bunch of stuff and then eventually we see a bunch of operations where it's trying to compare against the float value and the big int has to do a bunch of math conversions to get the stuff out for the comparison and so then it eventually does the comparison and sees that it 0.3E1 is equal to 3.0 float and so then it check match passes, that's the comparison for the branch and then it jumps to the code that is part of its segment of the branch. We continue doing all of this for the rest of the values, we see in this case the string Watt doesn't match anything so it ends up in the else path but it was a simple type so it actually goes to the else path directly. And then we see the code that redefines symbols triple equal method and from this point on things are gonna get a little bit weird. So we start seeing that all of these upcase dispatches end up falling through because triple equal has been redefined and so basically there's a short circuit in the implementation where Ruby says well if any of these core equals things has been redefined on any of the core types such as symbol, it just can't bother to do any comparisons anymore one way or the other and it's faster to just give up and have them go through all of the checks one at a time and so we run through this all one after another and then we get to the end with all the values and they get percolated up from all the functionality. So future work, I have to implement support for Ractors, Ruby's new multi VM in process concurrency model. Right now I'm just kind of relying on the one global Ruby VM internal of the process and then just generally keeping up the Ruby versions. The code will be available here at our GitHub repo shortly after this presentation airs but in conclusion it's been really fun working on this although it's been pretty tiring because of all the craziness that goes on with Ruby and various random things that can fail when you're messing around in its insides but I think that all of these techniques pretty much apply to other high level languages and runtimes, some good examples or Python, Node, Golang, and Haskell and I really think people should be trying to build some of these things. So to paraphrase Arlo Guthrie, you know if one person just one person does it they may think he's really sick and three people do it, three. Can you imagine three people writing parasitic tracers? They may think it's an organization and can you imagine 50 people? I said 50 people writing these traces, friends they may think it's a movement and that's what it is. I'd like to thank Addison, my partner in crime on the D Ruby stuff that led us down this rabbit hole for me doing this work and a wise man once said you can't hide secrets from the future using math. I believe that is true and I also believe it is true that you simply can't hide from the future. I would take questions but this is a recording so there are no questions to be had. So instead I will answer a question about why on my intro page I used an image from a Pokemon crystal and not Pokemon Ruby. Well to answer that question I do not like Ruby learning. I do not like it's yuppie gang. I do not like it's symbol keys. I do not like it's optional parentheses. I do not like it's method send. I do not like it's begin and end. I do not like it's magic verbs. I do not like it's method of keywords. I do not like it's IO.read. I do not like it's lackluster speed. I do not like it's deal open jit. I do not like strong params permit. I do not like it's if unless. I do not like it's dependency mess. I do not like it's case and when. I do not like its require middleman. I do not like its polymorphism. I do not like its object fanaticism. Its object nil gives me pain, that Ruby Lang is profaned. Thank you.