 to cover and really what we're going to be focusing on is internals of symbolic debugging and debuggers and how they work and when they don't work. A lot of you have potentially used one, whether it's GDB, LLDB, Dell, if you're working with Go and they all break in mysterious ways. So we're going to work with an example that reflects typical real world workloads. So we have a program here which counts the number of W characters in a file terribly and really central to this program is this function count. And what count does is loop through a buffer and increment some whenever the current character in the buffer that we're iterating over is W. So you will compile this, in this case this is C. We're going to use a C compiler using GC to compile this. We run it on Etsy password and it is 16 characters. So this will be compiled into an executable format that the kernel understands so the kernel can actually load this and execute it. And the format that is typically used on modern Unix, Unix is ELF which stands for the executable and linking format includes a whole bunch of information there as far as how memory should be laid out, what regions of the file should be mapped in and whether you need to do more funky things. So if you're using, if your program's dynamically linked, you may need to load other files as well. And the actual C code is compiled down to assembly. You no longer have types at assembly or interacting with memory or at least on x86 processors that we're all running a very limited set of registers, registers which are essentially very much faster than memory but very limited. A symbolic debugger is able to map this information back to types, variables, et cetera. So how does that work? So assuming we're inspecting a live process, typically that's done via ptrace. So ptrace will attach to the process. You can do all sorts of cool things. One of those is grab the set of all the registers with respect to the context of the process. And it's available in various ways for the operating systems that matter. So a very important register there is rip and rip tells you the currently executing instruction. And with a currently executing instruction, you can determine where that executable file lives. So on Linux, you have PROC PID maps. So in this case, we're in the POS system call. Well, again, this is in user-based context, so we're in libc and we determined that libc contains executable code. And that file also contains a whole bunch of debug information either within the file or some auxiliary file. So ELF also has a notion of sections. So you have debug line which, well, actually I'll go through all of these. So you will see very shortly what they do. So debug line essentially contains a sequence of operations which you execute in a debugger, executes an estate machine and that expands into a set. There was a cool animation there. Oh, there it is, I love that. That expands into a matrix which contains addresses that map to source file. You have debug frame which tells you how to unwind. So what was the caller of the given function? What was the register state? And it's very similar to debug line. You execute a massive state machine and that expands into a matrix. And then debug info which is very important essentially gives you all the details about how your program is structured. So that's type information, variable information, functions, even things like lexical scope, et cetera. So all kind of crazy stuff and that's represented as a tree and every one of those boxes is referred to as a die, debug information entity. So now we're a toy program. This is what a subset of that debug info would look like. You have a compilation unit, wc.c, with a whole bunch of information and within that you have a subprogram or a function count which has an argument buffer as a variable sum and sum is a type, the type of the sum variable is a size t type def which is implemented with the base type of an unsigned, eight by unsigned long. So the challenge with Dwarf, Dwarf was implemented to support all kinds of aggressive compiler optimizations and it's actually turn complete and you'll see some of the craziness that goes on. So over here we have our toy program with this function count which expands into the following assembly and the dwarf information I can't really see any of the text on my screen. So this is a dwarf information for count. So we have an I variable with the type information, the subprogram and then in this case the function was actually inline. So you see the bottom most die is a DW tag variable who is actually, additional details are specified in the original originating tag variable and abstract origin essentially refers to that and then the location, the value of the variable is actually specified in this location list below and what this says that okay, if the instruction pointer is within this range then we have to execute through the state machine, push the literal zero into the dwarf state machine and that values the current value of I and over here it says okay, if you're within this range, so at this point we're within the loop body, all sorts of interesting optimizations occurred you need to do a lot more stuff. So in this case it says, all right, let's push the value of RDI, so over here let's assume that I equals one in this loop. So push the value of RDI at this point, it's buffer plus one and then push the value of RDX and then subtract the two values and then add the constant, you subtract them, you get negative 4,095 and then you add the constant of 4,096 and then the value of I is at the top of the stack at this point, which evaluates to one, similarly for this last bit of information. So in many cases you'll see variables that's being optimized out even though they weren't actually optimized out. So we were walking through the loop body and actually the condition in the loop body and invariant where we're checking, okay, while we are not at the end of the buffer that actual branch is actually not included in this dwarf output. So even though the value of I is retrievable at this point in time, it isn't. So over here we're using GDB, I set a break point right at that instruction and we see that I is optimized out. While if you go to an instruction prior it is suddenly available, even though the value has not changed. So some compilers will emit crappy dwarf code and sometimes you have to dig into the assembly. So a very easy way to do this is to look at the dwarf information and just see whether the set of registers involved have been mutated in any way and if not the value is retrievable. So that is that. And what's worse is a lot of us today when we're choosing compilers we focus on things like performance, compilation times, et cetera but a very important factor is also debuggability. What's the quality of the dwarf output such that if you do get a crash in production and you have to debug the program you can actually retrieve the state of the program. So Clang is an awesome compiler but it does emit terrible dwarf code at the moment. In this case we're using Clang 3.62 and we essentially pause the program after at the end of the count function and you'll see that Clang's dwarf information actually outputs sum and i's being constants of zero. So unfortunately you cannot always trust your compiler. And this is actually a comparison with a toy application including all sorts of interesting and weird types in C99 between GC and Clang and you could see a huge gap in terms of quality of debug info. The last column to the right is Node.js at O2 optimization level and you also see that large gap. So just make sure to consider that between compilers. Some languages are still very early in the sport for dwarf. So this is a go application. I don't like go but we have to sport go and what happens here is that depth equals zero we actually say all right let's generate a snapshot. Let's run a symbolic debugger and just grab the state of all the variables in the application. And when you do that so in this case I'm using delve you'll see that A and essentially all the variables which are initialized after this point in time have bogus values. And this is the case of bad dwarf being emitted. So in dwarf you could actually specify hey these variables are only in scope within this range of instructions and go doesn't happen to do that. So again you cannot trust your compiler to generate the right information always. There's all sorts of other weirdness as well. So we have a toy application. I corrupt the call stack. I overflow trash it with memory and I run a debugger on this process and according to the debugger we only have one thread. So a lot of debuggers actually use thread DB a very crappy library which helps you which abstracts away a lot of the internals of threading implementation so you can do things like iterate over threads in a program. And again in this case a variable a thread has completely disappeared due to this corruption and this is what the process actually looks like which also has interesting applications for malware. So different operating systems have different mechanisms to handle things like threading and if you're dealing with something like Go or Haskell where you have a notion of user space threads there's a whole bunch of other crap that has to go on as well. Now obviously there are cases where things do get optimized out. So in this case using a different set of compilation flags to compile count and if I run a debugger we'll see that I cannot be retrieved. Why is that the case? So actually this code in the loops over here we're essentially looping through an array and we're indexing off of i into buffer or into buffer off of i and actually what the compiler does is determine hey we actually don't need i, let's just go ahead and use pointer semantics instead which ends up reducing the number of registers that it has to use. So in this case i is in fact completely optimized out. Unfortunate, this is life, yeah. And then here's another example and this is the typical reason why things are optimized out. So the account function is actually called by main and we'll see that arc and arg which are the first two arguments that are passed to main are actually completely optimized out. So one thing that the compiler will try to do is make the best use of the limited set of registers and try to avoid touching memory if it can. So in this case the compiler knows that arc is actually never used after the call into count so it doesn't bother essentially dumping that value from a register into memory and in this case it's just completely unretrievable and these are situations where things like reversible debuggers are very useful and ultimately the platform ABI determines which registers have to be saved or don't have to be saved across function call boundaries et cetera. And then you have a lot of other weird things. So tail call optimization is fairly common in this case we have some toy assembly we have main calling through the function one which calls into three et cetera so the call instruction on x86 will actually save some information about the call stack so you can unwind and also return to the caller using Brett. In this case the compiler determines that we can actually fold these into jumps and we don't actually require additional stacks space and it'll actually just change those to jump instructions at which point you really don't have any call stack information to unwind from. So if you pop this open in a debugger without optimization on the left you'll see everything looks same. To the right you'll see that everything except for the innermost tail call is essentially optimized out. So at this point the debugger actually has to do all sorts of weird heuristics to try to disambiguate what the origin of a call site could be. And I might be over my time, I don't know but dwarf and debuggers are great and depressing, it's cool stuff but yeah follow me and I'll be posting a couple of write ups on a lot more interesting stuff so what are different ways we could abuse dwarf how do you marry things like reversible debugging and tracing with dwarf et cetera. Thank you.