 So, for the next talk of today, we're going to talk about a new open source, DBI framework, which is called QBDI, DBI means dynamic binary instrumentation, you know, binary also. Instrumentation means to observe, monitor and modify some parts of a program or a library. And we have a speaker, Charles Souben, they are both security researchers, Charles likes attacking white box crypto and Club Mate with RAM. And Cedric instead, he's focused on reverse engineering and he likes whiskey. There you go, the stage is yours. Give them a hand, please. So I think the heroes did a good job at introducing us. So we both work at Quark's lab, which is a French consulting security company. And during our work, we use a lot of DBI and for the past two and a half, we've been together researching DBI frameworks and also working on our own DBI framework. And what we want today is to try to demystify what the DBI framework is, how it works, and try to show you how we went about implementing our own DBI and we hope to inspire you about the usage of DBI framework and maybe even helping us on our own. So I'll start easy and start with an introduction of what is exactly instrumentation. So basically this idea is the transformation of a program into its own measurement tool. So you're making an instrument. And this instrument, what you can do is that it can observe any state of the program any time during the runtime. And then after that, the tool will automate the data collection and the processing of those data. So that's quite abstract. So what could you do with that kind of instrumentation? Well, I'm pretty sure most of you, if you program in C or C++, you've already used Valgrind and maybe the Memcheck tool. And Valgrind is an instrumentation framework and Memcheck is an instrumentation tool. And what it does, it's quite simple. It will track memory allocation and the allocation during the runtime and also track memory accesses. And with that information, it is able to both detect use after free, memory leaks, double free, but also out-of-bond memory access. And then it can tell you, well, you are accessing this array two bytes outside of the boundary that array that you've allocated at that place in the program. So it's quite useful. Another popular use case is fuzzing. So in fuzzing, you're generating random input and sending it to a program and trying to make it crash. But one of the things you want to do is to generate interesting random inputs. So you want to know if between two random inputs, if the new one you're generating are exploring more part of the code, are generating more state transition and things like that. So you want to measure the execution. And one of the main criteria you can measure is the measuring the code coverage. So a DBI framework is a very good thing to measure code coverage. And if you use WinIFL, basically, it's IFL using a DBI framework, which is in Telpin, to gather code coverage information. And if you go deeper, you can even use the information you have about what instruction are executed to try to build a symbolic representation of the program, et cetera, and then you get the DARPA cyber and current challenge that was held one year and a half ago. The last thing you can do is record simply execution traces, which means recording everything that happens in a program. Once you have that, you can replay those execution traces and you get something that looks like a timeless debugger where you can go forward and backward and you can try to track everything that's happening in the program without relaunching it. Also something that I've been working on before was software side-channel attack against cryptography. So you're using the execution trace to generate side-channel information and trying to recover key from obfuscated cryptography. So far, one of the questions you might be asking is, why not use debuggers? Because debuggers can do all of that. The problems that debuggers, although they are awesome, they are meant for human, not for machines. And they are very slow. So if you imagine a debugger that's attached to a target and you imagine the target is posed and you press continue, this is what happens on your system. So the debugger sends a resume call to the kernel. The kernel decides to reschedule the execution of the target. So you have the whole scheduling of the process that happens. And then the target hit a breakpoint, which is like a trapped interrupt, which jumps back into the kernel. And then the kernel decides to, it says that the interrupt should be catched by the debugger basically. So it's going to send a signal to the debugger and reschedule the execution of the debugger. And while you're doing that for one continue and breakpoint hit, you're doing four boundary crossing between userland and kernel land. And also you're doing two process rescheduling. So it's quite slow. How slow? Maybe you've seen those kind of attacks you can do in CTFs where you have to reverse engineer and binary that checks the password. And what you do is you try to measure the execution time while you're trying different password. And not really the execution time, something more interesting, which is the instruction count, the number of instruction executed. And you can try to brute force a password to character by character this way. And what you see on the top is GDB. And what you see on the bottom is Intel PIN. So on the bottom you see that it actually can check a lot of cases and that you see the number of instruction executed. And while GDB is extremely slow and even though the instruction count is much lower. So yeah, you don't want to use GDB. You don't want to use a debugger. You want to use a dynamic instrumentation framework. So the solution, well, it's to get rid of the kernel. And how? Well, the only solution, except if you want to really have a bare metal system, is to run the instrumentation inside the target. So instrumentation techniques. So there is a few different ways to do that. The first one is from source code. And I'm pretty sure everybody has done that, is just adding print statement to your code and get information. It's basically instrumentation. But you could be more smarter and try to do that automatically at compile time with the compiler plugin. The other solution is when you work from binary. And when you work from binary, it is basically to approach. The first one is to take your binary and statically patch it and add hooks to additional code you insert into the binary. And the last one is dynamic binary instrumentation. So from source code, it's really boring. There's nothing very complex and interesting to know about that. From static binary patching and hooking, well, it's crude and barbaric. Because you have to know in advance where you want to insert your hook, which is not really possible. And so this talk is about dynamic binary instrumentation. So there's a few existing frameworks. And the first one is Valgrind, as I mentioned, that exists in 2000, is open source. Valgrind only supports UNIX platforms and it's very complex to use, not to use in the command line tool, but to write your instrumentation tool. Another one is Dynamo Rio, which exists in 2002, which is open source. It's cross-platform, cross-architecture. However, it's very hard to use, too, because you basically have to manipulate the instructions in assembly yourself, which is quite error-prone. And the last one is Intel PIN, which is the most popular one, because it's very user-friendly. However, it only supports Intel platforms and it's closed source. So why we made our own? In 2015, what we wanted was to have a cross-platform and cross-architecture DBI framework. And all of those frameworks were developed quite a long time ago. So there was new things you could try to do that. And we wanted to focus on mobile and embedded targets, because that's basically what we're working on in our daily job. Also wanted to try to have something that was simpler and that was modular, meaning we could use it with other reverse engineering tools easily. And also, because of our past in cracking DRM, basically, we wanted to focus on heavy instrumentation, which means instrumenting a lot of things, generating lots of data. So I'm going now to understand the following of the talk. You need to understand how DBI exactly work. So I'm going to try to explain that if I get it correctly. So the simple idea is that dynamic writing instrumentation is about dynamically inserting the instrumentation, so the code you wrote inside the binary during running time. So it looked like this. You would disassemble the code that's going to be executed, then you pass the instructions to your instrumentation tool that analyze them and generate the appropriate instrumentation. And then the instrumentation gets inserted into the execution flow, and it's executed. And then you have backman for scale. However, I mean, this looks simple, but there's a few problems with this graphic. Backman, other problem. The first one is disassembly. You cannot just, you don't know when you get a binary what part of the binary is code, because code is data, data is code, and there's a lot of unpredictable branching and jump everywhere. So you cannot just disassemble the whole binary in advance. So you need to discover what is going to be the code of the binary as you go during the execution. Basically what this means is that you will execute a short piece of code, a block of code, something that a series of instructions you can predict will be executed, and then that ends with a control flow statement, like conditional jump or call that you cannot resolve. And you execute just this piece of code. And then you look where it wants to flow next, what's the next instruction you want to execute. And this way you discover the next block of code you want to execute, and you simply execute the next block of code. So this basically form a short cycle of execution. Now the other problem you have is about the instrumentation, the code that you're generating. Because the instrumentation code you generated is much larger than the original code, because it's the sum of the original code plus the instrumentation you're adding. And compiler are not dumb and try to tightly fit all the code into the binary to not waste space. So there's not much space left around in the code segment of your binary. So you cannot instrument it in place. You cannot just add it in the binary. There's not enough space. So you need to write it somewhere else. And this introduced the problem of relocating the original code of the binary. So the problem with relocating is that the code contains relative reference to memory addresses. So you might have a jump to the instruction 20 bytes forward, or you might have a memory access that says, oh, now I need to access that piece of data, which is 1037 bytes backwards from my position. And if you remove everything, well, this reference becomes invalid. So this means we need to actually completely write the original code to fix all those references. Then in our engine and what we will present during this talk, this is what we call patching. So if you summarize everything you need to do, this is the cycle of life of a DBI engine. It starts by disassembling the first piece of code it wants to execute. Then it patches to make it relocatable. Then it adds instrumentation to the patched piece of code. It assembles it somewhere in executable memory, somewhere else. And then it can execute it. And once the execution finish, it looks at what's the next piece of code it should execute and start again. So now I let Cedric explain some of the low level abstraction we have to deal with all of that. As Charles said, we have seen the basic principle of the DBI, but I know we want to design our own DBI. So for this, we need some very low level things before building up on them. So Charles said also, basically, a DBI starts from an entry point and do automatic discovery discovery of the code. And this work on what we call a basic block, so it's a series of instructions that are linked together and they are composing what is called a control flow. So at the end of a basic block, you have a terminator instruction, which can be a gem, for example. And let's go to another one. And with this kind of structure, the control flow just go on. I mean, you start a function and it just executes every basic block and you don't have any way to interact with it. So what we want, it's some kind of under control flow. And by this, I mean, we want, after the first block, to take the control back. We want to be able to skip the jump or force it, modify it to jump inside our own code. There we will execute the logic of the engine and we will be able to go to the next basic block and so on. But here, in fact, everything is about keeping control of the execution. And this is much more difficult than we can expect. Because what we want, it's forces, it requires to modify the original instruction of the binary. But we don't want to modify the original behavior of the software. We want to be able to execute it as it was untouched. And this is quite difficult and requires lots of low-level tools. So what do we need, in fact? Very, very simple thing. We need a multi-architecture disassembler and we also need a multi-architecture assembler. And if possible, it should be also cross-platform, please. And we need some kind of intermediate representation to work on. And this representation should be linked, a link between the disassembler and the assembler. So these requirements were quite strong. And also what we don't want, well, basically we don't have unlimited resources. So if possible, we don't want, in fact, to implement a multi-architecture disassembler and assembler. But also, about the intermediate representation, we don't really want to, like, pass every single manual for every CPU of every single architecture we want to support because, yeah, I like developer manual but not that fun. So we don't have 10 years to spend on passing them. But things have changed recently since a few years, there is a new player. This player is called LLVM. What is LLVM? LLVM basically was created to focus on compiler technology, especially just-in-time engine. But basically LLVM is the core foundation, the core framework of a very well-known software today, which is the Clang compiler. I'm sure everybody in this room knows the Clang compiler, but it's more than this. It's like a wall toolkit that provides tons of things to play with binaries. So in fact, for us, LLVM already has everything. We support lots of architecture, like all the major ones, ARM, X86, X864. It provides both a disassembler and an assembler, but more than this, it also provides this intermediate representation that links, basically, the disassembly and the assembler. Some of you already heard about the LLVM intermediate representation, but here, this is not this one. This is not the intermediate representation which is used for the compiler pass. It's another one, a more low-level one, which is called LLVM matching code, or LLVM-MC. And this low-level representation is this glue between the disassembler and assembler. So what is LLVM-MC? Let's start with a very basic, simple instruction here. For the DBI, in the process of exploring the code, an instruction is just a series of bytes. So basically, what LLVM provides us, we feed it with this list of bytes, and LLVM just provides us this. And this, in fact, is an intermediate representation already. It's a very simple one. You have two tips, there is the MC INST, which is the instruction, and the instruction is composed of a list of operand. But basically, even if it's simple, you have all the interesting information that you need, and it's already something that we can work on. So LLVM-MC, it's, as you've seen, very minimalist, with only a few structures. And it's also totally generic. So for all the architecture, they use the same representation, which is quite good to work on, because we can do things in a totally generic way. But still, it encodes lots of things about an instruction. So at least everything we need to work on the instruction. But at the downside, it's kind of very raw. Okay, it's simple, but structures are simple. But using them to do complex things, it's kind of feel like something a bit complex, and we will see why. And also, they wanted it to be generic. So for this, we made some compromises. And for example, they don't encode everything, but an instruction that it's a lot of glue code a bit of everywhere, in fact. And it's make using this layer a bit tricky. So for example, we want to create instruction. It's one thing that we will need to do for DBI. And if every instruction use the same representation, in fact, they use it in slightly a bit different way. For example, here we have a MOV. And this MOV, as we have seen, is like, okay, a series of operand grouped inside an instruction. But the thing is, this representation, it's not documented, it's not really standard. Every instruction possibly use its own encoding. And to know it, you need to look at the glue code. So it's not, it has not been created to be an intermediate representation that you can work on, basically. And here, we also want to patch instruction. So we need to modify it. So if you take a look at the jump, it's very simple, only like one instruction, one operand, which is an immediate. But if you want to modify it, we will need to go from this to this. And you can see, okay, it's totally different. We will be forced to create some, like, hardcore modification on them. And we also have, we want to create patch. So a series of transformations. And here it's very easy, very simple example of what we are, what is an instruction that we will need to patch, in fact. Here it's an instruction which is referencing memory using the program counter. So basically, as Charles said, at the moment, we will need to move this instruction elsewhere in the memory. But if you move the instruction elsewhere, there the reference will be broken because it was based on the current address of this instruction, which is here. So let's have a look. We are moving it. But what we will be forced to do is, okay, we have moved the instruction. It's located in another address in memory. But we need to replace the register of the program counter with another register. And this register will be loaded with the original address value. And by this way, when we will execute this instruction, the reference will be the same. But as you can see, we are broken things here. We are modifying the behavior of the program because we are erasing this register. So basically, what you need is also to back up this register. And after executing the modifying instruction, you also need to restore that restriction. So as you can see, creation, creation, creation, modification, lots of operations here on an intermediate representation, which is kind of hard, kind of difficult to work in. And so the encoding is a bit painful to work on. And you can see here it was a very, very simple patch. And it's already quite complex with a lot of transformation and a lot of steps. And we also add a feeling, which is you see that we have been forced to back up something, to back up a register. So you can feel that it will be not the only patch when you need to back up a register. So you can feel that you will need some generic steps that will be needed like a bit everywhere. So basically, we need abstractions. And the idea here is to have like a magical engine, which really can call the patch engine. It's real name, in fact. And this engine will take one instruction, an input, one original instruction, apply our transformation on it using abstraction. And then output a list, one or more instruction. For this, basically, we are the vision, like some twisted vision. We have said, OK, maybe, maybe, we can identify the steps that are required to apply the transformation. So maybe you can say, OK, this patch is in fact just a series of transformations. Some of them are totally generic. So maybe we can just regroup them and try to integrate them in sort of like a language, a specific language which will be specialized in patching binary. And another part of the idea was, OK, the instruction, even if the representation is generic, the representation, every instruction we will have will be architecture-specific. So the patch will be architecture-specific. But possibly, maybe, that the language can be generic and express the modification for more than one architecture. And so, yeah, after some headache, we have this schematic. I will not explain everything here. But what we can explain is that, basically, the issue that you have is that you have two words. You have the word of the original program. And you have the word of the DBI. So one guest, one host. And you have interaction between them. So the idea of the abstraction was trying to make things a bit more organized to, in fact, be able to see the relation, the precise relation between them. And by identifying this relation, also trying to create abstraction that allows us to work more easily with the binary and to express the complex theme in a simple way. I will show you what it really means by using this. This is something which is part of the language. And it's something which is a temporary register. So you have seen a temporary register before in this example. So this is the same example than before. And this was our temporary register. And with our language, if you want to create something like this, this series of instructions, in fact, you can just say, OK, I want a temporary register. So I want a temp. And the register here will be chose as a free register, automatically by the engine. And it will be identified by an ID. So you can work with later. And by working on it, I mean, OK, this backup, whether needed for modifying this original instruction. To do this modification, in fact, what we are doing here, it's basically replacing a register in an original instruction. So we are doing a substitution with a temporary register on the original register, which is PC, and by using a reference here to a temporary one. So you can see it's kind of a language with keywords and kind of variables and things like this. And another idea was basically patching a binary is just applying a series of rules. Because if you want to modify something, it's because there is a need to do it. There is, in fact, a condition, one or more condition, that for an input instruction will apply a series of actions. And these actions, basically, there are our transformations that we want to apply on. So let's have a look at what we call a rule, in fact. So this is just extracted from the original code of the DBI. And this is our patch. This is the things that replace the program counter with a temporary register. And you see there is one condition here. So this is only one rule. One condition. The condition is it must use a register, PC. And then the action, you can see there is some of them. But I will not enter in the detail, but you can find here the substitute with temp and the temporary register and so on. And a nice thing with this rule, it's generic. I mean, it's exactly the same rule that we are using on ARM and x8664. And another example here is pure ARM. And here, we basically need to replace the jump, as you have seen. So that's exactly what you are doing here. The condition is a bit more complex because we have a condition which apply on several instructions. But the idea is always the same. You have keywords for your language, variables, and things like this. So what do we learn first? LLVM is really a magic piece of software. It's very robust. It provides tons of things. And basically, it just saves us on this. But the problem we had was the intermediate representation was so simple, in fact, that it became very complex to play with. And you can do it by ARM. You can really create a patch by ARM, like with a giant switch case. It doesn't work. It's just breaking your head. So you really need to focus on abstraction. And we were quite surprised or difficult it was to create this abstraction. And honestly, it's still a work in progress. But it allows us to make quite complex transformation in something that is very easy to read. You just have your list of operations. It's very easy to understand what the DBI is doing on your binary. So I will get the remote to Charles for the next part. So the next part I want to talk about is how you need to think about cross-architecture support. So if some of the DBI firmware, like Intel PIN, has had trouble supporting instructions at like ARM, they actually tried. And then they decided they would not support it anymore. And one of the reasons is that if you don't think about cross-architecture support from the start, it can become very complex. And I showed you how we handle this issue in our DBI engine. So if you think about what's going on in the process, you can divide the space into two entities. The first one is the host. So the host contain the DBI engine and the instrumentation tool you've written. And the second part is the guest. And the guest contain the original binary and the instrumented code generated by the DBI engine. So you see that this host guest terminology is taken from the VM world because it kind of makes sense. Now this is not a VM. So the problem is these two contexts, they share the same memory and the same CPU context because they are just one process. And this means that we're going to need to switch between the two contexts of the host and the guest at every cycle during the execution. However, we do not get any help from the kernel or the CPU because this is not a VM. We're not going to use a visualization extension and things like that. So this basically means saving and restoring the CPU context from the guest and the host and vivasa at every time you switch between the two. However, you need to avoid any side effect on the guest because the host is aware that it's a DBI and does its DBI thing, but the guest is not aware and should not be aware that you're doing context switches between basically another process inside this process. So this means that you cannot modify its stack and you cannot erase any of its register or else the program will just crash. And the only way to have something that work this way is that the guests need to be able to relatively access memory from the host because you cannot just compute a memory address in a register because you cannot erase a register and you cannot save that register you want to erase on the stack because you cannot modify the stack. So the only way is that you need to be able to do direct reference to a relative memory address in the guest. However, relative addressing is extremely constrained by CPU architecture. So under x86, you can do 32-bit relative memory address. But under ARM, you only have 12 bits. And basically, if you look at the encoding, this means that you're limited by a plus 40, 4,096-bit forward or minus under ARM. So the conclusion of this is that if you want to have a context switch that works nicely on the cross architecture support, you need to have a situation where you have host memory really close to the guest code, to the instrumented code you generated. The other problem is that we want to play nice with data execution prevention. So we could put data next to the guest. It's simple because we're generating that code. However, you cannot have a memory page that would be read, write, execute because on some operating system, this is not allowed. So what we're doing, basically, is allocating two contiguous memory page. One of them is read, execute. The other right is read, write. And this way, we can have satisfy all those conditions. So it looks like this. So this is the first page, we call the code block. And this is the second page, we call the data block. This one is read, execute. This one is read, write. So the first piece of code in the code block is the prolog, which is basically in charge of doing the context switch. And to do that context switch, you can simply store the host context there and load the guest context there using relative addressing. This is the instrumented code, the one we generate with the engine. And this is the epilogue, which does the inverse switch. So the idea behind the exec block, basically, is to bind instrumented code with instrumentation data. So data is guaranteed to be directly addressable. However, when I said one memory page, one memory page, it's four kilobytes in the most operating system. And this is a lot of space. So if you take one basic block and then you use 4K page per basic block, you're never going to, well, you could, but you need a lot of RAM. So the solution is that you need to be able to put multiple instrumented basic block into the code block. And you basically have also a lot of data space left. So you can try to do things with that data space. So this is basically what we got in the end. Here we have a very special thing, which is a selector. And the selector is a jump to someplace inside the code block. And this selector, basically, is a programmable jump. But we cannot modify the code block because it's really execute. So what it does, it's a jump to an address that's actually contained in the data block. And this way, we can really select which basic block, because here we're going to store multiple basic block, which basic block we want to execute. And each basic block jumped to the epilogue at the end. So if you just program the selector and then start the execution from the top, you will execute the basic block you want. From the data space, we are exploiting the remaining data space for things we call constant and shadows, basically. So the instrumentation constant, well, you've seen in the patch example that we wanted to replace the program counter and we wanted to load a constant. So this works very well on the X86 where you have a mouth that has a 64-bit immediate into a register. But this is because it's a Sysc instruction set. But under ARM, you're also very limited in the size of the immediate you can load into a register. So basically, we use this constant space in the same way as ARM literal pool. If you're already reversed engineer ARM code, you know that there is piece of code and then next to it piece of data and constant that are directly loaded reference by the code. And this work the same way. So this way, we can load any kind of data we want into our instrumentation without wasting code space. And the other thing which is a more interesting concept is what we call instruction shadows. This is not entirely new. It's inspired from Valgrind. Valgrind, the way they track memory allocation, the allocation, is that they create what they call memory shadows. So for one page of memory you allocate in your program, they create a small buffer that, inside the buffer, each bits represent the states basically of each byte of the memory page. And this is shadowing basically the memory. And so it's some kind of variable that's binded to the memory. And we wanted to do the same stuff but for instruction. So it's basically a means for us to abstract the idea of having variables inside your instrumentation code to make inline instrumentation very easy. One of the main use case right now is to record memory access. So memory access are a bit of a problem if you want to record everything that's going on in your program. Because to record read memory access, you need to instrument before the instruction, write memory access, you need to instrument after the instruction. So you're adding a lot of instrumentation. And for example, if you want to call your own code in a callback, then you need to switch context again. So you're making two context switch point instruction and then your instrumentation becomes very slow. So the solution is to do what we call inline instrumentation, where you are going to do the recording of the memory access directly in assembly without using the instrumentation tool itself. And these variables, the shadows, basically are used for that. So for the instruction that do a memory access will create shadows that are used to simply store the memory access and the address access. So you can execute a whole basic block and then at the end you can just query the shadows of that basic block to know which address and what data were transferred. So to realize all of that, it's not easy if you're thinking about in a multi-architecture context because we need a cross-platform MMU, basically, memory management unit. And we need an abstraction of that because we want to allocate memory pages and we want to change those page permissions. And also, we want a cross-architecture assembler that works in memory. We don't want to create a binary on the disk. And it's not that simple. A lot of assemblers simply assume that they are going to create sections and a binary object and things like that. But this is not the case. But guess what? LLVM saves us again because when the LLVM project was started, it was LLVM for low-level virtual machine. So they had that bytecode that was nearly cross-architecture. And one of the things they did is build just in time engine to execute that bytecode. And being a just-in-time engine, it means it's very close to the design of a DBI. And so they have everything we would need for that. And we cannot use directly the just-in-time engine, however, because although they're very well designed, they do not really fit our use case and the way we work. But inside LLVM, you have all the functions you need if you want to create a just-in-time. All the cross-architecture memory management abstractions and also this powerful in-memory assembler, which is LLVM MC. So what we learned from that is that really if you want to create a just-in-time, LLVM is really perfect for the job. But also designing a just-in-time engine for a DBI taking into account the cross-architecture problem is really difficult because you can easily be locked down into a CPU architecture if you start assuming that you can simply access memory with a 32-bit offset, for example. And so you need to think about portability from the start if you want to design that kind of project. Thank you. So all of this, in fact, are small parts of our project, which is called QBDI. QBDI stands for Quark's Lab, Dynamic Binary Instrumentation. We are very imaginative. So it's a cross-platform, cross-architecture DBI framework. By cross-platform, it means that today it runs on Linux, Mac OS, Windows, Android, and iOS. And we really focused on the last few months to having something user-friendly, which is kind of hard. You have a big engine. It's a very complex machinery. But we really wanted to have something really easy. So basically, we focused on clean APIs, extensive documentation. And we also provide binary packages for major operating system or Linux distribution. And it's a modular design. And by this, what we mean is basically the core engine of the DBI should only do what is essential for DBI. So no anti-anti-VM stuff or nothing related to anti-debug. I don't know. Like everything which is not part of the DBI. And the other idea was of keeping the thing simple is that by keeping the thing simple, you don't force users to do things in your way. Basically, we don't have one injection method that you need to use and that force you to do things in a certain way and limit you, basically, by doing this. What we have at the end, it's basically easy integration because our DBI is just a library, a static or dynamic library, your choice. So we have created with this library Python Bidings to allow very fast experimentations. And we also have full-featured integration with Frida. Frida, I'm sure lots of you already know what it is. It's a very, very nice framework for instrumenting binary in a different way. They are really perfect by using together. Really, if you combine the power of the DBI, QDBI, and Frida, it's something really impressive. We will see with a demo. Current run map. Basically, we are a bit late for the ARM support. But as you have seen, it's basically adding rules. In fact, the engine itself is already here, is already working, running on ARM. We just need to just finish the rules and maybe also focus on the 64. We need to improve the memory access because currently we don't have the SIMG memory access, which is a bit of a problem also. And we also want to focus on multistrading and exception but not in the same way that most of the engine work because we really want to keep the core very simple. So it will be probably integrating with something like an Alper library or something a bit aside of the project, but we don't know all exactly right now. So demo time. So first demo will be on the engine itself and its Python bindings. So we just need to, I hope we will not break everything by doing this. Yeah. Maybe we should just drag and drop this. Yeah, I think it's OK. So it's a very simple demo. I just want to show you how easy and simple we try to make the API. So this is a simple binary that check a password on inputs. And the password, well, you cannot find it easily if you just look at the software. So that's where I'm very glad that my colleague use freaking Azzatik keyboards, which is, yeah, Azzatik works. So yeah, so what we're going to try to do is just display every memory access made by the program. And because there's a lot of read memory access, usually we're just going to focus on writes. So we use the PyQBDI bindings. And the main thing is this callback, which is called when the DBI want to start. And we simply do two things. We add the memory access callback and we run the engine from start to stop. So the memory callback is just right here. And it does just get an analysis of the instruction and information about the memory access made and simply print while you wrote blah, blah, blah. Oh, yeah, great. So it's no. No, no, it's a bad idea. Is there your password somewhere? No. But we want to run it, maybe. Yeah. Yeah. OK, we want to run it. It's not in the history because it was too easy. So yeah, by the clear, we'll do this. OK, so there we go. Yeah, thank you. So yeah, we can run it. And you can see there's a lot of memory access that are made by this thing. So right now it's not really readable. There's a lot of memory pointer stuff. So maybe because we know it's a password, we want to filter by memory access that are one byte long. So there we do that. We check if the size is one. We only print if the size is one. So let's run that tool. So here it's more interesting. We can see we have byte value on the left. And we have a lot of instructions that print that. And there is a bit of soar in there and maybe soar. So when you see soar, you really think crypto or weird image manipulation algorithm. So what we can do is now simply do the same stuff. But we also filter for soar instruction to see what we get. Yeah, and so now there's not that many soar instruction left. And while it looks like random at the beginning, the end is interesting because it's always in the same kind of memory range that looks like ASCII. So the last thing, the last idea is simply to aggregate those bytes into a buffer. So you see that you have a data array there. It's passed as a parameter to the callback. And so on each of those instructions would simply append the data to our array. And we'll print the array at the end. And if we do that, we can see we have some garbage. And then at the end, some text that looks like import triton. So if we do, well, it's the correct password. So it was simply decrypted in memory. And we can look at it at the end. So this shows you how simple this kind of binding can be. And that's really not hard to use a DBI. OK, we have time for another demo. So that I will just maybe, oh, I don't know. Yeah, OK. I will try to do it on the remote screen, Tini, but. At least you don't have to use another Tiki board. Yeah, I'm French. So yeah, we have a demo binary. If we launch it, it's small. But if we increase the size, we can see, OK, it's done nothing. Like it takes, oh, I have an input screen, which is illo. And it's do things with illo and boop. OK. And the idea is, OK, we want to reverse engineering it and to understand what it does. So we will do this using the Frida framework. And OK, so I rebooted my laptop, in fact. OK, so first, we will load Frida, demo failure. Then we will load the binary, which is located, I think, our local share binary. OK, Frida framework. And we'll say, Frida, OK, I want to load the demo binary. So yeah, we are launching Frida. I don't see shit from there. So OK, I take control. OK, so here we are in Frida. It's the Frida environment. Most of you already know if you have used Frida. And what we can do with Frida, basically, is things like, OK, I have been reversed a bit the binary. And I've seen that there is a function, which is called like secret. So I will just do a call. It's only a Frida API, which is OK. I want the address of secret. So Frida just returned the address of secret. I say, OK, I will need an input for this function. And basically, I will use a string for this input. So Frida allows us to basically do a remote allocation of memory and inject the string inside the remote process memory. And using this, you can say, OK, I want an active function. And this is like a JavaScript function, which allows you to do the call of the function address that you resolve. And by doing a seg function, so there is completion. If you do seg input, basically, Frida is executing your remote function with the input that I've forwarded like LOR. OK, so what we can do with a DBI and the integration that we can simply create a virtual CPU. So we will allocate our DBI. So by doing this, simply, OK, so we have a VM object. With them, we will create a CPU state using like the completion. Very easy. And then we will create a stack using history, not completion, because we don't have lots of time. And here we go, we have basically a virtual CPU, which is we have initialized its state and we have initialized a virtual stack for it. And what we need to do now is to say, OK, we want to instrument the demo binary. There it's to avoid the call to the external library. This is a feature of our DBI. It's that we can really choose what part of the code we want to instrument and really let all the things executed by themselves. So here we are ready. You can just do a call and calling. So you're using the native function pointer of Frida. And then a list of input, a list of arguments, sorry, which is input. And you say, OK, we have exactly the same output and the same return value. And you say, OK, but you have exactly the same thing. So your DBI is can I do less? No. We are adding right now. What we can do, it's adding instrumentation to it because here we have just executed the original code but inside the binary, the DBI, sorry. And to add the instrumentation, you just need to use the history and create something like an instruction callback. OK? So here, what I have done basically is I have created a JavaScript function that will be called at runtime for every instruction. And this callback will basically here just dump the general purpose registers and also ask for a total analysis of the instruction in order to have the assembly, the disassembly, sorry, and the original address. And this kind of analysis, by the way, they are managed by the DBI and there is cache. There is a lot of things behind. But it's very simple here. We don't know what happened, but it's magic. You have the analysis of the instruction. And after this, you only need to add your callback. So I cheat again. So what I ask here is that I add a code callback before every instruction and I will call my JavaScript function. So now the callback is added and we can just go back and call again our input function. And here we go. What we have is our JavaScript function which has been executed. And as we see for every instruction, we have the address, we have the instruction, we have the full GPR context, so the general purpose registers. Well, there's lots of them. We can see also that in between we have the call for the external library. Here we are jumping outside the DBI and it's basically the standard library of the Lipsy which is doing stuff. And at the end, oh, not surprising, we have a red instruction and it's returning zero. So everything works as before. So QDBi is an open source project which I've just released it a few days ago. So don't hesitate to give it a try. It has also been released under a permissive license. So feel free for any suggestions or pull requests or anything. We have a channel on FreeNode. And yeah, just join us if you are interested in this project. And I would like to end by really giving a big thanks to our colleague at Quark's lab and for all the beta tests and for supporting us every day. And also especially to Paul and Joe because they have done major contribution to this release it will not be the same without them. And also a big thanks for Quark's lab, our company, to allow us to release this software like with a permissive license and allow it to grow by its own. So yeah, thank you very much. See if you want to ask questions. It's like a speed dating. We only have one minute, 10 seconds left. I'll try, hopefully, a quick answer. Can you contrast this with Dtrace and its user-land support? Dtrace is really cool, but it's not really cross-platform and not cross-architecture. The main point to be very, very fast because we don't have time, the main point is like here you really have, it's all about the granularity. And here you have the granularity is instructions and more than that you can be before or after every instruction. And this, only a DBI can provide you or a debugger if you have tons of time. But yeah, this is really the main difference. It's complementary tools, basically. But you can get them later. And the next talk will be 15 minutes. It's growing up software development. Thanks. Thank you very much. Thank you.