 The last day of your BSDCon 2014, our next speaker is Masao here, who will be telling us a few things about some useful tools supporting to the BSDs. Hello, thanks for attending. My name is Masao Uiebayashi, coming from Japan, and I'm self-employed. I have my own company doing BSD, hopefully BSD development. Now I'm working for a customer who is developing a network for your product based on open BSD. Their business is really good and they are developing a good product and extending open BSD, and they have some local extensions, including networking-related demons. And it works, but they also had experienced some troubles, problems, and especially memory leakage, because networking demons run for a long time. And if it's called battery, it's slowly leaking memory and it ends up with severe problems. And they wanted to fix it, but finding those dynamic memory leakage is very difficult. And actually there are not so many options to find those memory leakage, memory defectors. So they decided to port and berglings, which has already proven its usefulness and found many problems in Linux. And so, previously people already ported berglings. It's not really used wisely, but they have ported and it is said to be working. But if there is still a big uncertainty, if it really works on open BSD, because open BSD has many local features to make it more secure. But my customer is very good and they are generous and they allowed me to work on this uncertain project. And my porting project is not really finished, but I have reached to a point where I can finish this project. And so I just want to show you first that it's really working and it's real. Has anyone tried berglings for open BSD? No, okay. It's not really, I think it worked 30 minutes ago, but just give you one minute. Zero. One. You can tell it's the password. Anyway, I think I can show you. It's very close just from there, but it's a little difficult. Anyway, so I think I suppose all of you already had the name of berglings and you already know how to use it vaguely. It's basically most known as a memory profiler, memory debugger. And actually it has more features internally, but for now I'm already working for a porting memory profiler. And porting all of them is a little too much work. And it's basically for developers because it helps development and actually it can run unmodified programs. So we can use it, just users can use it, but it does not help to accelerate watching movies, something like that. It just slows things down, so it's not really helpful. And where our berglings work is not finished and it still has bugs. So if you want to use it now, you have to debug berglings. That is not fun, but still I need some help from you to stabilize and detect. So I don't want you to involve a bug in development deeply, but still. So I need some help from advanced users. So I set three goals in this presentation. Part is involve more people and explain its internal. And I want to focus especially three things, that is system call and exec ve. When berglings executes a debug target program, it internally does what kernel does for exec ve. So a berglings has to simulate what kernel does in exec ve. And it also has to simulate our signal features. And it's very fun. Ideally, berglings can work without any fixed changes against OpenVC locally. But unfortunately, I have found some changes to be done. So I have to convince community to accept those changes. So you can see this explanation at the up sharp side of berglings. Or there are papers that berglings is a framework for heavy weight dynamic binary instrumentation, which is a little too difficult to understand for me. And you can also see Wikipedia and it says it's essentially a virtual machine. So this is much easier to understand. And in my opinion, it describes the things more correctly. So it's basically a virtual machine and emulator. And once it starts running, what it does internally is basically a loop, which interprets client's debug code and disassembles it and executes those instructions one by one. It's basically what it does. And instead of actual resistors, it has virtual resistors in memory. So if you see some move instruction, which moves value from one register to another, what it does is just copy that value in virtual CPU. This is the basic. This is another view. Berglings is special in that it executes debug targets within the same other space. It's one process. Berglings runs somewhere in one process other space, but still target client, target program is mapped in the same process. This is very special for Berglings, I think. But actual stock, given by Cornell, is used by Berglings by itself. And client stock is allocated somewhere else. So for clients, it's basically the same. It does not know if it is executed in Berglings or not, but there are some differences. You can see many documentations and papers at berglings.org, which is good. And my project is based on previous reports hosted at BitBucket, and I could just fork that branch. This is our view, static view, how the code is structured. In a sense, Berglings is emulator, and they have emulator code, which is very portable. Actually, I had to change no code in the central core emulator. It's called VEX. I had to change nothing about it and accept some basic type definitions in headers, but it is written in a portable way. And it also has to interact with Cornell to do many realistic things, like system call and signal or something like that. To port Berglings, I had to modify those parts, which I'm explaining. Actually, those resources Berglings project provides papers all about emulators and theoretical parts, and realistic parts. There is no documentation. That is the problem. There are some numbers. Many machine-dependent files and if-defs. They have some definitions for if-defs. You cannot expect a quality of code like VSD. This is a typical example. They have in single dot Cpy. You see one if-defs, VGP. P means platform, and X86, RENUX specific code, like this, and AMG64 RENUX, like this, and like this. This is not beautiful, but it works because they already have... You see this one. If you forget to add your platform definition, you see a compilation error. So this is not beautiful, but it helps to do development. So if you port Berglings, just get used to it. So Berglings is basically developed for RENUX originally, but it's ported for Darwin and Android, which is almost RENUX. And the Darwin code is already merged into their official branches. And as I said, there is a FreeBSD port, but unfortunately it is not merged into the official branch. I don't know the reason, but their page declares that FreeBSD port is maintained out of tree, which means they don't intend to merge FreeBSD codes. I initially wanted to merge my work in FreeBSD ports into the main tree, but I have changed my mind, and I think we can maintain our changes as external ports. As far as we don't want to extend the basic functionality, just maintain port it. And now we have good version control systems like Mercurial. It is doable. So I think I'm happy with maintaining ports in the beta packet project. My work is kind of between alpha quality and beta quality. At first, of course, it didn't work at all like I'm calling initial development a stage. And after that, some limited, most simple programs start to work. But at that time, you don't usually have no idea if test code had bugs or a bug in the head bug. So you have to use some programs. You have to debug both things. And bugging is basically a debugger. But if debugger has a bug, it's very confusing, and it's not really fun. So I'm now trying to hard to make the situation from here to... and no range of developers who don't know Berggrind can test and help. Yes. So I want to explain system code. Actually, Berggrind has to simulate a system code. So I have to understand system codes to simulate it. And system code from our user application point of view, it's all C function. But internally, it's... Okay. This one example is... This OpenBSD is a little system code. If you call the system code from your code, it's just C function. Jump this here. And internally, there is a queue on instructions. And the real system code instruction is here. This is M1. It has more arguments. So a little more code. And basically, from kernel or from CPU, system code instruction is just one of Trap's exceptions. And when application code is running while running, and it gets hardware interrupt, CPU and kernel tries to save all context registers into some areas called Trap Frame. And after CPU finishes those tasks, it just restores those information back into our CPU and just returns back to application. And applications does not notice that it was interrupted. For system codes, it's very similar for kernel's point of view. And it saves state into Trap Frame as well as for interrupts. But this time, application voluntary calls exception at Trap's, and it wants some information, or it wants to do something for kernel. And when it comes back from kernel, it sees something changed like some memory areas are written, or it was read by kernel, and some registers have changed for returning values. So it's basically Trap. And while entering kernel, kernel does the things for application, and it modifies frame struct and just returns back to application. So kernel has to change Trap Frame, and it also saves some state on struct too. So Belgrade has to simulate this, and it uses system for both for emulates, both for application debug target, and it also uses system code for itself. And Belgrade does not rely on host libc for some reasons, including to avoid symbol conflicts. I don't know all of them, but it does not use libc codes, but instead it has its own system called wrappers, and uses it, for example, just open debug target file, read its elf headers, something like that. Basically, Belgrade simulates what kernel does for system code. The difference is that, as I said, Belgrade interrupts client text, and it maps in the process, and it reads instructions and executes one by one. And resistors are also allocated in the same process address space. So basically, most of system codes are just passed to real kernel. For example, if client wants to read some file, Belgrade, of course, has to read some file descriptor, some data from kernel, otherwise it cannot pass that data back to client. And basically, it just modifies reads a system code. When Belgrade meets system code instructions, it checks machine state structs, and its contents has arguments to system codes, so it can know what debug target wants to pass to kernel as arguments. And of course, it has to return values as system code return values via resistors. So Belgrade fills client machine state and chooses right resistors and fills it. And internally, it actually calls a system code against kernel, and those wrappers are written. Belgrade has its own wrappers to call real system codes, because when Belgrade sees client system code instructions, it's running Belgrade code. It's Belgrade kernel. But when Belgrade executes real system code, it's for client debug target. So it has to carefully swap contexts. And when it actually calls real system code, it carefully reads resistor values from client machine state structure and moves them into real resistors and just call it. And after that, carefully restores resistor values back into machine state. So you have to carefully implement these assembly wrappers. I'm sorry for just including the codes. But it basically... You see there is sigmask. It has to block all signals during calling system call. Signal has many problems, and I will explain later, but basically, Belgrade pulls signals. It does not accept signals. And it wants to control all signals explicitly, so it basically blocks all signals during system call. Then later, these instructions are just looking into structures and assigning them into real resistors to press them as arguments. And you have to set system call number two, of course. And this is a real system call. And later, you have to put back resistors into some structure against restore signal pro, signal mask, and that. Belgrade has many... It can have hooks for all system calls. Belgrade tracks memory read-write when memory is written or it's read. It's easy for user programs, but it has no knowledge. It cannot know... It has no knowledge what memory is written or read by kernel. So without these narratives, if some memory is written by kernel in system call and a user application tries to read those memories, if without these narratives, there will be mistaken reports that you are touching an initialized memory. Like that. So we have to teach all these information for all system calls. This is a little messy. And if kernel does system calls conditionally read or write, it's much, much more complicated. And it's impossible to maintain perfectly. Okay. Next... Exactly. When, as I said, when Belgrade executes debug target program, what it does is actually exactly... it has to create an initial process image and execute. So... Belgrade caused this operation in its image and it caused just image. And... So our initial process image is something like... it's just process memory contents and initial registers and our power process resources which is stored in a kernel and struct process in OpenBSD and struct block. Depending on operating systems, there are many power process resources. So initial memory image has a text, of course, and database VSS areas which can be mapped by parsing ELF program header. It has road entries. So... For kernels, kernel has ELF header parser in kernel and it parses and look for all those road entries and internally map those V nodes areas for it into process address space. And the initial process image has a stack with arguments and environments and auxiliary information for dynamic linker which... these are metadata for processes to initialize... initialize a program to set up things for our main function. These are done by a start code in user-rand. So there is hidden ABI, there is hidden promises between kernel exec and our start codes in user-rand. Of course, there are resistors which are used to pass parameters from kernel to our start code. Like this. So, exec ve. So, along this development, I had to understand exec ve function in kernel to simulate it on background and had to figure out which part I need and which part I don't need. But when I looked at the exec ve function in NetBSD at the time, it was wrong function and it has... I had no idea which is needed and dependency was unable to understand so I could not figure out I could not have splitting those functions into small functions. It was... it was fun. And another problem is all BSDs have slightly different ABIs for these and for application exec ve of course is standard function but as I said it has a hidden ABI hidden promise from kernel... between kernel and the start code and what content to put on stack and what values in registers they are not documented slightly different among BSDs which... those differences I could not see any benefit one over one they are just different for no reason so it's strange. So to initialize a process you had to fill these things and typically traditional BSD has on-stack RGC and RGV and environment and all AUX for mainly dynamic linker and those strings on stack and closer initial stack pointer has to point on top of those exec arguments but as I said there are a huge difference difference between among BSDs where to put stack and alignment and how to pass arguments for example an FBC has PES strings which is just a little duplicate of pointers connected to RGV and MV and RGC and count of MV strings in one struct so try to pass address of that via registered features I'm not sure it's helpful but not to be useful a open BSD doesn't have that so I had to adjust all those differences and carefully implement it otherwise things don't work I'm very good in the heads to simulate this exec VE precisely so again it has background runs or target debug program within the same process address space and it has our client machine state or register content in its memory space so basically it it sets content in machine state structure and it has to fill a client stock exactly the target device a target process expect 15 minutes signal signal is interesting thing and I think all of you like signal for me it was I have little knowledge about signal and I didn't read any signal code in corner but I had to understand it and now I like it and yes signal is basically same as system call it's basically a trap and for application it looks like a function if you press ask colon your signal action signal action function and as colon just call this function when signal is triggered but from colon's point of view it's kind of trap and just yes sorry it's special in some ways for colon basically a kernel just resorts application user and state when it triggers by hardware interrupt except one exception was system call and another exception was exec these are a few exceptions when a kernel transfers user and which is just not restoring original context and when this is about signal handle signal handle in user space so when application asks a kernel to call signal handle instead of just use it processes kernel has to set up context temporarily to call those client signal handle function this is done by allocating sorry executing a small code called a small code called signal trampoline it's just here some instructions just to call signal handle and just return back to kernel and signal trampoline expects some arguments from kernel which is put on stock and from application's point of view it's just called just signal handle function is called and when it exits original context is just restored but during that kernel has to save context the original context and execute 100 and again restore the original context the original context are typically saved onto the stack otherwise kernel has no space to save that information and of course these there are hidden ABI promises between signal handle and kernel they are slightly different among these this is a trampoline code for freebase of freebase.img64 which is in kernel and the lines of assembly code but these are copies into user space when executing signal handle and you see this expects that actual signal handle address is stored on top of stack this is ABI for OpenBSD also has signal code in kernel and copies those code onto a client user space stack and this one expects signal handle address in RAX register which is different from FreeBSD NetBSD is slightly different and special in that it implements signal trampoline in user for some reason so this code is called this fragmented code is called after signal handle exits this is put carefully and as a return code of signal handle slightly different and Berglint has a unique strategy to handle signal for clients basically it blocks all signals and pulls signals when it wants using a signal waiting for signal but you cannot block synchronous signals like segmentation for and how it works internally is very difficult I will explain it and the trampoline Berglint has no notion to copy executable code at runtime it may be possible but FreeBSD people decided to avoid that and provide a trampoline from user this is like how NetBSD does but it turns it works when returning back to corner user and really return back to corner it just cause fake return system code signal return system code and it's only handled by Berglint so this a bit simplified basically it has to simulate what corner does and ok this is for all synchronous signals like segmentation for and this is very very difficult to handle and actually Berglint as I said Berglint basically blocks all signals if possible and it receives other signals like synchronous and those conditions are stable sorry static when application want to block some synchronous signals Berglint manage it Berglint has it remembers those signal mask internally but still configure corner in the static manner so basically Berglint has static information internally and when Berglint receives a synchronous signal while during execution it jumps to you can read this in this paper it unpublished but basically it handles this is signal and wrong jump out of that context and try to the executing scheduler from there and actually execute or client signal from within schedule so I had just one minute I had to change base operating system to make Berglint work one needs system control okay sorry one minute to summarize I could manage to work to make Berglint work almost work but I still need a lot of work to support all system codes Berglint is good for users and it's good as far as it works for users but internally for developers it needs a lot of work and to maintain