 What does it look like? Yeah, so, it's pgf. And what's good? What should we do? F11, I think. Maybe not. F5? Something should be. What does it look like? It should be like... Do you have any clue how it's been done? You can actually do the full screen of the presentation. F11, no? It's very simple. Full screen mode. Control L. Hey everyone, we are about to start. Please turn off the sound on your phone. Like, each session I can hear the phone ringing. Please do it. And don't forget to vote. The mobile app or the website for a favorite talk, presentation and workshop. And please welcome Dmitri Levin. Can you hear me? Okay. And Dmitri Levin, the manager of stress. And today we have a little stress session. So in the first half of an hour, I'll be talking about stress fault ejection. And in the second half, my colleagues will talk about stress structure. So... Does anybody know what stress is? Please raise your hand if you do. Okay. It's going to be fun. So as some of you know, stress is a traditional diagnostic debugging and instructional user space utility for Linux. How traditional? Well, about 25 years old, so quite traditional. It used to monitor interactions between processes and Linux kernel, which includes system calls, the most known part of it, but also signal deliveries and changes of process state. Stress has a traditional common line interface, so it's easy to use, right? And it also has various filtering capabilities that make it a quite powerful tracing tool. And the last year, that's the most interesting part of it, the last year stress has been extended to tamper with traces using so-called, this is called fault injection. So today, I am not going to focus on traditional stress features, but I'd rather talk about this new feature, which is quite untraditional for stress. But you can see some traditional output too. So what is a fault injection? It's a software testing technique used to improve test coverage of error-hunting code path that might otherwise really be followed by the means of reducing faults. Thanks to Wikipedia for this nice definition. I wouldn't be able to produce something like this. So where do we place stress among other fault injection tools? It's obviously a software and time tool that does it by means of this call into precision. It's user space, not a kernel space. It's unproved, and it has a traditional command line interface. So now I'll show you a series of examples so you get an idea what the interface is and what can you do with it. And I'll start with a simple program, a cat from Corutals, which is dynamically linked with libc. So in the top block you see a traditional stress output with a simple filter that shows all open just calls. And in the bottom block you see the simplest fault injection syntax where you tell... No, it's not going to look better this way. Where you instruct a stress to fail all open just calls. From this you can see that the first just call of open made by... it's a dynamic link, by the way. It tries to open its cache, obviously it fails, and then it tries to find libc in four predefined locations on this architecture. And fails and then it gives up. Let's try to change the default error because the default error is inocys that is functionally implemented. Let's try something different. Let's pretend that all these files are not found. So tell that error is in the end, that it's not such valid territory, and you can see that it makes difference for dynamic linker. When it sees this error code, it tries to find libc in four more locations. But it has no chance. Let's limit the number of places where we fail this open just call to just the first one where dynamic linker is trying to open its cache. It looks like the previous pictures, the previous slides when you've seen that when dynamic linker can't open its cache, it just iterates over predefined locations. And it finally succeeds and everything is fine. So now let's do this. Let's fail second and all subsequent opens its calls. So when dynamic linker successfully opens its cache and reads the location where libc is supposed to be, you see it tries to open it, fails, and then it falls back to the same procedure you've seen before. It iterates over its predefined locations and expectedly fails. So you see it tries the same location and tries the first one, but it doesn't know about it because it's just different parts of code. It's just a coincidence that they are the same, this location and this location. Now let's go on and let's try to mess with open libc's calls made by cat itself. So it looks like you would expect it to be. When open syscall fails, it cat just reports an error and exits with error code. This is the way how you do it. It's a bit artificial example. So you can see how you can inject false, not just a subsequent, but with a step. So this instructs a stress to inject false and with third opens syscall and then every second one, plus two you see. This way you can do some funny stuff like fail third, open and then fifth open and you can see that cat has this situation properly. So it opens whatever it can and protects it and reports about those that it can't open and exit status indicates an error. Now let's combine a fault injection with a path chase filter. So in the top block you see a traditional path chase filter. So you see all syscalls that have something to do with the file specified here. And let's now fail one syscall after another and see how cat is handling this. Starting with open, well, open we've just seen what's going to happen. So let's go to the next one. The next one here is fstat. Let's inject some plausible error. fstat is not likely to fail, but what's going to happen if it does? Cat treats it as a fatal error. I don't know why. As you can see from this traditional output, it's actually not needed much the result of fstat in this case, but cat treats it as a fatal error. The next syscall however is an optional because as you can see it's optimization. It instructs kernel to do sequential reading. So when it fails, like it's not available, it could be. Cat just goes on. And then a reader happens, a hard reader happens. Cat also treats it as a fatal error, which is quite logical. But what will happen, what do you think will happen if say a temporary error will occur with read? Would a cat restart or retry this syscall or not? It would. So if you inject an interactive system call, it retries. Cat is a good program. That's the most funny thing about cat. After all this reading is done and the processing of this file is over, it tries to close it. It's a read-only descriptor. If it fails to close this read-only descriptor, it treats it as a fatal error too. I don't know why. It doesn't sound logical. The processing is over. Oh, nice. So some more examples of fault injection syntax. Here you can specify several ones, like in the first example. And in the second example, you can see that a trace can follow descriptors and actually can see files behind them. So you can use a trace as a trivial, like access control filter, which is a bit funny. Okay, now to real bugs that was found with this tool. The first one is the bug in Python 3.5. Every invocation of Python requires some random data, and when it fails to open or read data from every random, it generates a fatal error, but instead of exiting gracefully for some reason, it prefers to raise a six-fold at a hexadecimal address 50. Why do I think Python does this? It's just a method of an object that's not allocated, so the object is at address zero and 50 hexadecimal is just an offset of the method. That's ridiculous. Okay, and another bug was found in Lipsy dynamic linker. For some emission, it doesn't check for the return code for the first mProtect call, but does for all the rest. And mProtect call is allowed to fail, for example, because of fragmentation, so it's a clear emission on dynamic linker side. Why it happens? Just because there are different parts of code, and this was the only part in Lipsy where mProtect was not checked properly. So what's going under the hood? As you probably know, a trace works using kernel API called ptrace. It's the same API used by GDB, for example. So when traces invoke, this is called, what happens? First kernel puts trace into so-called syscall interstop state. This awakens a trace. It fetches syscall number and arguments from registers and memory of trace. It applies its filters. You've seen a lot of them already. It may decide to print something. And then it tells the kernel to let the trace go on. The kernel is likely to execute the syscall if it's valid syscall. And anyway, then it puts the trace into so-called syscall exit-stop state, which also awakens a trace. It then fetches syscall return code arguments if a trace thinks it could be changed. If the syscall is not filtered, a trace will print something. And then it tells the kernel to let the trace go on, and everything repeats until something stops this. So if we take all these two parts together, you can see in this trace workflow there are two places where trace could tamper with syscall numbers and arguments and return code and whatever with syscalls being invoked. So this is the way, actually, how syscall font rejection is implemented. A trace changes the syscall number to an invalid one, which is minus one. And then kernel sees this. It's an invalid syscall. So it just returns an error code. A default one on most architectures is in a sysfunction not implemented, but on some it could be a different code. So you shouldn't rely on something specific on this. And then on existences call, a trace just replaced this error code with the code that was specified, if something was specified. So it's quite simple, but quite powerful at the same time. As you can see from this slide, a trace could do something more than a classic fault injection, which is replacing error code. It could do some fancy stuff. For example, it could inject a signal. I don't have any ideas for real use of this, but some people do. So this is an example of how you would inject a signal right at the exiting of syscall. So you can just dump core and then do the core or whatever, like attach gdb to it. Or you can skip syscalls, pretend that they are successfully completed, like inject or substitute an error code to a successful one or whatever you like, and pretend that in this example that unlinked syscall succeeded and the file was removed, I mean unlinked, but in reality it's still there. But the trace has no idea. So this is mostly all I would like to tell about this fault injection stuff. This trace is a free software project. You can find it in many places. And now your questions if you have some. So the question was when the fault injection happens on entering or exiting. It actually happens in both places. On entering the syscall number is replaced and on exiting the return code is replaced. So it sits in both places, right? So the question was what's going to happen with resources? When fault injection happens no actual syscall is invoked. So it's just a natural failure of syscall. It's emulation of syscall failure by natural means. So if open fails it's the application that should handle this. That's why all this thing called fault injection that's what it's all about. You inject a real fault not an imaginary one and then application should handle this properly. Success injection is something different and all these questions would be applicable but this is a rather modern thing and it's not stable unlike fault injection which is part of already released a trace. In theory you could do this but currently we have no interface for this. So the question was could we change syscall arguments with fault injection? The answer is that technically it would be possible if we could think of some convenient interface to do this. Currently we have no interface for this but this is likely to change in future. A functionality available as a library as well. Is it still possible to build other things around a library? A trace is traditionally an executable so you probably could write a library wrapper around it but by itself a trace is an executable but you're the second one who asks this question and I'm starting thinking that it might be useful. I know some people are hacking a trace for this but we don't have an interface. The question was could we inject delays? The answer is that it's theoretically possible but we don't have any interface and a delay is not possible. The question is theoretically possible but we don't have any interface and a delay would mean that our... What kind of delay? For example, when a trace traces several processes should it delay everything? Or should it delay just this process? It needs some thought to make some reasonable interface. So it's not a question of implementation it's a question of good interface. Any more questions? The question is could we specify some randomness? Originally it was proposed to specify some randomness in this interface you've seen but currently it's not implemented and to be honest I'm not sure it's a good idea because if you're doing some fuzzing it's probably up to a driver of this fuzzing to specify randomness. So probably a trace is not a really good thing. Okay, so I don't see any more questions. The last question, yeah? It's a well-known stuff. So it's a very well-known thing how a trace affects performance was the question and the trace is a synchronous program so everything is like synchronous. As you can see or probably as you've seen on this workflow slide a trace evakes several times a lot of context switches so it's quite a big overhead but it's the only synchronous interface so nothing is lost. Everything is very slow but nothing is lost. All modern kernel interfaces are asynchronous so you can close some events but it's much more fast and orders faster. So a trace interface is just slow but synchronous. So the last question, really last question now so the question was why didn't we drop a trace in favor of Peruv which is asynchronous because for some tasks we need synchronous. Just need synchronous and we cannot allow losing any event. For different tasks we need different tools. That's why. The main difference is that it's synchronous and when you need this tool to be synchronous you can't use Peruv. When you don't need you can. That's life, I think. Thank you.