 Hello everybody. Thank you for coming. My name is Dmitry Levin and the chief software architect at Bezalt, where we do GNU-Linux operating system. But I'm also the maintainer of S-Trace for the last, slightly more than last 10 years. So today I'll be talking about post-modern S-Trace. What is post-modern S-Trace? I used to talk about modern S-Trace last year. So I understood that I can name it modern S-Trace any longer if I'm talking about very recent features. So where traditional S-Trace ends and modern S-Trace begins, and when modern S-Trace ends, modern S-Trace never ends. So when it turns into post-modern, it's kind of subjective. So my definition is very simple that the S-Trace that was before I started maintaining it is traditional and all the rest is modern. So here it goes. Post-modern is all new features since the last talk. So I'll be covering mostly what have changed for the last two years. But I'll remind you briefly about traditional features just to refresh these features in your memory. S-Trace is a mostly Linux system called Tracer. It also can not just trace but tamper the system calls since like several years ago. But it has a lot of options to control its behavior in different ways, like whether it prints instruction pointers, whether it prints timestamps or not, how it prints strings, what system calls are printed and in which way they are printed, what's abbreviated and what's not. There are also options to control what signals are printed. It can also dump the data that goes through descriptors. It can print its output in different ways. So you can, for example, redirect it into a pipe or collect output for each process separately. A lot of features that control what signals would be printed. It can also print statistics on system calling vacations. It can attach to already existing processes. It can follow forks and it can don't follow forks, depending on whether you specify the option. Well, that was traditional. There were also quite a few options added for the last ten years. You can print a lot of details about descriptors, like what paths are associated with them or what socket information is behind sockets when these descriptors are sockets. We can print stack of user function calls. You can filter system calls by path names. We finally got support for regular expressions for filtering system calls. So you can specify which system calls are printed using regular expressions and so on. Yeah, more ways to control how statistics is printed, what is being, how it's traced. So you can, for example, attach to many processes. You can run the strays as a detached process and so on and so on. Well, and there's also this big feature which changes the strays. I mean, it changes not just the strays but the way how people look at it. It's system call tampering. So you can not just trace system calls but also inject various things, like starting with return code. Also you can inject signals and delays. But this all was more or less covered in the previous talk. So in the last two years we got pictures get system calling for support. It went both into the kernel and into strays. We got system call return status filtering. We have SICOMP assisted system call filter nowadays. There are also a lot of new system calls in the kernel that are supported and we have more and more elaborated system call parses. We also finally have long options. Yeah, we had no choice. We will soon see why. And finally, a bit more than a year ago, we changed our BZ style lines to a couple of left license. So let's start with the first feature. Well, the story itself started very, very long. Like I think it was 2001. Then this new architecture X8664 appeared. So the way it was added in Linux kernel obviously was to support both 64 and 32-bit processes for various reasons because it was the main feature of this architecture compared to its competitor that it could run legacy code. In early years of this, there were a lot of legacy code and very little native 64-bit code. But the way it was implemented in the kernel, it allowed not just to mix instructions, but also mix system call notifications. So you could actually invoke from a native code both native 64-bit system calls, but also legacy 32-bit calls. And it was very poorly documented if at all. It was very surprising to many people. And it wasn't really exposed in the kernel API. So yeah, what user space duration debuggers could do, they could fetch the system call number. They could like, fetch this register that describes the bitness of process. And then they would just guess, do the wild guess and say, well, if the process is 64-bit, then probably the system call is also 64-bit. It's mostly the case. And if it's a 32-bit process, then system call is definitely 32-bit. And all the logic depended on this wild guess. And it mostly works because in most cases, it's exactly what happens. But sometimes, it's not the case. And back in 2008, there was a bug report against a trace in Debenbach architecture. There is a very simple example that looks, you can see are very similar to it. It's somewhat simplified compared to the one reported in Debenbach report. So the program does a very simple thing. It just brings a line of output. And then it invokes a 32-bit system call. And then it brings another line of output. But this 32-bit system call is actually a fork. So what happens is that there are two processes, and each of them prints the line. So if you compile a link and run this program, you'll see an output similar to this. Well, maybe the numbers will change, but all the rest will be just very simple. But if you run this very simple program under stress, you will see something very strange. So you will see this line is being printed, and then suddenly the process attaches. And then you'll see this ridiculous open system call with very, very odd, very impossible, I would say, arguments. But all you can say about this is what? And all the rest looks very usual and regular, making the whole picture completely ridiculous. Like this ridiculous open among all nice expected system calls. So yeah, if you run this program several times, you will see that all these odd open flags are different. You will never see the same combination, or probably never see the same combination flags. Because nowadays, thanks to kernel address indemnization, all these registers contain garbage that changes. And this reminds me of a toy I had in my childhood, a kaleidoscope. You turn it slightly and you see a different nice picture. So you can use this simple program as a kaleidoscope, if you like. Yeah, so this problem was approached several times. But until 2018, there was no progress. And finally, well, thanks to two people who contributed this API in the kernel, and two, yeah, two authors and 20 more people who reviewed this. And I could add silent bias. It took us almost nine months to get this into the kernel. And like, don't remember how many iterations, but it was two digit number of iterations. So finally, we have it in the kernel. And for all architectures that support trace troops, which are like, all supported architectures, or almost all, I would say, and some that are not supported, but get it for free. We have this and yeah, the API looks this way. There is a structure you can request from the kernel. It contains this crucial architecture field. And in other ways, it looks similar to second data. So you can obtain in one go both the architecture, the syscall number, syscall arguments, also instruction pointer, step pointer. And this is, this makes traces that use this API reliable in this respect, in this back to the original problem. So the same program now looks, if the, if Linux is freshen up and this traces freshen up, you see this is like, as expected. So process attaches, you see the proper for call and not this ridiculous open. And all looks good. So I think other traces and debuggers that have something to do with this system calls should switch to this API. By the way, it also allows to find out what kind of P trace stop is the current stop. Otherwise, up to this time, kernel provided no way to find out. So they used to think that they alternate. So first you enter syscall and exit syscall, but it's not always the case. So you actually can use this nice API to find out and what is the, actually the P trace stop you're dealing with. Okay, so it was a very major feature for this trace. And yeah, as I said, some other other traces are welcome to use this, of course. Let's speak about system call filtering. There is a new option to filter system calls by written status. It had a very unusual history, first trace. So it was actually introduced in 2002, but it was broken from the beginning. It was never announced. You couldn't find it exist unless you accidentally type it in or loop into the source code because it was broken. So what it did, it printed the beginning of system call and when it failed, it just didn't print the ending. It wasn't useful. But now you can filter system calls by written status. So you can print only those system calls that succeeded or those that are failed. So in this very simple example, you can see the difference. Well, if you run a very simple program like cat with a modified LD library path, it makes a dynamic linker to look into different places. I wonder whether you expect dynamic linker to look into so many different places. But well, you can see the difference. As a very useful side effect of this option, you can have an aggregation for free. So for example, if you trace several processes that are running asynchronously, then you will see a lot of this unfinished and resumed stuff. And sometimes it's not very convenient. We used to implement special aggregators to collect this data, so it would look like this. But now, thanks, you can use this option also to aggregate. The only need, I would say, is that it might change the order of invocations. So in this example, it looks like if nanoslipsis calls were invoked sequentially, which is definitely not the case, they were invoked simultaneously. But because they were printed at the moment they finished these system calls, it looks not the way you are used to. But then when you're aggregating, it doesn't really matter in which order they are printed. So there is also another option. There is a funny story connected with it. So when I try to come up with something useful as an example, I started invoking all programs I had in my small churrut. And I found out a few programs that were not printing their arguments correctly, then they couldn't find them. I just invoked programs with a non-existing file. And I found all these programs in LFutiles and I fixed them. But you can get an idea when this could be useful. For example, when a program doesn't print what's going on, you can chase and have a look. When you're filtering system calls, you probably don't want to, if you don't want to print all the rest, you probably want to make both system calls you are not printing execute faster. And now we have a very nice feature, which we planned for several years, but couldn't get until, well, we had two GISsoc Google summer course students. In the year before last, the student made a prototype. And, yeah, last year we had a student who is going to talk about this feature very soon, I hope. So he will describe how it works. But from user perspective, it looks like trace is no longer delaying everything by two orders of magnitude on those system calls that are not traced. So in this, yeah, it's a famous example because this is a modification of example BPF people use to describe how slow a trace is. And now we use the BPF stuff to show how fast a trace is. So you can see that Sikonbeveff itself slows down things about 10% which is nothing compared to what all these pitchers stops do with the speed of running programs. Yeah, you can see this is a long option and it was actually the first option that we couldn't find a good short analog. So we had quite a few, not as many as a less program has, but quite a few options and some of them are not obvious. And we had, I think, what is dash N is in our prototype, but we couldn't find an explanation why it should be called dash N. So we decided that it's time to introduce long options. And now we are starting adding our analysis for not so obvious names. Yeah, so Sikonbeveff was the first one. Another option which should probably have a long option analog and is the option that has named dash K. I don't know why it's called dash K. It prints a stack of user calls at the time of system calling vacation. Yeah, it's a very useful thing because you can see the logic that behind the program, if you don't know what's going on, you can just apply this. It will produce a lot of output, but it makes a trace somewhat kind of debugger more than a tracer. So in this example, you can see why, for example, cat closes is the doubt. From names of these functions, you can see that it does some kind of exit handling. And it closes doubt to ensure that everything is written, otherwise it should return non-zero exit code. Yeah, another option you can attach, you can use a trace as an attached process. So because this trace affects traces in different ways, it's not always desirable. For example, if these processes interact with their parents and they want to know their PID numbers, so you can run a trace this way and be more transparent. There is also a relatively new option that says how all these symbolic constants should be printed. So you can print, as usual, translate these numbers into symbolic names. So you can print both symbolic numbers and draw numbers or just draw numbers. It has various useful implications. You can debug programs that you suspect pass arguments to system calls in a wrong way, which is not very surprising because on different architectures, there are different system calls, different APIs, different number and order of passing system call arguments. And this also can be used. I think it's used in this caller project. So yeah, we added support for all new system calls that were added into Linux kernel. And nowadays they started adding system calls again. So there are a bunch of new system calls that work with mount points. Well, I can't describe them all. There are too many who should look into mount pages probably, but we have support for them. We also have a lot of very sophisticated system call parses. And I'll show you an example which looks very monstrous, but you will get the idea how sophisticated system call parses could be. So we support the coding of net link protocol. You can see this output. This is the output of a very simple routing table. And here you can see what's going behind. So you see this net link protocol is very structured. It has some structures, substructures, sub substructures, and everything is printed. Coloring is mine. All the rest is made by Astrace. And the last but not least is that in December of 2018, we changed the license. So Astrace used to be released since the very beginning under Beckley-style license. So it was by request of Paul Kroenbuck. I don't know this man. It was too early. So when we added support for Petris Gates calling for API, it was kind of crucial point. Most contributors to Astrace didn't want to contribute under permissive license any longer. So we decided we would drive a change to a copy left. So test suite is released under GnuDPL V2+, and all the rest is a license that allow us to release this as a library someday in the future. You can manage to make a library out of Astrace. So this is more or less what I wanted to talk about. And if you have some questions or ideas or something related to Astrace to discuss, you have some time. Should I repeat the question? So please be concise. Should I pick from the back to the front? So the question was why this system called filtering is not by default and why you have to type this very long option to use this feature. First of all, you can abbreviate long options. So you don't have to type that much. I think two letters usually enough, if not then type three letters. Some shells allow expansions of program arguments. So I don't think this is a problem. But well, there are two important points about this way of filtering. It's first, it generates and attaches a BPF program. I'm not going to dive into details, but it makes this program and you can attach it, but you can't get rid of it unless you're privileged. So this implies that you have to follow forks. You have to follow all processes that are forked by the process you are tracing. And this kind of change behavior. And one of important points of Astrace is that Astrace is backwards compatible. So we can't enable follow forks by default because people are not used to this. Yeah. And if you specify this option and do not specify follow forks, it says that I am enabling follow forks. Hey, so this is one point. Another point is that unless you are privileged and Astrace is used as an privileged program, you can attach a BPF program to another process. So you can attach to a process using Ptrace C or Ptrace Attach, but you can't attach a BPF program to another process. You can only attach a BPF program to yourself. So one of the important features of Astrace is to trace already existing process wouldn't work with this unless you're privileged. But if you're privileged, you can use a lot of kernel tracing nowadays. It's not really a big deal, although they don't have so elaborate traces. Yeah. Yeah, please, another question. Yes? So you mentioned that on the last slide that the color was your own. Would you consider adding color output to? So the question was that on this slide, the color was my own. And would I consider coloring by Astrace? It's kind of, this is a difficult question because we had actually a plan to generate a structured output from Astrace. And if you generate, for example, some JSON output, you would apply already existing software to do all this fancy stuff like coloring. So we decided we will make structured output first, and then other people will do whatever coloring they like. But as you can see, there is no structured output yet, and I have to do all the coloring. Yes, please. Sorry? Me? I think I can. So the question was whether I can pretty print this. I think this is pretty enough. So what was your question? So whether I can print this in blocks so it'll be easy to read. Yeah, it's getting closer and closer to our idea of structured output. So yeah, you can see why we decided to go the simple way, but it was not so simple. Is it over? Yes. Okay, thank you for coming.