 Hello everybody, thank you for coming to this talk. My name is Dmitry, and today I'll be talking about modern strays. So strays is a Linux system called Tracer from user space with the oldest history. So what is modern strays? Well, it depends. For the purpose of this talk, modern strays is all the features accumulated since the previous talk I made at DevConf, which was also called modern strays. So if you're interested at those things that used to be modern those days, you can have a look at that talk. But now we'll be talking about very new features actually since 2019. You can, quite a lot of them for these several years. There are several groups of features. Those that affect tracing process itself, most features that are about the tracing output, what would you like to see? Or a few features about filtering, what you don't like to see. And a few features about tampering, how do you like to change what you see? Also, one summer option and one funny option that makes a strays show you some tips. So let's start with this feature which is probably most impressive for all. This example you see, it's an infamous example used to demonstrate how slow a strays could be, how slow the programs it traces could be. And eventually we have a feature that makes all these things upside down. That is, strays no longer slows things down that way. By installing the second BPF program, the TraceIt program runs almost as fast as on TraceIt one. As you can see from these statistics, it's exactly the same example people used to show demonstrating how slow things are. So you see just maybe 10% difference when the high IO loaded process is under strays with second BPF enabled, compared to like 40 times slower. It looks too good to be true, right? However, it's really that good, but there are some limitations. Maybe these limitations are not that important for you, but you should know about them just to choose the right tool for you. Because of the nature of second programs that you can install but can remove, your nature refers to use follow forks mode because the second program is inherent forks. So unless you specify follow forks, this option second BPF is no option. Also, it is not compatible with any option that detaches strays. For essentially the same reason, you can stop this program. And what will happen if program still stays, but the tracer is no longer there? These second trace stops which are used to communicate with the tracer, they turn into circumferent earner stops which effectively disable those scores. And this is probably not the thing you want the program to do. Like in the following example, it's kind of a artificial example, but from this example, you can see how bad this could go. We're just tracing exit groups as score. And when there is no secomp, we just detaches strays and nothing is happening. But, well, in case of premature stimulation, there is no way for the tracer to work normally after its tracer was detached. So it actually sick falls because at the moment of exit group, it can't perform this system call. And then it goes to a hilltaping instruction that or something like this depends on the architecture and actually sick falls. So it affects the behavior of this process. So while secomp instrumentation in a trace is really fast, there are cases where you can't use it. Just you should be aware of this. So this is the reason why you can, for example, enable it by default or there is a quite old feature that enables a trace to demonize itself. By default, when you run program under a trace or it just forks a tracer and it runs as a child process. But in some cases, you don't want for a trace to be visible. So things are turned upside down and the parent process is the trace itself and the trace is going as a child or actually as a grandchild because it forks twice, so not to be the direct child of the tracer. So it wouldn't be visible that way. It's quite an old option. It's quite an old option. It exists since 2011, I think. So, yeah, for example, if you run something under timeout, there is a clear difference because timeout is sent in signal to this process and when it's trace, it detaches to early so you can see the output. So why I'm talking about a feature which is more than 10 years old because it's not enough. Simple demonizing is not enough as you can see in this example. When, for example, you are sending the C-kill, it kills the trace anyway because the signal is sent to the process group. So we added an option to also move demonized trace to a separate process group. Actually, you can specify triple D to move the trace to separate session if you need this. But for this timeout example, it's enough just to move to separate process group. So this way, a trace is not affected by signal sent to its tracing. There are some ideas maybe to enable this demonizing mode by default, I mean, demonizing with moving to the separate process group but the previous behavior existed for too long so we don't know maybe somebody is assuming the traditional behavior and we are like too picky about backwards compatibility so we decided we would rather add in this option. It's another option that controls a trace behavior which is very recent is the ability to stop tracing after a specified number of system calls which are those that are not filtered. So if you are tracing just a few C's calls, only those you are tracing are taken into account. People suggest this could be useful for some automated testing scenarios to attach to some running process and capture whatever they are interested in number of system calls in Detach. And in this quite an official example, I demonstrate the way how I use this sometimes. When I want to attach to a process that generates a lot of system calls but I want just a few of them. I just attach, grab these few C's calls and Detach. So for that kind of purpose, it's very handy. This is a very simple feature. Like if you are developing some multiplexer program like Commod or BC Box, which is a program that has several names and its behavior depends on the name. So it's a very easy way to test this program or affect its behavior without installing. Because when it's installed you can use its regular alias name, but when it's not, this is a very simple. So moving to features that control various aspects of a trace output. These new features allow you to see what's behind process IDs. For example, you can see what's the program name behind process IDs you see in the output. The option is called Decode Pits. Comm is a proc.pid.com. That's where it came from. It also has a short alias dash big y because decode file descriptors option has alias dash small y. So it's kind of analogous. When you are tracing programs that are creating pid namespaces, sometimes you want to see not just the process ID that's visible to these trace processes, but also the process ID, how it looks from the trace process namespace. Why it could be useful? Because otherwise you wouldn't easily see which process is which. For example, when, as you can see here, like process with pid two and three is actually process you see later in the left column. So you can actually see which process is which. And really you can combine both of this like display both common names and pid namespace translation. And this way it's even more visible. So you can see the program name kind of process com contains. And the pid in the target namespace and pid in the trace namespace and it's clearly visible. So in this example I use option decode pids all because it's more handy. And maybe someday we will have alias dash double big i. I don't really know why we don't yet have this alias. It reminds me of dash double small i which corresponds to decoding file descriptors information which I'll be exactly talking about now. We have one more feature to decode file descriptor information. It's information associated with signal of the file descriptors. So it's quite handy when you see system call accepting some signal of the descriptor you can see right away the signal mask associated with it. Looks nice. Also you can see a series of contexts associated with process IDs with file descriptors and with file names. So the same exactly the same example with without this information as you can see it might be quite handy if you use a series of looks. This is a short form and this is the full form. It's so lengthy. By the way all these strange looking arrows you see they are not produced by strays. So far a strays doesn't produce funny looking arrows, arrows. Also funny looking arrows is it's also doesn't produce. Unfortunately strays output with full signal context is very lengthy. So you can see like something very long. Another feature which is probably really important for those who are debugging some strange Assignments related errors is showing off Assignments context mismatch. Like in this very artificial example assuming that there is a file with Assignments context that doesn't match their database. Assignments will keep this information the information is in the database is that is unconfined and actually it's a system. So a strays in this second text full mismatch mode would show the difference. You can see how long these lines could be but if you're using Assignments you wouldn't be surprised anyway. And you can also show syscall numbers which is kind of strange why would need a syscall numbers. You need system calls and not the system caller numbers but here's the example. There is one Diane architecture called x86 which used to have and still has a few multiplexing system calls like socket call. And there are still Lipsy libraries that are using this system call for Barker's compatibility. So if for some reason you want to know exactly which way this system call was, this socket call was called via direct system call over a socket call. You can use this. Okay let's talk about filtering now. We have a feature that was announced many years ago as a dash Z option. It was announced but never worked. And only in 2019 with a proper return status filtering it's actually works. So you can filter system calls by its exit code. You can filter and show just only successful syscalls or only failed or some combinations. By default it of course shows everything but you can see how this be useful from this example. Like if you want to see just only successful syscalls you probably would use this short option because it's really short. But if you want to see something less common like those system calls that don't finish you would use the long option. A one less obvious consequence of using this status filtering is aggregation. For the obvious reason as trace doesn't know whether it would print or would skip a particular syscall until it finished it prints it at the moment it decides whether it should be printed or not. So it no longer prints all this popular unfinished and resumed stuff which could clutter the output. As you can see in this example this is without aggregation and this is the same thing but with aggregation but be careful it could confuse you. From this output you could think that these nanosleep system calls were issued in this particular order but you remember from the previous slide that they're invoked almost simultaneously. So the consequence of this aggregation is kind of reordering of the output but there is no other way to, if you want to see whole lines and you don't know whether they will be seen or not. It's probably the only way or aggregate afterwards. Like we have actually an aggregating program but it's not modern, it's from the previous talk, sorry. You can also filter system calls by the file descriptors numbers. So as trace would show you just those system calls that operate on the specified set of file descriptors like in this small program just a regular cut program but the idea is that if you can filter by path, by path to file, for example, there is no path at all. Like if it's a, I don't know, signal of the file descriptor, it doesn't have any path. So you can use this. It may be it's another slide. We, every now and then, we add more system call filtering classes because people cannot and shouldn't remember which particular system call names exist on this or that architecture. So there are groups like we added two groups for filtering system calls related to file credentials and to system clocks. Oh, okay. Poke injection is nice. We had, we have various kinds of system call injection for quite some time but so far we didn't inject anything into memory. So this new feature allows you to inject not just into exit status of a system call but right into the memory reference by system call arguments. And this somewhat artificial example, I substitute the second argument of open-ed system call by changing the stream itself. From it is a shadow for to some different name. So the system call succeeds. But unfortunately, it's not that easy to tell what's the file name because it's a hex string and you probably can that easily read this hex string. It's a pity, but this is the interface we use. And also you can inject into memory after exiting system calls. So when injecting this return value is not enough, you can inject actual value that the system call would have returned if it would have been called. So in this example, not just this system call value is injected but the actual value. But in this case, you can see what this hex is about. It's the string read by reading program. But maybe, maybe it's a good idea to add some interface to the string's poke injection to accept actual strings. So it would be a bit more readable, maybe, I don't know. We have one more option to control to control a statistics output because for the last few years we added a few features for gathering statistics. We can gather more different information about system calls. But by default, we show this. This is what we show by default. But if you look into the mind page, you would see that there are more of them. And for example, you can specify this or some other parameters you are really interested in. So okay, the last but not least feature is called dash dash tips. It makes the styles show you various tips, tricks, and tweaks. It was made initially as an April Fool's joke but it was too good to be kept just as a joke. So it became an integral part of this trace. And in the beginning, you had to see some actual trace output before seeing this tip. But now, the latest release, you can just see the tip without tracing anything. You can specify which particular tips you would like to see but by default, you will show you some random tip which is kind of nice. So let's have a look how one of the funnest tips looks like. Okay, tip number 31 says, medicinal effects of a trace can be achieved by invoking it with the following set of options. Medicinal effects of a trace. What? Actually, this phrase was coined by somebody who really used this phrase to request some feature, how to use a trace to make programs that don't work, actually work. Because sometimes, some buggy programs don't work in a regular way but for some reason, because they are too buggy, they work just under a trace. So that person wanted this to be documented. And now he or she, I don't know, should be happy. This is documented. There is something in this idea because what a trace does in this mode, it doesn't do any printing. All it does is it makes the traces stop twice on every system call. So this affects the order in which programs are executing their system calls. So less traces, slower execution and some bugs don't manifest itself. Okay, so maybe the last thing I wanted to say is tomorrow we can attend a few more talks in this kind of unofficial mini-conference about a trace. Tomorrow, Eugene would be talking about current state of netlink decoding. And Renault would be talking about using a trace to troubleshoot issues. This is going to be in room G202. I don't know where this room is but please find it. So thank you and I'm ready to answer your questions. It's not that easy to tell because, yeah, okay, great. Thank you so much. So the question was how many tips a trace currently has. And the answer is that it's not specified and it's subject to grow. And you can't easily tell because if you specify the number greater than it is, it would round up to the feature number. Well, it just rounds up. So you can easily tell. If you watch tip one by one, you would notice that you've already seen this. And this probably means that you've seen all the tips. But maybe the trace version is not fresh enough. Maybe the next version will add some more. So these numbers are kind of stable but we don't promise this. In every new version, you should check all these tips once more. Any more questions? Yes, please. There is a format for none. How does it ask for a tip with the format none? This means that the question was what would happen if tips would be called the parameter none. It's the way to turn tips off, I think. Print no tip. Very simple. This means that you can specify tips and then tips equals none and then tips and it will own all the latest you specify. This is the way. So, Eugene wanted to add that they use, we follow this behavior with most of the options, or maybe all options, for the reason that if you use some errors or a wrapper that uses a trace, so you could always specify something on top and it would overwrite. We have a few more minutes for any questions. I think so. Yeah, the question was, can the trace filter for a specific earner? I don't think, no. No, I don't think you can. But I don't know why. It makes sense. Maybe nobody yet came up with a bunch. It could be the reason. Sorry, I didn't get the question. What do you want to replace with what? Okay, so, it doesn't matter. Yeah, I think, so the question was whether poor conjecture or any other injection could change system call arguments. And we don't yet have this for some reason, maybe for the same reason. Nobody yet contributed anything. Eugen says that he believes it's not possible to every architecture, but we don't have to support feature for every architecture. For example, SICOM BPF programs are probably not supported on some architectures that don't support SICOM BPF. So, this shouldn't be a big problem. Yeah, but we would have to come up with an interface and somebody would have to implement this, I think. Or, unless the feature would implement itself. I don't know. Okay, any more questions? No? If there are no more questions, then we'll probably can say thank you.