 Thank you for listening to this presentation, Estrates. Before we start, a few words about myself. I'm working as a software maintenance engineer at Red Hat since four years, remotely from Global in the French Alps. My area of expertise is everything dealing with user space, which is something like 1,000 package distributed by Red Hat Enterprise Linux. I daily use Estrates to troubleshoot customers' problems, in particular when it deals with slownesses. For example, connecting using SSH taking 10 seconds or more. Hanks of commands, for example, DF for PS just hanging for no obvious reason, or just unexpected inactivity of services. For example, S is not reading logs from the journal or from files. Estrates is my favorite tool for this kind of issue. Let's go to the agenda. First, I will briefly introduce Estrates and the option I like to use. Then go through some very limited technical details, which are really necessary to understand. I assume here that you have some knowledge of system programming in C or Python programming. I will then tell you when Estrates may be of some help and why not, depending on the problem to troubleshoot. I will go through some Estrates output analysis and, finally, I will present you six examples of Estrates usage. Estrates is a tool that prints system calls executed by the process being monitored. It prints both system calls arguments and return value. There are several options available, including the option to inject false. But this is not the subject of this talk. So one I use by default are shown below in red. Minus F to flow of children, minus TTT to print nice human readable timestamps, and times when the C is called. Minus V to show structures unabbreviated. It's the kind of verbose mode. Minus S with a value, usually 1k, to record up to 1k characters in strings. Estrates is all about C schools. In brief, C schools are a way for a user process to interact with the kernel. Every time a user process needs to access a resource, for example, a file or a socket, it needs to use a C school. However, the application developer usually doesn't call C school directly. But through the corresponding gdc function, we usually expect very similar functional arguments. For example, the application programmer will use open, which maps to C is open. The difference is the gdc function does more and more checking and is platform independent. Most of gdc functions are mapped to C schools in a one-to-one manner, expect a few, such as for and v for that map to clone on Rails 7 and later. Hence, getting information on C schools is very similar to checking the corresponding gdc function man page in section 2. To get the list of C schools for your system, check the C schools man page in section 2. Let's know how the brief overview of how Estrates works. Estrates uses the P-Trace C school interface, which basically sets breakpoint in C schools of the callee. See the man page in section 2 for details. Once in place, every time a C school is entered, the process stops and Estrates is notified. Estrates collects the information about the C schools, then tells the process to continue. Upon C school returning, again the process stops and Estrates is notified. Estrates collects a return value of the C school and computes the time spent and tells the process to continue. Finally, Estrates prints the collected data to standard error or more usually to a file in music using the minus O option. The consequences of using P-Trace are multiple. Firstly, the process being monitored is sterilized. Every time P-Trace stops the process, all the threads stops as well. Secondly, if it slows down and the monitor process execution by a factor of 10 or more. Thirdly, you cannot have multiple programs monitoring the same process. For example, you cannot have gdv and Estrates attached to the same process. And finally, later we'll see that output of Estrates can be quite difficult to read. Now let's see when Estrates will help. Estrates will help whenever C schools are involved. When a command is hanging or is executing slowly when a process seems to hang waiting for something on the network. Additionally, you can do much more. It's very helpful when you want to understand what a program does. For example, you can see which files are being opened, written, which libraries are being loaded, et cetera. You can also observe the communication between processes. For example, you can monitor system CTL and system D and see how they talk to each other to do some kind of reverse engineering. This is something I frequently do when having to handle an issue with a program I don't know anything about. It's usually easier than checking directly the source code. Finally, with the help of the source code, you can usually determine more easily which error condition happened and made the program fail. This isn't an exhaustive list. There are also cases where Estrates won't help you, but sometimes it can still give you some hints about the root cause. It won't help you much if a program is hanging in a C school and the C school never returns for the corner. Here you will have to use kernel tools such as traceCMD or system tab. Note, however, that still Estrates gave you a hint that the issue is within the kernel somehow. When a program doesn't call C schools and does computation on spins on the CPU, Estrates is definitely of no help. Using P stack or G core is more appropriate. Another scenario where it won't help much is when the issue is due to a race condition. For example, start-up of services, race together and lead to errors. Still, if you Estrates both services and see that the issue goes away, you can then guess that the root cause is slightly a race condition which doesn't show up because Estrates throws down the processes too much. When a program doesn't call, you need to use GDB. Finally, when a program exits with no obvious reason, for example, with no error message at all, Estrates won't help much. Still, checking the failing C schools can give you some hints from time to time. It really depends on what the program does. To conclude this part, this is a recommended option for troubleshooting. To Estrates a program and its children, use Estrates minus FTTT, VYY, and a decent string buffer of 1K. 1K should be sufficient. You can reduce this to 200 bytes. If Estrates will be long running and the program being monitored has a lot of C school activity, for example, reading, writing a lot on the network of two files. When attaching to existing processes using minus P, don't forget that existing children won't be monitored. You need to attach to the existing children as well. Finally, when monitoring a command that becomes root, for example, sudo some command, you need to have Estrates minus root. I will show you later a convenient way to monitor sudo. This is required or else, once the process becomes root, you will lose permissions to monitor it. In this part of the presentation, I will now have a quick look at what Estrates produces. First, when monitoring a single process, then when the monitor process has multiple threads or multiple process are monitored simultaneously. I will also talk about some C school errors and give you some more information about signals. Below is how the Estrates output looks like when using my standard options. First, you have the PID, which isn't the PID itself but the thread ID. For the main thread, it's the same. You then have a timestamp in human readable form, typically our minutes, seconds and microseconds. Then you have the C school name open, for example, as the C school parameters. Upon C school returning, Estrates will print an equal sign and the result with additional details in case of error. Finally, Estrates prints the time span in the C school. If you are using VIM, you can enable colorization using set phi type equal Estrates. Let's see the example below. This is an Estrates of Estrates log, main thread being 1086. Here, we see that Estrates is calling the policy school on an array of five descriptors. The array contains only one five descriptor, which has number three. And it is an anonymous iNode, so here it's not a file on the file system. Poll is used to wait for data on five descriptors. Here, Estrates log is interested in incoming data, poll in. It returns after 900 milliseconds, with return value zero, which means timeout. Later, we see Estrates log does a read on that same file descriptor, the read failed, and erno was set to e again, which happens when read would block and the file descriptor is configured to not block. This means that Estrates log needs to read again and wait for an incoming event. The last line, we see it read on the file descriptor again and this was a success. There were 32 bytes returned by read and these bytes are shown in the buffer passed to read here, a second argument. Basically here, it read some data about the journal. The last parameter to read is the size of the buffer and we see from here it was set to 272. As you can see, Estrates shows us a lot of potentially useful data. This was the easy case, when only one process is monitored and there is only one thread. When it's not the case, we'll have a lot of unfinished resume lines as shown below. This happens because Estrates prints timestamps in growing order. Let's see what we have here. We have PID 1088, which is a thread of Estrates log, enters the few text is called. Few text is a fast log from user space. While the few text was being grabbed, MainStrain1086 did a poll. On next line, while the MainStrain is doing the poll, we see the few text is all returned, and we see few text resumed. Later again, the poll sees called return and apparently there was nothing on the file descriptor free on the second line here. This can be hard to read. To help you, usually you will filter on a specific PID using a grep command and stick to that PID. For example, with Estates log, if you are interested in the MainStrain only, you can just filter out all the rest. As you could see in the previous slide, we had read, return, and error, here again. Is having a sees call return and error really an issue? Well, it depends. You need to understand the program internals. They are well-known benign pathologies. For example, E again in int, E restart sees are supposed to happen normally. You can check the main page of the sees call failing with this kind of error. Read, for example, will return E again if it's in non-blocking mode. It's then up to the application developer to handle this kind of error and likely retry the sees call. When you have a new end, which means no such file directory, it's usually not an issue, but typically when it's a library linker that produces it. The library linker will search, for example, for ld.sodoc preload, which doesn't exist by default. When executing a program foo bash foo, for example, bash will search for the program in values location in your path. USR local bin, USR has been, then USR been. So it's expected to return in or end when stating the location and the file doesn't exist. Similarly, the linker will try to open libraries in values location and you will see in or end if the library doesn't exist in that location. When you get hyper, which is permission denied, there is likely an issue, but while it depends on how the program effectively works. Finally, a stress can show you information about signals. Whenever a signal is received, S3S records it in a line starting with dash dash dash. There is one exception. It's with sick kill, which cannot be code. So you will instead have a plus plus plus line and see that the process got killed by the signal as shown below. If the process received a signal from another process, for example, sick term below, you will see who sends the signal. Here, PID1. In this example, I told ACS log to stop for executing system CTL stop ACS log, which resulted in system D, PID1, to send a sick term signal to ACS log. Sometimes the signal is sent by the kernel. Usually this will make the process die. In that case here, you will see the process exit with the plus plus plus killed by. And this example here, we see the kernels send a six sec V reason being an issue accessing memory. This typically happens when the process tries to dereference an invalid pointer. Here it was the null pointer. Let's move to the last part of the presentation. Examples. I collected six examples of real use cases to show you the various techniques to troubleshoot. In the first example, we'll be dealing with a command slowness. The customer was complaining that executing DF would take 20 seconds or more, whereas usually the execution is immediate. I hence requested to collect an S trace of the slow command. Checking the S trace, we can see that the first line is exact VE, which means execute the command in argument. Checking the exact VE function prototype, we can see three arguments. First, the binary name, then the argument passed, including as first argument the program name, which is used when the program wants to appear as another program. And finally, we have the arguments as last arguments, the environment. The example we see is that the environment contains LD library pass, which is used to load libraries from non-standard pass first, prior to using standard location, USR Lib64, for example. After the exact VE, we see the library linker try to load standard libraries such as LibC from various locations. First, the ones specified in LD library pass. The library is not found here, which is expected, that's not an issue. The issue here is more with the time spent to read the custom location. 440 milliseconds, which is definitely not expected. Later, more libraries are loaded and similar snowness happens, explaining why DF was so slow executing in the end. From the analysis, we know that there is something wrong reading this custom pass. This is all we can see with this phrase here. So next step was to check what was going on in the kernel and particularly for this file system. For the short story, the root cause was some invalid mount options for that pass, hosted on a cluster file system, which was creating this slowness. And the second example, we'll be dealing with SSH slowness. It was taking 25 seconds to connect to the remote system. When it comes to SSH, a stressing SSH itself is usually useless because most of the work is done on the SSH D side. SSH just encrypts what the user types and decrypts what is sent back by the backend. Typically, standard output answer error. So here you need to stress SSH D. There are two ways to do that. Either attach to main SSH D demon listening on port 22 or spawn a dedicated SSH D demon on some alternate port. Say 8022, which will be used for testing the connection to the system. The second method is preferred, but not always possible. Typically because it requires letting the firewall accept the connection on the custom port. If running SSH D on the alternate port is not possible, just attach to main SSH D as shown below. Then you connect using SSH, you experience the 25 second slowness before getting the prompt and you exit and you control C to stop a stress. If running SSH D on the alternate port, spawn it with minus DDD to let it handle only one single connection. Then you connect using SSH on the alternate port and exit once you get the prompt and SSH D will also exit. Now we have the stress to analyze. First check for accept this call, which is used to accept an incoming connection. Here we see that the connection was made by 182, 168, 1221 from port 60740. Note that when it's tracing the regular SSH D demon, other users may be connecting to the system at the time you make the test. Hence you need to find out which connections you are interested in. There may be multiple accept scores. You then need to check for a clone C score, which is used to force the SSH D child that will handle your connection. Here we have PID 23918 that will handle the connection. We need to filter on that PID. The child will spawn all the children, but these are usually not interesting. To filter on this PID, use the frame grab command. Note that when using the recommended way, SSH D on the alternate port, there is no clone C since SSH D directly handles the incoming connection. So there is no need to filter anything. It's easier. Let's see the output of the filtered S trace. At some point in time, you will see a send message on the UNIX socket, which is here a debuts request to system de-logging D that creates session. Later, we see a receive message on the socket, which fails with E again, because the socket is non-blocking. Finally, we see a poll on the UNIX socket, which ends up in timeout after 25 seconds. Then process continues and we get a prompt. From above, we see that the issue is with the debuts socket. In UNIX, SSH D waited up to 25 seconds to have the session created, but this never happened. The code is part of PAM system D, not SSH D itself. This means that system de-logging D, responsible for creating the user session, is not functional somehow. For the short story, the root cause was system de-logging D process being in a dead loop. In this third example, the customer was complaining that executing sudo su command call was slow. Because the command will execute as root, you need to use S trace as root. Hence, you cannot use S trace sudo, blah, blah, blah. Instead, you will illustrate the terminal running the sudo command. First, you get the PID of the bash that will run the command using equal dollar dollar. In that example, the PID was one, two, three, four. Then from another terminal running as root, execute S trace and attach to one, two, three, four. And of course, follow children using minus F flag. Finally, execute from the user terminal the sudo su command that is slow executing. And you then get a nice S trace with everything needed to dig more into it. In this fourth example, the customer was complaining that executing a script in the terminal was working fine, but not through cron. The easy way to debug this is to S trace the cron daemon. So you just need to attach to existing cron daemon, which has PID recorded usually in run cron d.pid. Then wait for the script to execute through cron. Stop S trace. Then filter on the PID of the script that executed and dig into the S trace. For the short story, the root cause was an invalid Excel Linux context on a file that the script was attempting to read and resulted in permission denied. This happened because when executing the script as a user, the Excel Linux context was unconfined. Whereas under cron, the Excel Linux context was limited. In this fifth example, the customer was complaining that executing is daemon manually was working, but not at the system d service. Similar to SSH, S tracing system CTL command is usually useless because system CTL just sends an order to system d that will do the job itself. Hence, it's wise to S trace system d and children instead. So just attach to PID one. Then start the failing service. This is the, for example, and finally stop S trace once the service failed. Again, you need to filter what you are interested in too. Similarly to SSHD, system d will first accept the connection from system CTL. This is done through the accept for syscall. Then you will have some clones. When executing a service, system d will spawn one child per exact start pre or exact start command. In the case of syslog, there is no exact start pre command, but only exact start command. So you just need to find which PID executes syslog itself. Here we have 24480. Because syslog has threads, you then need to collect all the children spawned by the main thread 24480 recursively. This can be achieved using the frame grep pipe grep command below, which will return you PIDs of children. Finally, filter on all the PIDs you got using the eGrep here to reconstruct your syslog S trace. You can then dig into the filtered S trace easily. The final example is how to S trace the boot itself. It's experimental and they break your system. Sometimes issues happen during the boot. For example, when there are race condition between services, all services require implicitly a device that didn't show up yet and is not mentioned in the service unit file. Again, we need here to S trace system d as soon as possible. This can be achieved using the following S trace system d service unit, which is also available on my public product space for convenience. Due to some deadlock issues, you cannot attach before system d UDFD executed. Once the service unit has been created, reload system d and enable the service. Then boot your system with additional flags S trace, but also enforcing equals zero. Otherwise, S trace wouldn't be able to monitor children of system d due to not executing in an unconfined context, but the context of system d, which is in it T and restricted. Note that your boot will slow down a lot because basically S trace will monitor almost every single service being executed and children. Once you reach the prompt and see your failing service, just stop the S trace service and dig it to the produce S trace in search run. I have a final slide that I would like to share. It's about S Linux integration in upstream S trace. I'm convinced that showing S Linux context monitor processes are running as maybe very useful when issues are related to S Linux. I've hence proposed a pull request to upstream with reference below. Unfortunately, my pull request is stuck waiting for final reviews from S trace powered developers. I don't manage to get traction on this since several months. In info, I'm pinging the mailing list regularly. Anyway, this pull request introduces a new SC Linux, SC context, sorry, option. With this option, you will see in which context the process is running as, as shown in the example below. This example is system D spawning SC log as a service. Initially, the child of system D, one, two, six, two, three, five, execute as in it T, the context of system D. Then, when exact V executes because SC log D program is labelled with system D exec T, as shown in the S trace here, we have the process change context to system D T. Same PID, different context. That's super convenient and likely very useful. For now, this patch S trace is available on my public reddit space. There is one build per reddit or federal release. So thank you for having listened to my presentation. If you have questions, don't hesitate to come back to me.