 Hello, I'm Kamil Dukáv from RedHead. I work on static analysis of RedHead Enterprise Linux and in this talk I will present an innovative solution to dynamic analysis of RPM packages. The presented solution was developed as part of the AlphaWare project, which means automation of formal verification. This is a quick introduction to analysis of code. Both static and dynamic analyzers are tools that help us to find programming mistakes, which we also call bugs. Static analyzers do not execute the code. They just look at the source code and tell us, hey, this is not going to work. Static analyzers can now be embedded into compilers. You probably know the static analyzer embedded in C-Lang or the recently introduced GCC-F analyzer. Then we have standalone static analyzers like CPPCheck or ShellCheck, which are open source or Coverity, which is proprietary. Unlike static analyzers, dynamic analyzers need to execute the code. They execute the code in a modified runtime environment to extract the properties we are interested in. They can also be embedded in compilers, like, for example, address sanitizer, red sanitizer, and so on. These analyzers are usually easier to automate. Then we have many standalone dynamic analyzers, but in this talk we will focus on Volgrind and S-Trace. Volgrind is a tool for finding memory management bugs, while S-Trace is a tool that records Cisco's associated with running process. So we know static and dynamic analyzers. Let's have a look at analysis of RPM packages. Static analysis of RPM packages is easy. We have a tool named CSMoc, which takes a source RPM package and returns a list of potential programming mistakes detected by static analyzers. CSMoc provides an interface for plugins. The small blue boxes in this picture are CSMoc plugins. Each of them is responsible for a single static analyzer. The plugin interface makes it easy to plug additional static analyzers as needed. But in this talk we are focusing on dynamic analysis. So can we implement dynamic analysis plugins for CSMoc? The answer is yes, but it is not so easy as one would expect. Building a package is not sufficient to get results of dynamic analyzers. In order to get the results, we need to run the binaries that we have built. Luckily modern RPM packages run tests during the build, so we would like to run these tests through dynamic analyzers. Here is an example with Logrotate, which is a package that I maintain in Fedora. You can use the comments above to explore the spec file. There is a build section, which specifies how the package should be built when we create binary RPM packages. What is even more important for dynamic analyzers is the check section, which specifies how the binaries should be tested after they are built. If we run the comments below, it will automatically build the binaries and run the tests as it is instructed in the spec file. So we have an interface to execute the check section of RPM packages. Dynamic analyzers usually support tracing of child processes, so we can simply combine it together and dynamically analyze the whole RPM process tree top-down. If everything goes well, the binaries we are interested in will also be dynamically analyzed at some point. But is this a good solution to our problem? No, because this way we are dynamically analyzing also RPM build, bash, GNUmake and all the system provided binaries that get into our way. Consequently, it takes ages to complete and we get a lot of reports unrelated to RPM package being analyzed. So we need to do something better. And here comes the innovation. We can make RPM build produce binaries that will automatically launch a dynamic analyzer for themselves each time the binaries are executed. Thanks to compiler wrappers we developed originally for static analyzers it is really easy as you can see in this example. The first line inserts a compiler wrapper into the search path in front of the real compiler. The second line makes the wrapper append a custom compiler flag, which is in fact a linker flag. Finally, the third line causes that all the binaries produced in the build section will run through Volgrand when they are executed in the check section. This is exactly what we need it. Does it work? The core idea in my example was using a custom program interpreter. Let's quickly explain what a program interpreter actually is. I guess you know she bangs from shell or Python scripts. If we look at the first line of user bin jam we can see hash exclamation mark followed by user bin Python 3. This line tells the operating system interpreter should be used to interpret the script. So when we run user bin jam the system reads the shebang and executes user bin Python 3 passing it user bin jam as the first argument. I believe this is well known. What could be more surprising is that binary executables on Linux also have a program interpreter specified in their header. When we built an RPM package natively it produces binaries where the else interpreter is set to the system dynamic linker. However, we can also set a custom else interpreter when linking the binary. And that is exactly what I did in my example. I set a custom interpreter that took care of launching a dynamic analyzer each time the binary was executed. Our custom program interpreter is in fact a wrapper of the system dynamic linker. The wrapper is called csexec. We can use the csexecwrap cmd variable at runtime to specify a dynamic analyzer to use. If the variable is unset the binaries are executed natively. In any case csexec runs the system dynamic linker explicitly. Otherwise csexec would end up in an infinite loop recursively invoking itself. So if we execute our log rotate binary with custom left interpreter csexec executes the system dynamic linker through Walgreen giving it path of the original executable as the first argument. This implementation detail is usually not observable by the test programs that we dynamically analyze. However, some test programs are sensitive to the value of argv0 so we need it to extend the system dynamic linker to take the dash-dash argv0 option in order to make this work properly. Similarly, some programs are sensitive to target of protself-excessive link which unexpectedly points to system dynamic linker in our case. So we need it to emulate calls of read link to pretend the original target which is the binary being analyzed. I think I have said enough of implementation details. Let's have a look at the results of our experiments. The main goal has been accomplished. We have no unrelated bug reports in the output. Of course, if the rpm package uses a buggy library that leaks memory it can still be reported by a Walgreen. On the other hand, Walgreen will not complain about memory leaks of Bash or GNUmake just because we use them to run the test programs. There is no observable performance impact of CS exec itself. Of course, if we run test programs through Walgreen they run significantly slower in some cases but there is not much we can do about it. CS exec works well with commonly used testing frameworks which means it does not make them fail just because we execute the test programs through CS exec. We were able to successfully run the whole upstream test suite of GNU Core Utils with CS exec only. If we enable Walgreen it causes some tests to fail and we are simply not designed to work with Walgreen. For example, there is a test that verifies the count of open file descriptors. The problem is that Walgreen uses some extra file descriptor for itself which breaks the test. Another example is a test that intentionally sets a non-existing pass as a temp there to verify what is handling in the program. Unfortunately, this also prevents Walgreen from starting which in turn results in artificial failure of the test. Now back to CS mock. The tool originally developed for static analysis of Arpium packages. We have implemented experimental plugins for Walgreen and S-Trace. These plugins are now available in Fedora and Apple. Thanks to these plugins, dynamic analysis of Arpium packages is now available through a really simple user interface. You can just pick a source Arpium package that runs tests during the build and throw it at CS mock together with a build root where the package can be built. Optionally, you can specify some extra arguments for the dynamic analyzers. This is the time for a quick demo. Hopefully, you can see my terminal. I think I will get a source Arpium of the latest build of Plogrotate. I can use the Koji client for these tasks and Koji download build command. So I have the source Arpium package and give it to CS mock. Enable the Walgreen plugin and specify the build root. And that's all! CS mock will now run fully automatically and give us the results from Walgreen on the run of the upstream test suite. You probably know the mock tool for building Arpium packages. CS mock is very similar. The only difference is that CS mock does not produce any binary packages. It just gives us the list of potential packs in the package. And it seems to take slightly more time than expected so I will pause the recording for a moment. Sorry for the gap. My home office connection is too weak to download all the dependencies of logrotate in a reasonable time but now it seems to progress. We can see that Walgreen is being downloaded and installed into the build root. CS mock then applies some workarounds for broken packages which we sometimes need to deal with. Now it installs CSD into the build root. It is a tool that can recognize the output of various static analyzers and for example compare a pair of old and new builds and give us a different. Now CS mock is copying some files into the build root which are needed to drive the analysis. Now the build of logrotate is finally starting. This is the build section. We can see the Autocon checks running. As you can see it runs as fast as it would run natively. So the build is complete and now finally the run of the check section is starting. At first glance the output looks as usually it is just a bit slower because all invocations of logrotate are read by Walgreen which is not visible in this output. If you are wondering why the test numbers are mixed up it's because the tests run in parallel it means that Walgreen also runs in parallel. So if we use multi-core CPU we get the results faster. As you can see some test cases take slightly longer to complete if they run through Walgreen we can use this time to have a look what happens behind the scene. Increase the font. So this is the process tree of CS mock where we can see the build as if the build ran natively the only difference is the Walgreen here at the leaf processes. We can see that only one instance of Walgreen is running so this seems to be the most expensive test case of logrotate. It has already consumed more of machine time. So let's wait slightly longer for it to complete. We can see that we ask Walgreen to produce XML output which is better if it needs to be machine readable than the default output and I think it has finished so we can switch back. Now we can see that CS mock picks the results out of the build root and runs some final cleanup to save disk space. So let's give it a second to complete and we got the results. So we can unpack the results archive and we can see some files captured by Walgreen that is one file per a process annotated by its PID so we can have a look at arbitrary one let's take this one we can see the captured XML file and if we look here we can see that the report is really about the logrotate binary so we can have a look what happened some memory was leaked by logrotate so now we can analyze the backtrace provided by Walgreen. Let me switch back to my presentation there is a more detailed demo available at the URL below if you are interested although the presented solution works fine there is still some room for improvement there is a work in progress patch for CSDF to parse Walgreen's XML output the goal is to present Walgreen's results in a similar way as we present the results of static analyzers. Developers are used to report spins to source code rather than PID oriented reports about processes that don't exist anymore we could also port CS exec to more architectures for now CS exec runs on 64 bit Intel only it is not yet supported to use multiple dynamic analyzers in a single run of CS MOOC this is going to be resolved by extending CS MOOC to be able to run the check section by a full rebuild of the package the origin motivation for this effort was formal verification of RPM packages which we experiment with in the alphaware project alphaware means automation of formal verification it is a project supported by technology agency of the Czech Republic we are going to develop CS MOOC plugins for formal verification tools namely for symbiotic, divine and CBMC based on our experiments the formal verification tools will be improved to handle more RPM packages but this is a task for universities which develop the verification tools that's all for me thank you for listening, I'm ready to take questions