 Hello, everyone. Welcome to my talk about static analysis tools for embedded systems and how to use them. I hope you enjoyed ELCE 2020 online so far. A quick introduction. My name is Jan Simon-Möller. I'm the release manager for Automotive Grade Linux, and I'm also a board member of the Octoproject representing AGL there. You can reach me on IRC or through email if you have any follow-up questions. In this talk, I want to introduce static analysis and show that how it can be used, what it produces, and how to embed this later on in your Octobase project. So I'll start with some introduction and motivation. I'll discuss some differences about kernel versus user space, what tooling exists there. I'll give a quick intro how to use tools locally during development so you can go right-hand on and try that. And the main part is then about two projects. One is called META SCA. It's a collection of tools that include static analysis and about META code scanner, which integrates the code scanner projects into the Octobase build process. In the end, we'll do a couple Q&A. So the motivation behind all of this, what is static analysis? So static analysis means we do analyze a program without actually executing it. So in general, most of the time that means static analysis runs at build time. Most of the time, for example, through your compiler because the compiler does some sort of analysis anyway, so why not use that right away? Or there are tools that will scan your source code in different ways. In contrast, dynamic analysis is usually done by executing the program in question, which of course catches a different set of issues, mainly multithreading issues. So things that you really only see at runtime. Static analysis becomes an important topic once you need to deal with functional safety aspects and when you need to prove that your code meets certain criteria. So this can be done with static analysis tools and basically you can create the necessary documentation out of that. That is the case in automation or automotive. So in the case of AGL or the ELISA project, we have to fulfill and document requirements on code quality for our own code and we need to do certain checks on open source code that is reused. There can be different levels of requirements based on function you need to provide, but in general that is the need. The goal of this talk is to show you ways to do this using open source tooling and introduce and evangelize the tools that are available. So first of all, I will introduce some basics, introduce some of the tools that exist and then go into how to embed this into your OBST project. First a few words around kernel versus user space. So the kernel is a very big project and it has kind of a special code base. Don't quote me on the number, probably I got it wrong, but there are millions of lines of code in the kernel, which means it's very demanding on the tooling used if you do such an analysis. Just by the sheer amount of code within the kernel. Also, there are specialized tools kernel and if you work with the kernel, one quite common tool is the check patch script, which is used by a lot of maintainers to check the incoming patches for basic style and kind of the usual issues that you see in code. This is based on kind of string matching and it's a very good thing if you check your patch to be submitted before you upload or mail it with check patch. The compilers, especially Clang, have a long history meanwhile on dealing with analyzing the Linux kernel and they can be used with their built-in methods. I'll introduce these later. There are three more quite common tools. One is a tool called sparse, which you can call with the command line show here. There is smatch, which is a generic framework. You can actually write your own rule set like here minus p equals kernel. I think there is a rule set for wine. There are different rule sets already available. This is kind of a framework to do your own analysis plugins if you have special requirements. Finally, Cochinel, which is a very powerful tool to match on replace code patterns. It is used in the kernel to do large replacement operations. But it can also just warn on certain patterns. There is a pattern database invoked shown command line. Of course, there are then also proprietary tools, but those I won't cover here in this talk. For user space, there are a large number of tools available. Some are old, some are new. I will cover a selection here and show how to use these. The first three GCC, Clang, CPP check are quite common tools for COC++ based projects. GCC and Clang as compilers, they do the analysis internally anyway. So this is merely enabling capabilities within the compilers, which makes it quite easy to use, frankly. You just have to enable the appropriate flags. CPP check is then a standalone tool which allows you to scan your source code upfront or standalone. Same with Floorfinder, Reds and Splint. Floorfinder has a little more eye on security issues than Bear. So locally, during your development, it is very easy to enable static analysis functionality. I'll show three tools a little deeper, GCC, Clang and CPP check. So GCC has, since GCC 10, a new flag called F Analyzer. And if Analyzer will enable certain warnings, that can be issues. Clang has, since very early times, the scan build tool set. So this is easy to just take scan build and then run your make command, compile the code. And CPP check in the end will scan your source tree standalone. This is how you can use the new F Analyzer. In this case, we also enable minus W error, so any warning will be treated as error. So this will abort if any issue is found. And you see, meanwhile, there is a lot of progress on making the output very intuitive and useful to the developer. So this is a very neat feature that GCC has developed me. Clang has quite a few tools that allow us to do static analysis. One is Clang Tidy, which will scan your source code and report any issues found on the command line. There are other clang-based tools. One is scan build, but I'll show these. Here's scan build. So scan build will be used on a complete project in this case. So we'll call a make file, which will do all the compilation steps. The make file needs to use the variables like cc $cc. So it can be overwritten by scan build. So don't hard code any compiler name in your make files. And then you can use scan build as shown here. Scan build will do two things. It will output the issues found much like Clang Tidy, which is kind of the console-only command line tool. And scan build takes this one step further. It will report and it will generate a browsable report out of the findings, which can be useful for larger code bases. And you can then browse this in a browser window and it will show you the individual issues found in a graphical UI. So this is quite handy for larger projects. Finally, CPP check. So CPP check is a standalone tool. It will not compile the code, but it will warn on issues found in the code base. So this is useful in a CI setup when you don't want to build the code to scan it. You could just run this report back right away. So after the quick introduction, go over and introduce Meta SCA. Meta SCA is a collection of tools which will do static analysis, but they will also do linting and other things that you can use in your Open Embedded or Yocto-Based project. As I said, it's an Open Embedded or Yocto-Project compatible layer. It's a collection of multiple tools around source code analysis. The project has zero impact, which means all operations are done at build time and none of the code reaches the target file system. So this is quite important once you do production, once you want to scan a production code base and not mod. It provides a consistent way how to configure all the tools included. And it also has a unified output. There are parsers available for command line and a simple static pre-rendered web UI. It's being developed and maintained by Konrad Weiman and you can find it on GitHub. This is the repo. It supports the usual branches of Yocto, Master, Dunfel, the LTS branch and so on. So this is very up to date. I mentioned already there is a web UI that can be used. So this is statically rendered from your generated data. As you can see here, there are lots of scanners integrated and lots of tools. Being it, we already mentioned CppJax, CppLint, Clang, but also scanners for bash, for scripts, for Ansible, for BitBake itself. So there are a big number of tools integrated here so you basically can pick and choose. This is very useful and you see there are tools around static analysis. There are tools around CVEs. There are tools for other languages than C and C++, Python, Go, JavaScript. So this is a very, very extensive collection. Beside programming languages, there is also tools for kernel config hardening. There is tools for license checking available, coding metrics. So if you are looking for something useful, a tool that will. And the good point is that Meta SCA will integrate and allow you to configure all of this through BitBake variables in your local.conf. So in total, we have about 87 to 90 options, which is quite a lot. So to summarize, we have a large number of languages supported ranging from C, C++, Python, Perl, PHP, JavaScript, Go, Lua. Everything is in there. We have spelling checkers, we have metrics, and we have multiple scopes covered. So if you want to find security checkers in there, you will find functional checkers in there, you will find style checkers in there. Now, I also want you to get your hands dirty and try this out. So here's a quick step-by-step guide, which will do a test run on the master branch. But you can also just jump on the Dunfell branches of all the repos as well if you want the stable release of everything. So TLDR, we need Meta SCA, but we also need Meta Clang, which is optional but provides the clang-based checkers. And we will clone Poki. Alternatively, you can also pull down and embed it. We create a project, we add the layers Meta Clang and Meta SCA, and now next we will edit our conflocal.con. So by adding the layer, Meta SCA will not enable anything by default, which is a good thing. Because meanwhile you tend to have multiple layers in a stack and the testing layer should not modify any behavior by itself. First of all, we will have to inherit a class. So we inherit SCA. That alone is not enough. We will have to throw a switch. So SCA enable is the switch. We can do that either at the global level or we can do that at the recipe level. So that is up to us. There is also an option available to skip certain layers. Once we enable it globally, we can also exempt certain layers by name. There are options to make the scan automatically enabled on an image or on a recipe. We can accept certain licenses if necessary in our scan procedures. And finally, we do define which modules are available. For example, we would enable the RATS scanner, we would enable CLANG static analysis, and we would enable CVE check as example. So those modules are available, which means tooling will be built. And then we say for recipes, we do run these three tools RATS CLANG CVE check. There is another enabled modules by image, which lets us do image-based things as well. There are also ways to exempt packages by tool that we run. So for example, in this case, we would skip the Lipsy headers, GCC Lipsy. They are quite large and for this test, I will exempt those big packages. Finally, we do run BitBakeCoreImageMinimal, which will then kick the whole process off. Once everything is done, we will have a result folder deployed in tempDeployImages QMUX8664 SCA. And each package will have its own result file then in there. There is a tool included in Meta SCA, which will parse out the result files and make them readable on the command line. Result files are in JSON format and this will make it readable on the terminal. So for example, we pick one package base past WD and you will see the format is tool at package name and then the method. Also, there is a script to generate a static web page, which will show the issues found in a more graphical. So let's summarize. So Meta SCA can be used to easily instrument your whole project builds. It can be used for linting and format checks in your CI and there's a lot of preintegrated tools to choose from and it also has a unified report form. On the flip side, you will have to post-process the reports in some way that fits your needs, but that's up to you. Either the command line report, the web report, or you do some custom post-processing. Next, let me introduce a project called Meta Code Checker, which I did. It is the BitBake integration for a tool called Code Checker that you find on the right side. So what is Meta Code Checker? It integrates the Code Checker tool that Ericsson develops on GitHub. Code Checker is basically the successor of the scan build tooling. So it's a scan build on steroids. It's a collection of tools and it will do two things. A, it will intercept and lock the compiler calls. With that captured log, we can analyze the data by running ClangTidy and ClangSA, Clang Static Analysis. Which means we do, for example, any compile run. We will lock what compiler calls were executed with which options. And in a second pass, we will then rerun the compilation calls, but with Clang. In the end, we get a report, which is either static HTML, or we can upload the data to a database with Webfront. The project is on GitHub and it's very actively developed. The documentation is on Read the Docs. So that's very easy to come by. What's not that easy is how you compile the tooling. The scanner is fine. The web tooling is a little bit complicated. As they are Docker containers provided. This is an example for the web UI. It's pretty slick. You see the packages scanned here. You see the number of reports. You can review those. You have different products. You have different categories. So that will help you manage the amount of data that comes in. Now, how can you use that on your local projects? Once you have code checker built and in your path, mainly they are Python helpers. So it is rather easy to set up. You run code checker in your project. First, there's a log path, which writes out its finding to... This works in a way that we preload the library and this is a logger which stores the compiler command. For example, we do gcc minus i something minus f what not fubar.c. Then the logger will find, okay, we are executing gcc. We can even influence that with an environment variable and say, okay, we called compiler with minus i minus f something fubar.c. So we'll record all the options that were executed and which file name and so on. That will be written to compilation.json. With those extracted commands, we can then replay the compilation using Clang and its tools and do a full analysis pass. The output will then be in a folder called reports. And now from there, we can either parse findings on the command line. We can produce an HTML report or we can use the store command to upload it to the web UI which can be the Docker compiler on your local... the Docker container on your local machine. It would be this. And from there, you can inspect the findings. So for example, here is one thing which is 32 levels deep. I'm not sure I would catch something like this traveling a couple of times up and down the code base. Now, I need to run this against a whole project and not against a single source tree. I have actually multiple source trees which are built with BitBake. So I created an integration for BitBake. There is some integration hints mentioned on the website, but they will basically preload the library before BitBake runs. So we will lock any... Now, if you run BitBake, you will not only build target binaries, you will also do helpers on the host. So you would have all of these in the reports and it's not split by package. So the integration shown here does that by package and we can... is done during the build. We can write out the HTML reports. We can upload to the database. And we come with as many batteries included as we need and we build all necessary tools on the fly, which means the layer needs MetaClang, MetaOE, Metapyton to be present. So here's a step-by-step guide how to do that locally BitBake. So first of all, we need MetaClang. We need some... we need MetaOpen Embedded pulled down and we need Meta... We need Pokey. We create a new project and we add the layers. Clang first, then MetaOE, then Metapyton and then MetaCodeChecker. So here's how you enable CodeChecker. First of all, you need to inherit the class. Next, we will have to enable it specifically. It's very similar to how MetaSCA works. So we need to inherit the class and we need to enable it. Now, I do not care in my case for the packages that run on the host. So I do not need it to run on the native builds. So this would be class native. Instead, I only care about packages that end up on the target. That's why I enable it for the class target. I also exempt Clang. It's a very large code base but used for development mainly. So I exempt that. And I enable the HTML report. There is also the flag how to upload it. You'll find that in the readme. So here in this example, we'll create local HTML reports and you can browse them later on. You kick off the build with BitBake and then all your results will be in temp deploy CodeChecker. We enable the HTML reports so you will have subfolders per package containing the HTML reports. Let's summarize. CodeChecker can be used by developers and in CI as well. The complexity is hidden by the preloaded logger which is even configurable. The workflow is straightforward and there are parsers available into multiple formats. Also, WebUI is available to store your results for later review. BitBake integration is also available using the MetaCodeChecker layer. The documentation is good but here and there are a few pages with issues. I'm sure patches are welcome on the upstream GitHub site. So as summary, we can say use static analysis and it will help you to improve your code base. With GCC 10, it's very easy to use locally for development. Just enable if analyze. Also, you should enable minus W error and you can also integrate these easily using the open embedded or Yocto layers shown here in this talk. So as a look out, I want to promote the use of these tools in the various projects, raise the bar in open source in general and I will also enhance MetaCodeChecker further and update it further. If you have any questions, suggestions and so on, please try it. I'm happy about any feedback on the MetaCodeChecker layer. Thank you for joining my talk and I hope you have a nice rest of the conference.