 few minutes after the official start. So welcome everyone. Thank you for coming. My name is Victor and I'm a software engineer at Red Hat. And today I'll present you the Difficult Project which is a project I'm working on. I'd like to mention also that besides being an engineer in Red Hat, I'm also a PhD student at the Brno University of Technology. I'm working in our Brno office in Czech Republic. And one of my goals or one of the things I'd like to work on is to bring methods and stuff from research and from science into industry into Red Hat. So this is what this project actually implements. So it's kind of bringing research to production code. So what the project is about, it's about what it says, automatically analyzing differences in kernel parameters. So let's dig a bit into what that means. What is our goal? Our goal is to check that a part of the kernel, we are talking about Linux kernel here, part of the kernel behaves the same between two different versions of the kernel. Well, how to do that? First obvious choice would be the traditional one, which is to write tests. So write a bunch of tests that will check whether the behavior of some part that you want to check is the same as the behavior of the new version. This won't work, but unfortunately you can never write as many tests to cover all possible behaviors of, for example, let's say a single function. Writing as many tests to have 100% coverage is nearly impossible, which means that we still can miss some behaviors using just testing. So instead, what we do in this project, we take a different approach and we use so-called static analysis. We compare the source code. We directly look at the source code and compare source code of the kernel versions. And by running this static analysis, we determine if there may be a difference in behavior of, for example, kernel functions. What is the motivation? Why would we want to do this analysis? Well, there are two main reasons, and those are stability and compatibility. For example, the first part, if you have some setting in a kernel, let's say you have a assist control option that you have set to some value. You want to upgrade the kernel to a new version, and you want to know if you can preserve the same setting. And if you preserve the same setting, if it does exactly the same that it did before. In such case, you want to know that the part of the kernel that is setting my influence behaves the same as it did before. Second example is, for example, that Red Hat guarantees something, the stability of something that is called kernel application binary interface, KBI, which is essentially a list of symbols, a list of mostly functions, that Red Hat guarantees to be stable between or across minor versions of Red Hat Enterprise Linux. This is another thing. We guarantee the stability, and we want to know when releasing a new version that these functions are really stable. Let me just set up what we are comparing here. Generally, we will be talking, and I will be talking about comparing functions. So we have a function in Linux kernel, and we want to know if the new implementation of the function, or the function in the new version of the kernel, behaves the same as it did before. As I said, in the previous slide, we also can compare settings, and usually settings in kernel are stored in the mean or are saved as global variables. If you set some setting, it's usually a global variable that gets set to certain value. In such case, when comparing effect of two global variables, we basically take all functions that can use that global variable and compare them pairwise. Here we have a slight advantage, and we don't have to compare whole functions. We only need to compare parts of those functions. The setting, the global variable, can influence. We don't have to look at those part of the functions that the value of the global variable cannot influence in control or data sensor. But basically, we reduce all to comparison functions, so in this talk, I will talk about comparison of functions. So the first obvious method when comparing source code would be, why don't we just use div1? Don't we just take the functions, div them, and see if there is any difference. Well, I have put here two functions. These are actually functions from KBI, and we have here a version of the same function from Rails 7.5 and Rails 7.6. The question is, are these the same? Do these functions have the same effect? Well, if we divs them, div would be empty. So it would say that, yes, they are. By looking at it, they really seem to be equal. They actually are, like, lexicographically, these are equal, as the div would prove. However, this function by web page calls another function, calls called maxSizeOffset, which is not necessarily the same. And in this case, it's really not. The return, or return expression that is returned from this function changes between the two versions. And in the new version, there is some one more function added to the original result, and we choose either the original result or some number of maximum sectors. So in this case, sorry, the problem requires at least checking of the call function. So it's not that easy. We cannot just do this plan if we have to understand the code at least to some extent. We can go even further. Again, the same question. We have two functions. Well, we have a single function, tcp-prop-init, in two different versions, Linux 3.10 and 4.11. And the question is, are these functions the same? Well, if we use div, if we look at it, they are not, because the new function implements some kind of check at the beginning, calls a macro that creates a bug if some unexpected situation happens. However, if you remember, I was talking about comparing kernel options, which are essentially kernel variable or global variables. In this case, what if we wanted to compare these not whole functions, but just functions with respect to the value of the global variable booth size. Let's say user, kernel user, has a possibility to set the value of the variable booth size to some value. We want to know if the value that he has set has the same effect on this particular function in both versions. In this case, if we only compare the functions with respect to the value of booth size, we will find out that they are exactly the same, because the only parts of the code that value of booth size may affect are these, these marked by red, and these two are exactly the same. So again, we see that we have to understand the code to some extent to be able to decide the equality or non-equality. Actually, we could go even further. This is not from kernel. These are two different implementations of C library function str pbrk, from dialypsy and open bsd. And the question again is the same. Are these two functions equal? So do they have the same effect? For every combination, if we send two, same arguments here and two same arguments here, will the result be always be the same? This is not so easy to see by just looking at the functions. We could certainly write some tests, but as I said, it would not cover all possible behaviors, probably. But actually, they are. And actually, there are tools that can prove this. This is exactly where science comes into play. And yeah, there are tools based on quite advanced scientific formal methods that can prove that these two functions are the same. They have exactly the same behavior for every possible combination of arguments. So in this case, the problem requires quite a deep understanding of the code. So if you want to build something like this, or a tool that does this automatically, we need a tool to understand the code. So how do we do it? Well, as we already said, using Div is not an option. It's not efficient. We could analyze the C code, yeah. That's what we're essentially doing. But analyzing C code directly would be quite impractical, because we would have to write a parser. Essentially, we would have to write a compiler to some representation that we would then compare. So yeah, a C code is quite complex. The other part, we could use a compiler and analyze the produced assembly code. We could do that. But there's another problem. It's to unstructured. There's a way too much information lost when compiling into assembly. And it would be quite difficult to compare directly assembly. So the solution is somewhere in the middle. And we use a compiler, but we use a compiler's internal representation. Every compiler, GCC, Clang, etc., traditionally translates the C code into an intermediate representation that they do a bunch of transformations, analysis, etc., and then they compile this into assembly for the processor. So what we do is that we use the compiler internal representation and we analyze it. We compare that. What are the advantages? Well, we get a source code parser for free because we use an off-the-shelf compiler. And still the internal representation is quite a structured representation that contains things such as types, etc. So we have much more information. We also have debugging symbols, etc. In this case, we use Clang or LLVM. So we don't use GCC. We use the default choice to compile a kernel. We use LLVM instead. Why do we do that? There are multiple benefits of using LLVM over GCC. The first one is it has quite well-structured and human-readable internal representation. When we talk about internal representation, we usually say it as LLVM IR. So when I say IR, it's the internal representation of LLVM. Here we have an example. There's a function that computes the absolute value of X. And here's the code transformed into LLVM IR. There are tools that enable you to visualize the code quite nicely. So it's very critical when debugging, when finding the problems, etc. Also, LLVM has quite a nice infrastructure containing many useful analysis and code transformations that are already built in. So we can use those. We don't have to write our own transformations of the code that we want to use because a lot of them are already built into LLVM. Also, it has quite a nice API. And last but not least, there's already a number of static analyzers that are built over LLVM. Both commercial ones, ones running some analyzers that are running on production code, also there are research or science tools that run over LLVM and implement advanced formulas, etc. So this is another advantage that we can build or we can use other's work we don't have to write in when we'll. So now let's get into how DiffCamp does the analysis. We do it in two phases. The first phase, so called generate phase, takes a kernel source, takes a list of parameters or KBI symbols or whatever we want to compare, and generates a so-called LLVM snapshot. LLVM snapshot is basically a set of files in LLVM internal representation that contain definitions of functions that we want to compare. Afterwards, in the second phase, we take the snapshot and we run the actual comparison. And the result is, well, either for each function that is in the snapshot or that we want to compare, either the effect of the function semantics or the function is the same or it is not the same. If it's not the same, we also provide an additional info to the user so that he knows where the actual difference is. Right, so generate phase, as I said, takes a source code and parameters and yields an LLVM snapshot. How do we do that? It's basically composed of two small phases. First of all, we need to find in which C source is those of the functions that we are interested in are defined. To this, we can use a well or quite widely used C scope tool, which for a function can give you a source where the function is defined, where the implementation of the function occurs. And afterwards, once we have the source, we compile it into the LLVM IR. To that, we use the client compiler, of course. We do that in such a way that we find a command that would be run by cable. Cable is the internal internal build system. We find a command that would be run by cable to build that file, and we replace GCC by client. This way, we get the same optimizations, the same include libraries, etc. Once we have the LLVM snapshot, we can go comparing. This is a phase that does most of the work. Before the comparison itself, we run a number of simplifications of code transformations that will, let's say, remove all the code that is not relevant for the analysis, and then we'll simplify the code so that it is much easier to be compared later, because the comparison itself is quite a difficult task. It takes some time, so the less code we need to compare, the faster it will be. Afterwards, we run the actual difference. In case the semantic diff says that the functions are equal, we are none. In case it says that they are not, we have to run the difference localization, or we try to find where the difference occurs, and supply to user as much information as possible so that he knows where the difference occurred. Code sim slicing and simplifying. This is a phase, as I said, to simplify the code as much as possible, and one of the main techniques that we are using is so-called code slicing. Code slicing is removing technique for removing all the code that is not relevant for the analysis. I'll get back to the example I showed in the beginning, and we have a function, but we know that we want to compare it or we will analyze it with respect to the value of a variable both size. In this case, we can slice this function so that we only keep those instructions, those commands, that the value of the variable both size can affect. In this case, we can remove these first two lines, basically, because whatever value of both size will be, they will be always executed the same way. In this case, we can slice this code, we can slice out these two first lines, and obtain only this part, which is exactly this. All those commands are affected by the value of the both size, because it either appears in the command or, for example, here, we are testing some field of a variable which is set by calling a function which depends on both size. So there's a transitive dependence. Also, after slicing, we run a bunch of code simplifications, such as, for example, in kernel, you can have functions that are, yeah, you can have functions, function calls, that as an argument have a string which contains an absolute path to that source file. Of course, it will be different from the other one, because you can have the kernel, the two kernels that you are comparing stored in different directories. So we'll remove these or normalize these. We run that code elimination, which removes all the code that is not reachable when executed, because if there is a difference in the code that is not reachable, then we don't care, because it will never be executed, so the difference will never occur, et cetera, et cetera. This shows the advantage of LLVM, because it already has that code elimination pass, constant propagation pass, et cetera. Afterwards, we run the diff itself. The diff itself is, again, leveraging on LLVM infrastructure, and we are using an LLVM component called function comparator, which is a component that goes instruction by instruction and compares them for equality. So this basically does syntactic equality or syntactic diff. Thanks to using LLVM IR instead of C, we already can handle stuff like, for example, variable renaming. LLVM IR does not use the original variable names, so it just checks whether the IR, IR in LLVM is a control flow graph. It's a graph of instructions and control flow edges. So it just checks whether the graph has the same structure, so it doesn't care about the names of the variables. We extend this comparator by handling changes or by identifying patterns that we know that are syntactically different, but have the same semantics, such as, for example, changes in structure layout that preserve semantics, or, for example, moving code into functions. You have a function that calls some code, and in the new version of the function, you take that code, put it into another function, and call it. This is syntactically different, but if you analyze it correctly, you will find out that it does actually the same, so we can handle these kinds of situations. But more importantly, the last, I would say that the last step is even more important than the previous one, because it's nice to find out that the functions are equal or not. But if you say to the user, hey, these two functions are not equal. Well, he won't be very happy because, well, it's nice to know that they are not equal, but how does he know where a difference occurs, and so on? So this is one of, I would say, the most important component, and it provides the user information about the found difference. What we can currently find is we find a symbol in which the difference occurs. This is not necessarily the symbol that we are originally comparing, right? So we are comparing two functions, but the difference can be in some cold function. Moreover, it's not necessarily in a function, it can be in a macro, it can be in an inline assembly code. We can identify all of these. Next, we determine where the symbol definition is, well, let's say physically, is located in the source file. So we get the name of the source file and the line in which the different symbol is defined. Then we get a program path from the analyzed symbol to the different symbol. We just give a plain call stack where each line contains a function call with a company with a filename and a line number. I will show these in a demo in a while. And at the end, finally, we get the actual difference. So we take the different functions and we return, we run them through a diff tool and we return the result. Just to mention, this, all this information are retrieved or is retrieved for either from debugging symbols. And also, we do an additional analysis of the C code, because for example, macros are lost, usage of macros is lost during compilation. So we analyze C code directly. Again, using debugging symbols so that we know which parts we have to analyze. And this way we can combine, for example, differences in macros, etc. Now, this is a part that, or before showing the demo, I would like to show one more thing. And the question is, can we do more? Can we really analyze? If you remember, in the beginning, I showed these two implementations of C library function that were completely syntactically different, but they were semantically the same. Can we do this here? Do we support such a thing? Yes, we do. We have an optional or experimental step that is called advanced semantic diff and that can really do this. This is based on using tools that use full methods to prove that two programs are semantically equal. We use one such tool called LRF. It's a tool developed by some researchers at a university in Karlsruhe, Germany. We found this tool and we use it because it operates over LLVM. So it's quite simple for us to use it. We just take the LLVM IR files that we compiled and feed them to the tool, and it will give us the result. So the tool is not written by us, but I wanted to show you that this is the science part of the talk, and I wanted to show you that there are really methods how to find that two programs are semantically equal. How this is done? Well, especially in this tool, the compared programs are translated into a logical formula. So we take the program and translate it into a logical formula, which expresses the effect of the program in terms of logical variables. Afterwards, we use a tool for solving logical formulae. These tools are called SMT solvers. You may know, for example, Z3. It's one of the most known ones. And we ask it a question. Is there an input I so that executing the first program with this input yields a different result than executing the second program with the same input? In case the tool tells us that, yes, there is such an input, then we know that programs are not equal. Moreover, it gives us so-called counter example, which means that it gives us exactly that input. So we can maybe, or we could potentially parse it, analyze it, and provide the user more information. In case it says no, there is no such input, then we are sure that the programs are equal, because there is no such input that the effect of the functions would be different. Here, by input, it's not necessarily argument, only the arguments, it can be a state of the stack or state of the heap. Also, the result is not necessarily only the result, the return value of the function. It can be, for example, against state of the heap. But this way, we can prove, really prove, soundly prove that two functions are semantically equivalent. Okay, let's get into demo. I will show you short demo on how this actually works. So we will compare the function that we have seen in the beginning. And first of all, we need to create a list of functions that we want to compare. In this case, we only are comparing a single function. So I will create a file, which will contain the name of the function, bio at page. Yeah, I will store it in some file, can be demo. Right. Now, as the next part, we will generate a snapshot, an LLVM snapshot that will contain the LLVM files that contain definition of this function. So let's run this camp with the generate command. And first of all, we specify which kernel we want to compare. So I have this in kernel Linux. Yeah, this is the version of kernel for Rails 75. Next, we specify where we want to store the snapshot. So a bit larger. Is it better? Mm-hmm. Can you see? Okay, cool. Yeah, and we say which file we are actually comparing, which file contains the functions that we are actually comparing. This camp says that it has found one function in the file and that the definition of the function can be found in the file fs slash bio dot c, which was compiled into a file fs slash bio LL. Now we do the same for the other kernel, because we always have to compare two versions. So we take a version of the following of the following rail version. So the first one, the first one was rail 75. This will be rail 76. So let's store it into a snapshot slash 76. Yeah, we take the same list of functions. Again, it was compiled into the same thing. And now let's run the comparison. So let's run this camp in compare mode, specifying snapshots. So first one is rail 75. Second one is rail 76. And let's tell him that we want to show if any is found. Yeah, this is the result. There is an output, which means there is a difference. And this tells us everything that we found out about the difference. First of all, what is different is this is this symbol. So we are comparing a symbol bio at page, which is different. Where the difference occurs, the difference occurs in a function called BLK max size offset. In the beginning, I'll show you that this was exactly the function that contained the difference. Here are call stacks on how to get from this symbol from the compared one to the to the symbol to the different one in both kernels. So in the first kernel, this function is called at fs bio C in line 816. The same function is in the second kernel called at file at line 818 and then follows the actual difference, which you can see is exactly what we have seen at the beginning. So the return value or the return expression changed. Here is the minty introduced. Another thing besides comparing plain functions, what we see that we support is comparing this control options. So kernel options that you can you can set on the running kernel. Again, we will create a file with a list of functions or list of options now that we want to compare. In this case, it will be just one option, which is called kernel set latency NS and we will save it in a syscontrol demo. And we generated a snapshot. So let's generate a snapshot. Now we say that we are working with syscontrol parameters. Default behavior is comparing functions. Yeah, let's take kernel for Linux rail 76 and story into rail 76 dash syscontrol and yeah, let's specify the file. Good. Now the output is a bit more verbose because we are now comparing an option. This is a kernel option that a user can specify. This option can affect six functions in total. The first one is a procedure handle functions, which is a function which is triggered once the or when this setting or value of the setting changes. And the rest five are functions that are using a data variable, which is set data variable syscontrol cad latency, which is set to the value of the setting that the user sets. So this is the list of functions that this setting can affect in kernel for rail 76. We do the same for rail 77, which is the newest rail seven kernel. Kernel, we store it in a snapshot rail 77 syscontrol. Yeah, let's take the same file. It will give us the same list of functions. And now when we run compare, it will compare these functions per wide. So it will compare procedure procedure handle functions, then it will compare these two functions, these two, et cetera, et cetera. Let's run the compare just quickly. Compare six, seven, six, seven, seven. And the result will be empty, which means that all of these functions are semantically equal. At the end, I would like to show you some of our experiments or I would like to show you that we actually ran this on whole kernels in redhead. And these are the results. So we compared versions of kernel for rail 74 up to rail 77. We always compared two sex story versions. And we also compared 80 with 81 beta, which are the new rail eight kernels. What do you can see here? Let's take an example of the 802, 81 beta. These two kernels have 471 KBI symbols in common. So there's a list of KBI symbols that we guarantee to be stable. And there's 471 of them out of these 80%. So something around 380 are proof to be equal. There are 67 that are different. And there's some number that we still cannot decide. The method that we are using is generally comparing functions for semantic equality is an undecidable problem from the computation theory point of view. It's undecidable. However, a lot of it can be decided. A lot of it can be found as as you can see, but still there's a number of unknown results, which we are working on to to remove. However, out of the 77 functions that are not equal or 67 KBI symbols, they contain 80 unique differences. A unique difference means that a single KBI symbol can contain multiple differences. It can come multiple functions that are actually different. And also one difference can affect multiple KBI symbols. But there are 80 unique differences out of which 73 occur in a function, seven of them occur in a macro. The last column, potential false positives, gives the number of differences that we reported that there is a difference. However, we did not find any syntactical difference anywhere that would be related to it. So we think that probably this is a false positive. This is probably an error of our tool. And yeah, we are working on identifying those and removing them. So as you can see, it's still work in progress. We still have some areas where we would like to improve. However, there's quite a lot of things that we can already prove and quite a lot of differences that we can already find. Yeah, that's it. That's everything from my side. Thank you for listening. If you wanted to try the tool, feel free to for clone whatever, do with our repo. If you have any feedback, you can open issue reports. You can even fix them and send PRs. We'll be more than happy. For better usage, we are preparing an RPM package which should be ready, hopefully, in the following weeks. Okay, that's it. Thank you for the attention. And if you have any questions, feel free to ask. Oh yes, so yeah, I can repeat the question if you want to. So the question is if this can be used for different projects than just kernel. The answer is technically, yes, why not? Currently, we only support kernel because we want to use it for that. So you would have to at least write or implement the part that translates your source code into LLVM IR. And then you can use our infrastructure. So technically, it's not a problem. Technically, you can compare anything that is compilable into LLVM IR, which means that it's not only C, it's also C++, it's Rust, which is naturally compiled. We just found that Go can be compiled. Basically, since LLVM is quite expanding a lot, I think that a lot of, well, there are works that compile almost every procedural language into LLVM. So we are technically, you can do that. We just, right now, support kernel. However, if you had the frontend to compile your project into LLVM IR, why not? It could be used. Yes? I guess we have to use the mic. Yeah, so this tool from Carlos Rulli, which I forget the name already, is it fully automatic? Sorry, I didn't hear. The formal method part that actually analyzes the semantics of the methods, is that fully automatic? Yes. This is all fully automatic, as you have seen. Also, if we plugged in the formal methods part, yes, it is fully automatic. There is no requirement for any hand proofs or so. Does the optimization level mess with this at all? Is that something you guys have experimented with? What level of optimization you use with LLVM coming out in the code that it spits out? Does that cause any problems with the diffing? Or is that, do you understand what I'm saying? Yeah. I guess what I'm asking is, does that come in at the IR phase? I guess I'm not quite sure what phase in the compile process you're coming in. So we're coming in, we're on the parser, we get the LLVM IR, so it's the internal representation. And before running the optimizations and all of this by compiler, that's where we get in. Okay, so there's no optimization phase that happens just on the IR? No. Okay, cool. We control everything, or we control our optimizations by hand, so we explicitly say what we want to run and we control how the IR is transformed, that's right, then we analyze it. So you could turn that on if you wanted? Yes. Okay, cool, yeah. Thank you. Yeah, currently we're running no optimization. Only those that we choose to run, such as dead code elimination. What are the next steps or future plans? Okay, so the first one is written here, the RPM package. We would like the kernel developers to start using this experiment thing with the tool, we'll wait for their feedback. So currently the next step, I would say, is let's see what users, what are the most known problems or what are the problems that users will have, try to fix those. Besides that, we are trying to work on a few advanced stuff such as simplifying the LLVM IR so that we can run the formal method tools easier because the formal method tools do not scale, the problem of research tools in general, they do not scale on large projects. So we want to simplify the code as much as possible so that we can maybe use these tools for larger code bases. So that's one of the things that we are thinking on right now. So there's another question. Are there any plans to install this in a continuous integration pipeline? Yeah, thanks for the question, good one. Yes, there. I'm discussing with guys who run the or who maintain the KBI list. And yeah, one of the possible usages is to run this in the kernel CI and to get the output back to the maintainers of individual KBI symbols in case that something breaks between, or that there is a difference between two versions of the kernel. So yes, there are plans to do that. I guess that's it. Thank you for coming. Thank you for listening.