 Hi, welcome everyone. So today, I will be presenting how to revive the old Ken Thompson backdoor in modern OpenBSD make. So I'm Samuel Obertin, so known on the interwebs as cons. I work for IBM Security in France as a consultant. I'm a network and system engineer and a dropped out PhD in cybersecurity in France as well. I've been using OpenBSD at home for nine years now. And as well for my research, it's been a super nice tool to use for multiple reasons that I won't explore more. I like a movie. I would like to speak about you, about an old movie from Stanley Kubrick, which is 2001, Space Odyssey. And in this flick, there is a machine called R9000. And R doesn't really behave the way it should actually behave. Why that? What if R9000 got actually backdoored? Maybe someone influenced the way R9000 should behave as a machine. So the question is, how can we trust R9000? First, we need to think about physical security. Maybe someone entered the machine or the perimeter of the machine and modified the machine. Same for the actual hardware of the machine that has been shipped, maybe by a constructor or a vendor. The firmware as well is pretty interesting. It's the code that runs in the hardware, but you don't necessarily have access to it in modern computers. On your CPU, there are firmware that we don't have code for. And every peripheral as well has some firmware to make it run on your computer. This could be the kernel or the user land as well. Maybe the operating system of R9000 got modified to misbehave, or the actual R9000 program. It could be as well operations like the way all of these components are actually used by the end users or the administrators. But there is an interesting fact in this, is three of these components are actually code which is compiled at some point. And this code has been maybe audited. What about the actual compiler that compiles this firmware kernel or user land applications? And that's the whole subject of Ken Thompson 1970, 1974, I think, paper called Trusting Trust, 84. Sorry, my bad. The conclusion of Ken Thompson is quite straightforward, one is that we cannot actually trust any code that hasn't been written by ourselves, compilers included. That's quite harsh as a conclusion. And that's an interesting paper, which is quite old now, but I think it's still really modern in some ways. The Thompson paper specifies two features. The first one is for his backdoor to work, you need self-replication. And maybe you know coins. Coins are little games that you can play as a developer. And the idea is to develop a program that once executed will actually print itself or reproduce itself. So there is a little C example which will print exactly the same input it has been used to be compiled with. And a coin, there are plenty of different coins that you can find in the real world. Mainly, this is for games. There are golf games where you want to maybe write the shortest coin ever in a different language. You can visit, I think, there is a website called Rosetta Stone, where there are hundreds of coins in different languages. Some of them are not real coins. Some of them cheat a bit, like they read, let's say, the file macro in C, which contains the whole content of the actual file, which is expanded by the preprocessor of the compiler. But this one is a real one, where we use printf 10 and 34 in ASCII, which represents a new line and a double quote to actually reproduce the double quotes and the new line used between line 1 and 2. The second feature is what Thompson calls learning. So we have a way to say programming, actually. Compilers, they do actually carry knowledge obtained from their own sources to the next hereditary binaries. Like you can tell your naive compiler, anti-slash-m is a new line. And then once you've compiled this compiler, when the compiler will encounter an anti-slash-m, it will maybe print a new line return. So you've learned or programmed your actual compiler to do some things. And combining these two features, Thompson's create his factor. He says, OK, I'm going to learn my compiler, program it, in a way that if he compiles himself, he will create a coin. He will actually self-reproduce. And the second thing that the Thompson backdoors learns about is if you compile login on the Unix system, please make it misbehaved. Maybe if I type a specific username, my password will be right every time and I could enter the system. And the backdoor is not into the login.c program. It's in the actual compiler, which is used to compile login as a binary. So if we wrap up the things, we have a compiler source that we will call cs, which is compiled by x, and we obtain a compiler c. Then we use a second source that we modify to program to misbehave. And we use the newest compiler, which is c, to compile it. We obtain a new compiler, which is called bc. And this compiler is actually the Thompson backdoor. And the Thompson backdoor is interesting, because when you take again the first compiler source, which is saying that has been audited by a security or developer people, and you compile it with bc, you obtain the self-replicating backdoor compiler, actual real Thompson backdoor that could propagate across generation because of the self-replication mechanism that we've programmed into. So if we take again the compiler source cs and we recompile it with this new version, we obtain a second version of the same self-replicating backdoor compiler. And then with this latest compiler, we may compile login and login will misbehave. And the actual question we could ask ourselves is, OK, that's nice. Where does x come from? Maybe x is an actual Thompson backdoor. I never asked myself, oh, yeah, I'm going to compile my compiler with x, but where does x come from? And that's the real thing I like about this paper in this Ken Thompson presentation he made at the SCM. The origins of this behavior is quite interesting. The original Thompson paper cites a known Air Force document. OK? After some research, it's now available on the internet. And it's a very old paper, which is 48 years old, which is called Multic Security Evaluation to Neurability Analysis. And in this paper, there is an interesting quote about some trapdoors, backdoors now we call them, about the programming language one, PL1, which was used to compile the kernel of Multics. And in this paper, they explain the essence of the actual vulnerability, which is, even if we've got a misbehaving program, we are not able to really detect it. Because even if we recompile a trusted source, we are never, ever able to assess the origins of our X compiler. And there is an interesting thing about that. I was reading this paper and I was like, so Thompson used CC, the C legacy compiler, to compile CC from CC.C. That's a lot of stuff. And I was like, OK. So there is some kind of recursivity in this command. And recently, I came across like, I had to do this for my work. I came across an interesting command I had to type. I had to build Docker from source. And to build Docker from source, you actually need Docker. And I was like, hmm, that's interesting. There is something recursive as well in this command. And in fact, these compilers, per definition, at least C compiler, used by Thompson is self-hosted. So he's able to actually compile in itself. But other components which are not, compilers could behave this way as well, like Docker. And more recently, as well, on OpenBSD, I typed this command. Do you see some similarities between these different commands? And I was like, OK. But if X is not a compiler, but make, let's do that. So demonstration time. But first, I just want to show you the program that we will attack. If I'm in the directory, yes, I am. This is just a copy of HAL 9000 user loan program I just got. We'll compile it with the system make, which is same, I promise. And I'm going to execute my HAL 9000 program, which asks me for a password to open the pay door. So I type one of the things. And HAL tells me, I'm sorry, Dave. I'm afraid I can't do that. So if we look at the HAL 9000 source, the password is actually encoded in a hash. And the input of the user, which is somewhere here, is compared with the hash. And if the hash are the same, so if Dave knows the password, then the door may open. And that's this program we're going to attack with our make command that I just typed there, but using a slightly modified one, of course, make version. So in this directory, I have the HAL source, which is the same. We can check it again, but it's really the same that I showed you. And then we've got a make file to actually build the backdoor. All this code will be available at the end of the presentation. There is a link at the end to actually build, as we saw, the different steps to get our Thompson make backdoor. And I've got engine.c, which is a modified part of make, actual source code of make. And in engine.c somewhere, oh, yeah, I think it's here. Yeah, it's here with better. OK, I prepared that, but I forgot. I have my two features encoded into engine.c, and the make file will make a patch of it and then do some specific stuff and create a new version of make for me. So the first part is the self-replication one occurring when make compiles make. And the second part is the simple backdoor that just does a set on the HAL 9000 source during just before compilation and replace the modified source code again back to its original form. So it's very simple. And the whole restricts here are pretty straightforward and simple. I'm just looking for engine.o, which is a target for make. At some point, when you make, you may compile engine.c in engine.o. And same for HAL 9000, so same for the demo, which is, of course, in a make file. So I've compiled make.legit to be able later to compare it with other stuff. Make.legit comes from the original OpenBSD source, HAL 9000 as well. And then I make the source backdoor. So the first generation, the orange one. And with this make source backdoor, I compile make from the legit source and we can see the self-replicating job occurring. And so now we have make binary backdoor and we compile HAL 9000 again with this binary backdoor. And then we just execute two occurrences of HAL, the first being the legit one with a random password. And as I showed you earlier, HAL does behave correctly. He says, that's not a good password. Sorry, but I'm not going to open the door for you. And using the exact same password, I'm actually able to enter the door. So I can do like this and it will always misbehave. I can add some characters as well. It will always open the door for me and even if I audited the source code, there is no clue of this backdoor in the actual source. It's in the make that even the compiler that has been used to build it. So the implementation details, I showed you the actual source. We are in the do run command function of the engine.c in make source. And we just need a few variables which are available to us, which is the name of the object that will be compiled with engine.o. And the command that needs to be emitted by make to actually build this object, which is cmd. And so in the source, yeah, it's readable. OK, nice. I do something odd to obtain like self-replication. My source code looks like that in engine.c, the backdoor one. So there is a code diff. And then I decode it with base64. But that. Fact is, I need to encode my backdoor to be able to survive multiple generation of recompilation. And in the make file I showed you that I used to make this demonstration, there is this input of code. So I just diff my version of engine.c, which is evill, against the legit source engine. And then I obtain a patch, $at. And then I do multiple rounds of encoding for this patch. And then with this patch.free, which is the third generation of my patch, I'm actually patching the real engine, the target engine of the make.src binary. So I have different levels. And the first one is the template backdoor. Because I have in my code, I just have diff, which is not an actual misbehaving behavior. It's not even decodable by base64. So I have my template of the backdoor. Then I encode it one time. Then a second time to be self-replicating. And then a third time to be able to actually never hit the actual templated backdoor. Because I don't want in a few generation of my code to be actually able to stumble upon diff, which is not interesting to me. For the targeting of all 9,000, I use the same semantics. I look up the job name. And then I will mess with the actual command that make will do to compile all 9,000. And what I do is simply left and right of my command, I will add some other shell commands. The first one is set, which will actually modify the source code. There could be many different ways to actually backdoor all 9,000. I chose this one because it's pretty straightforward. And this talk is more about the idea behind it more than the actual way you can do it. And I think it's quite simple. Like the whole backdoor, the two features are 29 of C. I think the docker backdoor I made a few months ago is six lines of go. So it's quite cheap to develop. I think the backdoor for docker took me about one or two days. And this one took me a couple of hours of employment. Because once you grasp your head about the principle of self-replicating code and self-hosted components, it's really simple to implement this in any component that has these properties. And so what I do is I just change the code, then execute the command, then change back the original state of the code using MV. And the dash i oric of said enables me to actually save a backup copy of the file before modifying it. And then it's an uninteresting string concatenation stuff. But then maybe we can actually detect this kind of behaviors. Maybe, I don't know, we could actually do some static analysis of what the compiler of HAL 9000. That's a difficult answer, because we don't really know well this kind of backdoor could be. So we could do differential analysis, let's say using a legit make that we trust. Oh, I don't know, but we trust it. And then compare it using the Levenstein distance even just doing a binary diffing between our programs. There are a few tools for that. Bindiff, which is a commercial proprietary one. There is Radiff, which is part of the Radar toolkit, which is open source. Maybe we can go further. Maybe we can do some decompilation. Maybe we can use a few other tools, like Ghidra or Radar 2, as well, which are tools that we can use. Maybe I can show you. I have time to actually see what's happening. So let's have my binaries there. I'm going to load into Radar the binary backdoor of make. OK, there are fortunes. I was lucky for this one. Radar emits weird fortunes. I'm just going to do a static analysis of functions so I can have the symbol tables. So let's list the function tables. OK, there are plenty of them. Some of them have symbols because I'm lucky. I've compiled make with the debug symbols, which are quite useful. And I'm going to grab for this object. This symbol contains our backdoor. Let's inspect it. Let's assemble it. OK, that's a lot of code. OK, you didn't see anything. OK, there is the function and the symbol entry. Plenty of things happening. Plenty of things, plenty of things. And then at some point, we stumble about a large string, which is the patch.free. And this large string should appear as true. Oh, here it is. So actually, we can do binary decompilation, or at least scan the symbols in this binary to find such backdoors. But maybe I could use a way still there, a way to, I could store it. I could hide it in, I don't know, I could build a return-oriented part using preceding symbols already found in the binary, or another one. So we can do that. But that's quite manually. We need to inspect, as I said, maybe make, maybe the compiler, maybe hal 9,000. Maybe we can do some runtime analysis as well. We could use on OpenVsd, retrace, retrace, maybe run it on a debugger and inspect what's happening. Same radar supports a debugger mode where you can run interactively your programming too. But these tasks are quite tedious. There is a guy, David A. Wheeler, 20 years ago. He was reading the Thompson paper. He made some interesting contribution about trusting trust. And his contribution is the diverse, double-compiling method, which is a way to detect some of the backdoors emitted by the Thompson hack. And the idea is, OK, we spoke about the compiler of source CS before. You took a compiler that you don't really trust, X maybe. And you compile it. And you obtain the compiler, the binary compiler, X1. Then you take the second compiler, Y, that you don't really trust more than X, but which is slightly different. Maybe you can take, I don't know, GCC for X and C long, and LLVM for Y. So you have two compilers, X1 and Y1, which are binary different. They come from different compilers. So they are not the same. But they share a feature. These compilers are compilers. And functionally, they should emit the same binary code. So let's do another round of compilation. And with X1 compile again CS and obtain X2. And with Y1 compile Y2. And then X2 and Y2 should be binary equivalent, because they are the same implementation of different compilers, or same compilers in functionality, but different in implementation or compilation source. And if you are able to do that, then you are able to actually detect if X or Y was backdoor. You cannot tell which one, because you don't trust any of them. Maybe there is a slight chance they share the same backdoor. Could be possible. But that's one way to do that. There are a few related works, appearing in a POC or GTFO, which is a very nice read. The first one is the unable backdoor using compiler bugs. This paper shows a way to actually backdoor sudo by patching GCC with a little patch, which is about maybe some typos in the code. And the guy actually submitted the GCC commit before it was removed. And for this specific version of GCC and using a specific version of sudo, you will obtain a backdoor sudo. And the interesting part of this paper is the fact that it's deniable, because it's just a bug. Oh, sorry, I wanted to fix a typo. And then I've got a backdoor. Oh, I'm so sorry about that. That's an interesting feature. And from one of the authors of this paper, there's a nice blog post as well, which simply enumerates a few ways that us as distributions or maybe compiler developers, we should behave in front of this problem, which is honestly quite old and not new. So the actual conclusion of this talk is, yeah, computer science is quite a recent thing in human history. But we have this problem for quite half a century now. And it's still unresolved. And maybe we should actually think about new ways. I don't know. I have no answer to this actual real problem. I just wanted to share it with you. And some people could think about maybe reproducible builds that we have seen a few years ago, notably in Debian, I think. But it doesn't actually fix the self-replicating problem on the compiler that you first. So maybe next time when you choose a self-hosted component of your system, you should really think about the origins of x. Could be make, could be Docker, could be GCC. There are like plenty of components which are self-hosted. And the next time you stumble upon a command that looks like that, think about it. And tell yourself, OK, this could be an issue. Maybe it's not. I'm not over paranoid. And I'm not saying that every compiler or every component of every compiler or every Docker runtime you use is actually backdoor. I'm just saying it's doable, and it's quite cheap. Maybe this could be a game for you to do that. Anyway, let's discuss about that if you fancy such discussion. And you can find the whole code and slides at this git repository. I don't mind browsing it with a web browser because it just will kick you out. But it's plonible using git. And I think that's it. We have like 10 minutes to discuss about that. I'm super eager to hear from your feedbacks. Thank you, everyone. So do you have any questions or comments or things you didn't really understand in the presentation? Throw it. Just wanted to add two quite important observations. The first is when people talk about trusting trust, they mostly consider it an academic problem. We had this issue that was covered a while ago that the Delphi compiler, the object-oriented Pascal thingy that was very popular in the 90s on Windows, was actually chipping malware in the compiled binaries for a long time. So everyone who was using Delphi was creating malware and distributing it. So we've also seen a couple of other attacks on the supply chain that pretty much look just like this. And the second observation, if you look at reflections on trusting trust, the PhD thesis from Wheeler, you'll notice that the approach he's taken doesn't work for some of our supposedly very secure, modern programming languages, because, for example, Rust doesn't have independent second implementation. Everyone nowadays using Rust typically is getting some Rust compiler binary from someone and trusting that to work. It's insanely complicated and slow to actually bootstrap Rust. The very same situation with Java, with Go, the list continues. Haskell is one of the traditional examples. And if you can't provide a short bootstrap cycle for your programming language, maybe it's not as secure as you think it is. Yeah, good point. Thank you. Does anyone else want to talk about the comment? Those are good points, just one correction. Go does have a reasonably good bootstrap story, because you can build Go 1.4, which only requires a C compiler. And you can build any modern Go version with Go 1.4. So in Hackersource, we actually build sort of Go from scratch without the need for a bootstrap binary. So you're able to actually self-host Go from a scratch on your machine? Nice. From C. Nice. So you just need to trust C.C. No, I'm joking. No, that's the best we can do. No, I'm really joking. And I think we shouldn't be paranoid about this issue. It's a real one. But at the same point, we are humans. We trust each other. I would also like to mention that there is an implicit context there being the operating system. And you trust it to provide the right file to the compiler. And you can make the OS itself your unscored file. And when it's compiling itself, it can also create a hostile operating system. Of course, operating systems are health-hosted components as well. It's quite difficult, at least for the BSD. Maybe in that BSD, there are some cross-compiling options. I'm not really aware of that. But on OpenBSD, you definitely need OpenBSD to build OpenBSD. We could talk days about, like, I have a friend which is doing CPU synthesis. And the funny thing is he is able to emulate a non-existing CPU on a platform. It's called software synthesis. And you take an hardware definition and you synthesize it on an FPGA. And then you obtain a synthetic processor to test it, maybe. And the funny thing is he was running our hardware definition of an FPGA on an FPGA. And maybe the actual hardware FPGA was lying about something. So it's a non-endable problem. But yeah, thank you. Yeah, operating systems, for sure, are vulnerable as well. As the manager of OpenBSD make, I will deny everything. Ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha. Now, the real point is, as human, we are creating bonds and we are trusting each other. And there is no need to be overly paranoid about such things, especially in the open-source community. But too much confidence as well is not good. But we are trusting ourselves together. And that's a human thing. And I think that's OK to trust someone is being able to be confident about ourselves as well. And that's important. I think I'm done. Do you have any more interventions, suggestions? OK, thank you very much.