 So, it's actually my involvement in the TAR project that made me familiar with this thing that I'm going to talk about today. So what is the issue we face, all of us, but also that we face, particularly in the TAR project? So when we talk about software, they are like two sides of it. The source is what some humans can read, and there's also the binary that even the smallest fraction of humanity can read, but it's mostly intended to be consumed by computers. And so the binary form is what the computers need to run the software. And transforming the source that human can produce and read to binary is called compiling of building. And so when we talk about free software, they are like two great things, at least. The first is that you have the freedom to study the software, and you have the freedom to run the software. And the question is, because the source is actually what, as humans, we can read, and so it's what, as humans, we can verify and ensure that the software will do what it's intended to do, that it doesn't contain any security bugs or malware or bad things. And the binary is what we actually use. Problem is that, at the moment, if I want to know that a binary I get, how it's been built, how it's been compiled, basically I have to trust the software authors. I have no good way to know how they made it. And what would be, it would be so much better if you could get a proof that a source has been used to actually produce the binary we get. And so why does this matter? It matters because, sorry, it matters because we, in any cases, even, like some people will say anyway, if you have to trust the software authors, because, I mean, you can't check all the sources every time that you use something, and so there's always some level of trust. And so trusting the binary that some software authors that you trust is already good enough security-wise. And so Seth, shown from the EFF and McPerry from the Tor Project, they did a talk last December at 31C3. And well, they explained at greater length that developers actually could be targeted and not realize that their building environment had been compromised. And like, for example, during the talk, Seth showed a proof-of-concept kernel exploit that would, when the compiler will read source code, some C file, without modifying anything on the disk, the C file that actually given to the compiler had malware inserted. And so basically, even if we trust the developers, we might not actually get what we want. Watch that talk if you want to know more about the reasons why this is important, but I'm going to give you just one example. A couple of months later, after they made the talk at 31C3, I mean, we're not discussing an hypothetical hypothetical attack here, because the intercept released another document from the Snowden leaks. And this one was internal CAA conference in 2012. And so this is one of the, an excerpt from the program of that conference. And it's a presentation about a project called Strow Horse. And basically, they are describing an attack on the development environment for macOS 10 and iOS, which is called Xcode. And they said that they have a modified version of Xcode that can create binary, that can be watermarked, or that can leak data, or that can have pure and simple triumphs. So this is a real attack. And we are talking about developers being in totally good faith. I'm producing software, the binary they will give you. And even if they are of good faith, we could be totally own. And if we trust them, because they would not have realized that the environment would be compromised. So we have a solution for that. And the idea is to get reasonable confidence that a given binary was indeed produced by the super source. And so we want to enable anyone to reproduce identical binaries from a given source. And if we have this, if enough people redo the build on different networks at different times, on different machines, then, well, we can assume that either everybody is own. But with more luck, we would assume that it's okay and no bad stuff would have been inserted behind our backs. So the solution, we call that reproducible build. The budget of the camp is a bit low, so you can imagine trumpets and, you know, the orchestra being all John Williams and all. And that thing for the CIA and good thing for us, it's kind of trendy. I mean, I became familiar myself with the concept because of the work that Mike Perry did on the tour, brother. But he himself was inspired by the bin coin people. And since then, it's been two years that we started to work on this in GBN. Some people have started to work on this on 3BSD, NetBSD as an MK, repro, environment viable. Coreboot has now fixed all its reproducibility issue when there's no payload. OpenWLT started to accept patches. And it's not limited to this list. That's so many projects. I heard that the mono compiler now has options. They are like people fixing their projects or fixing the tools that we use so they can produce identical binaries. And wow, I'm really happy because simply, for me, it should become the norm. We have to do that. I mean, one say that the only software that can be secure are free software because that's the only software we can probably audit. And we can modify it so there's a team of people who look at the code. But if we look at the code and the binary gets owned because some system somewhere has been compromised and we don't know about it, then we're doomed. So yeah, let's make this the default for all software we produce. And while watching on this for the last two years in GBN, and I mean kind of becoming more or less a reference, but I mean, it was not possibly the plan initially. But anyway, we identified that there were actually multiple aspects to get reproducible builds. First you need to get the build to put the same bytes for a given version. But it's not only that, it's also others will be able to set up a close enough build environment with the software being similar enough so they would be able to actually perform the build from the source you were giving to them. And for them to set that environment that you decide to set it up, then it needs to be distributed somehow. And I'm not going to talk about that aspect in this talk because for me it's mostly about documentation and it's going to vary from one project to the next quite a lot. But the idea is that anyway, for checking the results, I don't want to discuss it actually because I am a strong proponent of the idea that if we want real reproducible build, the checking result operation, it should be just comparing bytes and not having some specific software that would like try to ignore some differences and all, just same bytes, boom. So let's get started. How do we get a build system to always build the same thing? And in a nutshell, you want stable inputs, always same source. You want stable outputs, always the same results for the same source. And you want the build to capture as little as possible from the environment. If it sounds like common sense, well, we discovered that actually it's not that great in the real world. With the work we've been doing in Debian, so 22,000 source packages that we looked at, we've seen that these assumptions, they actually do not hold for software rebuild. The number one issue preventing the output to always be the same is time stamps. I will get to that later again, but time stamps everywhere. The date and time of the build, the current time at which the build is made is everywhere. Other common problems, variation in file ordering on disk, CPU class or usage of randomness or the path where the build is done, getting recorded in the build, the time zone issues, lots of things. But before giving some solutions to all these problems, I want to raise another issue, though. So if you want to build a piece of software, we actually need to get our hands on it. And so why the lucky Steve was an amazing member of the Ruby community. And if you meet someone who doesn't like Ruby, that's because they never had a chance to read why it's pointed right to Ruby, which is like a programming manual, a comic book full of jokes that is an amazing piece of art and tech, so anyway, I'm talking about why, because one day, why disappeared? And he actively disappeared. He took all his writing, all his source code with him, done, and boom, no more why. So I mean, inputs from the network, even if it doesn't seem like they are volatile, they are, and we have mirrors. But if you want to make sure, then it's better that your build system don't rely on remote data. Or if you do, two things, use checksums to make sure the content has not been modified and key backups. A good example of how it could be done right is how the 3DSD reports work. They record a master site, and then they have another file with the size and the cryptographic checksums of the source. And then they also provide mirrors, sorry, they also provide mirrors from the, whoo, nope, no, fail, sorry, yep, they also provide mirrors for this file and so you can actually even if the master site is done, you can still don't have the source from the mirrors. Okay, now once you tackle your inputs and you know they are, you can get your hands to them, let's see a couple of difference. So this is a simple difference between two TAR archives. And they have exactly the same content, but not in the same order. And so if you actually use a command like the one below at the bottom, you will get varying order depending on how the files have been written in the file system. And from one file system to another, then there's no guarantee at all. So the actual listings are not stable, so this is a bad construction. What you want to do is something like that, which is listing the inputs explicitly. This is usually how it's done in Mac files. And another way is to use sorting. So it is a complicated way to get it done right now, but so you use find and sort and then TAR with a list of files. It's a bit of a black magic, but there's one catch with this actually, is that you might get different, if you don't do that, if you don't specify the local, you might get different inputs or inputs in the different order, because in some locals, you will have the file case sensitive and in some others, there will be case intensity of the sorting. Some people argue that it's a bad design from GNU and other UNIXes, but man. So if you specify the local, then you're safe. Another issue, this one is from Carboot, that's a real example. And it's the kind of thing that you really, really don't want you to have to track down. I know that Mike Perry and Yorke Coben, they had to face such an issue with the windows build of the TAR browser. The difference we're seeing here is only a couple of bytes. And at every build, you will get different values. No common patterns, no nothing. That's because they're actually the content of the memory, of some random part of the memory while the build was done. That's because the structure that gets directly written to the file was actually not properly initialized. So this was the code from Carboot, and here's the fix, it's trivial. It has a zero, initialize the structure, and you will be fine, but you have to make sure that it's done, otherwise it's hours of what? So another example, here we have a build number, which is a date, but it's a build number embedded in the German directionary for ISPIL, and the number gets different from one build to the next. So don't do that. You don't want a different version number on each build, that will not work for a possible build. What you want is to extract meaningful information about the source from the source itself. So it can be a version control system revision number, can be an hash of the source code, and if you use git, then does the same, yay. Or it can be like a change log entry. So the example is what we do for the NCIS DBN package, so we extract the version from change log and pass it to the build system afterwards. Another issue that is closely related, somehow, this is a dump of NASM, and maybe it's small, but on the left, it says July 29th, and on the right, it says July 30th. So it was a date when the build was made, and it gets encoded in the binary. Don't do that. We want to be all the same piece of code at different time and get the same result, otherwise it's not reproducible. But also, if you think about it, it's actually not a useful piece of information anyway. If I get your version from 10 years ago and build it today, why do I get the date of today? That's not really good information. Also if you think about it a little bit more, actually, if the date and time of the build is meant to be an indication of the environment in which the software was built, well, then to get reproducible builds, we will have even better way to specify the build environment, and so it's actually not useful in the context of reproducible build at all. So if you need a date, try to avoid it, but if you need a date, well, you can use the date of the latest change log from your Git commit, from the latest Git commit from VCS, all the latest change log entry, all that works, like you get a date that is accurately representing the state of your source code. And one catch, again, don't forget the time zone, because if you don't record it, or if you don't use simply, I mean, easiest way, use ETC all the time, but if you don't, then make sure that you also record the time, otherwise you will have, I mean, if someone in Berlin builds the software and someone in San Francisco builds the software, they will get different results, because the computer will have different times then. So watch out. There's been a trick that's been used to get around date and time issues is fake time. So fake time is an option, but it has some serious drawbacks. So fake time is a library that gets loaded by LD preload, so whenever software asks the system for the current date and time, then it will return a date and time that you specified and that is fixed. Seems like, okay, quick way to fix everything that use timestamps. Problem is that it also introduces creepy behavior, because if I take, like, a tool that is often used to build software, which is called make, well, make works by relying that it will always try to build only what has changed since the previous build. And when you use fake time, make is not able to properly understand that a source file has been changed or not, because everything has the same date when you build it. And the bug I'm pointing here that affects still, I think, to our browser builds, it is actually a reproducibility issue that is introduced by the fact that it's using fake time, because the multiple finds will be, the same file will be built multiple times at different times, because it's doing parallel builds, and it means that in the end something will be rewriting twice and the order is not deterministic because it's parallel, and boom, unreproducible. So, fake time, I would not recommend it except for very limited use, like, only called a single program with fake time, and you can be safe, that's a way to fix things, but we actually have a better thing, a better way. It's called source date epoch, so what's the thing? It's an environment viable. It's a new standard, quote, quote, that was initially driven by Shimin Lo and Daniel Kang-Zimo Gilmore, and we are trying to push as the DBM reproducibility, reproducible build effort, and so it's a new environment viable, and that can be set with reference time. The value is in the epoch format, so number of seconds in January 4, 1970, midnight UTC. And the main idea is that any software that would like to use the current time, like, think about Doxygen, who says documentation generated on date, well, instead of using the current time when it has source date epoch set, it will use the value that is in the environment viable, and so we are working on patches, and it's already implemented, like, upstream in Hub2Man and a PyDoc, Doxygen, GodCript. We have patches ready for GCC, 62Man, a little bit salty, GetText, and more have been written by also Akira and Dole, who are two GSOC students we have this year, and who's doing, like, awesome work, if you're watching the stream. Thank you. And so solution, instead of using fake time, try to set source date epoch in your build system and see if you get reproducible output. And if you don't, then patch the tools and submit the patches, and we can all enjoy reproducible builds in the future. If you want more details, you can look at the actual proposal on the DBN Wiki, it might be a moving target, but since we're starting to push patches, probably so that epoch is going to stay as it is. I'm not done with timestamps. I'm sorry. They really are everywhere. This is here. You can see timestamps in two places, in GZIP headers, and also in Table Metadata. So you really don't want to record that also. Most archive formats, they will actually keep the file modification date in the metadata. Some rattle, like GZIP, GZIP has a minus and option, and then it will not record metadata, so that's fine. But not all of these tools have such an option. So several solutions. One is to specify an end time that works for TAR, and so you can use dash dash end time. Another solution is to preprocess the file you're going to put in your variable with touch, simple, easy. This will work with almost all archive formats, and another way is to actually do post-processing. So at the end of the build, you strip non-deterministic data from your build products. I'm going to talk about strip non-determinism later. Another example, this is... Can you see what it is? It is a dump of some executable. It has three functions, and sadly, depending on the build, they're not in the same order. That's actually because it's generated code, and it's generated by using the key order in Ashtables. So that's because the Pyl and Ruby and Python and other languages, they have a thing to prevent someone to attack the hash function and then starve the machine for CPU or memory, and so they have a random hash function. And so if you just traverse the dictionary, then you will get a different order every time because the hash function is different every time. Solution is pretty simple here. Just talk. It's one keyword usually does a trick, just sort your output, and you're fine. Another issue, avoid true randomness. So unless you implement your randomness like Randall Monroe, you will actually have non-deterministic values. So the example here is a link time optimization from GCC, and, well, it's... If you give it that, you will get understandable builds because, well, random, different every time, hopefully. And the good news is that actually computers, they are very bad at randomness, but you know that. But... And what we use are pseudo random number generator, and they take a seed. That is an initial value that we'll use to derive a long string of numbers. So one way is just to use the fixed value for every one of your build, and then you will get the same output. Or you can extract a value from the source code like file name or content hash. That works as well. There are other environment variables that might be affecting the output, like RCC time, we all change the time string, RCC type, we change the text encoding, TZ will affect time because it's time zone. Well, if you notice any software that gets affected by changing this value just in your build system, set them to your canonical value. Just one pledge, though. Please don't force the language on people building the software. I believe that people should be able to use computers in the language they prefer. And so some people might prefer to get compiler errors in their own language. That's... So please don't force like everybody to use English or German or Japanese, which would be more complicated. Also, don't stop in your version string. Don't say build on system foo by users, bar, on CPU, Pentium, 14, blah, blah, blah, blah. If you want to proceed with build, you want different kind of machines to actually be able to have the same result. And actually, if you want to record this value, if you want to know about the builds in detail, like timing, like to understand how long a build has used, well, the log of the build is a perfectly fine place to put the value and then users can save the log, but your build products make them identical. Do not record such information because then we can all compare the results without having to do complicated things. And actually why we don't also, it's quite unneeded to record such information is, well, because we're going to have a reproducible build environment, because if we want people to reproduce the build, then they need to also have a common enough toolchain with the one you used that they will get the same result. So what's in the build environment? So at least you want the build tools and the specific version. If you use a compiler, well, there's a good chance that between one version and another, you will get different output. Pretty simple because compiler gets improved all the time. And so new optimization. And that's a good thing. We get faster software for free. But it means that from a version A to version B, then output is not the same. So you want at least the build tools to record their versions. And up to you, depending on your own settings, well, you can record the build architecture. That might be a sane assumption to say, okay, if you don't do cross-compiling, you need to reproduce the binary for AMD64, then you need to build an AMD64. Or that this piece of software to produce a binary must be built on 3BSD. I don't know. It's a bit like you decide. You might want to record a build path. That's something we do right now in Debian. It's because as soon as you produce debug symbols with GCC or anything generating draft symbols, they will get the build path recorded. And there is no real good cross-processing tools right now or even a way to trick the compiler. So we'd say, okay, always build it in the same directory and it will get the same result. It's easy enough to just mkd or something that we assume it's a fine thing to ask to users. And if you use things like fake time or source.epark, you might want to record the initial build time, maybe, maybe not, up to you. So you need users to have a way to reproduce the tools that you use to perform the build. So one easy way to do that is ask them to build them for a source every time. It's the approach used by Cargoot or OpenNiboyerT and partially TowerBrother. You might also use a full operating system distribution, usually it's GNU Linux and with a stable distribution like Debian or CentOS, they would also do the work. They need to be very stable and not have major updates in the last time of a single release. You might want to record the actual package division so you can install them later. Another way to do that, I use VM. VM is great because it saves some trouble. You can always have the same user in a VM, the same hostname, the same network configuration, eventually the same CPU. Problems with VM is that they introduce new things that you might have hard time trusting. So it's a bit of a bootstrap problem, but they can make the whole process quite easier. You might be asking if anybody of you is doing software for OS 10 or Windows is how we deal with proper sharing operating systems, because trusting proprietary operating systems, so I'm all in for just not doing that and cross compiling. And so for Windows, we have two old tools, they're all in Debian, for example, like Minwi, Minw, W64, can build Windows binary on Unix system. We have the NCIS that can create installer also from Debian, and that's actually how we create the bootstrap for the Debian installer that you can run on Windows to install Debian from Windows without any CD or USB stick or whatever. For Mac OS this, it used to be quite complicated. You can look at the Bitcoin documentation, and now it became pretty straightforward. Thanks to the work done by Ray Donnelly, you will need to use a non-registible part though, that's not that great, but it's provided by Apple after a free registration. You need something from S-Code to give to your cross file engine, and you can also create the DMG, which are the usual format to distribute Mac OS 10. It's a bit weird, like you need three different tools and all, but probably if more people start to do cross compiling like that, things will improve, I'm confident. Okay, so great, we now have like build environment that is defined, and that we can tell people to use how we tell them, how we distribute that environment. So one way is go all Mac file. This is how it's done for Coreboot, you have to type Mac cross GCC at the beginning of the build, and it will build like binocels and GCC, and everything for the right architectures for the firmware you're trying to build, and so it does that by just to make file, which Donnelly's the tools and archives compare reference checks to see if the file is right, and then build a thing and set it up. Problem is that it's like volatile inputs, so I told you no volatile inputs are better, so one way is just to check in everything in your version control system. This is the approach used by FreeBSD, when you type make world, it will start to build the compiler first, and then rebuild everything in the base software using that compiler that it just built. And that's also the approach used internally by Google, and if you want to make absolutely sure that absolutely every tool is checked in, then you can use stand box mechanism so nothing from the system actually get exposed except something that has been compiled. This is what the tool that Google recently open source Bazel does, but problem is that it works for FreeBSD because it's a monolithic system, it works for Google because, well, they have internal processes, you can't really ask everybody, like everyone, to always download every, like GCC 70 times to build tools in their daily workflow, so it might not be the best. One little ground is how OpenDubalachi does it, is that they actually make the tool chain a build product as well, and they distribute it, and so you can download it from the same place, you download the router image, and then rebuild the package because you just have the right compiler for your system. It means that the tool chain become a big project as well, so it has to be reproducible in itself, that would be great, but then for other users, it becomes easier to actually rebuild a single package. Another tool that can be used is Gitian, that is used by Bitcoin and Torbruser, and it drives an LXC container on a Linux KVM virtual machine, and its input is called descriptors, and they basically, basically a base distribution, some packages that have to be installed, Git remotes that need to be fetched on which tags, other input files, and then a build script, and it will start the VM, fetch everything, cut the network, run the script. Another tool, like problem is that setting up LXC or KVM might be complicated, and some people say they provide a very way to set up containers, and that's Docker, and actually, it's a way to build a system image and a way to run applications inside, that will be specified using Docker files, and so the example I'm showing here is from a tool that is used by Docker for many Docker files, it's called Gosu, and it's built like that, and the test I made is actually reproducible because it's fetching always the same reference image from the Golang project, that will have always the same version of the compiler that is like 1.4 here, and then it will copy the source and run the build, and then you can just like get it, and it will always be the same. Docker has an interesting feature though, is that you can actually specify instead of here Golang 1.4 that dash cross, you can also specify hash of the Docker image as an address, and if you do that you will have the guarantee to always get the same tool saying, because always be the same version of the system image. Problem is that how you trust the Docker image, and I read in the Bazel documentation that they actually know how to build Docker image reproducibly, so I haven't tested it, but it could be investigated. Another tool I can like Docker, but it's more cross-platforms, it's Vagrant, it drives virtual box, so you can also build in a control environment and script it with Ruby and set the thing up. For Debian we actually went another way, we decided to record the environment where the initial build was made, in a new control file that is going to be called build-info properly, and basically tying the same file, the sources that have been used, the generated binaries and the packages that have been used to perform the build, and so we can then take the list of packages in the version and reinstall exactly the same environment in a shoot or VM or whatever, and we can do that because Debian provides a service that is called Snapshot, and that archives every single version of every binary package that gets into Debian archives at some point, it's a huge repository, it's amazing. Here's an example of this info, so you can see that the build architecture, the path, then check some for the source, that's the .gse, check some of the binaries, that's the .dev, and the list of packages. It's not there, it's not in the Debian mirror yet, hopefully it will be soon, and then we'll have a simple script, srebuild, you will give the build-info and we'll do the magic and see if the .dev actually are the same. So it's been two years we've been working on this thing in Debian now, so I have a few more tips to you on how you actually implement this. You don't want the users, when they try to rebuild your software, to detect that there are changes that are not related to something bad, but just like something in the environment being recorded, and so if users are the one who find this problem, you're going to have a lot of false alarms, so you don't want to do that, you want to actually test that it build reproducibility in many different environments before, and so the basic idea we're going to be using in Debian is that we build the first time, we keep the result, we change various stuff in the environment, build another time, and then compare the results, and we set up a continuous test system driven by Jenkins, this is the work of Holger Liveden, and also now helped by Matarizolo. They set up this continuous test system, it's driven by Jenkins, it's huge thanks to our profit bricks, because this is a crazy machine that can test 1,300 Debian packages a day, it's like actually building every package twice, and then comparing the results, that's what I'm talking about, so serious hospital. And so the results are then put in a database in the website where you can see, and we can notify maintainers, it's beautiful, and it's been extended lately to new projects like Carboot, and that's how Carboot fixed all the things, because they can actually track that it was working right, open WRT, and we're starting to experiment with FreeBSD and NetBSD, and maybe your project, so if you want to, you can find Holger at the camp and ask him, it would probably help. He's there. And so just so you can have an idea, that's all the variations that we do on the system between the first build and the second build. So hostname, domain name, the time zone, if you see the time zone is more than 24 hours apart, so we detect date changes like that. The language, the main local, the username, the UID, the GID, the name space is different, the kernel version is different, the UMask is not the same. And for now, the CPU, the date, and the time is actually mostly the same. But we're actually, we now recently are able to have slave hosts, and so we can add finally these very last variations, and I hope we will have everything. But time will tell. And I want to make the talk too much about Debian, but just so you see that it's actually working, is that we are now at 83% of testing reproducible. But to get you a more accurate view of the progress, every single day, we produce new patches, and every single day, some of these patches get integrated by the Debian maintainers. So we are actually having more and more software fixed, which is amazing. And also we, that's, I believe that's because we also enable maintainers and us to spot the differences between these two builds and understand. And so we wrote this software called DefuseCup. And the idea is that it examines differences in depth. And so it outputs HTML or plain text. And to do that, it recursively unpacks archive. Because you don't really want to compare two different Tata.gz, that's meaningless. Because it's compressed, and so we'll get different outputs if any of the files that are in these archives are different. So you want to get to the bottom of this. So it will recursively impact every archive it's found in every archives. And it will also seek human readability. So like using PDF to text or SNG or message and format, we'll try to get the binary things to a human readable version and then compare these ones. It's easy to extend to new file formats. It's been designed for that. And if there's anything like it can't figure out in the human readable version, then it will fall back to binary comparison. That's HTML output. But you've seen some of them already. That's the text output. Three-to-three. Because it does things recursively. Another tool that we come up with, DBN, is called strip non-determinism, which normalizes various file formats like AR archives, like study libraries, GZ, JAW, Java.html, and even pump property, PNG, zip. That's the example I gave earlier. It's written in the same language as the DBN packages. So we don't have a new dependency to do cross builds. And so thanks, Andrew, for leaving that project. A couple of resources before I'm over. We're writing an auto. And this talk is mostly what I would like to see in that auto and maybe extended. We want to see how it works. So your feedback, your experience, whatever should be in that document somehow, please contribute. That would be awesome. It's a very early stage right now, but it has some hopefully some future. We have the DBN reposible build wiki, which is a bit more... It's a wiki. It's a bit chaotic. And also more like targeted at DBN developers. But you might still find interesting information. We also kept a lot of what's happening on the reposible front everywhere. So a lot of press, other projects, links, references. And one last thing I wanted to talk about is David A. Willow-Wacht, which is called Diverse Double Compilation. So because, well, every time I talk about reposible build, someone is coming like, hmm, how can you be sure that the compiler has not been backdoored, the compiler that you use to build all these things. And because the next time you build a compiler, then it starts to backdoor, and then you can't detect it. And that's called the trust attack, and it's called Ken Thompson. And that was in the sonnet document also mentioned. And so David A. Willow refined and also did a formal proof to answer the question where the compiler is backdoored or not. And it's called Diverse Double Compilation. And to try to sum it up very quickly, you need two compilers, one that you trust, one that you want to actually test. And then you build another compiler with one and the other compiler. And then with the result in compiler of both, you rebuild the same compiler again. And then you check the output. And if they are different, then something has gone wrong, but they are the same. You have a reasonable assumption that actually the compiler that you were testing is not backdoored. And to do that, you need to be able to actually have these things saying, did these big build products of this compiler are matching or not? And to do that, you need reproducible builds. So we actually are very complementary project and fixing the problem that David faced. And also when we have enough thing reproducible, then we can start doing Diverse Compilation on everything. And we can do that. The world is actually not backdoored at all. So I'm done. I certainly hope that this short lecture will make you want to provide reproducible builds in your own project or in other projects. As I said, I really think this should become the norm. Really. Thank you for listening. I'll be happy to take care of questions if we have some questions. One last thing. Thank you. Everyone involved in the DBM reproducible team, you are so awesome. You are making my every day of my life for the past six months. I wonder. Thank you. Thank you, Luna. We have a little bit of time for questions and answers. There are two microphones in the middle of the lines. Please stay seated until the end of the questions answers, if there are questions answers. And at least I enjoyed the talk. So if you have questions, line up at the microphones. Until everybody stands there, again, please a nice applause for Luna. Any questions? Is there somebody who has a question? If you want to also work on these things, you can come find me, Holger. I'm on the phone book at the camp. I would love to help you make your project reproducible. Yeah. So then one other thing to say. Oh, there's a question then. We take this question first. Yeah. So I'm glad to see this. We're getting my more mainstream. I also think that reproducible builds should be everywhere and be the norm. I'm wondering if you know about the Nix. Nix project, NixOS? I know about Nix. I'm trying to get these people to talk for the past month. I hope we can meet at the camp. Okay. So in particular, when we say we should keep the version numbers of the tools that we use to build, in a way, Nix will only use reproducible builds for these tools and keep somewhat the hash of this. So then you get the full chain of all the tools, which I guess improves a bit the food. Now, I think Nix and Griggs are super-right infrastructure to do reproducible builds, but right now they are not as far as I know reproducible because they are not doing the work that we are doing and fixing all the time stamps, all the issues that crept in software. Yeah. But we should welcome that, yeah. Thank you. Thank you. So there was another question on the other side. Yeah. First, thank you for giving us a very important comment. I was wondering if there any work on app to make it provide multiple synonyms so that you can make sure that you have the steps in one way. So we have a proposal in the Wiki, how to do that in the Debian archive. We have no comment from the Debian STP master yet. So we don't know what they think about it. Thankfully, next week is DebConf. Right now is also DebConf. And so next week we will know hopefully a lot more about how they feel about it, but I really do want that my main idea for Debian is that right now in Debian when you upload a package, you upload the source but you also upload a binary that you've built on your own machine. And what I would love to see is that this binary is not uploaded anymore, just the checksum. And then it's built by one of the Debian like build system. And then only if these two checksum are the same then the thing gets into the archive. That would be the thing I would love to see when not there yet but maybe in a year, maybe in two years, we'll see. Then there is the next question over there. You're doing work mainly in software. I'm just wondering to what extent reproducible build has been looked at in the context of firmware for FPGA and things of this nature. The problem being that a lot of the FPGA tools because they use techniques like simulated annealing tend not to be reproducible even if you do two builds right after on the same machine. You're part of a different bit stream. So it's actually a problem and I'm glad you are raising it. I don't know anybody who's worked on it. So please pick the task up. Think about things. Unfortunately, all of the tools are closed source. That's difficult. I guess, yeah. Then I think we have two more questions and if I see this correctly, at least your question now. Hello. Thanks for your effort. I tried to set up my own little build environment with a dedicated machine about two months ago and I got stuck at the point where I said you need to install Jenkins. Jenkins installed, but after that I didn't know any how-to and how to move on from there so that I can get my small little world program or something slightly bigger just to get a DBM package produced that is built in a way that is reproducible technically. So for DBM right now you need our experimental tool chain. There are patches to our GPKG for example that are not yet merged into the main GPKG but the Jenkins installation we have is everything in Git so you can actually look at that source code but I don't think there is actually proper set up your own how-to yet. Okay, that's missing. Please work on it. Thank you. Yes, I think that were our questions and that's great because we are just through our talk from the time and I think it was so great that you can get another applause. Thank you.