 Good morning everyone, thanks for coming here so early. We're going to have a look at some build-in profiles and what we can try to do with them. This relates to the reproducible build project, which created the specification for build-in flow. So we, in the reproducible build project, have been looking for ways to reproduce Debian's packages from source code to values. The work in stretch has focused on removing potential causes of non-reproducibility between two successive builds. And that project went very far, managed to fix problems in 94% of Debian source packages. That's when, until we start considering additional potential causes of non-reproducibility, like the build path issue. But so far, the testing infrastructure hasn't focused on reproducing the actual binary packages that Debian publishes to its users. So that's kind of what I've been looking into now. The build-in profile describes the environment in which a .dev binary package was built, or many .devs were built from a single Debian source package. Now this serves under the useful aid in figuring out why a binary didn't match what you expected. This is generated. You may have seen, if you run S build, P builder, or in de-packed build package, you probably see not only a .changes file, but a .build-in profile is now generated. That also gets uploaded to the Debian archive if you use debug or whatever else people use. This build-in profile is describing any potential reasons that your binary package might end up different from when someone else builds it. We are archiving build-in profiles, not just the ones that are created on development machines, but each of the official Debian build-ins, build servers, creates the build-in profile when it finishes building. So we can compare the development build environment to the official build machines that you can build environment and try to understand if there are differences in the binary output and why that might be. That's a good aid to diverging the disability issues because once we know what the differences are, we can try to avoid those having an effect on the build-in binary package. Ideally, we want a package build to always produce the same output so that we can verify that it is trustworthy and tampered with. Some reasons packages are not reproducible. You've probably heard, if you've been to Holger or Chris has taught, all kinds of strange details specific to the build environment, like the hosting of the machine, which would use a build-in package. These are quite often embedded into a package. It causes it to be different when someone else builds it or on a different machine. There are things like time, which is always changing. If that is extended in the form of timestamps, different binary output on each build, randomness can creep into the build. Some file systems order files on this randomly and if you read them back, this directory contents, you might get files back in a different order and that could lead to things being built in a different order or linked in a different order and so forth. There are other causes of non-revisibility, which I categorize a little bit differently. Like build path, the current working directory when you build the source of the package. That's something that could potentially be fixed by building with the same path that the official Debian packages were built in. There are other variations like to these environment variables, depth build options and depth build profiles. If you change things in there, it is quite often expected that you get a different binary output. This is why we also want to record the values of these environment variables when we build packages. Of course, if you change the install build dependencies, like the compiler version used, it's quite often expected that you would get a different binary output. All of these things are included in the build info. What that leads to then, if we want to reproduce a binary, we simply take the source package and hopefully all the necessary information would be included in the build info that you take the source code, you follow the instructions given in the build info with the specification given in the build info file, and that's a recipe for reducing a remittable package every time. In theory, the build info looks something like this. I have only included the most relevant fields on this slide, if there are many more. So, as I said, the build path is there, the install build dependencies and their versions and some details of the environment. And, of course, here, well, here, we also include the hashes of the output binary packages within this file. The whole thing can then be signed and then the signature is an attestation that someone was able to build this package and these are the hashes of the binaries that were produced. So, to reproduce official Debian packages, if we have access to the build info file that was produced by the official build servers, the build info file contains the build path that was used. So, if the user was able to create the same path in their own build environment, then it should be the same across both builds and not lead to any variation. If you try to reproduce the package in a plain or up-to-date search route, it's quite possible that your install build dependencies will be different from the ones on the official build server because these are changing continuously in Debian and stable. So, it would be necessary to install additional build dependencies or even downgrade some to an older version in order to be quite sure that you will get the same output. So, my project this week has been writing automated tool for this second point to install build dependencies, old versions of build dependencies from, I've been using the Archive, snapshot.devian.org, which has an archive of all old versions of Debian packages that you would need. So, I will demonstrate the tool that I wrote. It's written in Perl because Perl already had available modules for passing the built-in file format and then after passing out the build dependencies I used the www.mechanized Perl module to connect to the snapshot.devian.org web service search for specific versions of build dependencies that we need so that it's running now it's making many queries to snapshot.devian.org, that's okay I could use a cache web proxy so that I could run this once before and it should remember the result. From the result of that it's constructed in app sources list with about 120 entries one for each old version of a package that we might need within the build environment. This is not the most efficient way to do it but I would say optimize your code for blasts. Updating your package caches with 120 entries in the list takes some time. Also each snapshot is a snapshot of the whole archive so it contains an index of all the packages in the archive 10 megabytes each so after downloading it has to calculate the package index there will be many duplicates because for example if a package exists in an old snapshot it has not been updated it will still exist in the new snapshot but my tool is performing a proof of search of every possible app source it is signed that it may need to be able to find all the versions of these packages. This is running as a hook within s build it's running app get like this installing all the build dependencies that would be needed and specifying the exact version with this equals version syntax. Allow downgrades on the command line because we are actually downgrading gcc to an older version if that's what was used to build the official package in the archive in more detail that this app sources this maybe maybe slightly alarming sometimes that sometimes the snapshots are from 2013 that means that the binary package needed to build new packages has nothing rebuilt since 2013 sometimes 2011 or even something older these are packages that were uploaded and never rebuilt in five plus years and these are still being used to build new packages so that could be a problem so this is still installing build dependencies now it's running the actual build so at this point the s build should be a pretty good replica of the state of the shoot on the build server when this binary package, this is a dash was originally built so if there are no causes unnecessary reproducibility in the package then we should be fairly confident of getting the same binary output at the end of the process it's finished building as I mentioned s build will automatically produce a build-in profile for you now so I can compare the build that I just completed to the original build-in file these are all archives now on an internal devian server so currently only devian developers have access to that but we should be able to make this public soon we're still figuring out the best way to do that so comparing these two build-in files this slide we have checksums for the output-.dev package and we can see that it was identical from the official build to my own build that I just executed now within the build-in profile the only difference is here the actual build-in when the .dev is built is recorded here but that's not recorded inside of the .dev so the .devs were identical the side-point I actually carried out that build with no checks so it would be faster but that didn't actually have any effect on the binary output that may be a way to speed up some of the reproducibility testing that we just don't run the test-read if it does lead to differences we could try running it again with the test-read to see if that does cause it to be reproducible again so as a result of writing that tool I because it takes so long I only run about 10 packages through this process so far but 9 out of 10 were reproducible so I managed to reproduce official .devs from the official archive and I got the same result when I built them on my own machine so I agree that .devian is probably at least 90% reproducible actually we have some old binary still in the archive these were not necessarily built on the final .devian machines but quite possibly built on the developer's own machine which is problematic for trustworthiness if a developer did not have a clean build environment or that she was compromised that .dev packages produced is still in the archive right now and being used to as part of the build process of new packages so I think we want to push the idea of rebuilding some older packages especially if it makes them become reproducible source only uploads have been possible for some time if you don't know what those are or didn't know it was possible you should look this up on the .devian rekey how to make a source only upload where you upload source to the .devian archive without uploading .devs that you have built yourself and then that ensures that it will be rebuilt in the fresh environment by the official build machines and upload .devs on your own machine and upload those and building for will save this and it has a pgp signature on it so we will know who you are there was one package uploaded I noticed and for some reason GCC 4.9 was building the build environment this was a package uploaded this year and GCC 4.9 is not gone I think that person may have a configuration problem with it's maybe not removing all packages so maybe I should contact that person and see what happened there I found one package that could not be reproduced even with the same build dependencies installed that should be looked into I think it was only a minor difference in a info documentation file but we should look into why that happened especially since the rebuild build test architecture did not did not detect a rebuild problem with that package came across some build-in profiles that were incomplete they missed pre-dependence of python but this these were quite old build-in profiles maybe the problem has been fixed already so looking ahead we would really like to rebuild this continually rebuilding the devian official packages and when those builds are carried out produce a build-in profile sign it it could already be submitted to the build-info.devian.net tool we can collect build-info's there and analyze them for example it would be really great for NGOs or states organizations such as the NIST or the BND in Germany could be reproducing devian packages signing the output and contributing that to an archive then if you find that the package has a signature from devian from multiple third parties then we are quite sure that the devian build process itself wasn't tampered with by looking at these signed build-in profiles we can see the software supply chain meaning how the soils get built and distributed to the user this probably ties in well with the next talk where I got some of the ideas for this basically FF has some kind of vulnerability when it's fetching with the .dev there is a possibility at the last moment before installing the .dev that we can use build-info's signatures as a second way to verify what is this .dev where did it come from did someone actually build it are there signatures from multiple parties to let us know we can trust this .dev before installing it so it has to compromise or if the archive queuing is compromised we have this independent way to authenticate that the package was actually built by a trusted machine or trusted for the party before we go ahead and install it at the time I will skip some of these other concepts do we have time now for any questions one question yeah, good back thanks for this talk we have some binary rebuilds but they all arched any packages right? yes we should really understand that yeah, that's true the tool that I've written for example does not handle arch all packages I concentrated that out of scope for the research I was doing but still a very important topic arch all packages are built once and then we use by many built in future across different architectures even but that also means we have more interesting ways to QA these packages we can try to cross build them from different architectures is a an arch all package is not binary code tied to a specific machine architecture so it should be possible to cross build it we could try to build this from an AMD 64 machine from an ARM 64 machine even something more exotic like a KVVST machine and we should in theory get the same output for an arch all package yeah, and in fact we don't really know if these packages are verb and sigri yes, that's also a problem sometimes the actual package is the only support being built on one specific architecture no, not the architectures version so if there is a binary package from 2013 in the arch that was built in 2013 against whatever, squeeze maybe or whatever we don't really have you tried actually taking one of these and rebuilding them against a recent sit yes, the official tests.arch for infrastructure is doing this and finding a lot of problems with building those packages so maybe that's already being done great I'll have to ask you today we are almost at the beginning of the next session ok, let's do that