 Hello everybody, this is the Reconservable status update, we will be 7 people here, so this is Valerie, I guess you know most people, but some of you don't know them, this is Valerie, Chris, Maria, Simon, Vagrant, Steven, and I'm Heuger, and we are way more people who worked on this, I guess many people are missing on these slides, so if you know a name missing, please tell us, we want to fix the slides afterwards and put credit where credit is due. Okay, what is... Okay, so I'll give a brief introduction to reproducible bills, probably most of you or almost all of you know what it is, but I just want to make sure everyone has a nice starting point. So what is the goal of reproducible bills? The first goal or the goal, the motivating goal is really to prove that binaries came from the source code that we think that they came from. And why do we want to prove this? Well, I think we all like free and open source software because we can trust what the programs that we're running have been produced by a community that we trust, we can inspect the code if we want to, we can change the code if it's doing something that we don't like. But we don't generally download the source code when we want a new program, and we especially don't download the source code when we're a user that is not also a programmer and said we download the binaries. And the binaries could have been compiled not by a friendly Debian developer but by a malicious actor who compiled it not from the source code that we think it came from, but some modified source code. Or it could have been compiled by a Debian developer who do love and trust, but their compiler is compromised and added some malicious code to the program anyway. And if you don't believe me that this happens, you can watch the beginning of any other reproducible bills talk that starts off with lots of exciting notes and scary things that can end up on your computer. So, how do we achieve this goal of reproducible bills? How do we prove that some binary that we've downloaded came from the source code that we think it came from? To do that really the only way is to take the source code and recompile it and see if it matches the binary you just downloaded. But in general you can't do that for a lot of reasons. You can recompile the source code but get a completely different binary. That was the norm of the software world for many years. I'm sure you're all aware. So, in Debian to achieve this goal of reproducible bills is generally two branches of work, or I like to think of it as two branches of work. The first is the compilation of the binary program should be deterministic and it should be random. There should be no randomness every time we compile. But it also should depend on things that change constantly like the time of the build or the host name of the machine that you're building on. So that's one area of work. The second is the build environment for any binary program or any binary package should be reproducible as well. If you have a binary you need to know some things about the build environment to reconstruct it exactly and in general we pass around binaries but we don't know a lot of important information about the build environment. So this is solved by the build info file which you've probably heard of and other work in that way. So, yeah? Yeah, so the first rebuild of Debian was done in 2013 with a rebuild name comparing whether packages were reproducible. There we were at 24% and four years later at 94%. It was pretty nice. While this looks very nice, first we are at 88% instead of we're varying the build part, but the other part actually that the tools to verify this in practice and but just on the random test side, these tools are mostly not existent yet. The workflows are not existing and the build info files are only accessible to the Debian developers. So we are not at 95% but rather at 50% or something. If you want to check the progress, there's test reproducible builds org which was previously known as reproducible. Debian that was still available under this name. There you find all these numbers, the tests, the statistics. If you haven't looked at it, there you'll find everything. So you don't necessarily need a tin full of data to care about reproducible builds. Say you can or you feel like you can trust your tool chain perhaps internally to a company perfectly. There are still some benefits to adopting reproducible builds in practice. I'll just demonstrate this through a few bugs that we've found. Here's one security bug. During the build, what it would do is generate a fairly random string that it dumps it in under user share as you get under the open ID into your secret. That's the random ID there. Every installation therefore shares the same secret even though different builds will have different secrets. So Debian was shipping the same private key to a lot of users and I'm a security expert but that's probably not very secure. That's in G-browns. Then you get to sort of other just random weird behavior. So what we were seeing is this manual page here. Sometimes it said usage off and then sometimes we compiled it. It would say should you throw it over that under user share. And this happened in a lot of band pages. What did you do with Dr. Book-Germain? Chris West-Forks found the coding question using Memcopy. I see people are already spoilering the joke. The Memcopy man page rather explicitly says the memory errors may not overlap. And the reason it turned into an O is that you can check it out. Yes, you just change Memcopy to Memmove or it goes away. Although I needed to adopt that package and upload it as Dr. Book-Germain. You also get some other silly things. Here's a package that fails to build from source, 0.46 for the time. It's because the test suite, it's a Python test suite, test suite to generate a type thing, generating test cases. And this is like a random number. It's testing the random number generator. So it's a regular bit. Dodgy as the deterministic test. So 0.46 at the end of the time, it will generate a list that doesn't contain all of the elements because it will generate a, a, a, a, a, b or something like that in its test. Details don't matter. And you could do the nice Commodore romance. And yeah, that's Python parsed. It's very fun. And we managed to find that too. We just said I'm making it usable. Cool, thank you. Okay, so now we'll talk about some updates from the last year for re-produceable builds. We had a re-produceable build summit in Berlin in December of 2016. And who attended? Not just Debbie and developers, but all of these various projects had people who attended and I probably miss them. For some reason we never made a complete list, but this is everyone that I remember. So we worked together for three days, collaborated on various, like what does re-produceable build mean for all of these various projects? And what can they get out of it? How can we work together? So one thing that we did that, that we're all proud of, I guess, is that we made a definition like what, what does it mean for a build to be re-produceable? So, well, I'll read it. A build is re-produceable if given the same source code, build environment and build instructions. Any party can re-create bit by bit identical copies of all specific artifacts. And if you go to re-produceablebuilds.org, slash doc, slash definition, all of these like italic words are defined in more detail later on in that document. So that's cool. So now we all read all those projects. You can't disagree now. And other things that we did at the re-produceable build summit include talking about build-on profiles for RPM, and we did a lot of cross distro collaboration on things like upstream package fixes and stuff like that. That's great. I guess this would be my part. So I'm maintaining, apparently, up to 29 on-boards. And then we also added an entire ARM architecture on 64. Thanks to CodePink for donating some moonshot systems. Unfortunately, they're not running Devian, but in a lot of ways it's good to test another distribution to see if there are variations introduced by running the host on Ubuntu. So that's great. So we've got 29 boards wearing away. And some of them, newer ARM boards tend to be on ARM 64 hardware. And so we're experimenting with running a few builds with ARM HF Tarrou on ARM 64 hardware. And some of you may know that it has some quirky issues. But we'll be working on sorting those out. And I think DSA is working on similar problems. And I thought to chat more about that. A lot of projects. We also added, we got some more resources from Prophet Bricks for AMD 64 and i386 testing. So it's a lot faster than last year. And we're working on more collaboration with more projects. So one example, Bernhard Wiedemann from Ubuntu's user is very active now. He sends lots of patches upstream and also has set up a test environment to build users twice. And we will share the results and present our reproducible builds org. There's also the same coming from Geeks, asteroid and leader. But there's also other new projects like Intoto, Taze has been reproducible. So there's a lot of things which happen due to the summit in Berlin and Hamburg reason one. One thing we also started in Berlin was the build testing. So the 94% figure that Holger mentioned before, that is first testing while fixing the build path. If we run builds under different build paths, then the number drops down to about 79-80%. So it accounts for about 10-12%, maybe a bit more for packages that may or may not reproduce. So there are a few reasons why you might want to build under different paths, including while to strip out part of the prefix of your build path, including privacy running under parallelism and to make it compatible across different distros as they don't have the same sort of stable build path set up and to make it easy for users to build under their home directories. So there are lots of sources for this. By far the biggest source of this is debug info generated by GCC because GCC is used as the main C compiler. It does have an option to strip out the prefix of the build path. So all you're left with is the relative path under the source root. So we've been experimenting with this for a couple of years. There have been various hurdles. One thing was called DWAT producer, which is a debugging info field that records the compiler flags passed into the compiler. And if we give dash f debug prefix map to strip out the unreproducible prefix of the path, this unreproducible value will end up in DWAT producer in the final part. So some of it will be made reproducible, but then another part of it will be made unreproducible. So we send in the patch to GCC to strip out this flag and they accepted this patch. But now we've seen in other parts, in other build systems, they will unconditionally record C flags and so on into parts of the build output. So this is interesting, you know, things around and around. Golang recently, what Golang and Rusty have made progress recently, they have similar flags. We're sort of still exploring, starting off exploring this area. It remains to be seen whether they, you know, they're sort of chasing this flag around and around, will crop up in those build systems as well. To sort of stop this from happening, we introduced this new environment called build path research map. So it's very similar to the source state Epoch environment variable that we introduced to lots of different pieces of compiler's build systems and so on. So we wrote a spec for it and we have written a patch for GCC to do this. Based on our tests, we have managed to make 1,800 packages out of about 26,000 in-devion reproducible under different build paths. So that's about 6 to maybe 7% of the total. Because it's an environment variable, a lot of people are like a little bit uneasy at doing this because usually, you know, it's not a, it's not the, you shouldn't use environment variables too much. It's usually best if you're controlling a program directly to give command life flags directly to the program. So at the moment we're getting some resistance from GCC developers about this. The discussion is still ongoing but sort of, I think this particular environment variable is acceptable because it's not about giving extra information to GCC. It's sort of about taking away existing unreproducible information that it already has from the environment. So we'll see how that goes in the next few weeks. GoLang and Rusty, with these options, they're probably going to be in the next couple of months the first reproducible compiler step we'll see ever. So GoLang, I was hoping because I'm the maintainer of Rusty as well, I was hoping that Rusty would be first but GoLang kind of jumped in and had a little bit in the last few weeks. So now we'll talk a little bit about reproducibility tools. So re-protest, we have a tool called re-protest. It was written by a GSOC student last year. The idea is you can give it an arbitrary build command. It'll run it with like varying the time, the username and things like that, sometimes even the kernel personality and then get the output and run DevScope on it. So you can use this to test your build processes. You can also run it under like virtual container, so like an S2 or something. We're working on, at the moment it's very dev-specific, it relies on auto-package test code. We've been working on reducing the diff of it at the moment with duplication code for various historical reasons and then we're going to hopefully deduplicate it and then eventually make a district vendor. So we have someone working on getting it to work on Arch Linux, which I found. So next up is DevScope. DevScope is used when we have tested some things, whether they build reproducibly and we found out they're not. So they're different and we want to find out what exactly makes them different. And DevScope is good for it because it impacts archives and converts various formats to something that would be easy to read and it generates a nice output that you can read and find out what was the cause of unreproducibility. So that has been quite a lot of work on DevScope throughout this year and for one very important aspect is that it now works much, much better on huge files with huge differences, like GCC size big files and you can now control how detailed is the output so it won't take want it up all your memory for just displaying it. You can now save that by product of DevScope with Jzone file and then tweak around how the output looks and use it again. And there has been quite a lot of speed optimizations since Lamby who did it with Vitatales and so there has been very good speed improvements like from three hours to eight minutes and so there has been also a progress bar added to DevScope so you won't get bored while you wait for these savings and there also is ways to control how the DevScope behaves on different files what files are created excluded and what's the maximum depth of container and bugs so you have now more control over how it operates we have added some debugging, login new utilities so you can see what's going on better and we have added quite a lot of new formats that it displays and I see so it now converts them all to something readable, something displayable and it shows you nice output in the format you find usable most we also added output formats so you can use whatever you like there has been for HTML output there is visual compression of images of pictures that are part of packages so you can see what exactly is different if there are differences between them so I haven't mentioned a lot of things but the debugging DevScope goes pretty fast there has been a lot of small improvements in it we have all of these tools for analyzing or testing for reproducibility so where we need to go from here is to, in Debian at least actually prove that the packages we publish are reproducible and who has been able to reproduce them we are producing building profiles now describing the build environment of all binary depths being uploaded to Debian so it's possible to see now specifically which build machine which developer built the package and the environment in which the package was built what that means and this was the topic of my talk yesterday using these building profiles it is possible now to attempt to rebuild the official Debian packages byte for byte I expect this should work 94% of the time the remaining issues we've already observed in the Jenkins test setup those packages many of them do have patches available to make them reproducible where those have not been already applied by the maintainers I believe Shiman has started to make enemies against these packages starting with the most most essential security critical or necessary packages as part of the build process getting all of those reproducible this is something that individual maintainers can also help with be extra careful now that you're building Debian packages in a clean virtual build environment that's relevant for all packages if you build them on your own machine wherever possible consider soft only uploads and this will help ensure the archive stays clean make binary uploads illegal yes we should and that leads into policy up changes upcoming yeah we have this bug to be reproducible and I've spoken here with the policy team quite a lot about what needs to be in policy to make this real so we will say the packages should be reproducible for that we need to define reproducibility which Valerie has shown that we have this definition base already so we need to put this in there we need to explain that it's a very controlled environment so we don't expect that you can rebuild packages reproducibly everywhere but in a clean csh root with the right dependency installed we need to define that there's more that also the build pass needs to be the same the build options need to be the same and we also need to mention that for reproducing it will need build info files we need to be accessible and there will be new packages and well I hope this is doable to write this into a coherent text tomorrow even assuming that we all agree we as a debbie agree that we want this in policy the debbie is really ready for this now it should be a normal bug so if your package is unreproducible it's bug but this is where we want to go 1,300 packages which are failing at the moment but we had user share doc transition in policy when there were several hundred packages still having the documentation in user doc so I would like to ask you to raise your hands if you think we should have this in policy now we somebody really against that and says we should only have a policy when there's 200 packages left for something can also disagree by writing to the bug yeah we want to do it at the beginning of the cycle and it's short and Buster will not be reproducible I would still think that there will be essential packages which are not reproducible or maybe it's just the build essential but it will still be a buggy distribution but it will be a way less buggy distribution yeah and the other thing we need to work on is really user interfaces there's a bug for app with a patch which you can try out today and add it to your app the app will warn you if you install a package which is unreproducible um that we also need to fix sbuild and pbuilder that you can rebuild packages in that way and we need to distribute thread, info, fires this is all work still needs to be done so how you can help first thing, I mean join our lovely team we will smiley face and do this and as previously mentioned our team is much bigger than this across distribution etc so if you aren't necessarily a devian but please join our team very friendly etc as a package maintainer you can check out the packages open you page and there's a rep column which is next to the auto package test c i column which will tell you very quickly whether which of your packages are really reducible or not and then you can find out more details on tests.reproducible, hyphen build and all if you have any outstanding packages etc against your packages please imply them and upload them before we NME them and then push them upstream it's probably a bit easier for the maintainer to push packages upstream there's a really established relationship there you know or be accepted in between etc if you have some spare time please have a look at our outstanding toolchain issues so obviously fixing one package is very virtuous but to be divine you can attack the toolchain issues which will affect the reproducibility status of many packages at once in particular if you have Java experience as in on the Java compiler itself there's some interesting Java.issues that there are many around there there's quite a few in tech dvips and if you want to delve into the interesting source photograph there's an interesting bug that will break if you get to take a few weeks we have two ISE channels we have Debian reproducible and reproducible builds both on OFTC the website and the mailing list please come with any questions you have any help and finally we would like to ask our sponsors so the co-infrastructure initiatives sponsors the work of some of us like Red and German Chris and myself and Ed and Mattia and our funding is only running to the end of the year we need more funding next year so if that's one part to do and profit breaks and code things for the hardware plus Debian there's more RAM 64 what's coming and thank you all for your contributions we have time for questions first shameless plug next tomorrow in my presentation reproducible builds are very important for embedded development as well especially in updates but I'll expand on that later but the question is developers both in Debian and outside of Debian where can we go to find out how to make our software reproducible what are the best practices like best GCC compiler flags to be used and that kind of stuff reproducible builds.org slash doc so thanks for your talk are there any plans for source only updates to new no but this is FTP master territory because that's sort of the only part that's missing for doing source only updates uploads you can do source only uploads today not to new not to new, that's right that would be good it would be great to to somehow relax that or work it out so then we can just go completely source only yeah, that's definitely on the radar thanks to you all for this effort I have a specific question because you said in Diffoscope you are using some visual comparison of images how is this done image magic this can do some kind of say it's same similar magic I just outlined in the pixels that are different and making for jig animation that also shows the difference maybe I need to read the docs in your bug filing experience so far have you received pushback against reproducible build patches more reproducibility in general than if so, of what type so the most general response is cobwebs and tubbleweeds so that's at 80% of the time the bugs somewhat go unlanguished until many things may have seen some of them only a few maintainers have pushed back in a big way often to do with severity and things like that but in general, no that's mostly apathy at the moment and so I mean hopefully policy change will do that and NNUs typically motivates developers if I put it that way could we maybe have a room show of hands how acceptable would it be for you as maintainer if at the time of filing the bug with a patch they would immediately upload to delayed 7 yes good idea, yes so at the time they filed the bug with a patch for reproducibility issues they would immediately upload to delayed 7 is that a good thing or a bad thing good thing we have already announced we will be doing this with delayed 15 reduce that and if you don't like us to MMU please reply to the bug MMU would you say no so you mentioned that there are 1200 packages that are now reproducible and you sort of teased that there are some interesting things with graphics and other things are there other qualities that are common among these 1200 packages that are reproducible or are there either juicy stories you can tell us about them I think the common thing is that they are harder problems we looked at them and didn't find obvious things one thing we have done is categorized the areas of reproducibility so we will go through a different scope reports and say oh yeah this is caused by a build path or this is caused by graphics which means we can then rank these issues by sort of severity of impact so if one fixed this toolchain issue versus this toolchain issue you will make 33 packages reproducible so all this categorization is done already the number of packages that are uncategorized is actually quite small so about this scope can you please clarify what the current status is because I seem to remember a previous release cycle when it was kept out of testing because it wasn't going to be security supported or too fast moving target is that still the case or is there any stable interfaces that we can rely on in stable and so on I believe it was kicked out of testing because there was a security issue with live archive or something like that it wasn't to do with oh so that's what's on the difference no it was just like there was an RC bug in some dependency so they forgot okay thank you this is a little bit less of a question than it is a comment but the I want to just mention that this reproducible build effort this is like a huge effort across all of Debbie and it's touched the toolchain so the numerous packages is in the entire infrastructure and it's just been one of the best run global changes to Debbie and that I've ever seen in the time I've worked on the project I mean I was a little slow to come around to the merits of reproducible build by itself but even before I came around to that just how well the team has worked with the rest of the project made me want to help you even if I didn't wasn't on board entirely with the goal so thank you very much for like running an amazing project across Debbie thank you so much for that it seems like that's it thank you once again listen to us and I hope you'll hear from us soon thank you