 Hello everybody and welcome to our talk Replusible Bullseye in practice with our friend Holger Lebson. Let me introduce him quickly. Holger is a Debian user since 1995 and a Debian developer since 2007. He is working in many teams and activities inside Debian like Debian Edu, Replusible Builds, LTS, Debian QA, as well as being the founder of the Debian Media team. So enjoy his talk. Hello, my name is Holger. Thanks for the introduction Jartan. I will speak about Replusible Builds in practice. This is actually a live talk and actually with a live audience of two people. Hello. The goal of the talk is to share and widen the understanding of the status of Replusible Bullseye, which I'll go to in detail. Jartan has covered already who I am. He didn't say that I'm located in Hamburg, but that's also least important. And I miss you lovely Debian people and I also miss a lot being a Debcon. Because this is not fine. This is really not fine. That's all wrong. And it is what it is some people say. And yeah, the pandemic is just a small crisis, way more concerned about the climate apocalypses. But I want to not go into this and just focus on the nice world of Replusible Builds. So Reproducible Builds. Small introduction. The problem is really, really small. I don't tell you anything. There's source code for free software available. And most people install pre-compiled binaries. And we have no idea whether this binary really comes from the source. And that is the problem of Replusible Builds. And that is the introduction I give you because we have given many talks. If you go to ReproducibleBuilds.org, there's a video section and you find many good introductions why Reproducible Builds are useful, what benefits there are and other things. I'll just concentrate on bullseye now on how to distribute and verify. Because we have so far recovered the part how to do Reproducible Builds, but we have not really distributed and verified them. And usually we had a debt con. We always had a section what happened since the last year and usually we did this together during debt camp and I just didn't have the time to it. And it's nice to do this part with other people. So this is missing in this talk. Again, if you need an introduction, ReproducibleBuilds.org is our page and there's many videos linked. There's papers linked. There's how to sling. There's lots of stuff there. So to my goals for wishes for today, to talk about the problem of CI versus Rebuilds, the issues with built-in for Stebien, or distributing built-in for files. Then we still have some thousand packages without built-in for files in bullseye. And I will show you what's wrong with deep rebuild, our tool to rebuild packages, and descripts, and some other issues. And this is all not even yet covering the user interfaces and the band using Reproducible as a user. And we need to talk about this too, but first we need to solve the other problems. So that's still on the horizon, but later. So first I'd like to share my frustration besides the state of the world. My other frustration is that I think I've been warned about next Stebien release will not be reproducible since some years. So again, bullseye will not be reproducible in practice. And bullseye will be released in a year or whenever. Not a bit more than, a bit less than a year, I think. Frozen in February or something. And unless we, you, me act now. But I show how this is difficult. So this slide is actually a slide from DepConf 19 in Brazil, where I said stretch was in theory, but not in practice. That was because at that time deep package just got into a sit and nothing was rebuilt or not much was rebuilt with it. So most packages were not reproducible. Boost, Buster, more or less the same. And now we are at bullseye and it doesn't look too good in my opinion. But the release is still far away and we haven't frozen yet. So there's still a chance to fix this. So this is thing from Hans Christofsch, a slide or a logo he once made for Foster. So I have lots of bugs in this talk and it's not meant to be finger pointing these bugs at the teams because it's all of us in the end that we need to fix these bugs. And I pointing this out because people think we reproducibility team can solve them and work on them and we work on them and we can supply patches but some things they are in teams where we cannot do that much. So I need to list these. Again, the list of topics and so I start with the first one. CI versus rebuilds. Debian or Debian is not wrong. Many people think that Debian is 93% reproducible and that is certainly a lie because these 93% reproducible are our CI results where we build packages twice and then compare them and say, oh yeah, you can rebuild this package. It's possible but we don't compare with what Debian is to use because we don't have any Debian infrastructure or reproducible build infrastructure where we rebuild packages and compare them. Yeah, they build but they don't rebuild and compare against the stuff. The New York University has a proof of concept but they are rebuilding actually the Jenkins build and I made a prototype on Jenkins Debian net using Depp rebuild but I got stuck there because of all the bugs in Depp rebuild and Arch Linux has a rebuild ID which can rebuild Arch Linux and because of the issues I list, the rebuild ID does not support rebuilding Debian. We also want to do it but see the issue number four of his Gitterlap project and it's the same issues I described. And also rebuild ID is written in Rust which is a good idea because Rust is a good language but on the other hand all the Debian infrastructure is written in Python and Perl so if you want to verify Debian, you need to rely on Python and Perl anyway and at the moment we don't need to rely on Rust so to put Rust as a core piece of the rebuilder thing we need to rely and trust Rust as well so I think probably it's better to do in Python but that's the detail. And we've also discussed or thought that it would be worthwhile to have the rebuilders as part of the official build network but that is so far just an idea but it's an idea I wanted to share with the build people, people we want to work with you or I would like to hear your opinion. It's an idea we need to talk how it would be possible. I got to all these issues with rebuilders in a bit but first the issues with build info. So the idea is you have the source, you take a build info file which lists the exact depends which were used in the original build and then you do a rebuild and you get hopefully the same hash of the package and then you're happy. So there's build info files and at first we had build info deviannet and now we also have build info deviannet and we should clean this up but to explain the difference build info deviannet, the singular version allows submissions from everyone so Jenkins submits there, NYU submits there and devianresults are fed into it, the official builds so if you query a build info file from there you get lots of build info files and it's hard to get the right one. So I set up build info deviannet which just copies the build info files from FTP master and FTP master on FTP master they are stored in a build date structure so there's 2020 and there's 08 for August and there's 27 for today and then there's all the build info files for today so if you look for some build info file you need to know when the package was built and that is a bit cumbersome so there's also traditional pool structure where there's whatever pool and then there's D and then there's dash in there and then you get the dash build info files. That is nice but one unofficial service in my opinion there should be a devianorg machine serving these build info files that has started from last year so in 2019 there were almost a million files so by now there are probably over a million files and it's not really big it's 12 gigabyte files, 4 gigabyte links the issue is the number of files so you need iModes but it's still not a big amount of data but then this bug I worked around with a condo on my own but we really would like to have the build info file in the archive or a build info devianorg server but one of the tools officially devianorg officially distributing build info files currently devianorg doesn't do that they're just an ftp master and ftp master only devianorg developers can log in and get them out so they're not in public the next issue is that for security devianorg because that runs on a restricted host because of the what is this term again the embargo issues so not everybody can log in there and so the build logs and the build info files are not copied to ftp master they are at point releases but not during the time when you want to get the new security ideally you want to verify the security update before you install it and you cannot do it because they are only published every point release and the same is true now for LTS build info files because they are also on the security host so they don't get out and Stretch LTS now has a deep package which does support build info files so it would be worthwhile so some build info files are not there and they're in a vague structure so then last devconf in Brazil David Bremer came up with build info with yet another naming variation which is a nice Postgres database so then you can actually do query say you want whatever dash version 2 5 10 dash 2 for AMD 64 and then you can query which build info files it's still not that easy because a build info file can be for several architectures so it can contain whatever AMD 64 and binary arch all binaries and still not but it's okay you can query them for what but you also need to read with them but it's okay and this build info database from David needs a FTP master based directory structure so you can only use it if you have all this data on your host anyway that is all hmm so I had the idea of including the build info files as part of the binary packages and that is something Arch Linux has been implemented and is their information in Debian if you build a Debian source package you get the binary depth and you get a changes file and you get a build info file and so to rebuild you take the sources and the build info file and you create the binaries if the build info file were part of the binary package then they could not retain the checksum of the binary package because they modified so you would need the checksum yourself you would need to download the binary package extract the build info file rebuild it and then compare that to rebuild exact same thing and it would solve all these problems about distributing build info files and we wouldn't need a different machine and it would be quite straightforward and our initial design was to have them build info files and Arch Linux thought about it and decided no it's better to do it the other way around and that was like 2 or 3 years ago and I was bit against it because it was against our design principle but over time I didn't see real disadvantages and I see the huge advantage that the build info file is distributed nicely with the packages it's on all the murals there's no impact on the murals the binary packages they only become a bit smaller in size and gets compressed but they're easily mirrored because they are mirrored with all the packages if that's a security upload the package would be the build info file would be immediately available with the upload so I think that would be good right before this talk I went to send a mail to Guillem the deep package maintainer and asked him and he said that is difficult from the implementation side in deep package but maybe it needs another wrap up if we go this path I have not discussed this with the deep user build team but I think it's worth considering whether we want to switch as we need to solve all these other problems I don't know I'm really like feedback on this I'm happy to discuss this explain this thought more and yeah it's difficult with the current deep package design so and then we have thousands of packages without build info files in bullseye and it's mostly arch all packages because Evo from the release team thankfully scheduled a lot bin and amuse like when after the release of Buster like in the first week of bullseye he scheduled like 3,000 bin and amuse or something really nice but bin and amuse for arch all are not possible so the question really is shall we do must and amuse drifted with bit of I don't know 2 or 3,000 packages and I could do the for loop now and then upload this stuff but I'm not sure whether this is socially acceptable or technically socially I like to hear your opinion or what else shall we do it's around like 3,000 packages and 3,000 proper and amuse is really a lot of work could also say are they get kicked out of bullseye I don't know and again this is the bug for this so yeah this is also a question I like to hear your opinion on this and then we have depth rebuild which was written by Yosh it's part of the Descript package and it's written in Perl and it has some bugs so there's no man page and no help option then it can only deal with non sign built info files but most built info files nowadays are signed so you always need to remove the signature it's not a big deal but it should accept signed big build info packages then depth rebuild so it takes the build info file looks at the version of base packages and then okay I need to use a stretch or whatever bus does base and then install the packages but for some reason it's sometimes downgrade packages and downgrading is not supported and normally not done so I think it should really just not do it and there's more bugs for bin and emu use the bin and emu in Debian is a source package and then the change Debian change is modified and then this modified Debian change is thrown away so it's not part of the source package it's not thrown away but this modification of Debian change is put into the build info file so if you want to do a rebuild of a bin and emu you need to take the sources then take the build info file cut the part of the change lock and put it into the sources and then rebuild it and depth rebuild should do this if I give it a bin and emu a build info file for bin and emu it should see there's a Debian change lock entry add this to the existing sources change lock entry and then build it but and there's a small bug to create depth rebuild only creates a command line it doesn't actually do the rebuild it creates a command line for sbuild and this command line is wrong for bin and emu which is an easy fix but still fix needed depth rebuild also fails to download some packages from snapshot I'm not sure why because the packages are there more bugs than depth rebuild wishlist bugs now depth rebuild creates lots of output and if you want to feed the output into sbuild then you only want the sbuild relevant output so that's this bug then the command which is constructed uses lincean for sbuild and that is useless because we don't want to run lincean we know we want to rebuild this package so it should not use lincean and it doesn't actually explain how to use snapshot which I've done a prototype in the Jenkins script so I figured this all out but it would be nice if depth rebuild from depth script would explain this to you and more wishlist bugs depth rebuild expect you to have sbuild set up and it would be nice if it had a standalone or one shot mode where it would set up sbuild for you and then delete sbuild again because that is possible with sbuild but depth rebuild doesn't do it and for just reproducing one package that would be nice and sbuild doesn't download the sources so you need to download the source again and that is sometimes difficult and in the bit in for file there is the source so sbuild should do this as well and because we cannot at the moment we still have variations depending on the path so the approach now is just to rebuild in the same path and sbuild supports this and the bit in for file has the path but depth rebuild doesn't set this option so if you know Perl please have a look at depth rebuild like please and this is just me looking a bit at it because I'm sure there is more bugs than it and besides that it's working nicely except when it doesn't and at the moment with all these bugs I could reproduce like 5% of the packages or something and so that was not good and then there is more issues of course but there are now the minor ones or the minor ones this is annoying but that if you do a source upload it's called named AMD64 build info which causes problem with then the AMD64 build is coming and the security team is not happy about this then the build demons they use a tainted build environment because they have a file in user local and deep package now complains and says that's a tainted build environment and that's in the build info file and it just looks bad if the official builds have a tainted build environment and actually that's good progress in the bugs I'm quite hopeful with this one and in my opinion are still a very fragile concept and even if we solve this with modifying the change lock I think there's a better way to do it but that's further along the line and equally further along the line there's a bug in up that it should warn when packages are not reproducible but at the moment we can only do it based on CI results and not on rebuilder results so this bug is far away yeah and these are the issues um we are very happy that the release team changed the testing migration that it's blocked now for binary upload so you need to do a new source upload as it doesn't migrate to testing which was a nice way from the release team some people are unhappy about the new requirement which I think is just the case of needed improved tooling but in general we are very happy that the case in theory bullseye should not have any builds made on developer machines so that means we are guaranteed to have build info files and the same environment that is good um we also like the idea of accelerating testing migration for reproducible packages which is something the release team talked about, we talked about it last year in Brazil I wanted to remind people of the idea so not to punish people for unreproducible package but to give people a carrot for the reproducible packages and similarly for packages must be reproducible that's too early for WN policy but maybe it's time for packages must not regress if they were reproducible they must not become unreproducible we will need exception for that because we want security updates even if they are unreproducible or introduce unreproducibility but anyway we need rebuilders and all the stuff for that before we can have this and that this must not regress and forced um so to sum this all up um fixing that rebuild should be rather straightforward if you know Perl and if you have time and if you know the WN archive but there should be some people listening I know very basic Perl um distributing build info files is really hard on the one hand and on the other hand it's also crucial because without build info files we cannot have reproducible builds and if we don't have build info files for security updates yeah having we want unreproducible or yeah and then we need to discuss also how to do rebuilders and which rebuilders to trust and how many rebuilders we have but at first I want one rebuilder the Debian is rebuilding its own packages twice we're not doing this yet because of the bugs and depre build and the missing build info files so this is why I think I'm not so much just looking forward so much not forward um positively to this I am reproducible maybe we can fix this for now I would like to thank you for listening to this and for all your contributors and do you think reproducible builds should happen if so please take one of these bugs and help fix each of you fixes one bug as only 20 in this talk so if you fix one and you fix one and maybe I fix one we could make it hopefully this is the only other URL um wikidebian org slash reproducible builds where all these bugs in Debian are listed if you generally have one information about reproducible builds go to our project web page reproducible builds org which has lots of info if you just want the Debian bugs they are still on this wiki page there's a list of bugs thank you thanks a lot Holger so time to questions and answer us so let's start with the first one um it seems to me a bunch of the blockers for Debian to be reproducible are hard bugs like GCC bugs etc how can we as Debian developers work on that I mean it's hard to say how many Debian developers work on reproducible builds because many people like whatever 50 or 70 have worked on their packages or on their toolchain um and there's a bunch of people who constantly work on it I would say it's between 5 and 10 people okay um what could Debian do to have infrastructure to rebuild Debian packages well what could Debian do to have the infrastructure I think at first it's not the question of the Debian it doesn't matter whether it's a Debian net or Debian org machine and we have hardware but these bugs about the distributing built-in profiles is mostly the FTP team domain and it's running on FTP master um so it's only this group of people could do it or if we change the implementation and put the built-in profile into the binary packages that's something the de-package maintainer needs to do and so we are stuck here um yeah we're stuck here okay the next question if we were to not ship packages that are not for this one for both I will you get a work in distro no no there's some core packages um which are not reproducible I don't know which one but there's um we have 3,000 key packages and out of the these key packages which are not even complete distraught from the key packages something like 150 are not reproducible um but what we need to change is to change this we need to fix to both we need to fix the packages that they become reproducible but also we need to fix the infrastructure like what I said when there's a security update the split info file is not published until there's a point release so we need to fix both the infrastructure and the packages right the next one should we remove packages from testing that are from all binary uploads what was the question again yes um should we remove packages from testing that are from all binary uploads um no as long as they are uploaded with the build info file that's totally fine the problem is the problem with these arch all packages that we have packages which were arch all packages which were uploaded before the year 2016 and there was no build info support in the package and these packages from before 2016 these are the problem and there are 3000 of them 3000 source packages so and these cannot be been in a mute there need to be source for uploads for them so what I could do is I could create a list of these 3000 packages do a for loop download the sources do automatic change change of change of re upload for reproducible builds or whatever and do 3000 uploads but there would rely on gpg that I download the right source and stuff I would not but I'm not sure if I should do that if people tell me I should do that I'm happy to create this for loop that's very simple okay our next question is well it's a kind at the beginning of comments J for source only upload and thank you for you bringing up this point in the talk by the way is there any plans to allow for source only upload for the initial upload a new package or are there still too many technical issues to overcome for example routing to build.debian.org and uploading the results to the new Kiwi about this requirement that there needs to be binary uploads for the new Q I have no idea why this is the case I don't think it is needed but this is something which is not related to reproducible builds I'm not involved in running FTP master I don't know the next question how can newcomers help with reproducible builds project in Debian for one we have IRC channel and a mailing list so you can join there if you are we have thousand with patches in the BTS so you could also do a NMU campaign and do lots of uploads like there's more than thousand patches in packages which need uploads to Debian and then we also have lots of categorized issues where we know what kind of problem it is but we don't have the solution so you could try to find these solutions okay this is a comment from Andreas Stille I don't work inactively but I try to upload a package with a patch concerning reproducible builds in the next 24 hours another comment I could support now and a question should we consider checking reproducibility John a peer binary package basis rather than the source package to give a more fine game view of reproducibility what can you repeat the last part of the question again please yes of course should we consider checking reproducibility on a peer binary package basis rather than source package to give a more fine fine grained view of reproducibility I still didn't get it sorry I lost the pad if you just could point me to the pad again on IRC I will just look at the pad myself and see the question there can you put the pad URL on IRC on the whatever yeah that's content to talks thank you thank you fox for loading the clip what's the question which one from who that is the last one whether we should check the reproducible on a binary package basis instead of on a source package well we need to do the rebuild on a binary package base anyway cause a rebuild you need to look for binary package which was the build info file and for one like one source package on one architecture can be built on several architectures cause binary can produce binary any and all packages so the for whatever the I3 I386 package binary packages are built on I386 but the arch all packages are on AMD64 so you need to download both build info files and then do these builds so in a way we're not checking the reproducibility but we will check the binary packages on a not a source base and this also makes it difficult that we are we have source base view and the binary package space view but for rebuilding we need to use a different view which doesn't really it's more or less arbitrary where it's done so we need to look in the build info file to see which binary packages were built there and it can be different per package per upload per version let me see if we have more questions on I386 so I think these are all the comments and the questions so would you like to add something else well I I'll put at the slides to the talk page and I'm happy to discuss the slides or the content of the talk via mail or IRC I think I slightly prefer mail to the reproducibility list at the moment because it's the rest of summer and I might be a bit more outside the next days so mail might be better but yeah I'm very curious about your feedback especially from FTP team release team deep package maintainers people who know I would love to see patches for debris build and I'm glad I had the opportunity to share these with you and I hope we'll get a better reproducible bullseye than I feel thank you thank you very much for all your efforts and work in the reproducible builds Holger and for this wonderful talk thanks a lot