 Hello! This will be the talk about reproducible builds presented by Steven, Chris, Jartan and myself, Reuter. I just leave this stage now for Jartan. Hi everybody, so welcome to our talk. We will present reproducible Buster. Well, as you know, reproducible builds have the purpose to enable to anyone to reproduce identical binary packages from a given source. And our project goals are enable the insured builds have identical resources. And also we want to change the meaning of free software. It is only free software if it is reproducible. During the last months, we have given some talks like Bonhack, AllSystemGo, AllThings, Open, AllSep, HackMeet, FreeNode, Live, Kubacomp, OpenCompliance, Summit, linux.com.au, FastDem, Scale, Nilug, LibrePlanet, Easterhege, Minidefcom, Curitiba, FastNord, FlossUK and mentioned by several talks at 34C3. What we have new since Defcon 17 in Montreal. Okay, we have done the migration to Salsa. Also, we had the third reproducible build summit in Berlin and we have discussing about the logo and the voting and this is the new logo for our team, but not the final typeface and color. But we will soon have t-shirts, which we wanted for more than a year. Okay, so since Defcon in Montreal, a few things have happened. In JCC upstream, a patch was merged that is called macro prefix map. This, in our use case, refers to build paths and being embedded in the binary. In this small example, see source file, use of this file macro. If the compilation process calls the compiler with the full path to main.c, then the current working directory would end up embedded in the binary. So depending who builds the package, they will always get a different result to someone else. In the case of Debian builds, the build path is random every time. This opens a whole class of reproducibility issues in packages. But this new option in upstream GCC allows to do kind of a search replace of the path wherever it is used in macros and change it to something that is reproducible between different builds of the source file. We are talking with four people, so maybe. The problem with this build path issue is that for Buster, we just say we use the same build path as it was originally built to reproduce, because else we won't have these 93% we're having, but we would have something in the 80s, because this bug is not solved, not in GCC and in many other compilers. So we say for Buster, we say just use the same build path to reproduce than the original build, but this is bullshit. So that's why we don't want to do this, that's why people are dealing with these GCC build passes. But it's a long term goal, maybe not even bull's eye, I guess. It will take some time, because there's other compilers having the same issue. But we want to fix this properly, so that's why we're discussing this and then we put it away. So there was a bug report late last year where thanks to a package being reproducible, it was possible to test. If I drop a particular build dependency at build time and build the package, does it make any difference to the binary output? Because if it doesn't, then it doesn't need to be there as a build dependency. So this is one possible application of reproducible builds that wasn't really intended or much thought about, but some kind of automated QA could be possible there to find is a build dependency really necessary. Because if it's not, that could mean it's wrongly listed in the build dependencies, or it could mean that something has wrongly compiled and not used a build dependency and not enabled a feature that it should have enabled. So reproducible builds extends way beyond Debian itself. For example, the reproducible build summit brought together people from many open source projects. It's not at all limited to Debian. There's been a lot of interest and activity in other projects. Arch Linux announced they were able to, with a modified version of their package manager, reproduce at least 80% of their packages. OpenSUSA 93%, NetBSD and FreeBSD potentially 100% with the right configuration options for the build. Tales have gone even further with their 3.3 release and again in 3.6.1 they made the whole installation binary media bit for bit reproducible, which is sort of the ultimate goal. Similarly, OpenWRT can do this for at least some of the images they produce. And then there's plenty more interest in other projects that are looking to promote the fact that they support or enable or help with reproducible builds in some way. Shortly after the DEB Conf in Montreal, it was accepted into the policy that Debian packages should be built reproducibly, but that's not a hard requirement, of course. So what's still missing in Debian? I'm the one telling the bad news. So DI images are not reproducible and nobody is working on this. So if you want to get involved into reproducible builds and have some spare cycles, look at DI images why they are not reproducible. That is rather easy task, build DI twice, run DIFOScope on it and fix the problem. Choose Niels. And the other thing, there's also Debian CD images could also be made reproducible just like the Tales ISO because they will still contain unreproducible packages, but they are reproducibly assembled into the same images all the time. The other thing what we are not doing, we don't compare against packages from the archive. We just compare packages we built with packages we built, but not the real ones in Debian. That should be fixed. And that will help us detect when maintainers build in unclean environments. And then funding has impact and it's actually rather the lack of funding because some of us used to get paid by the core infrastructure initiative and that funding ran out at the end of last year. So since then we have less progress on new developments, stuff like the GCC stuff, stuff like comparing packages against the real archive. Lots of progress is slowing down. Some things even go backwards. Like it's not only Jenkins, it's 54 hosts running these tests and there's some issues on I386, there's some issues here and there and there's no time to fix those. There's impact on collaboration and community. The summit in Berlin needs to be prepared soon, like in two or three months and somebody needs to do it and I used to do it the last three times and I'm not sure if I have time or go working somewhere and make money. And we keep up the weekly block and the block about our success every week so people think, oh, this is going nicely, nicely, nicely and don't realize that it's going slower and slower and not so good. So we don't have funding, it would be interesting to get funding to extend this to keep the work done because it's still a lot of work. And then Debian is wrong. Other things wrongly. This 93% is a lie. We need infrastructure, we need processes and policies. Not only, the moment we have a policy it should be reproducible but there should be policies that must be reproducible, that the securities things have to be in place, these things. What we have at the moment is testing but we only have testing of SQA thing and we have a wake goal, packages should be reproducible but it's not a must so who cares. And there's an upcoming list of bugs and with that we don't want to finger point at teams but rather make it clear what's missing because only if it's clear that there's some things are not there we can fix them because if we think everything is fine we cannot fix them. And most of the things we cannot do alone even if we have an FTP team member on board he also cannot do everything and the FTP team and it must come from Debbie and I think you will see. So one of the major blockers is and it's unclear whether it's an S build or deep packet is that when you do a source only upload an AMD64 build info changes file is usually produced because you build on AMD64 that gets uploaded to the archive, changes it and then the binary build happens and another AMD64 build info file is created and that's uploaded and Dux has no and there could be several ways to solve the depot could just not upload these files or deep package could rename them differently or something needs to be happened there and there needs to be consensus on this bug first then we have this wonderful problem with bin NMUs and the M times and R sync so when a bin NMU is made a new Debbie and change log entry is created but it's not put in the source and then the package is built with this change log entry with the same source state epoch as before with getting different files and they are then not the same and then but the M time is the same because of the same source state epoch and everything has things of the same file it caused a backup problem and this probably needs to needs to redesign how bin NMUs are done so the bug I think is at the moment also assigned to FTP Debbie and org there was almost consensus on the bug and then the discussion restarted in the middle of two years ago I would say maybe read this bug and reply especially if you're involved in the build these infrastructure this also has I think problems for multi-arch so it's not only backup and reproducible but multi-arch is also broken with because of this and then we have a bunch of problems with build info files this bug is about putting the build info files to build info Debbie and net so that we have them outside FTP master and then we have to put them on Corsair I think so DDs can get the build info files but the general public cannot this is I think I spoke with Gannath yesterday because I knew he would leave today and in general Gannath said send patches we are sorry they merged 15 patches last week or this weekend so they are happy to merge patches and there's the stuff which does send bugs to the BTS sends information to the BTS so that the connection could probably be adopted to do this for the build info files then we also want to include the build info files in the archive so having them on build info Debbie and net is just a workaround but it's an easier workaround so we have these two bugs and you see that bug is way older and then we have the problem with security updates because security is an embargoed host so things are slightly different there but the bug also needs to be separated into this build info Debbie and net and in the archive but we need this to have reproducible security updates which we could have as a feature since stretch because the toolchain supports it but the infrastructure doesn't so this is another interesting bug and yeah stretch was the release reproducible in theory but not in practice we had the package the patches in there but we didn't rebuild the packages Buster is we should reproducible like a policy it should but we are not reproducible and Bull's eye is then maybe the release we still haven't made it I would be very sad if this becomes real so I'd rather see say Buster is still not released it's not even frozen there's half a year at least to work on these things so maybe we can make Buster a lot better for us I think we have plenty of time for questions we have two microphones we have questions do you think reproducible builds are useful do you think we can do better for Buster than I just made painted black no comments nobody wants a microphone so maybe more comment than question from your talk I understood first of manpower and some support and also question of community agreement that we should do this and everybody does their own part but also that we have some common goal like what we do we build info how we change animals and something like this is that correct or did I understood something I think that's correct yeah that sounds right so it's actually a lot more policy or more complex structural problems in the infrastructure so it's more that than individually working through individual packages fixing them question there's still a lot of work on individual packages as well we still have 500 non-applied patches there's 500 MMUs to be done so if you want to MMU packages 5 a day you can do so for the next 100 days if you just do one per week you're helping so please consider doing one per month yeah I was talking to you at Montreal someone may know I've maintained Thunderbird package and it's a huge package and I'm not able to get this package reproducible because of lack of time lack of knowledge there must be more than such packages how can we improve the situation on such a thing I haven't a really good idea to solve this I certainly get what you mean there's quite a few packages in that category and quite a few tool chains in Debian in that category that just like where do you start with something like Firefox, Thunderbird etc have you spoken to upstream at all because I think if you got perhaps buy in there and then you can't do anything and then they perhaps coordinate some of the effort between other distributions because part of the problem I see that face we have had luck with Mike working for Mozilla and Sylvester also but my communication has slightly going in a bad way with both versions because there is no communication currently I've asked both persons because of the new ESR versions and I got quite zero feedback it's quite difficult to get in contact with both and to ask I think Mozilla, especially with Thunderbird team is really a team that gets where you can probably get easy in contact and they are friendly to answer my questions but mostly these problems need to be done by the Firefox core team because especially with Thunderbird uses I think 90% of the code is Firefox but even they happy to get even the e-fail bug fixed in just in time for the rest is quite no personal power on this side here that highlights another problem because Firefox or rather tour browser was reproducible made reproducible in 2012 by the tour browser people and we had Firefox reproducible in our test setup then it was not reproducible again anymore and that's because Firefox the tool chain was not reproducible so we also once we achieved reproducibility we still need to test and make sure that it's continuous reproducibly and I think for these big packages upstream is this thing I don't think Debbie and maintainers can solve all upstream problems whether it be reproducible or others so sometimes it's just if you cannot do it then wait that somebody else does it and what we can do as a project we can have this list of important packages the base set the required set the CD set and then see okay we have 25,000 packages and 93% means there's 2,000 unreproducible packages but the basic install has only 2,000 packages and out of these are only 50 unreproducible so let's tackle the first 20 of those 50 first and then get there step by step by step I think you can do is perhaps help you convince upstream to make it a higher priority because sure they're going to work on e-fail first and then perhaps new features and like presumably they're not against reproducible builds I mean no one hears against the idea like who's for them, yeah sure I'd vote for heaven too but yeah if we can help move it up their priority list and make it a bit more of a concern for them I think that's something that we can help because that's sort of fairly generic information that would work for Firefox other bird latex whatever yeah I fully agree maybe we can build such a place where like all the Holger mansions we can collect such packages but can be problematic but as I said I think there must be more packages like Thunderbird which need interaction from upstream it's correct even if I have fixes we need to go upstream of course we have such a place where we collect these lists already there's package sets in our tests web pages where there are these sets like Gnome and the first CD and packages installed on Debbie and Org and packages here and there I think perhaps the notes git repository might be the other place that's more salient for particular one hey guys I don't know if anyone has thought about or talked about it before and we just like made a CI CD system where the package just goes mainstream if it's reproducible a what systems are continuous integration delivery like you have you upload a package with test if it's reproducible if it is not you don't go mainstream you know what I mean yes but that's you want to prevent packages from muting unstable when they are unreproducible yes but then we won't have Firefox and we probably and probably not the Linux kernel or maybe not unstable but maybe some packages if it's a Debian policy right now we can enforce it well at the moment it should so it's just a normal bug it's like yeah so I think we must get to mut we must get to policy say packages must be reproducible but even then the release team will override these bugs and say ah we want to release with Firefox let's ignore this RC bug and I'm thankful the release team will do that because I want Firefox I need Firefox or whatever so I think the bigger thing is that we could do that it's like technically feasible does that does it really help in the sense that we've all got buy-in on the problems like magic making it impossible to upload to SID if your package isn't reproducible doesn't really solve these infrastructure problems and that's like our real like blockers at the moment if you see what I mean so yeah I think no one's against having reproducible packages so I'm not really sure what it would help apart from having a a destroy without bash in the sense that we can if we upload only packages that are reproducible maybe the package in the archive will be reproducible by default you know what I mean we won't have to test against the archive I don't quite get it sorry yeah if you're proposing that we test against the archive instead of testing against our stuff yes we should do that but this is the part we had was like our funding our new progress slowed down it's on the to-do list since more than a year we can since December 2016 we can do it because the archive produces reproducible packages now but we never got along to implement this and that would still not prevent getting the packages in but it would give real results that would definitely better we would should do that it was something of an opt-in so that a package manager could say if when I upload my package is no longer reproducible then don't accept hi from the maintainer point of view what is the status of build tools with regards to reproducibility so do I build a package and I don't need to do anything to have a builder tell me if it's not reproducible or not yes or no do I need to go through hoops or do additional steps to get that information should we work in providing the maintainers I'm a maintainer maybe I don't have much idea of reproducibility but I don't even test it because it doesn't come by default even if it's not yet in if it's a shoot it's not a must testing for reproducibility will increase the build time at least by 50% and so that's why those tools don't do that's one of the reasons those tools don't do it by default like P builder also doesn't run PU parts by default and by why P builder can do that and it cannot do test for reproducibility at the moment we do have repotest in Debian which is also available outside Debian I think and repotest will build a package twice and if it's then unreproducible it will reduce the number of variations and build it again it will reduce variations until it finds a reproducible version hopefully and then it is able to tell you your package is unreproducible with time zone variation or with whatever the variation causes so it helps you to find the problem but sometimes it's not possible that this finds something that helps you and sometimes repotest because it also uses fake time packages are more likely to fail to build with repotest because of fake time being used not the tool also what we have in our testing infrastructure we have kind of maximized variations but there might be other variations in the wild so that is the one of the core central problems with reproducibility you can only prove the opposite you can only prove that something is not reproducible and you can only assume it is reproducible under a variety of environments until you find an environment where it's not reproducible anymore and so testing for reproducibility is not so easy so I also don't I also think because of this that I don't think that maintainers must test or should test the packages for reproducibility before uploading I think maintainers should test whether the package build and whether the package works out of scope usually for the normal uploader process particularly when you can check the status on tests.reproduciblebuilds.org as well so it takes like the urgency out of that in a sense see that it's a problem because if you fix the reproducible issues you want to confirm this and then you do an upload so that this infrastructure tests it it's kind of suboptimal particularly with the 1,500 I think it's more like 1,000 unapplied patches it's great and maintain you get a patch like well I'm just going to apply it I want to see what difference it makes and without uploading it it's quite difficult to tell if it's actually going to make your package reproducible because you don't necessarily have the tool locally to validate that so it's like well great I don't really want to just cowboy applying patches with Difascope you could check that if you had the image that worked having this that gave rise to the the stat on reproducible builds so would there be a way of saying please preserve the image because I'm just doing a build next time you build this and then you could push your built package at reproducible builds for another run through the tests you probably wouldn't even need the image right because if I send you a patch for your package I say please apply this once you apply this you should get a char of X if you get the same char then you've won it's probably your build environment yeah okay I will do that one of the problems or one example for the problem is that on our test infrastructure we build with German, French, English and Italian locale to test local variations but some packages fail to build with Hebrew or Arabic or Chinese or they produce different results when used whatever not those languages are fine but with Finnish it fails and so you can only fix it for that and then and we won't do all the builds with all local variations but especially locale are so confusing and complex that I'm sure there's some bugs we don't catch yet and that's a principal problem you said you had some funding issues so what kind of sponsors are you looking for let's say people are watching you and you want to tell the sponsors that it's important work I think we didn't prepare much for this question because we prepared for the Debian audience what we can fix for technical problems nothing really to add I mean for sponsors we would probably have to speak to their needs I mean because they would probably want to they're probably coming from a particular angle that they want to solve for their business whether that's to do with say a compliance angle or they want to have a security angle like you need to have probably different messages and you're sort of talking if you talk to like 10 people you're going to want 10 different images messages there because they just have different demands because just to say reproducible builds at them it's not really that effective well for Google it already made code build times lower quite that's just another example of another that's a third example of you might want to speak to Google to say oh we can make your coding time smaller for the you know hilarious XKCD reference but like for a big manufacturer of equipment you may want to say you really want this or a manufacturer of medical equipment will need a different message because they have a different use case for reproducible builds they don't care about build time but it's easier by seeing what change between two versions of your compiled code yeah I'm not disagreeing with the use cases I'm just saying that each potential sponsor will have their own different needs and so therefore you can't just you can't just like have a list of a hundred things that it's good for and let them do all the work in working out what fits them I think this was very good at the funding we had in the research funding and to a lot of degree it's a research project and many other projects benefit from our research and use it and it's hard to get funding to continue with the research it would be easy to get funding to make whatever write a patch for GCC because we need it then we could see this is the goal and fix this thing but in my experience it was way better this broad funding we had we wanted and caught way more issues with it and had a wider impact on the community also so research funding I think would be our preferred model is that all for questions okay good and thank you all for watching