 Hello. My name is Joshua Watt. I'm a software engineer at Garmin where I've been working for the past 11 years, and I've been using the Yocto project and opened and vetted since 2016. Today, I'm going to talk to you about hash equivalents and reproducible builds. So, what is hash equivalents? Hash equivalents is an attempt to speed up builds by detecting if differences to the inputs of a task actually change its output. If multiple task inputs result in the same output, that output can be cached and reused anytime any one of those inputs is seen instead of rebuilding from scratch. Hash equivalents was enabled by default for the Pocky reference distribution and the Yocto 3.0 codename Zeus release. And you can try it out for yourself by adding the fragment shown here to your local .comp file. To try and understand how hash equivalents accelerates builds, I'm going to give a very simplified example of some tasks in the bitbake run queue. First, we will examine how the run queue works when hash equivalence is not used. In this example, there are two recipes, A and B. And each recipe has two tasks, do configure and do populate sysroot. Dependencies are shown with blue arrows. So, for example, B do populate sysroot depends on B do configure. And B do configure depends on A do populate sysroot. Finally, simplified task caches are shown for each task. The task hash is a hash that includes all the metadata the task requires, such as bitbake variables and task code, as well as the task hash values for each dependent task. So, for example, the task hash value 222 for A do populate sysroot depends on the task hash value of 111 from A do configure. When given a configuration like this, bitbake will execute the tasks sequentially in dependency order. In this case, A do configure, A do populate sysroot, B do configure, B do populate sysroot. Certain tasks in the run queue have the property of being a state tasks. When these tasks execute, they save off their resulting output into an S state tarball, which is then put into the S state cache. This S state tarball encodes the name of the recipe, the name of the task, and the task hash that generated it. In our case, do populate sysroot is an S state task. So, when these two tasks execute, they generate the corresponding S state tarballs. Once S state has been populated, bitbake will use it in place of actually executing the tasks to accelerate the build. For example, if our run queue is executed again in a clean build environment that has access to the previous S state, the two do populate sysroot tasks will be restored by extracting the contents of the tarballs in lieu of actually running the tasks. Any tasks that are completely covered by S state tasks are skipped. In our case, that's the do configure tasks. Note that the S state tasks are still executed in dependency order, so bitbake will restore A do populate sysroot from S state before B do populate sysroot. Now we will examine what happens when the metadata for an upstream task changes. In our example, the metadata for A do configure is changed, and this in turn changes the task cache for that task from 111 to AAA. This has a trickle down effect. All downstream tasks from A do configure get new task caches because they depend on the value of the upstream task caches. In our case, that means that the entire run queue will be rerun. And notably, the do populate sysroot tasks can't be restored from S state because their task caches do not match the S state object task caches. If the changes to A do configure were trivial, this is obviously not ideal, since it is possible that the output from the downstream tasks will be the same even after bitbake reexecutes them. The goal of hash equivalence is to try and detect these types of trivial changes and allow bitbake to start restoring downstream tasks from S state instead of having to reexecute them. With that in mind, let's take a look at what the run queue looks like when hash equivalence is enabled. In this diagram, you can see a few new elements have been introduced. First, each task now has an additional hash called the unihash. Unless bitbake is told otherwise, a task's unihash is the same as its task hash. The unihash is important because it breaks the dependency between a task hash and its dependent upstream task caches. Now, instead of a task hash depending on the value of the upstream task hash, it depends on the value of the upstream unihash. So, in our example, the task hash of 222 for A do populate sysroot is calculated based on the value 111 from A do configures unihash. Note that since the unihash defaults to the task hash, there is no functional change in the run queue without outside influence to change the unihash values. In addition to the new hashes, the hash equivalence server is also shown. Execution with the hash equivalence server enabled, in this case, is pretty much the same as the traditional run queue case with one exception. When sstate tasks, such as do populate sysroot, execute, they calculate a new hash called the output hash based on their output, and then report the mapping of their output hash and their task hash to the hash equivalence server. The hash equivalence server stores this mapping in a database for later reference. Note that B do populate sysroot also records an out hash and task hash in the database, but it is omitted from this diagram for clarity. Now we can take a look at what happens when we make our trivial change to A do configure. Initially, this has the same trickle-down effect as before, where all downstream task hash values change. However, things start to get more interesting when A do populate sysroot re-executes. In this case, the change to A do configure was trivial, so A do populate sysroot generates the same output hash as before, and reports that to the hash equivalence server with its new task hash. The hash equivalence server now takes action. It sees that this output hash is equivalent to a previously recorded output hash, and reports back to bitbake that it should change the value of A do populate sysroot's unihash from BBB to 222. Since these two hashes are equivalent, bitbake makes this change, which then triggers all the downstream tasks to re-hash. Since the downstream tasks are dependent on the unihashes, the tasks in recipe B now hash to their previous values. Importantly, this means that B do populate sysroot can be restored from S state, skipping the re-execution of B do configure, and allowing any subsequent downstream tasks to also be restored from S state. In its most basic form, this is how hash equivalence works. It is important to keep in mind that only S state tasks will have their unihashes directly changed by the hash equivalence server, since they are the only ones that have persistent S state objects. Also, in actuality, there are a lot more complicated edge cases that hash equivalence has to deal with that I'm not covering here. Now, let's look at some of the types of recipe changes that hash equivalence helps accelerate. One of the most obvious cases where hash equivalence accelerates builds is when trivial changes are made to the recipe metadata. These changes could be anything from white space changes in task functions and variables to code changes in unused code paths to the ordering of items in order agnostic variables. In these cases, the recipe rebuilds and gets marked as equivalent by the hash equivalence server, which means that all downstream dependent tasks can be restored from S state instead of having to actually rebuild. When dealing with non-trivial changes that do cause recipes to rebuild and change, hash equivalence can also help to prevent some types of changes from propagating through the entire run queue unnecessarily. One such type of change is when updates are made to a shared library such as bug fixes, CVE fixes, or even minor API additions. When these types of changes occur, the affected recipes must rebuild and will be different than they were before. Additionally, all downstream dependent recipes must rebuild. However, these downstream rebuilds will get marked as equivalent to the build with the previous version of the library. This in turn means that any tasks downstream of them can be restored from previous S state without also having to rebuild. Similarly, changes to native tools might not change the output of dependent tasks, which cuts off the rebuilding after the first level of downstream dependence. The effectiveness of the hash equivalence server hinges on the ability of BitBake to calculate output hashes. The current mechanism that BitBake uses to calculate the hashes is a stable checksum of all the files and metadata, such as permissions and owners, that go in to a task's S state output. If you are curious what goes into the output hash calculations, BitBake dumps the contents into a Depsig file found in the recipe's temp directory alongside the task logs. BitBake also includes a reference implementation of the hash equivalence server, which is the one that it will start and stop automatically if BBHashServe is set to auto. By default, BitBake will talk to this server over a local Unix domain socket. If you would like to share a hash equivalent server between multiple build clients, BitBake can also be instructed to connect to a server over TCP using the host port syntax shown here. This allows multiple build clients to report equivalent hashes, which can lead to even better build acceleration, because once a single builder reports a task hash as equivalent, the other clients will be able to restore the same task hash from S state without having to rebuild it. It is important to note that it only makes sense to share a hash equivalence database in cases where you are also sharing S state, since hash equivalence is closely tied to the contents of the S state cache. Importantly, you do not want clients to report task hashes to the server unless they are also publishing their S state for others to consume. Otherwise, you can get into a situation where the hash equivalent server is telling BitBake to use task hashes that are not in the S state cache. If you find yourself needing to debug what BitBake is doing with hash equivalence or are just curious about what it's doing, you can enable targeted logging using BitBake's structured logging configuration. For example, if you write the JSON fragment shown here to a file and then reference that file with the bblogconfig variable in local.conf, you'll see lots of interesting and useful log messages related to hash equivalence when you do a build. That pretty well covers the current features of hash equivalence. There are a few outstanding features that would be nice to have and would make hash equivalence even more useful. The first major feature is the ability to have a read only or a read mostly hash equivalent server. This is primarily for the users who have a CI centered workflow where their CI jobs populate their S state caches and their developers consume them without sharing their own local S state caches back. In this case, you do not want developers to be reporting equivalent hashes since they might refer to local S state objects that are not present in the shared cache, but you still want them to be able to take advantage of the hash equivalence database that the CI jobs generated. A read only option for the hash equivalence client would be useful in this case. Another option is a read mostly implementation where clients report hashes but the server only records them if they happen to be equivalent to something the server already knows about. Finally, some additional tooling in regard to introspecting the hash equivalence database would also be quite useful. The database can grow quite large and trying to extract any specific data out of it can be difficult. Moving on from hash equivalence, now we are going to talk about reproducible builds. So why are reproducible builds important? Well, there are a few reasons. The first, most superficial and the main reason I started looking at them in the first place is that it improves hash equivalence. Since hash equivalence uses the output hashes to determine if tasks are equivalent, it is important that recipes build as reproducibly as possible so that they have the best chance of having the same output hash when built. The second reason is related to code archival. In many production environments, we do a build for our customer, then take the source code for that build and save it off to an archive somewhere in case we need to pull it out and reproduce that build again. When we do this, we want guarantees that when we pull that code back out, it will build the same thing we built when we put it in. So we want our builds to be reproducible. The final and probably most important reason for reproducible builds is validating your software supply chain. We need to have the ability to verify that the tool chains we use are not inserting backdoors or security vulnerabilities into our software. The only real way we can verify this is if our software builds in a reproducible manner so that we can tell if something is amiss. As it turns out, the Octo project and Open Embedded actually have a pretty strong case for preserving the software supply chain. As a function of the way the system is designed, the project builds up almost all the native tools it needs to build the target software internally from source. What this means practically is that you can start with a relatively small set of trusted host tools which will be used to compile the native tools from source. And then these native tools can be used to compile your target deliverables from source. You can now fairly easily trace back the software supply chain from your end target deliverables all the way back through your native tools to the host tools and have high confidence in the integrity and traceability of your tool chains. The part that's missing from all of this is that you need a way to verify that your starting host tools are not introducing vulnerabilities into your end target deliverables, which means you need your end target deliverables to build reproducibly. Once you have this in place, verification of the complete software supply chain is possible by comparing the end target deliverables from an untrusted set of starting host tools with the end target deliverables from a trusted set of host tools. So for all of these reasons, we want reproducible builds in the Yocto project. To do this, the project has started regularly running a quality assurance test to validate that recipes are written in such a way that they build reproducibly. This QA test was first implemented in the 3.0 codename Zeus release and ensures that at least a subset of packages produced from OE Core are binary reproducible. This test is run fairly regularly on the Yocto project auto builder as part of new patch validation and release testing to ensure that no regressions are introduced in the set of currently tested recipes. The QA test works by doing two builds, creatively called the A build and the B build, and then compares them. The A build is allowed to build using S state if possible, but the B build always builds from scratch, although it will use a download cache if you have one. Currently, the test builds the image recipes core image minimal, core image full command line, and core image seto, which gives good coverage over many commonly used recipes in OE Core. The test will build the IPK and Debian packages for all the recipes that go into these images, then compare the packages between the two builds to see if they differ. There are many things that can cause reproducibility problems, but one of the main ones that the QA test looks for is build paths being encoded in the build output. It can easily detect when this occurs because the A and the B builds are done with different build paths, so if any build path is encoded, it will cause the binary output to change, which will be flagged by the test. Another common source of reproducibility issues is timestamps. If a recipe encodes the current time in its output while building, it can cause it to not reproduce if it is run at a later date. The QA test partially tests for this by doing the A and B builds sequentially, meaning the timestamps will differ by at least a few minutes between each build. However, it is unlikely to detect if only the day, month, or year is encoded in the output, since they are unlikely to differ between the two builds. One of the last major sources of reproducibility issues that can occur in recipes is reliance on a specific build host. This can manifest in all sorts of insidious ways, and although the QA test doesn't have an explicit way to test for this, the OctoAutoBuilder does tend to discover these because it builds across a wide variety of hosts and allows the A build to pull from estate. If reproducibility is important at your site, I would highly recommend building in a known container or virtual machine to help eliminate this class of problems. Finally, when problems and reproducibility are found, the test generates easily-browsable HTML pages using the excellent Difascope tool, which aids in resolving issues quickly. I will now show you a demo of what the HTML output looks like when the test finds a package that is not reproducible. Here I have example Difascope output from a test build I did to show what reproducibility differences look like in the HTML output. As you can see here at the top, I've got the reproducible A and reproducible B builds that I'm comparing. If you drill down in here, you'll see directories for both the Debian and IPK packages, and then the differing package files here. If you open up these, you can see that Difascope actually understands how to even parse these elf files inside of this package, and so in this snack.so library, if we scroll down a little further here, you can see that this string right here, this DWAT compter, differs between the reproducible A and reproducible B builds because it's encoding the build output path. If you keep going down here, you can see it shows you all the differences in this elf file and even down into the strings here. There's the actual string that's different. So Difascope is very powerful. It's touted as being able to diff anything, and it is very good at showing good diffs for things. So if you go down here even into these other packages, you can see them, this one has the same problem where the build path has been encoded into the library. And the same is true for the IPKs also. They have the same problem. The QA test is designed to be easily extensible so that you can test your own images. In fact, the entire test suite is encapsulated in a single class which you can derive from in a simple Python file and override a few key variables to customize as shown here. You can then run the test using the always self-test command again shown here. While the reproducibility tests that we have today work very well for our purposes, there are many more features we would like to add to make it even more useful. The most important thing we can do is increase the number of recipes that the test covers. The next logical target in this endeavor is to build CoreImageSato SDK, which is CoreImageSato but also includes all of the tools required to build on the target system, like GCC and binutils. We are actually pretty close to being able to do this with the only holdout being the perf recipe, which has turned out to be quite resistant to building reproducibly. After the SDK images, we would like to move on to testing the world, which would cover recipes outside of OECore. You may have noticed earlier that RPM packages are not included in the set of tested package formats. This is primarily due to trouble we had with getting the package format itself to be reproducible. We would like to also fix this and properly test all packages as RPMs. We would also like to test the final root file system images that are produced, not just the package feeds that created them. This way, we know that the final root file system that gets flashed onto your device is also reproducible. We would also like to be able to test deployed items, other than packages and file system images. Since these two things do not necessarily cover all of the deliverables that may need to be reproducible, this might include things like the kernel and bootloader. Testing the native tools that are built for reproducibility would also be useful as it will further strengthen the software supply chain confidence when building. Currently, the OctoAutoBuilder is only testing the x86-64 architecture for reproducibility. It would be really nice to at least test AR-64 as another architecture, since that one is also quite commonly used. Finally, there are some tools that the reproduciblebuilds.org project recommends to further tease out reproducibility issues. The first is DisorderFS, which is a Fuse file system that changes the order that files are returned from the kernel file listing syscalls. This is useful for discovering where a build might depend on the order that the kernel reports files in a directory. The second tool is LibFakeTime, which can be used to fake the system time and get proper timestamp testing. Integrating this into the QA test so that the B build has a faked system time would be really useful, but getting it to interact correctly with the pseudo library that BitBake uses to fake root permissions might take some work. In conclusion, I've shown you how hash equivalence works and how it helps to reduce build times. I've also shown the reasons why we want reproducible builds and where the project currently sits with regard to them. I've also shown a number of ways that you can get involved if either of these two projects interest you. If you are interested in being active in the Yachto and open embedded community, there are many opportunities to be involved and provide input. You can contact the community in several ways. There are two primary IRC channels where discussion happens and lots of knowledgeable people are online to answer questions. The Yachto project also hosts a monthly technical meeting you can call into where a variety of technical topics are discussed. Finally, there is a weekly bug triage meeting where members of the community review the latest bug reports. With that, I would like to thank you for your time. If you would like to contact me after the presentation, my email addresses are listed here. Or you can chat live with me on IRC or the conference chat. Thank you. Are there any questions? Hi, hopefully you can all hear me and see me all right. Okay, so I have a few questions outstanding to answer. Any system settings that seem to interfere with reproducibility like having address space, randomization and things like that, I haven't, not that I'm aware of specifically, but like I said, the Yachto auto builder builds across a very wide variety of build hosts. And so it does tend to tease these things out. I don't know of anything specific like address space randomization, but I have seen other things. We have seen other things. And then does hash equivalence work when two recipes are providing the same shared library to another recipe? No. So you could theoretically, well, no, because it's based on the task caches and I highly doubt they'd have the same task cache. And we also, in the database that the hash equivalence tracks, it also tracks the recipe name because I wasn't quite confident that we could do it just on task cache. You should theoretically be able to do it just on task cache, but I had the recipe name and it also makes it easier to debug when you're looking through the database. So it matches on the recipe name also. So I don't think it would work in that case. I don't think you'd have one recipe generate something and then another recipe pull the previous one's estate or remarked as equivalent to the other one's estate. So I don't think that would work. Do you have any tips regarding increasing reproducibility using MSVC? Is Clang CL a better replacement? I don't know. Yeah, I don't know. I don't use MSVC all that much or Clang. Yeah, are there any questions? I know it takes a while sometimes for the questions to show up, so I'll sit here for a little bit and wait. If a developer wants to reproduce a build that has been reported by QA as a failure, how can the developer pull down the sources matching to the reported build in his or her local workspace? Yeah. I don't know if you're talking specifically about the Yachto Autobilder or in your own CI. So if it's your own CI, I don't know how you're doing your CI, so presumably you could pull down the same hash or just access the sources. But the reason for this, this is actually one of the major reasons that we use Difascope on the Autobilder. Because it can be kind of hard to get to the original state that the Autobilder used to reproduce that reproducibility problem. So Difascope helps a lot with that because you just get the HTML report and Difascope is really good at showing differences. It's an excellent project. That's why we implemented that on the Autobilder. It's really useful. Will two CI servers come up with the same hash for the same task? They should, given the same inputs. Yeah, if you're building from the same hash and everything, they should generate the same configuration. I should be able to get the same hash. It'd be the same thing as building two separate directories on your local PC. You should get the same task hash from the same input state. Can you comment on Universal and Uninative with respect to S-State? I'm not quite sure what you mean by that. If you could clarify, that would be helpful. Yeah, I'm not quite sure what you mean there. Do you know of any efforts to use the build reproducibility tools for other projects? Is there any intent, desire to make these exportable? A lot of reproducibility is actually driven by reproducible.org or reproduciblebuilds.org. We're just kind of using the things that they do. We have slightly different requirements for what we consider reproducible. For example, reproducible.org doesn't require different paths to build reproducibly because they just assume that you can build in the same build path all the time and we definitely don't have that that we can do. In some ways, we're a little stricter than they are, but they manage reproduciblebuilds.org. They maintain Difascope and DisorderFS and some other libraries for faking things. I'm not sure about libfaketime, but I know they use it. You should go to their website because they have a great interface and you can actually see the status of all the Debian packages that are reproducible on all the architectures. For a while there, they had even like Fedora was reporting back to them what packages they had that were reproducible. It's a great project and you should go check it out. Can you comment on Universal and Uninative with respect to S-State? What does it change when you inherit Uninative? Why do we need separate S-State for Universal? I don't know off the top of my head what the Universal one is. Uninative should help with reproducibility. Wait, you're talking about hash equivalence reproducibility. It should help because it gives you the... Either way, it should help because it gives you the same like G-Lib C across all your build platforms because it's kind of what Uninative does. It provides a consistent native like libraries and things that your native tools are linked against so that when you move to a different platform they're still using the same library. You can kind of think of it a little bit kind of like container light. I think Richard's probably going to be mad at me for saying that but it's kind of like a light, a really light container that only does the name... I want to use this specific library part of it and not any of the other namespacing things. Yeah, that's kind of what Uninative does. Can the LibOE self-test cases reside in a custom layer? Yes, this is actually a really little known feature of the OEQA test. But you can just drop that path in your layer and the OEQA test will find it because it searches those paths in all the layers that are present. So all you need to do is drop that file at that path in your layer and you can do that with any QA test. You can write your own QA test just by dropping them in those paths in your layer. And that makes it really easy to extend the OEQA test. So that's really cool. You should totally do that for extending the QA test to test your own things. And that's kind of why we chose to make the reproducibility class test the way it was so that you could just write one file and inherit from the base one and change a few variables and you're off to test your own images. Same hash if different building on different architectures. I'm not quite sure. I don't know if you're asking if you can get the same hash building on different architectures. You totally can if the architecture doesn't affect the estate output. So all arch, presumably most all arch packages are definitely going to. And that's one of the great things about hash equivalence is it's entirely based on the output. So we're not trying to enumerate, oh, these things affect the output. Instead we're saying, is the output the same? And then you can kind of work it back and try to say like, oh, here's the things that actually don't matter. And so that would be a really interesting thing to do at some point in the future would be to try to chase back and say, you know what, these variables, they don't do anything, right? They don't actually affect the output of anything. And I think that would be a really interesting thing to look at, just try to figure out, trace it back and say what variables actually affect the output of things. But yeah, so you definitely can get the same hash. And I would expect you do on all arch recipes. Let's see, okay. Surely the B build times are different anyway. Not sure. I don't know if that's talking about disorder FS. Yeah, the B build times are slow because they don't pull from S state and they build from scratch in a clean directory. There's a whole nother level of slow when you introduce a fuse file system. Yeah. I actually did get it working this week to use disorder FS because I was trying to track down a different issue. And it's slow. I have not yet tried a full build to see if it finds anything. Does the calculation of the hash depend on the architecture on which the hash is calculated? Like the native architecture, like if it's ARM or x86? For target recipes, I think the answer is definitely not. I don't know about native recipes. I would assume it has to because you can't restore a state from an x86 machine on an ARM host. Yeah, an ARM host. I don't exactly know. Does the... Someone was asking about bit ordering in the... I think it was at the output hash. Sorry. I need to find... I don't remember what the... Okay, just a second. I need to find the original question here. Bit identity of the output. Okay. So there's a question about whether reproducibility includes the bit identity of the output executables. I'm not sure. I suspect it would because it's just taking a SHA-1 or SHA-256, I forget what I did, some of the executable. So I believe that it would take that into effect, but I'm not sure. I guess maybe what I could say is if your bit identity is different... Because those calculations are done on your build host, right? So I guess if your bit ordering is different on your target than your build host, I would expect that to show up, but I'm not positive. I just don't know. Yeah, that's a good question. On Arch, there are no debug symbols provided for the packages, but it's fairly easy to rebuild them with the bug symbols. Have you ever had a chance to add the bug symbols after collecting a dump? I don't know if I understand the question. I'm sorry. I will say that for reproducible builds, we do check the debug packages also, and that actually ends up being one of the sources, one of the places where we have a lot of reproducibility problems is the debug packages, because the debug symbols oftentimes will encode file names and things that aren't necessarily encoded at the base package binaries, right? Just for like elf symbols or dwarf symbols and things like that. Yeah, so in Yocto, I think most if not all recipes are compiled with dash G already, so you should always be getting debug symbols. Yeah, you should always be getting the debug symbols. They're going to be in a separate package, and you can just install those later. I think you can turn it off and things, but yeah. Oh, yeah, what's with perf? So that was a really long rabbit hole that I went down for perf. I ended up having to first make it so that a bison could generate reproducible output so that then that reproducible output would go into perf properly, and then there's just a lot of assumptions that perf is made that you would want to run it out of the directory in which it was built. And so that because of those assumptions, it's got a lot of build paths and things encoded in it that need to be removed if you want to properly make it reproducible for different build paths. And so that's really most of the issues with it. It's just there's quite a few of them, but I'm not sure the best way to go about getting rid of them. Yeah, so if you're interested in perf, and I suspect it's not terribly hard to get rid of them. I don't really even know, do we need some configure switches or something to say, oh, don't look for Python and your build path and all that stuff. So if you're interested in that, let me know. Yeah. Okay, if there are... I don't see any more questions. I think they've kind of trickled off. So you can talk to me on Slack or IRC. I'd love to talk to you and I hope you guys have a good rest of the conference. All right, bye.