 Hello everyone. I'm going to talk about peer-to-peer OS and Flatpak updates, which is something that we've been working on endless over the last year and a bit. I'll define some terminology first because these terms are a bit vague sometimes. Flatpak, you've probably all heard of, it's a Linux application sandboxing and distribution framework. For the purposes of this talk, we just need to know that it's basically something that uses OS Tree which provides things that users care about. We don't need to do any more details about how Flatpak works. The OS in this case is Endless OS, which is a Debian-based operating system. Again, you don't need to know any more details about it apart from the fact that it's based on OS Tree. OS Tree is very briefly git for operating system file trees, but I'll go into more details about what that means on the next slide. Peer-to-peer updates, we care about updating from other computers on the local network and from a USB stick that's been pre-prepared by someone else with updates for you. The overall goal is to be able to distribute system updates and to distribute new versions and new and different applications to other people without them having to download them over the internet. OS Tree, which is the core of how this is all done and the core of how Endless OS is implemented and the core of how Flatpak is implemented, it's like git. It's a content-addressed file system where you dump files in, they are hashed by their content and stored as a path based on that. Each of them has a checksum. Each object can be a file, it can be a directory tree that contains a hierarchy of files or it can be a comet object which contains some combination of files and directory trees. It's got refs, which are basically human readable names which point to a comet just like a git branch does and they can change over time so you can point them from an old comet to a new comet to a new comet. It's got remotes, just like git remotes, they are a little configuration saying here's some repository on the internet somewhere which you can download updated refs from. Each remote has a name which you choose locally. By convention it's always the same but it doesn't have to be and as I said before, Flatpak is based on OS Tree. So in the Flatpak world you've got apps. Each app is a ref in OS Tree terminology and when you deploy an app that is a comet from OS Tree with a load of files that contain whatever the app needs. It's binary, it's icon, whatever. So how do we add peer-to-peer support? What we have to begin with with OS Tree are these refs. These refs are unique per repository so just like a git branch you've got a master branch for each git repository you care about but each repository has a master branch so if you want to update your master branch in OS Tree you're caring about. Same if you have a ref in OS Tree and you want to update it you need to know what ref that it actually is so if I have an app that I've produced locally called G-edit and there's also a G-edit produced by the G-edit developers and published separately they're going to have probably the same ref name but they probably refer to different content so what you need is a global namespace which disambiguates those and says that the G-edit over here that I've produced locally is mine and the G-edit over there is theirs and those should be considered a set of things so we need a global namespace for refs. How do we do that? We added something called collection IDs so these are basically a globally unique version of the remote name so you can uniquely identify each repository and clones of it mirrored around the world with a collection ID so if G-edit were to be published by the G-edit developers in their repository they would set a collection ID for that repository if that were to be mirrored by GNOME they would copy the same collection ID so all the refs for either are considered to be equal and then if I were to publish my own G-edit I would choose a different collection ID by publishing something which is not necessarily the same as what they're doing so if you take a collection ID and a ref and consider those as a tuple that becomes globally unique and you can use those to look up and query for updates for the refs you want wherever you have a question I can repeat it Is there a convention or some enforcement for making people not collide with the names of their collection IDs? There's no enforcement although if someone were to choose a collection ID that had already been chosen and it started to collide with things then errors would appear everywhere there is a convention to use reverse domain names same as most other things but this is documented Any other questions so far? Cool So summarised collection IDs are like a name for a remote configured globally rather than locally Does this mean that you could prepare a malicious version of one app put it on a USB stick walk over to somebody else's endless OS computer then essentially update the app with a malicious version? You would have to have the GPGK from the original Right, so there is a measure of enforcement I'll come on to that in a second but I'm very perceptive So yes, summary files are the other half of the problem Each repository has a summary file and that contains amongst other things a map of refs to Comet checksums so it just gives a complete list of what the repository contains and the comets for each of those refs and it contains some other metadata like the repository description and its name, localised name and the summary file is signed by the same key for the entire repository so you know the summary is authentic for that repository so that you can't have a man in the middle interject a summary that contains inquiry data or malicious data as you download over HTTP which you can do for OS tree So it's traditionally signed as one big blob of stuff and it lists the refs If I am on a local area network and I've got some refs from this repository over here and some refs from this repository over here and maybe some from my operating system vendor over here and I've installed all of those and I want to expose my local OS tree repository onto the network I've basically got things from three different summary files and I need to combine those into one I can't do that if I've got one signature for the entire file because A, I can't reproduce any of the signatures for many of the upstream vendors and B, I can't sign with three keys somehow for different bits of content so the solution there is to yeah, that's the problem the solution is to drop the signatures that reduces security you say so we reintroduce security by implementing it a different way splitting the ref mapping up and flipping around how the security is done so instead of having a summary file which binds a ref which is the name to a comet checksum you have a ref binding which is in the comet which binds the ref name to that comet so it's kind of backwards rather than forwards and you can do this because each comet in OS tree has some metadata sort of saying it's date and the author and maybe a comet message just like git comets do and this metadata is always signed by the person who built the repository so if you put some extra metadata in there that says this comet should be on this ref and maybe also this ref and this ref and then you sign the whole lot you can always check whether a comet that you've downloaded and looked at is actually meant to be on the ref that you thought you downloaded it from which means that when you get rid of the signature on the summary file although the summary file isn't trusted you can then verify from everything that you pull back from it so now the groundwork is in place those are the two big problems that were in way of doing peer-to-peer updates we can do them so with USB updates we essentially take an OS tree repository and put it in a well-known location on a USB stick and with the LAN updates we essentially just expose an OS tree repository on the local network with a web server with the LAN stuff how do you actually find the updates on your local network without going into every machine and saying like what refs do you have can I have all of them early up to date because that would result in a lot of unnecessary traffic so like with a LAN of 30 machines say in a small business or a school or whatever each of them have 100 refs like a couple of your operating system this apps that you've installed many of them will be at different versions how do you actually find which refs that you want and which are up to date and which machine has the latest update that everyone else can pull from the solution is to take a bloom filter of the refs on each OS tree and put it in a DNS SD record with the fahi and then when you're updating from that peer you will check whether the ref you want is in their bloom filter if it isn't, you don't care about that anymore if it potentially is because bloom filters aren't entirely deterministic you download the summary file from that peer and you check to see if it does actually contain the ref you want or whether it's a false match and if it does contain the ref you want you then download that comet check the GPG signatures are all correct and match the ref and then download the rest of the comet and then update from them the code for all of this is being done completely upstream in libOS tree and in flat pack and there are some co-owns in our update for endless OS the OS op data which is also free software and it's already been supported in various upstream repositories where the collection IDs have been added to their configuration so flat hub for example has a collection ID set so you can already use flat hub apps with peer-to-peer updates if the tooling you're using supports it I mean that's still being shipped out to distributions and probably hasn't been enabled in many places yet but the pieces are in place the components that we have in the endless OS op data they're the bits that if you wanted to implement this for yourself for your own distribution or platform they're the bits you'd have to replicate or adapt they're not like shipped by default by flat pack so we've got a web server and a DNS SD record generator for the LAN sharing which basically takes your local OS tree repository expose it over the network and also updates anavahi list of DNS SD records generates the bloom filter from the refs that you have and various bits of plumbing for that to integrate it with system D and do socket activation this has been worked on by quite a few people endless we've got Matthew, Rob, Dan, Cresimir and me and then lots of help and review and feedback from Colin Alex OS dream of flat pack and also a lot of reviews and merge testing done by the RH atomic bot and that's it we're hiring so if anyone wants to talk to me about that please do we're looking for desktop engineer and tooling engineers but the code is all there and has anybody got any questions I feel that was like a lightning approach to it so I can expand in detail on anything how does this DNS SD record compare to distributed hash tables to what sorry distributed hash tables I think distributed hash tables took up a lot more space and they weren't as easy to implement but yeah I can't remember the details now we wanted the bloom filters to take up there were various restrictions on being supported by different routers as they forward DNS packets some of the larger DNS packets just get dropped so we wanted the bloom filters to be really small distributed hash tables I think come out a bit bigger but yeah I don't know it's been a while since I actually looked at that bloom filters certainly do what we want they allow you to have more than enough in your local repository that you're advertising several thousand if I remember correctly before the probability of collisions becomes too high to make it worthwhile and they do mean that you can massively cut down the amount of requests you have to make like unicast from you to the computer you think has a ref by using the bloom filters to cull the ones that definitely don't have the refs you want does that answer a bit more detail yeah thanks Alex back back I was promised a color emoji in the slides mmm you promised an emoji you got two emojis while it was black it's a color issue a color emoji would have detracted from the color scheme of the slides I think so sorry so what about line updates why just restricted to line updates and not to updates over the entire internet I guess by using DNSSD you sort of restrict yourself technically but is there a reason for not thinking about when updates I mean the original use case we had for supporting LAN updates is because NLSOS is something that we want to run on computers which have restricted internet access and particularly the use case we were caring about was schools where teachers machine will have an internet connection and then there will be loads of student machines that don't and they can only connect to the teachers machine I can't immediately see there being an advantage in doing updates over a wider network because the cost of communication between all the peers would start to get very complicated and also didn't want to go into writing an entire distributed hash table file system that's not my thing this does the job for the use case that we cared about there's nothing stopping people from writing one in future so the underlying APIs in OS tree that allow my computer to say I want this ref and this ref and this ref find updates for them they will in parallel look on the internet on the local network and on any USB sticks that are plugged in so you can always write an extra provider which would I don't know look on BitTorrent or something or some other custom protocol as long as it can do implement there's a couple of virtual functions basically but it can do all of these things in parallel and it will take whichever updates appear first and look likely for some moderately well defined metric of what likely means so yeah it's a possibility in future if you want any other questions so thank you Philip and give him a wonderful applause thanks