 All right, so unlike Torsten, I wasn't hacking slides until five minutes ago, but the reason for that is not that I was so much more ready with my slides. It was rather that it's a little to talk about that the slide deck was so small. So this is me, so I'm doing Libre Office on Ubuntu, but I'm also working a bit on Libre Office itself, and one of my most insane ideas was BiBisect. Some years ago, who here knows what bisecting is? Torsten talked about that. Okay, who knows what bi-bisecting is? Okay, roughly half of the audience, which is great, so I can waste some time explaining bisecting. So imagine Libre Office 520 is working great for you, and your document is rendering fine and everything is beautiful, and then Libre Office 521 suddenly your text is read, although it shouldn't. So imagine this is Libre Office 50, and this over there, the other end, is Libre Office 50, this is 520, and this is 521. So 128 commits in between, and if you ask the developers who of you guys broke it, what's the answer? Right, everyone is saying not me, and in this case, that's true, because I did break it, and of course I don't know that, but I did so, and this is the commit where I broke it, somewhere here, but I'm not aware of that. So you're sitting here, you're not knowing where the stuff broke, so what can you do? You know it's broken here, it's okay there. So you go into the middle, and you test your document, and is the text read or is the text black? Which color is it? It's broken, right, so I know that the 64 commits broke it. I go here, red or black, is it broken or not? Not, it's working, okay, so I've got 32 commits to my left, and go into the middle, now it's 16 commits to my right, is it working or not? It's not working, okay, it's there. Okay, now actually I can just look at the 16 commits and probably, then it was me, so speaking of breaking things. So this is roughly how bisection works, except that for LibreOffice it's a bit different because usually with each step you have to compile the whole product, which is a pain with the product, the size of LibreOffice, so we created all these binaries and put them into a Git repository so you can do it on the binaries. But that leads to pretty big downloads. So if someone wants to find out if something breaks, he has to download a few gigabytes of a repository just to find that out, so it would be great to make that lighter, and I looked into that but I have to tell you, no idea what I'm talking about, I did mostly application development and I don't know that much about more development closer to the system, but one of the basics is that in Unix everything is a file and there's one path and at that location you will find a specific thing. That was true in Unix for a long time, but actually that is not that true anymore for quite a few reasons. There are some containers and solutions that make actually different trees be available in the same thing, and you can use that, for example, to create a specific view on the file system. So you already saw this thing, like Snappy, for example, and other solutions, it would be great to really be able to switch between these versions without having some intermediate half state and it would also be great to be able to roll back. Now most of this stuff is done by Snappy or also by Flatpak and solutions like that, but we really want to have this solution for having many, many, many different installations and not having to install them all at the same time, which is a bit of a different problem. But if you look at Git, how it is set up, it's actually a content addressable file system and that means that once you have a specific file and you have a different commit that also has this file with the same content, it actually stores not two files but it's just the same stuff. So it does the duplication in a way, which is great for reducing the amount of data you need. And it stores all this stuff in very simple objects which are blocks, which is just the content, trees, which is a directory essentially and commits, which is pointing to a tree which contains the state of the file system. So like I said, in 2011 I did BiberSec and started off with this and OS3 was something similar, I know, but that was the starting point for LibreOffice and I could put 755 images of LibreOffice into just one repository of 12 gig, which was a good starting point because that means 16 Macs for just one installation. But you still had to download the whole thing to really get started with this. Actually, that's not current data, that's half a year old, so just keep over that. So I wondered if we can solve this problem with of course there's an XKCD for it because there's an XKCD for everything and this XKCD essentially says why don't we download stuff on the fly when we need it? And the summary is below, I felt pretty clever until I realized that I reinvented webpages but actually these days there's not much different between a web page and desktop application because this happened by now. So an average web page these days is bigger than whole desktop applications were back in the day. So our networks got this good and this fast and have so much bandwidth that we don't need to think in the same terms that we did back in the days when a desktop application was just one thing. So my idea was actually to have a Git repository somewhere and if you wanna buy bisect on this Git repository or you wanna use content from that, you create a fuse mount, so a dynamically generated mount and whenever you actually wanna look at that file in that moment you download the file actually from the Git repository. So that was my idea and I did that proof of concept with an ugly Python script in 2013 and well, it was slow because it was Python and the Python script used forked shell commands and these forked shell commands then downloaded stuff so that is obviously not quick but I could have a by bisect repository somewhere and start LibreOffice out of this repository remotely essentially so it was downloading on the fly on first start, it was slow but it worked and what I could see for example is that starting of LibreOffice for the first time just to get to the start center, the whole installation in a by bisect repository is roughly 250 mechs but to the start center only 75 mechs of that are actually read. So if you just wanna test one specific case like you do with bisecting, you don't need 250 mechs of data, you just need these 75 mechs. So that is quite a lot already, quite a different difference. So yeah, this was just a proof of concept and I did processes isolation back then just as a hack. I looked up with which process ID was looking for a file and then giving this process the LibreOffice that was there and then you could switch the file system and actually have a different LibreOffice there and start a second one. I thought that was a great idea until I realized that the dynamic linker gets very confused if a file at the same location suddenly has different contents for different processes but this kind of stuff is much better solved by solutions like namespaces, Docker, Snappy, Flatpak, whatever. So that was just experimentation but the basic concept to have this to dynamically download stuff on the fly worked because I could start LibreOffice from it which is probably one of the most complex binaries you can start on something like that. But yeah, it was hackish with Python git commands and it navigated the paths remotely so it whenever you navigate through the directories it always had to check back the remote location and yeah, it was not really something you could use in production or even for bivisection. So this year I tried it a bit better with using an actual proper language for something like that which was Rust and used Libgit and again Fuse and had good tests on that and essentially got the same thing with if you things changed. For example, as I said, Libgit has not only the blocks as objects but also the trees which are the directories so I also download the trees and with that I can locally navigate the file system so the performance is better and other than that it does mostly the same if you open a file it checks if the file is there if the file is not there it goes to the Git repository downloads the Git repository, creates an uncompressed version in the local file system and then just maps every file access to the local file and all of that transparently so you never see any download there. So you're opening the file the first time obviously is low but because you downloaded it in the back but it's not that slow these days anymore if you have a good connection. As I said, I tried the first one to start LibreOffice, I didn't try it on this one yet but the hard part is again the last thing because you have the local file cache and the fuse mount and it works against a local Git repository which of course is not what you want you want the Git repository remote so you need again another fuse mount or something to get this remote so you have this Frankenstein combination of two fuse mount on top of each other but I don't think it will hurt the performance much because most of the accesses will be directly to the first thing but then you can use HTTP or SSH to mount this second thing. There are a few challenges there for example statting the file system is a bit tricky for most of the time it's a read-only file system so I just created fake stat like access bits and stuff like that but for example the size of a file in Git is not stored in the tree node so to find out how big a file is you actually have to download it right now so you need some modification there to actually have local tree-ish objects that also have the file size of the tree node in them so that not statting a file makes you download it on the fly. The second thing is if you wanna have insecure protocols like so you just wanna for example have this on HTTP then it would be great to have a Git tag which creates a signature and then have all the objects in between signed and with hashes of the contents so that once you have checked the signage of the tag you're sure that everything else is not somewhere in the middle or something and then once you have all that you can think about garbage collection like you have two checkouts and you're running one and the other one you're keeping it around at some point you just wanna delete that because no one is using it anymore or quotas or how big your local cache will be and all that fun but that's for later so this is roughly the state where this is in I would have loved to say that I could start or that I did start LibreOffice remotely already on the new implementation but I didn't get around to do that but I hope I can do this soon and then you could actually do a bye-bye sect against the remote repository without having to download 10 or 20 gigs but just like for the first step you download 75 max for the second step you maybe download another 30 max and once the next time you will again have likely these two first steps because I mean the starting point of a bisect is mostly always the same so you can dynamically or on the fly download the repository as you need it and not have to download it in one step so again I have no clue what I'm talking about here mostly so originally I wanted to get others to actually do the work for me that didn't work outside to do it myself now. Luckily this stuff is solved for application deployment with solutions like Ubuntu Snappy but for us the specific requirements are a bit different so I'm still hoping that I can finish this at some point All right, questions, none. Oh yeah. Couldn't you leverage the get shallow checkout to do this? How do you want to use get shallow for that? Well if you just want to have one specific version if you do get shallow that at least is smaller and I don't know well enough I just look for the manual maybe you get shallow. What you could do is do something like again have the git repository on an SSH FS mount or something like that and then get shallow into that and have the local copy different but the disadvantage of that would be when you do the first checkout you do full checkout so you get the whole that whole all files from that although you only need like a third of those. So yeah, it's another solution one might attempt but yeah, other questions. Not a question, a request, we have enough time could you do a quick demo? No, but actually I can show you the repository link. Thank you. So this stuff also has unlike the first attempt has quite a few tests so you can see what it should do and if it doesn't you see that it doesn't. Kloff? Did you also consider just going the waste disk space route and just the creator version on the server that has the different versions in separate directories and just switch to the directory on your local mount? So you mean like having the complete thing on the server and then yeah, no I haven't considered that but it's another option to go forward but given how the compression of Git is with 60 max per checkout instead of 250 max that's quite a difference that you would lose there. So I have a question. Armin? What about if you marriage Kloff's idea with checking out a one version at the server and mounting that locally? Sorry? You could check out one specific version on the server to attempt directory and mount that locally? Yes, but then you would need one temp. Yeah, you could do that as a service. One temp per user, yeah. Yeah, yeah, but that requires the user to be able to trigger changes and stuff on the server which is admin stuff that I didn't want to get involved. My idea here was to have the admin side like just that file system and be done with it. But I see there are lots of great ideas here so go ahead implement them. This is just a hobby of mine so as you can see from the time frames that are involved. So thank you.