 This talk is about hacking on app for fun and profit. And my name is Michael Vogt. And for those who watched this later, this is DevCon 2014. All right. So first let me say, because this is recorded, and this will be watched later, when I say hacking, I obviously mean having a delight in an understanding of how a system works and not break into a system. So if you are here because you like to break into systems, that's the wrong talk. Like, go away. And I also have a confession to make. When I said profit, I lied. I'm sorry. But I tried to lure you into coming to my talk because I like to talk in front of a lot of people. That was also a lie. But what I didn't lie about was the fun part. It is actually great to hack on app. And so let's say, what is app? So Wikipedia says it's the advanced packaging tool. And I really don't think that's quite correct. Because I mean, that sounds like this is DevHelper, or maybe a different version of DevHelper. Like, if I read the history correctly, then app stands on its own. Like, it's not an acronym for anything. It's just apt. And of course, it is the tool that we use in Debian to manage our packages. We search, we download, we install, all this stuff. It's also a bunch of command line interfaces that we use every day, or some of us use every day. And there's also a library. Like, two libraries, actually. Well, nowadays it's actually three, but two public ones. So why is it fun to hack on app? Well, if you contribute to app, you help 20 million people, at least. Like, it's probably way more. So 20 million people is the figure that Debian had like a while ago. Like, how many users do they have 20 million? But of course, Debian has also a lot of users. App is used outside of Ubuntu and Debian. It's used in Mint. It's used in SteamOS internally. It is used, apparently. I was told at this conference on jailbroken iOS devices. So there's a lot of people who use app. And if you contribute, you have 20 million people, at least, just think about this. And I'm mentioning this because sometimes people write their own window manager or your own text editor. Chances are, you won't get 20 million people easily. I mean, some of us will. So app is great because it is important. It's also a central piece of our infrastructure. We use it in the distro. We use it on the actual devices. We use it as part of DAG, like our FTP archive infrastructure. We use it in the Debian installer. We use it in all sorts of places. And it is part of what makes Debian great. And so Joey Hess was talking about the Debian cosmology at the last Debcon. And it's a great talk on its own. But the part about app was very interesting because he highlighted how influential app was back in the day. And I mean, it still is. But nowadays, having a package manager is kind of everybody does it, right? I mean, even I was told .NET has a package manager now. So, you know, but app was one of the first ones. And it was certainly very influential. Bindings are available for lots of languages, like Python, Perl, Ruby. Fonts are available for lots of toolkits. You know, synaptic, move on, aptitude, of course. And it's the building block for building systems that are even one higher level of abstraction, like package kit, like deep down, there is apps still managing your packages. And there are a lot of relatively low hanging fruits. But I mean, low hanging for people in this audience, for sure. Yeah, nudge, nudge, wink, wink. Know what I mean? And one is bug triage. We are drowning in bugs. And not because app is so buggy, but because it's so important, it's used a lot. People want to use it in different and interesting ways, and we just can't keep up. So I mean, just having some people helping triaging, helping reproducing issues, that would be immensely helpful. Having more features. I mean, I will talk about some specific ideas later on. But even though app is quite mature and very useful, there is still stuff we want to do. And we can do to make it better. And of course, there is always bug fixes, right? I mean, there is always this, oh, it doesn't quite work correctly on K-FreeBSD, for example. Because the K-FreeBSD PTY handling is apparently different from the Linux handling. And yeah, we have very friendly people, very friendly. Hanging out on IRC and on the DET mailing list. So friendly people, right? So this is the current team. This is the people who are currently working on app. That's David Kalnischkis. And David, I hope I pronounce your name right. It's kind of a difficult one, but that seems to be a tradition in the app team that we have difficult names. What did the mailing list get its name? That's really like in the, like, it happened a long time. Oh, sorry. So the question was, how did the mailing list got its name? And I don't really know. I will talk a little bit about the history later on. And maybe, I mean, I see some people here in the audience who are around for a very long time. So maybe somebody can answer this question later on. But it happened before my time. So David Kalnischkis, Julian Antres-Glode. And again, I hope I got your name right. He's mostly working on Python app, but he's also contributing to app as well. And of course, I, you know, me, Michael, MVO. Your name could be on this slide, on your picture. Just saying. So let me give you a very brief history. And I call it a brief history according to Michael, because I'm actually too young for the full history. Like, app got announced on the 1st of April in 1998. And that was before I was involved in Davion. And it was announced by Jason Gunnthrope and Jason. I hope I got your name right. You know, 1998, first commit we have in Git. Well, CBS at this time, but we converted it over to, I believe, TLA, Bazaar, Bazaar 2, and then Git. So now it's all maintained in Git. And I should note that the basic principles, like the basic design, it really stood the test of time. Like, sure, it expanded a lot. Like, we added lots of features. Lots of code got changed or rewritten. Or, you know, the core design, the core principles, the way the cache is generated, the way the acquire system communicates, that's still the original design. And I think that's a pretty big achievement. I mean, the code is 16 years old. And it's still going strong. And it scales pretty well as well. So yeah. So some interesting milestones. Like, you know, 2003, we added app secure. 2007, we added translated package descriptions, which is really useful because not everybody understands English. And this allows us to have translated package descriptions. We got automatic dependency removal from aptitude. Like Daniel Burroughs, he was the person who invented it. And he ported it over so that now instead of having it only in aptitude, well, I say now, but you know. So that it's not only available in aptitude, but it's also available in all the front ends, like, you know. And we got HTTPS support, which is not this important for us, because we have assigned package infrastructure anyway. But it's still kind of nice for different use cases. Then we got in 2010 multi-arch in apt, which was a big deal. And of course, multi-arch is much bigger than just the apt implementation. You know, it required support from the package. It required the tool chain had to change. And multi-arch is really important for Debian. And I think we are the strongest distribution when it comes to supporting multiple, like, you know, what we call multi-arch. We also have stuff like bot automatic upgrades, which makes integrating backpots into the source.list file easier. We got in-release support, which means that we have inline signatures for the release file, external dependency solver protocol, which is nice for people who do research on how to resolve dependencies. Like, this is an interface where you can basically have an external application that solves your dependencies. You can say, you know, here's my package universe. This is what I want to do. Please tell me how to do it. And it will communicate over pipes. So you can write your resolver in great languages, like OCaml or whatever language you want. And in 2014, we got the apt-1.ow release. So it's a time-based release. It took 16 years. But let's see, like, we are not set on this particular time-based release schedule. That's a great idea. Oh, yeah. So the question was, like, is it exponential? No, it's not. And we also got the apt-binary, which is kind of a funny story, right? Because, like, we have apt-get and apt-cache. And new people asked, so why do we have two different ones? And it kind of makes sense. Like, if you look at it from an implementation perspective, but it doesn't really make sense when you look at it from a user perspective. So we wanted to have an apt-binary. But the apt-binary name was already taken by the annotation processing tool. That is part of Java. And fortunately for us, it got deprecated in Java 6, I believe. So right when the binary name dropped from the Java package, we uploaded an apt package that had like a shell script basically saying, like, you know, this is our name. But nowadays, you know, it's actually doing useful stuff. Like, the binary is kind of useful. And it's meant to provide all the commands that you need on a day-to-day basis. Oh, and we also got client-side pdiff merging, which means that if you run apt-get update on a Debian system, and it touches pdiffs, it's now really fast instead of really slow, which is great. So let me talk about some stuff that may not be so well known. So the new apt install output is actually colorful and has a progress bar. Well, you know, some people like it. apt-get apt mark hold lets you set packages on hold. And it will be respected by d-package. It will be respected by aptitude. We got order removal for kernels, which is nice, because, like, we didn't do order removal on kernels for a long time because we were just afraid of removing the running kernel. You know, that's bad if you want to boot again. But nowadays, we have a script that checks the running kernel, the latest kernel, and it will make sure that these will stay on your system, but the other stuff gets removed. And this is mostly relevant for systems like Ubuntu, where new kernel, ABI versions are in the archive a lot. Oh, and yeah, we also have apt-get-moo, like apt-moo, and apt-moo-moo, and apt-moo-moo-moo. And we have lots of useful debug options. And I guess these should be really more like, how many people have used two of those, like in the audience? Yeah, I guess they should be more well-known, because it is really useful. Like, if stuff doesn't install, for example, the depth cache marker debug argument is really useful. If the dependency solver gives you a really weird output, like the debug option for that is really helpful. Same if you have network problems, like the HTTP output is nice. All right, so a very quick demo. I hope you can see it OK-ish. The details are not relevant. Just look at the bottom of the page. That's the progress bar. And I'm just showing, originally, I didn't have this slide, but I was talking to people at lunchtime, and I was asking, if you see in this progress bar, and they were like, no. So my sample size was small, but I felt like I can as well show it, because it's end users like it. Yes? What's the thing that says something else? Oh, yeah, that's apt-batter-of-s snapshot, yeah. So if you install apt-batter-of-s snapshot, it will actually take a snapshot every time you install a package so that you can roll back to this snapshot if you are unhappy about the particular package. All right. So what's next? We have currently an ABI pride in experimental, and I really hope to get it into unstable before the freeze. It is quite a lot of change, but it should be fine. And it has some neat stuff, like some stuff that people were asking for basically for years. I mean, you can run apt-get install, and then give it a depth package, or a bunch of depth packages, and it will install the depth package. Like, you don't have to run d-package, and then apt-get, yeah, fix it for me, please. And it will tell you beforehand what it needs to do. Yeah, yeah. Thanks. We have apt-get build-tap, where you can point it to a DSC file, and I understand this is useful for sbuild, for example, where we can get rid of some code. App source uses strong hashes now. It wasn't before, which is a bummer, and, you know, it's available for at least a year in Git, but because it's an API break, and we try to be really careful about API breaks, it isn't available in unstable yet, but soon. Then we have apt-get update, which now can use a by-hash scheme to get d-index files. So essentially, this means that instead of saying, give me the packages.js file, it will ask the server, give me the packages.js-minus, and then a long hash, which means that proxy setups are much simpler because the name of the index file is now unique, instead of, you know, not unique. This will fix a lot of the apt-hash-sum mismatches when a mirror pulls is happening. Okay, thank you. I guess that was your question. Yes, so yeah, we have right now, there's an inherent race condition if we run a mirror pulse, and this mitigates this. Like, there's more work to do to make it really go away, but this is an important step forward. We also have the dev822-sources.list format, so instead of having a single line where you say, you know, dev-hdp-la-la-la, you can describe it just like you would describe it in a packages file. Like you say, ure, colon, and then give it the ure, then give it the section, then you can add comments. And this is nice because it is much easier for a machine to read, it is also much easier for a machine to write, and it's also much easier for a human to read and write, and it's much more like extensible. Right now, if we want to add options like untrusted equals yes, for example, like you have to use brackets, and it looks really weird, and software breaks because we have so many custom parsers for it. So this should make all this go away. And while I'm here, and while I have people talking to me, I want to take the opportunity to talk about some common misconceptions. And some of, you know, this room is probably much less affected, but you see it on the internet, you see it in forums, or sometimes on user mailing lists, stuff like apt and aptitude are incompatible. This is entirely not true. Like apt and aptitude get along just fine. They have a different resolver library, like the aptitude resolver is using a different technique when it comes to dependency problems. In some ways it's much more clever. apt on the other hand is more heuristic, and in some ways more predictable. You know, sometimes one gets it right, sometimes the other. But they are not incompatible at all. Like you can mix and match, it's just fine. The apt and the aptitude developers do not hate each other. No, we don't really. We get along just fine. We port features back and forth. No problem at all. PDIFs are not slow anymore, at least, like unstable. PDIFs are really fast. The recommands handling on upgrade is very simple. Like if there is a new recommendation on your upgraded package, it gets installed. But all the other recommands are just left alone. Like if you have unsatisfied recommands, the apt will not touch this at all. It will only look at new stuff. And apt is not going to be rewritten in brain fuck or in white space, even though David keeps threatening me doing this. And David, if you're listening, you're not going to do it. And I should point out that it's unlikely that apt is going to be rewritten in any other language because, like, you know, there are serious attempts, and it's fine. But I think the value of apt is really this ecosystem. And if you rewrite apt in a nicer way, that's certainly a good goal, but you would have to replace a lot of the upper layers as well. And that's going to be a lot of work. Right? So, you know, just in case you're interested now. So what, you know, how can you hack on it? What is this? What can you do? So, first of all, let's talk about what the major building blocks are in apt. Apt is kind of complex. It's not very big when it comes to lines of code, but it is complex in the sense that it needs to do quite a bit of stuff. First of all, it needs to read the index files, like the packages files, the sources files, the stuff that it downloads from the Internet, the stuff that you have installed on your system. And it reads it all into a binary cache. It's a memory map data structure that is a cache in the sense that if your files haven't changed, then it will just memory map this file, and it's going to be really quick. If it has changed, it will just build the memory mapped cache again. And, well, it won't be as quick, but once it's done, it's written on disk and it is fast again for the next time. So that's basically the static data, the available packages. And on top of this, it will build the dependency cache. So it looks at what packages depend on each other and build a data structure that's called the package depth cache. And this data structure is also used to mark packages for installation or for removal. And once you do that, once you mark a package for installation or removal, let's talk about the installation example, then the policy comes in. And the policy is the part of app that decides what version of a package you actually mean when you say, I want to install, let's say, you know, Cowsay. And in the most simple case, it's, of course, the latest version. But you can overwrite this. You can say, I want to have packages from testing instead. And those should get a higher, like a higher score. I mean, internally, it's all mapped onto scores. And once you have marked a package for installation and once app has actually decided what version to pick, then it will mark the dependencies for installation. And it may run into a situation where packages conflict with each other, right? So in this case, the dependency resolver or the package problem resolver, it's called internally, comes into play and it will try to get your system into a consistent state again by removing packages or setting packages back from install to keep. And hopefully, when it has finished doing this, then your cache is in a consistent state. And once it's in a consistent state, the Acquire system comes into play and that's the part of app that downloads stuff or copies stuff. So you can, you know, stuff gets downloaded over HTTP. You can get it via CD-ROM. You can get it via SSH. There are external transports available for BitTorrent and Tor. So it's quite flexible. So this part once brings it into the packages onto your local system. And at this point, the package manager interface takes over and that's the part of app that drives dPackage. It will run dPackage in the right order and execute it with the right arguments. And so that's the library part I described here. And then, of course, we have the command line interface stuff, the frontends for the library that apt itself ships. That is apt-get, apt-cache, apt itself, and also apt-ftp archive, which is also part of the apt-git tree. But, of course, it's not installed on most systems. Well, yeah. So what else do you need to know if you want to hack on it? It is written in C++. But it's not, you know, it's not the scary kind of C++. It's not this modern Lambda C++. It's really a bit old-fashioned. So people who like C will... I think they will like the apt code base as well, to a certain extent. We do use the STL, but we don't use, you know, we don't use, for example, Lambdas. We don't use Auto, like we don't use a lot of the modern stuff. And we also have a pretty good test suite, which is really good, because we don't want to break stuff. It's mostly written in shell. It's currently 140 individual shell scripts that use a framework to set up, basically, a Sherwood-like environment for apt so that we can test, is the policy working correctly, assuming that we have these preferences and this list files. And each of these 140 files has a bunch of additional subtests. So it's really, by now it's quite a good test suite. We also have unit tests based on GMock. Not as many. We would like to have more here, but we have some and we try to expand this. And it's also integrated into TravisCI and DebianCI. So when I do a Git commit, then TravisCI is run and it will run the entire test suite on the Git commit, which is nice, because obviously it means if I break stuff, I know about it. Sometimes I know about it because David will yell at me on IRC, but yeah, I like Travis sending me emails instead of... Right. We actually have a README and it is a good README. It will tell you exactly, you know, how to set up your environment, what you need to do in order to contribute. But essentially, I mean, the workflow is very simple. You git clone the repository, run make, and then you run, like, whatever command you changed and see if it works. And of course you can also run the test suite, but... I need to set the LD library pass because it's a library. I mean, a good chunk of it is a library. There are, of course, some gotchas that you need to be aware of. So, ABI's in C++? I don't know about you, but I find it kind of difficult. It is fragile. Like, you are not allowed to do a lot of stuff. There is a very well-written wiki page that describes what you are... what you can do and what you can't do. And the list of stuff you can't do is quite long. There are some techniques to help, like d-pointers, for example, which we use, but really it's... like, keeping an ABI stable is serious business. At least that's my experience. And while I have the opportunity, I would really like to rant a little bit. And people don't know me know that I really like ranting, like a very ranty-ish kind of person. But maintainerscripts really make life, for me, very difficult because basically what a maintainerscript is doing, it's running arbitrary code as root, and it can do anything. So it can alter the system state in any way it wants, which means, of course, that we can't downgrade a package. Which means that we can't roll back a package, like we can't roll back a transaction. Which means when an upgrade goes wrong, the entire system, like, is in a broken state and we can't do much about it, which is a shame, really. And, of course, I have seen lots of systems where upgrades went wrong and there was not a single one I couldn't fix. Like, you know, the right deep package and apt foo, and it's all good again. And it's true, I mean, it definitely is. But even experienced sysadmins sometimes don't know how to do it and they don't have the time to do it and frankly, I mean, they really shouldn't know that it's in and out of apt and deep package to fix a broken upgrade because if we could have a more declarative way of saying, I want to add a user, I want to do whatever, then we could provide all this stuff automatically and that would be really, really neat. But yeah, I totally understand that this is a long way and this is difficult and we use maintainer scripts in all sorts of ways. But still, I mean, it's something that I think we should really talk about and discuss more inside the project. Right, so back to the topic. Thanks for letting me ranting a little bit. So just in case I got you interested, there are quite a few areas where you could work on or anyone could work on or sometimes eventually we will work on. And first, let me say all development is demand-driven. We don't do stuff... Well, sometimes you do okay. But generally, we try to do stuff that's useful to people. And one thing that is useful is having more integration tests. So if you or your application depends on a feature of app, like say you run app cache and then some sort of weird command line options, we do not want to break your application, but we may do by accident if we don't have a test for it. So just contributing a test case for it would be helpful. Tests are really simple. Everybody who can write a maintainer script can also write a test case. We have a framework that has a lot of commands, a lot of functions that you can use. It's not difficult at all. What's also nice is... or what would be nice is having the installed progress. I showed you the progress bar earlier. That progress bar is currently calculated in a very simple way. It's basically the number of packages. But instead of the number of packages, we could also take the size of the packages into account. Obviously, a kernel package takes much more time to unpack than a small library package. So that would be nice. And it's probably not a lot of work to do. PTY handling on K3BSD is broken, I was told. Somebody... I don't even have K3BSD, so... that would be appreciated. We have this great redirector service, http.debian.net, that redirects you to the closest mirror. And having apped understand the meta-link information from that service would be really cool. And there is some work being done here, but it's not done yet. So that's probably a really interesting area. There's server records for security.debian.org, another interesting area. The improvements for AppKit Update to make it really race-free when a mirror update is happening. We are half the way there, I believe, but there is still some more work to be done. We could port some of the aptitude matches because everybody likes the aptitude matches. And I think they are really great. There are a lot of ways not to have them directly in the library instead of in an external application. So the ordering algorithm we have in App... So the ordering algorithm is this part of App that orders how d-package needs to be run. So it will run, like, you know, remove sent mail first and then install postfix. And currently it's optimizing for speed. It will try to run d-package as little as possible. People may prefer different orderings. People may prefer an ordering where you can say run, like, try to keep the system as consistent as possible. Like, every time you upgrade a package, try to minimize the amount of brokenness during this upgrade. And this ties into the next point. Like, can we run d-package without running it in various force, like, using various force options, like force... Like, force depends, for example. So that would be a really interesting problem for someone who's keen on working on algorithms, for example. Having a binary interface for the external dependency server protocol would be kind of nice because while it's a great interface, it tends to be a little bit on the slow side because it needs to give all the information that it already has in its cache to an external application. And then this needs to be passed and it needs to build its package universe. So having a binary plug-in interface here may solve this speed issue. And yeah, that's just a bunch of ideas. There is more stuff. I'm sure David has lots of ideas as well. Uh-huh, not much, I'm going. Okay, so that was my talk. Thank you very much. I hope you enjoyed it. We do have time for questions, I believe. You said that you have support for doing a snapshot immediately before installing so that you can do a rollback. Did you think about maybe turning that on its head that you could sort of build the snapshot out of the thing that you're installing and then just atomically say, okay, this is the new system image, so to avoid this in between, we're in the middle of an upgrade thing going on? Yeah, so we do have the app better of a snapshot package, which, as I understand it, does what you want. But no, maybe I misunderstood the question, so... Run the upgrade. Oh, I see what you mean. Yes. There was a sandbox mode for... I wrote some code that did this, but it was before we had all this fancy namespace magic, so yes, it's a very good idea. You mentioned the binary cache, which is one of the... It's one of the bits of app with the most kind of spooky action at a distance. You assign to variables and stuff, turns up in the cache, which can be quite confusing. And it's sometimes a source of quite obscure bugs. Has anybody ever looked at replacing that with something like SQLite, or are there reasons that just wouldn't work or wouldn't be performant enough? As far as I know, nobody has looked at replacing it, and it would be quite a bit of work, but there's no intrinsic reason not to do it. So getting rid of maintainer scripts is something that I have previously been a big fan of as well. Working on Lintium for a while, I feel your pain, because Lintium tries to figure out what the heck you're doing in a maintainer script that you've tried to figure out if you're doing it properly, and that means that what Lintium tries to do is parse shell, except that no one ever implemented a full shell parser in Lintium, so Lintium parses some part of shell with lots of regular expressions, which mostly match kind of how people write shell, but not really. So to expand on that a little bit, I think there's several things that people could be thinking about in terms of how can we significantly reduce the number of maintainer scripts we have in the archive. We're going to be able to get them to zero, but I think we have considerably more than we actually need. Triggers has helped quite a bit. I'll point out that we don't have policy documentation for triggers mostly because we have a lack of reviewers to drive the documentation all the way through to completion. I know there aren't a lot of people who feel like they're expert in triggers, which is part of the problem there, but if you know enough about triggers to think that you might be able to review that policy document, getting documentation for exactly how to use triggers in the policy I think would let more packages use them, and that would make more maintainer scripts go away because that's like the biggest thing that we've managed to do to get rid of maintainer scripts. The other thing to think about is is there anything else where we can take something that's in a maintainer script right now and make it declarative? In the system detox, Josh mentioned sys users and the way that that might be able to get rid of ad user, remove user pairs, or whatever we want to do with remove user, which is a whole other kind of conversation. But there's a lot more stuff like that. If you read maintainer scripts, you see lots of this boilerplate where lots and lots of packages are doing the same thing, and oftentimes there's debhelper code to write that thing out, and by the time you're at the point where debhelper is writing shell script fragments for you, it feels like we should have a configuration file or something somewhere in a system service that does this properly. Yes, I totally agree. Great. Just following on from that one thing I've thought about in the past is that debhelper auto scripts are an intermediate step between total mess and properly declarative, and I don't know if anybody's written something to audit whether this maintainer script is the result of a debhelper or of a sequence of debhelper auto scripts and nothing else. That might be a useful thing to try to figure out. For me, the debhelper scripts are very helpful but then I rebuild packages and they start to act differently and then I realize that the old package was built with really old debhelper and the maintainer scripts have changed since and suddenly we realize that our archive can have a lot of out-of-date code in their maintainer scripts. If it would have been declarative, you don't need to rebuild anything, suddenly everything uses the latest, greatest, whatever. I don't know how to detect that or do an archive scan either. Can Linitian scan debs and see whether there was old debhelper snippets in there? The latest maintainer scripts that I've written were only calls to the package main script helper or the age main script helper. There is a support in DH installed deb for doing that for you. Okay. And that, I guess, also adds the snippets in the script. And I wondered if I could just have a file that's packaged in the Debian directory that gets picked up at install time and does it so that's even less maintainer scripts? About two years ago, two or three years ago, I got Debian slash foo.ment script into Debian helper, so you can put basically the arguments to Debian script helper into that file. You don't need to write them in manual maintainer scripts. That's what not DH installed deb does? DH installed deb, I think, but yes. It's still a Debian helper alter script, but it writes the Debian helper alter script for you. You don't need to actually write the code by hand. I'm much more worried about people writing code by hand that would be wrong. You're right that code does get out of debt, but it's, like I said, it's an intermediate step. I did not find it in the package main script helper. I'll show you that. Yeah, I agree with Colin. I'm worried about people writing maintainer scripts by hand as well, because contrary to popular belief, shell is actually quite difficult to get right. It looks so simple, but it's not. Yeah. You're not popular if you try to parse this. Make people move the microphone around for a quick joke, but people think there's more than one way to do it in Perl. Try parsing shell. All right. Any further questions or comments, or does somebody want to comment on the history, like in the pre-2003 area? All right. I guess we are done. Thank you very much again.