 Okay, let's welcome Matthias for his talk about the status of AppStream in Debian. Okay, thank you. I will be talking about AppStream in Debian, which is a project which is designed to enhance the metadata we have available in the software archive. It's also known as DevEleven, the Debian extension project 11, but since all the stuff that was in DevEleven has been merged into AppStream, you can basically exchange those two terms. So AppStream is DevEleven. So because AppStream is one of the lesser known projects in Debian, first of all, what is AppStream actually? It besides from having a very bad name to talk about at conferences because it's constantly confused with AppStream with the U, so I hope I can make it clear when I'm talking about AppStream projects and AppStream, the thing that's metadata. So AppStream is a metadata format which is shipped by AppStream projects. It's a small XML file, which is commonly found in user share meta info or user share update for the legacy as a legacy path. It also is also a name for metadata that's shipped by distributions, which is an XML file which actually aggregates all the stuff that AppStream projects ship in terms of metadata. Additionally, it also includes some things distributions know about, for example, package names or information that was extracted from desktop files or anything. It also contains a set of PNG icons in case you want to build an application center and provide icons for this. So which metadata do we actually provide? The first thing AppStream does is to separate the different software we have in a distribution into a set of components. There are a few component types, which are mainly the component type generic, which can be used to describe any software. The component type desktop, which is for GUI applications, then of course firmware if you want to have metadata for firmware and fonts, codecs, input methods, add-ons which extend other metadata or other components. This is what we currently separate our software in the archive into. There are a few ideas for more categories, but I will come to that later. AppStream also provides a unique identifier for every software we have, which is set by the AppStream project. So a unique identifier means that there's one tag the application is associated with which can be used across all the distributions. So if there's one RKDE-Gwenview.desk identifier, we can refer to it as this on Fedora, Debian, Arch and whatever the software is running on. This can ideally be used, for example, quickly check if software has security issues because you can check which version do I have of this software with this particular name and then have a reference list with those names and check if there's anything wrong with it or just refer to it or just have a button which you can install it for. It's basically an equivalent of a package name by distribution agnostic. The metadata also includes a name, summary, description, URLs, categories, keywords, screenshots, icons, etc., so anything you can think of which helps the user to make an informed decision on whether they want to install this application or not, or which helps the system to determine whether the user might want this application. For example, it also includes supported mime times, mod aliases, binaries, libraries, etc., which can help to guide the user to installing an application. For example, if the user has some new file type which he can't open, the system can suggest a set of applications which might be used for opening that file. This is how it looks like in the upstream metadata, so this would be a file that upstream projects ship. I think I can skip this. It contains every along description, the ID, the short summary, etc., and all of this can be translated upstream which helps us a lot because we do not need to translate stuff downstream anymore, so the translation work is done once and in the upstream project and we, as Debian, don't need to do initial translation work because upstream has done it for us. So this is an architecture overview, it's slightly dated, but still useful to understand how this works. In Debian, we have a compose server which is the thing which creates the final metadata which we ship to the user. That one takes the data, the upstream project ships, enhances it with things we already know, for example, desktop file, data or package config information, and then creates, assembles a huge XML file or in Debian's case, a huge demo file which gets transferred to the user system where it then is used for example for software centers which are primarily the primary users of upstream metadata today. So a few examples what you can do with upstream, you can, for example, check what applications provide a certain modellia handler, so for this one in particular you have a Pymissal which can make use of this device, then you can search, have a full text search, you can even use it to install applications by their ID instead of referring to a package name. So who uses this in Debian? Those are mainly the software centers which I know software Plasma Discover and elementary sub-center, but also a few more so I couldn't just include them all, but those are like the main ones that I know about. Then there's a tool called Isancram which suggests packages to install when inserting new hardware, which is really nice to browse the archive to see what software could I use to get the most of my hardware. A Flatpak and Limba are software bundling solutions which also make use of this metadata as like their primary format to get information about applications and to enhance the data we have for software centers. So they can decide do I want this stuff installed from a distribution package or do I want this stuff installed from a Flatpak bundle, for example. Firmware Update is a service which installs flash firmware on devices which is a very interesting topic, but it would be a talk on its own. So I will just mention it in here. And also obviously the distributions currently at least those mentioned below are fully supporting AppStream. There might be more, but I think at least McGeyer is investigating it, but I'm not sure if they are already implementing it. So what do we do at Debian? We use a general representation of the distributed data. Initially this was supposed to be XML, but our FTP masters don't like it. So we use Jammel now as the main format for this. Yeah, and as you might remember in the last talk at the last Debcon, this was still not happening. It was just an idea that we might ship this at Debian. Now we do, as you can see, if when you open GNOME software and it has organized metadata. So what we did was deploy a tool called Debian Generator on AppStream.Debian.org which extracts the metadata from packages on an external machine and then sends it over to the FTP masters which included into the archive. And then we also use a new feature of the app packaging tool to make it automatically download this data for us and have AppStream CLI put it in the right places and build caches for it. So then after we deployed Debian Generator, we already made Ubuntu use it. We decided to replace it again to not to annoy Ubuntu, but because we found some really severe limitations in the Debian Generator, which is mainly bad performance. There were a lot of Python 3 multiprocessing issues which weren't not only because of Python weirdness, but because the libraries and tools we were using weren't really designed for a multiprocessing case. If we relied on contents.jsat, which is bad because we had one case where the contents file wasn't updated properly at Debian, so a lot of stuff wasn't extracted at the AppStream Generator, and it was even worse for Ubuntu which only generates this daily or weekly. So for example, if stuff moves icon around, we didn't have enough data for it. So also it was Debian specific, which was a deliberate design decision at the beginning, but in the end, yeah, it was obvious that it didn't need to be Debian specific, so we thought about making it more broad and allowing other distributions to use that thing because it was really useful for other distros as well. Also there were some very bad design choices in there which made implementing features which Ubuntu needed, but which we didn't want to Debian really hard. So rewriting it was basically a quintessence of working together with Ubuntu to get it into their infrastructure. There was also code duplication with the AppStream, which was initially intentional, but it's much more maintainable if there's one implementation or well actually two implementations which is written wide AppStream metadata rather than having to maintain another implementation in Python to do that the same task. So the replacement is now called simply AppStream Generator. It's a completely right, it's distribution agnostic. It's very fast compared to what we had before. It can read depth files directly and doesn't rely on the contents file anymore to get the file contents. And yeah, it's written in D which is kind of obscure, but for this tool it was the right choice. So going on to AppStream.dev.org and these are the current statistics on metadata we have in the archive. The red things are errors. And yeah, the green is valid data. One thing you need to know when looking at these graphs is that one package might contain multiple metadata or multiple valid metadata components and also might emit multiple errors. So this is an accurate mapping of packages to errors. But yeah, as you can see, there's still a lot of work to do, there are many errors and any error that's found there could be a successful component extraction and all those warnings are usually old metadata which needs to be updated to a new format or there was a failure to download some screenshot from AppStream. So these are also things which would need to be fixed in packages and it's also something that we can not really do anything about on the AppStream generator side. If you look at the evolution of the metadata this looks quite poor but if you include because it's only from the point in time when we started to use AppStream generator on AppStream.dev.org but if you include the data that the Dev11 generator had we see a linear growth of valid metadata but it has a very, very, very, very small slope so there could be much, much better, much bigger growth in valid data. This bump you see there is when the counting has changed of errors and warnings so this is unfortunately not people uploading lots of valid data to the archive. So which brings me to future plans. We, I have for ideas, I have for this project and the AppStream generator in particular. One of these is to fill in the provides type that provides entry for libraries and binaries and Python modules automatically. One reason for that is that tools like PIP for example could automatically download the Debian package if there's an equivalent version in the archive instead of using the bundled version but for this it will blow up the data, it will be much more metadata we produce for this so we first need to check if it's really worth it and if it would be used by the respective tools. There are also some ideas to include all the binaries we have in the archive in order to get rid of the command not found stuff downloading all the contents information and extracting the binaries out of it but this is something that needs to be evaluated in future. Yeah, font metadata is something I'm working on which we don't support in Debian right now but Fedora does it. Fonts are really hard because there is work needed from the font package maintainers who really need to include a meta-info file especially because the data which is in the font files itself is almost unusable for a nice presentation to use this so but yeah as soon as support landed in an upstream generator I will reach out to the font people and ask them to include metadata. Yeah, this is one controversial entry. It might be useful to provide the metadata that is currently in contraband non-free in main so if the user inserts a new device which requires proprietary firmware the system can ask hey this device could work better if we had this firmware but you need to enable non-free for that, do you want to do it? This is currently not possible because the system will simply not know that there's something in non-free which could help so yeah this is an idea to work around this problem but on the other hand of course we would have like a very visible reference to non-free in the main archive which might not be desirable for a project like Debian. Yeah, obviously one of the biggest things which needs to be done is to show those errors and warnings on tracker.debian.org in order to increase the visibility of metadata issues. There's currently a patch for it and I'm working on to make it merge ready for the package tracker maintenance. Yeah, obviously there are a few issues in the metadata which I'm not catched, I'm not caught and we could also do some stricter validation but this will only follow when we have a substantial amount of data already in the archive. Multi-Arc support is something which is mainly useful for Ubuntu and for Skype. So if you have Skype in an archive which is not in an architecture which is not your main architecture and you will not have metadata for it so yeah, adding support in AppStream for that is a bit tricky but it will be done. Yeah, also splitting metadata up into GUI non-GUI parts might be useful for servers because there apparently are people who want it on systems which do not have a GUI front end and don't have an application store mainly for which library is in which packet or for command line applications. So this might be useful to achieve smaller download sizes. Yeah, also creating binary disks for the icon table is in the same area of work in order to reduce the download size. There's a bug report for this but it's a very hard problem to solve so I'm not sure if we can do it or if you should do it. Yeah, a mechanism to replace bad screenshots is for example, if someone uploads pornography and we download it into Debian, we want to replace it or if someone uploads a screenshot which contains copyrighted material we do not or cannot ship it with Debian and display it there. So we need the mechanism to easily replace bad screenshots. I'm working on this, this will happen soon. So this is my favorite slide, what can you do? The main thing is write meta info files and submit them upstream because that's where they belong to. Also fix all the issues highlighted on aftim.debian.org. So ideally at this DefCon go to the page, go through the issues pages and see if your packages are affected by some problems and if you don't know why some issue is shown there please talk to me and if you think it's a back in aftim generator please report it back for it. So obviously patches are welcome, back reports are also welcome and there are instructions on how to properly package and create metadata on the wiki. So that was it, are there, this is the page how it looks like but yeah, are there any questions? So we have three minutes for questions. About 15 or 20 years ago there was something called an LSM file which had a format or the meta information for package. Have you considered parsing it or reviving it or just extending it? I must admit that I've never heard of that. It was actually reasonably popular. I think software map had even like some searching functionality and if you look around in the very old archives you'll see most, well many targz files also have a .lsm file. So it might be something to maybe extend. Okay, if there are still packages in the archive which provide these LSM files we can definitely think about parsing them in the upstream generator and producing the final XML out of them. Cool. So for a user to make an informed choice he often wants to know what license the package is under. Have you considered including licensing information in the upstream file? Yeah, it's not mandatory but it's even a recommended tag to add licensing information. We use the SPDX licensing tags for that so it's machine readable as well and so far I think almost all projects use it. I need to check if the staffs if there are some meta info files. Do you think we could generate Debian copyright from that or do you think it would make sense to do that? I don't think so because it's not as fine grained. In the meta info files you have for example LGPL and GPL for the whole project or for this whole software component and the Debian copyright file is down to the file level in terms of information about licenses and that's maybe a bit too much for most users. Oh no. Like, easy question. We've started a little initiative in Ubuntu to try and increase the coverage of AppStream in the archive because as you know, we added AppStream with our 1604 LTS release. So people found it hard on both sides so Contribute has found it hard to understand exactly what they had to do. So I guess that's a question of documentation but also sponsors found it hard to know if the package that they built was actually going to be accepted by the generator in the end, right? So I'm wondering if you have any ideas about how people can go about testing that or if maybe it's possible to make a submission service so that you can see if your thing is good or bad or make it possible to run the generator in a mode that just takes one Debian and tells you if it's gonna work with the archive or something like that. Yeah, ideas or thoughts about maybe what we could do there to make it easier for sponsors and for new comers to come and work on this stuff because it seems like it would be a nice initiative for new people to work on maybe but it's also not so easy to come in and start working on it. At least for the AppStream metadata files, you can find AppStream CLI validate on it and then it checks for formal errors in this format or AppStream Util validate as well and yeah, for this, the most issues in the archive exist because there are XBM icons or missing icons for something. This is a bit harder to check. Currently, there exists no easy tool for it. You can of course fire AppStream generator edit but that's a bit of an overkill. So yeah, I think it might be useful to provide the service and to provide an easy tool for this. So you generate AppStream from desktop files most of the time and these are the things that people are working on. So it wasn't actually stuff with meta info files. It was typically like something which doesn't come with a good enough icon or something like that. So I know you have guidelines as to like where you want to put them and like where they should end up and what format they should be in but it was just like, how do I test if this depth that I've made is actually going to be accepted at the end of the day? Yeah, there are instructions on the wiki which are, I hope detailed enough but we should expand them if there are problems. Thank you very much. We are out of time. So thanks again.