 Okay, so let's start with a fairly short talk, even shorter than what I planned. So I'm going to talk about Ultimate Debian Database, that's a fairly old project now inside Debian. So the initial motivation for UDD was that we have a lot of different services, more or less small services inside Debian, generating a lot of data with different data formats. We have text files, the Berkeley DB, SQL databases, the JSON, YAML, and we often have the need to combine them all, mainly to do QA, such as asking what are the packages of priority, higher than standard, with RC bugs, who are the maintainers, who have lots of audited or buggy packages, and doing that kind of queries before UDD was quite difficult because you had to pass data coming from different services and combine them using an ad hoc script. So the idea of UDD is to take all that data, import it into a single PostgreSQL database, which makes it much easier to query because, well, mostly everybody knows at least some SQL. It's also the proper way of joining data together, and then you have no need to write a problem-specific script. You just query UDD for what you want to know about Debian. So it started in 2008 as a Google Sum of Code project. The student was Christian von Essen, who left after the project. We were three mentors at the time, with Mark Borschmiet and Stefano. A few slides about design choices in UDD. So really the goal was to do something that was not problem-specific, so not optimized for typical queries, but rather make it easy to ask about basically anything. So typical user is a human because it is expected that the most time will be spent thinking about the queries. So performance is important, and we try to optimize for it, but it's clearly not the first goal. When we have to choose between ease of use and performance, we usually choose ease of use. One consequence of that, there are no surrogate keys. So surrogate keys are things like that, like columns with an ID abstract identifier instead of a set of text field, for example. Another design choice of UDD is that correctness is critical. Correctness of the data we have in UDD is critical. So we don't do partial updates of data except in one case, because it's usually quite difficult to just figure out what's changed and apply the diff. So we do complete data reloads for most importers. The only exception is the bugs imported because doing a full import requires reading a lot of separate text files because the bugs.dbn.org uses text files as the main backend. So we do partial, we scan bugs. Well, we update bugs that get updated for some reason and do complete data reloads only a few, well, every few hours. So this is completely hidden from the user because we use transactions to avoid having empty tables at some point. Another design choice is that we want to be inconsistent because Debian is inconsistent. For example, if you look at the various services we have, we have various definitions of packages. One good example of that is popularity contest where you have a package column in what popcorn exports, but that's actually just a copy of what users of popcorn provided. So if they put a pass or a file name in that, then it ends up in the data that popcorn Debian.org exports. You can look at what's in there but some interesting stuff. Probably users have data corruption issues. So this, we keep that inconsistency in UDD because that could be interesting for QA, which means that we don't try to have foreign keys between different data sources, but we also provide some views that hide inconsistency to make it easier for the user to run queries. So it's currently running as UDD Debian.org. There's also a public mirror hosted by Ashish. Okay. It uses PostgreSQL 9.4. You can connect from QA and IOS using PSQL service again UDD, which means that even non-DDs can connect. There are a couple tables that are not available if you don't connect using the guest DD account, but usually you can just connect using the guest account and well, there are two tables that are unavailable. I think is the ADAPT table and the PTS subscription tables that doesn't make sense anymore anyway. And there's a wiki page with all the info about how to connect. So what's in it? So the main stuff is sources and packages for each suit in Debian, bugs data, stuff about the developer's identity. So Carnivore is a mapping between people, GPG keys and email addresses. So Debian ADAPT, list of Debian maintainers, PTS subscription, PTS subscription. We have stuff about Debian derivatives, mainly for Ubuntu with sources and packages, bugs, upload history and popcorn in Mintian. And also some of it for up to seed. We could add more if you would like to have your own derivative added to UDD that's really easy. And these are just the ones where someone cared at some point. And additional stuff. So there's a long list here which covers, I'm going to go through it because it would be boring, but there's basically most of the sources of data that exist inside Debian. If you know of something that isn't here, well tell me and probably we can find a way to add it. I know at least one that's those which isn't included. So what can we find out about Debian using UDD? So one example query is the number of different Lintian errors or warnings. So the packages with the highest number of different errors or warnings. So good examples to stress test Lintian or something that. So the first package has covers Lintian quite well. Yeah, dead bugs here. And who applauded packages currently in seed? So that's the list. You can see, I'm not sure if the girl is in the room but he is around. But yeah, Clint is here around as well. There are so a few dashboards that are built on top of UDD. So, oops, so let's try to resume. Can you still read? So the first one is UDD bug search. So this is a big form that you can use to search for bugs. So what's interesting is that it provides a few views for bug squashing parties or bug squashing in general during the release. So there's a bug squasher view. So bugs affecting seeds and trash, not backed as down, not tagged patch. Sponsor view, typically bugs with a patch waiting for someone to upload them. Or bugs with a strange state and a real steam view to sort out migration issues. You can also use it to query for bugs for specific packages. So there's a list of teams here. You can query by email, which means that you can search for your own bugs. So for example, that's the list of bugs in packages I maintain, co-maintained, or sponsored upload for. That's a good way to build your to-do list and search for bugs. And you can, of course, that's basic JavaScript, but you can play with the columns and et cetera. Another dashboard is a maintainer dashboard. So similarly, you can search for using emails or just a list of packages. And this generates several different views. First, there's a to-do list with the likely things that should be fixed in your packages. So if you don't know what to do, if you have a few hours to dedicate to them, you don't know what to do. That's a good place to start. So that's my page. I know it's not really in a good state. But most of those packages are either co-maintained or not very important. So I should be fine. So it's listing stuff like missing builds, testing migration issues, new versions in either upstream or in the VCS. There's a big table with versions of packages in each suit and upstream and on mentors. And there's a big QA, well, bug security issues in QA at checkstable, covering most of the QA tools we have. So just playing with that, you can see which are the packages that are currently issues reported by the reproducible builds people, for example. And there's a last table about status and derivative distributions. So for example, you can see that this one is as an outdated version Ubuntu, this one as an outdated version of Ubuntu, but with a patch, so way to highlight needed actions with respect to derivative distributions. And the last dashboard I wanted to mention is BAPAS. And this one lists packages with some issues that might not be problems. This one is not really a problem, but for example, this one is quite interesting. That's packages that are maintained with NMUs. So for example, this package, the last eight uploads were all NMUs, which probably means that the one doing all those NMUs, I think in that case, most of the NMUs are done by the same person. It's probably someone that should adopt but doesn't want to commit to adopting the package. So just some thoughts about things to improve. So UDD is mostly a one-person project, which means that when something fails, and that happens quite frequently because some services like to change their output format, for example, if I'm not paying attention, it can take some time to get fixed. So it would be really great to find someone that can have time on a regular basis, just to care about watching DVD. So there's a CGI here that lists current status, the last runtime of each data importer and whether it was successful or not. So that's a good starting point to contribute to UDD. Until yesterday, there was no really good development environment. This is no fixed, so you can just run a background and get UDD running on your laptop in a few minutes. This doesn't include importing the database because it's quite large. But similarly, if I have a script to just download an import, you just need to give it some time to finish. It takes about one hour here to download the database. More collaboration with the GDPU and Tracker would be great. Some overlaps on data is passed in several places. One clear limitation to collaborate more is that it's quite hard actually to share data inside the data infrastructure. If you want to access from one machine files that are generated on another machine, there are a lot of not-so-nice solutions to do that between sharing SSH keys, but the DSA will scream at you, using HTTP export, but the DSA will also complain. So another not-so-easy way, standard way to do that. We use different ways and different data importers and that's probably something that could be streamlined a bit. So if you have questions, well, I can answer questions. Of course, and if you have questions after this talk, you can get an ISE on the BNQA or on the BNQA menu list. Thank you. Any questions? Thank you. We're slightly over time. Can you hear me? We're slightly over time so we can have a take. One question. No questions. Cool, then we're almost back on time. Thanks. Thank you. Thank you.