 maintain still some of the flat files system that should increase the speed. Before I actually start talking about that though I have to admit that my primary motivation for giving a talk was to try to force myself into actually doing the words that I'm going to talk about in the talk and in that regard I'm only slightly successful so I managed to do some of it but I've become too much of a professional procrastinator to be influenced by even giving a talk so but we'll see how this works out. So the first couple things that I'm going to talk about are just some general bug statistics. I enjoyed showing some plots. I'm going to introduce the basic architecture of dent bugs just to back up. Debt bugs is the system behind bugs.dent.in.org. So if you file the bug, you fix the bug, you've wondered what bugs exist, you've interacted with that bug. It has both a web front end and a mailing system. I'm going to talk a little bit about some of the new features that have been implemented that you may not know if you haven't followed along but I'm pretty sure most people here follow along pretty closely so that might be all old information for you. I'm going to talk about some planned features especially features that'll happen if I suddenly have two or three more people willing to help and hopefully we'll plea with all of you or people who are listening to this talk online or who may listen to this talk recording later to assist me in implementing some of these features. I'm a nice person. We have some nice people and I'd like to see any Perl hackers or CSS hackers or JavaScript hackers or people who want to write documentation help me so that's what I'm going to talk about. Okay, the goal of the BTS is to report bugs, track the evolution of bugs, fix bugs and hopefully reduce the impact of bugs both on maintainers and on users. This is how many bugs we've got versus time so as you can see our bug growth is roughly linear over time. It's actually decreasing slightly but we have a huge number of bugs. People like to track exactly how many bugs we have and Christian Perrier does some fun contest of guessing when particular bugs might be filed. For example, the 760,000th bug will be filed, I think, September 2nd and the 800,000th bug will be filed almost a year from now in September 15, assuming the linear progression maintains itself. Christian will enjoy that but he's not here so maybe he'll see it online. Anyway, that just shows the bug reporting rate. We average roughly 142 bugs filed a day and so you can see that's a huge number of bugs. This is the bug closing graph. It's actually technically not bug closures. This is actually bugs being archived but for the most part this approximates the bug closure rate with a lag time of about two weeks. So we close roughly 95 bugs a day. So from that you can imagine that the bug system is gaining 50 or so some odd bugs every day that are not being fixed. Unfortunately in this graph you can see that the bug closure rate is decreasing. In context with the bug reporting rate also decreasing, this is something that I've seen in previous posts that I've made on my blog. This is actually kind of disturbing. I'm not sure what that means for Debian as a whole, whether it means anything but I'd much rather see the overall rate increasing than decreasing over time. In this graph you're all familiar with, this is RC bugs which are the bugs that are most important. Luckily the RC bugs are those that matter for the next release are decreasing. So we're getting in line for a new release there. Of course there's always too many RC bugs. So that was enough fun graphs. Now I'm going to talk about the actual DebBug system and how it works. DebBug's two main components. There's a mail back end which is what you interact with when you email control at bugs.debian.org or submit at bugs.debian.org or a bug number at bugs.debian.org. That system runs on a machine called BuxTahood which has all of the files and processes your email. The other aspect of DebBug's is a web front end. That's what displays information on what bugs are in which package and the bug logs that you can interact with. It's also mirrored onto another machine called Beach so that it's ideally slightly faster to interact with. DebBug's interacts with DAC which is a software which is responsible for maintaining the archive. So DAC tells DebBug's who maintains which packages. So DebBug knows who to send email to if there's a bug in a package. It also tells DebBug's which packages are in which suites and which architectures. So DebBug's can calculate whether a bug is present in a particular suite. So for example, whether the bug affects unstable or whether it's fixed in testing or stable which is what was calculated in the previous graph I showed you. Brittany also is the testing migration software that migrates software from unstable to testing. It uses information from the BTS as well in regards to whether a package is becoming more buggy or less buggy by upgrading it. So the actual thing that does that is sort of attached to DebBug's it's called bug status and it provides a list of bugs and it also does the RC bug graphs. But it provides the list of bugs to Brittany. DebBug's itself looks like this. There's mail comes in. There's spam processing that happens on the first in. We try to throw out as much spam as possible. Blar's Barson who I believe will be here soon. I'm not sure if he's here yet is primarily responsible for keeping the bug tracking system relatively free of spam and de-spamming the few... Yeah. Expanding the few or sorry, expunging the little bit of spam that actually makes its way into the BTS. It does a largely thankless job but people who... So if you see Blar it's thank him for doing that. It's not a task that I would ever want to take on myself and so he's done a really great job. Anyway after the mail has been de-spammed then it goes to process all and so process is responsible for handling email that gets sent to submit and email that gets sent to bug numbers. So like if you send an email to 12345 at bugs.diamond.org that's where it goes. Service is responsible for handling all of control. So an email you send to control is Catalyst Service. Now with the advent of control at submit or control at any other time you want to these... This diagram has gotten a little bit blurred. There's actually an abstraction that service talks to that process can talk to as well but that's the basic idea. Then all of the information is stored in flat files in a DB-H directory which has a small hash function to split out the bugs and is indexed with a couple of flat file indexes and then the CGI scripts use both the index indices and the flat file system to display bugs to users. Okay so that's how dead bugs looked before I started working on the database. So the current plan is to add on and basically replace the indices with a database layer. And so I'm going to keep parts of the flat file just because that's a well-tested system. There's lots of things that already parts the flat files that know how to deal with it and add on top a PostgreSQL based database that the CGI scripts will actually utilize in order to display information to users. This will help both increase the speed at which you get results back if you're looking at complicated packages and also enable you to do more complicated things like doing bugs that actually affect a particular version without waiting for huge amounts of time for a query to complete. So for example if you wanted to look at all security bugs which affect unstable well that's actually a really hard query to do without a database layer doing that. So that's one of the major things that the database is going to do. And so the script that actually handles loading things into the database is called dead bugs load SQL. So dead bugs is written in Pearl so you don't like Pearl. I'm sorry but Pearl has recently come quite a ways in its handling of databases. Most everybody has adopted the Pearl idiom of using DVI in order to talk to databases. It's a fairly successful database abstraction layer. But anybody who's ever written code in DVI knows that it's extremely tedious to do joins and complicated statements where you're constantly writing SQL and dealing with escaping and etc etc. Or using placeholders but still it's something that you have to keep track of. So DVIX class is an extension that gloms together a huge number of Pearl modules into a really coherent database abstraction service where if you give it your schema it will build classes that enable you to talk to each of the result groups from your database. So it's a complete system. You can actually write a schema entirely in DVIX class that you can then convert into SQLite. You can convert it into PostgreSQL, MySQL, etc. In this case though I'm primarily interested in writing for PostgreSQL. That'll actually be the primary database backend. There might be an option eventually to use SQLite for testing but my primary goal is to deploy to PostgreSQL. I'm also using a bunch of classes that are specific to Debian. For example the Debian version extension to PostgreSQL that enables you to sort by Debian version because that's extremely important for the BTS. And that's something that's handled very well in PostgreSQL. So what I've actually done is I write the schema directly in SQL and DVIX class has an extension called schema loader which handles converting the SQL schema into the class declarations for DVIX class automatically. So you just write plain old SQL like you're used to and it automatically creates all of the database related Pearl classes that you use to talk to the results from the database. There's another module which handles deployment so it can do automated upgrades from different schema revisions so as you change your schema it handles doing both upgrades and new installs that a new schema which you can also it can also do downgrades and you can do other things in addition to just executing SQL alter statements you can also run Pearl code or anything else you wanted at the database at each upgrade step. So that'll enable much easier changes to the schema in the future. Finally the actual module in Debugs that sort of abstracts this all out is DebugsDB and so all of the database interaction classes in Debugs are under that SQL module. So this is the it's kind of complicated but this basically tracks all of the bug relationships it tracks who corresponded with the BTS has all the source package versions binary package versions and version dependencies so for example when you upload a version that was based on a previous version this enables all that to be tracked and it's I've taken the DAC SQL schema as as inspiration but unfortunately the Debugs SQL diverges from DAC and that's maybe something that if I was small it would be less but I think it makes sense currently so but anyway if somebody is a PostgreSQL genius or an SQL hacker and is interested in maybe offering suggestions where I could make these more identical I definitely be interested to talk to you about. Okay this is actually pretty easy you just call Debugs SQL bugs and it'll load them there are two different parts of bugs in Debugs there's the ones that you can actually modify and then there are the archive bugs so this handles dealing both with both sets of bugs you can also load versioning information this loads which packages are dependent on the ones and the dev info loads the architecture and source versioning information. So the SQL is actually working this is an example of a handwritten SQL query but you could also write this using DVI class and let me show you that this actually works well in theory here okay I'll just try it here and I'll show you the results so it'll be easier so I can run the select statement which is just selecting the count of bugs where which have been modified since I think that's June or July which are not done and sorry which are done and which there's an owner set and the answer is 521 currently so I mean you could see that's a full load of all the bugs in Debian and the actual SQL query I mean executes fairly quickly of course I'm replicating the same queries everything's been cached but I'll give you an idea that it's still relatively quick so I had hoped to have more of this done by the time of this talk but still a lot of work that's needed the log files currently are not loaded and so the log files are all the correspondence with the bug and so that's needed to enable full text searching of the BTS it also currently doesn't do status caching and so that's what will enable faster loading of the package report page and then it needs more work on the deployment to or I actually need to deploy it to the servers so you guys can use it and it's not just sitting on my development setup okay so that's the major work with SQL which is the major thrust of this talk I'd like to talk just a little bit about some of the new changes that have been done one of the more recent ones is using mail to link so let's pick a bug here let's do this one okay here's a nice bug so now there is a reply link which includes the subject the references header basically everything so you can click it it'll open up mod or whatever year MUA is that handles mail to links it'll populate the references the in reply to and will give you most of the message the question is why doesn't the subject also quote the bug number it probably actually should I think that's probably a bug it just quote yeah okay yeah yeah it probably should quote the bug number but it currently all it does is it adds read to the original subject and I was really lazy when I did that but yeah I should include the bug number in the subject of anyway it quote it quotes the original message and it does that for all of the messages in the bug log so you can go down and is the two always the bug number yes the two is currently always the bug number when you click on the reply link no that's correct it does not go to the submitter even if you're replying to the submitter so that's something that that's another long-standing problem that actually needs to be really fixed which I'll talk about what needs to be done in order to fix that okay so other things these are actually old well relatively old force meter merge now does the right thing you've probably or you might have seen force merge failing occasionally but it at least does all the operations that you would have had to do to merge them manually so it calculates what has to be done does it all for you the other major thing is control at submit time which yeah which you just you send any message you use control colon as the pseudo header and you write control commands as usual and at submit time the bug that you're submitting is negative one or the bug that you're mailing is negative one so you could actually same control commands that influence the whole set of bugs for example if you sent the same email to 10 or 15 bugs and used control colon I don't know reassign negative one to some package well you however many bugs that you email so you can actually use that as well to do multiplex control messages if for some crazy reason you wanted to okay this this is a set of the future features that I'm trying to work on status caching that'll come as a consequence of SQL work and so also enable you to do reverse status lookup where you could look up bugs by their status which you can't currently do you can only look up bugs in a general thing and then exclude on the basis of the status which is really slow and not good one of the other major things that I if I ever have enough time is better statistics so I mean I've shown you the plots at the beginning all those plots are sort of manually generated I had to generate them before I gave the talk today I'd be much nicer if huge set of statistics could be generated all the time hourly or even up to the minute so that people can see what's going on with the bugs and who's fixing the most bugs and which packages need more help because their maintainers not responding or who's doing a really good job triaging bugs so we can identify them and thank them and promote them all those sort of things are things that better statistics will help some other things I would like to implement a web-based reporting system not completely web-based but at least with submission of bugs using HTTP to some CGI with report bug or something else as the actual interface to that and that'll get rid of people having to have a working MUA or report but I wasn't currently require working in UA but it require or MTA but it does require that port 25 and 587 work so you can talk to bugs master unfortunately a lot of people's networks currently don't allow 25 and even more are starting to disallow 587 outgoing so but most people at least allow 80 outgoing so that's something that will need to be fixed the major reason why I haven't already implemented that is because I want to make sure that people who are submitting the bugs actually have a working email address now you can mail bugs without a working email address but at least that requires that you could send mail so I want to set that up so that people get an email saying hey you emailed that bugs click this link and once you do that once it won't ever ask you again but that's something that I want to do a second thing is I actually want to release that bugs again that bugs hasn't had a released outside of experimental and it in I mean since the entire time I've been working on it so which is a very long period of time and that's mainly my fault but so it'd be good to do that the third thing that's on this list is bug mailing lists actually in Debugs and so this is what's actually going to solve the submitter issue so the basic idea is that the submitter will be installed or will be subscribed to a per bug mailing list by default where they can easily opt out of it it'll do proper headers and bounce handling and by doing this in Debugs it can also avoid the same person getting the duplicate email so if you are the submitter and also the maintainer this will keep you from getting multiple duplicates of the email if you're also subscribed to the bug list or and the packaging list this will also help a little bit the packaging email you'll still get where from the PTS you might still get but this will at least help reduce the number of duplicates and it will also enable people to set up defaults BTS wide so if they never want to see submit emails to submitter or to bugs which they've submitted then they can opt out once and they'll never see them again or they can decide later to opt in and they'll be able to see that okay the other one is merged bug reports currently have two separate log files which you don't combine so anybody who's ever dealt with a merged bug knows that the history of the bug is sometimes difficult to ascertain have to go view all the bugs and if it's two bugs that's annoying but okay you could do that if it's 20 bugs that have been merged which sometimes happens it's now almost impossible to figure out which bug has all of the history in it which one's the most important so that's something big that needs to be done another one is threading in the email in the bug report view so you can actually see who's in response to you to who another major one is user categories currently cannot be easily duplicated or replayed so if a package has a user category or user tags that you want to emulate it's really difficult for somebody else to duplicate that unless you keep the email that you sent to the BTS somewhere and modify that there's no way to take what's currently in the BTS and pull it back out and replay it on so that's actually something that somebody who is interested in helping out could write that up just as easily as I could so if somebody's interested in working in Perl that would be a really useful thing to merge another thing is remote attachments there's an RFC that enables you to have email attachments which are not included in the email which you can obtain remotely it would be nice to not email out core dumps to people who probably don't want to receive a you know 20 or 100 meg core dump in their email that they're going to download from the BTS anyway eventually so if they ever want to look at it all so that sort of thing it would be very useful and the CGI needs smarter options so that the the query strings aren't as long and they do more the right thing all the time most some of these have actually already been done this is a little old but but there's a lot of places that need so if you're interested in helping and hopefully you or at least some of you are interested in helping please get in contact with me this is how you can actually get started all of the code through running on devian.org is on bugs.devian.org devian branch is the branch that is actually running yeah so the devian branch is the branch that's actually running on bugs.devian.org and you can check that branch out and the branches are also checked out as well if you just want to browse exactly what's running if you want to follow what I'm doing my branches that get done dot Armstrong comm slash dead bugs and I generally try to keep them in sync with what's on bugs.devian.org but sometimes I'm running behind the mailing list is devian.debugs.list.devian.org feel free to email that you can also email me I'm Don at devian.org there's also an IRC channel pound dev bugs on IRC.devian.org and I'm Don Dell-Carlo there so I'm friendly I'd love to talk to anybody if you have questions to BTS isn't working the way you expected it to versioning isn't going the way it should feel free to contact me on IRC or email me and you can also get in contact with owner at owners owner at bugs.devian.org as well owners the group that runs the bugs this is the team and hopefully you will join us so with that any questions I apologize I'm going to be disappearing relatively soon after this talk so if you have questions find me quickly or find me online I'll be at Burning Man the rest of the week so any questions or comments or complaints or do you have a performance goal on the on the update once you have Postgres running I mean what do you mean like when you do a query on an individual bug right now it's usually about five seconds yeah and that's kind of okay for some things but if you're using if you're doing discovery it can be really really onerous I mean it should be less than a second I mean especially for my for individual bug logs so for viewing an individual bug log that's a slightly different problem than the database there's an issue with there because it has to read the entire bug log in order to display it and I'm working on splitting out the individual message messages so the entire bug log doesn't have to be read in order to start displaying it yeah and fortunately part of the reason why I haven't done that yet is because I don't want to reinvent the wheel what needs to happen is some sort of milder like thing with some customization that also allows for detached attachments so that you can separate out a very very large attachment that you're never going to actually display into a separate file and I'm assuming somebody's invented this which is why I haven't actually sat down and done it myself so I've sort of been lazy trying to figure out exactly how that's going to work yeah so just quickly so I've done a lot of work with the historical bug data which you had in there is the database gonna erase this distinction between archived bugs and non-archived bugs so it probably won't erase the distinction at least as far as emailing goes so the major reason okay so the historical reason for archiving bugs was they were deleted at one point time and so then saving them was a brilliant idea and so they're now they haven't been deleted in a long time the other major reason why we archive bugs and why that's actually useful is because it means that they can no longer be mailed and really old fixed bugs primarily collect spam that that's all that goes to them so it's quite possible that the I haven't I mean I'm leading in this direction that I will disable email to really old bugs but besides that they will still be I mean you can still do control changes to them and things like that exactly yeah there'll be basically no distinction between them I mean yeah at least that's the idea so back to the bug at and bug submiter at this is something that has caused problems like people mailing the bug and the submiter not getting like asking the submiter could you maybe try something different and the submiter doesn't get the email I think the mailing list thing is like a great idea but it's probably a lot of work could it be possible to have a new address like dash all or something and get the link in the web page use this new address so that everyone gets the email so yeah the I mean it would be possible to change it so that like emailing the bug number that dash submiter would also email the submiter as well as the bug number currently all it does is email the submiter so I this is a change that's been talked about and I should have just done this already but but I always forget about it so so yeah so I'm going to change dash submiter to actually email the bug and the submiter no matter what even if they opt out in the mailing list eventually once that happens not separate from dash quiet okay so when you email so yeah so the question or the issue is that emailing submiter sets the reply to to dash quiet yeah I'll fix that that's that's dumb the unfortunate thing is there's a lot of things that made sense historically that have changed and some of the things that Debugs I don't even know that it does because I don't always pay attention to what it's doing so okay so I'll fix that try to get everybody one thing that I noticed it's a little bit different about how Debugs works and a lot of other bug tracking systems work is that there doesn't seem to be a way to close a bug as done but not actually done like won't fix yeah like you could do close but the policy seems to indicate you shouldn't do so is there a historical reason for that so you can just tag it won't fix and and close it if you're not gonna ever fix it I mean the the major reason to not close it is historical and it's also because it's easier to discover common bugs that you're not going to fix if they're not closed but yeah that that's largely historical thing I mean there isn't really a way to say that it's not a bug at all like the users is mistaken but you can I mean just close it in that case with a note aside from critical mass of you know 800,000 bugs what does Debugs offer as far as distinction from other bug tracking systems and why do we invest in our own solution I mean what what is it about Debugs that is unique for our distributions bug tracking needs this is a really good question there's two aspects to it the major one is that it's totally interactable via email which is a major thing the absolutely critical aspect though for Debian and why we couldn't use any other BTS system without a lot of work is that Debugs does versioning so it knows exactly assuming that you set up the found and fixed versions correctly it knows in the version tree which nodes have been fixed and which are not fixed and so that enables you to figure out whether a package is buggy and testing whether it's buggy and stable etc all that would have to be added on to any other bug tracking system that we used we were to switch it I just like to say thanks thank you did I miss anybody who had a question wave at me and feel free to ask me as well as well over email or whatever or you can even call me if you want okay thank you very much