 All right, welcome to Exploring Our Bugs. I'm Ben Cotton, the Fedora Program Manager. I like to start off my talks with a little bit of housekeeping every time. So this talk is licensed under the Creative Commons Attribution Share Like 4.0 license. If you have nice things to say, there's my Twitter handle. Happy to hear your nice feedback. If you have not nice things to say, you can keep those to yourself. So what is this talk? As you might expect, it's a look at Fedora's bugs. I started with Fedora Linux 19 and went through Fedora Linux 32. And the reason I started with 19 is that that's the first release where we had the end of life closed status. And so it sort of feels like a good dividing point for the modern era of Fedora Linux bug reports. We stopped at 32 because that's the last release that has reached end of life. And so 33 and 34 are still supported. And so, of course, those bugs are changing on a daily basis. I also didn't include rawhide bugs for a couple of reasons. The first is that this is really sort of more generally focused on the user perspective. And by and large, our users don't really care if a bug gets fixed in rawhide because it never made it to them. Yes, I know there are some people who are running rawhide as a daily driver, but that's a very narrow case. The other reason I didn't include rawhide bugs is because then for each release, I would have to look up the branch days, do the bugzilla search, shove them into the CSV file. And really, that felt like more work than I wanted to put into this talk. This talk is based on curiosity, not convincing. I've drawn some conclusions from doing this, and I have some ideas. But it was really more about exploring what's going on as opposed to me saying, oh, here's my way of convincing you that we should do a thing. I really don't have anything actionable to come from this yet because this talk asks more questions than it answers. And really, the way it got started was wondering, gee, how are we doing with end of life bugs? And are there certain components that might feed into a major downstream that we might be able to use to be like, hey, maybe you should increase the investment you make into Fedora in order to fix these bugs before they make it to the downstream distribution? OK, so first let's take a look at a few basics. The most obvious question you might ask is, how many bug reports do we get? And so that's what this is. And these are the non-duplicate bug reports. I do want to highlight that when I was going over this with Matthew last week, he made sure to say, these are bug reports, not bugs. And I think it's an important distinction, right, because all software has bugs, at least all non-trivial software. And just because somebody hasn't noticed it and reported it doesn't mean it doesn't exist. So if we look, it's definitely going down over time. These are, with the duplicates removed, because I think duplicate is sort of a special class that we sometimes want to consider and sometimes don't. Now, you might be really happy that the line is going down. I'm not sure which direction it's actually. Yeah, the point is, you might be happy that it's trending downward. I'm not entirely convinced that's the case. Carl Fogle in his book, Producing Open Source Software, talks about how having a bug tracker with a lot of bugs in it is a sign of a healthy and vibrant project. And I think there's a lot of truth to that. I actually have a much longer tangent. I can go off on that concept, but that's for another day. So I don't know if we should take this as a good thing or as a lack of engagement. It could be both. All right, so now let's look at the components that get the most bugs. And this is, again, over the entire 19 to 32 window. As you might expect, there's really nothing on here that's a surprise. These are all things that most contributors will have heard of. The kernel, for example, having the most, partly because I think that component tends to be a little bit of a dumping ground for people when they don't really know. They just say, well, it's probably in the kernel. So this is a lot of bugs. And you can see there's a very sharp drop, 9,000 to 6,500 to 36 to 30. So it drops off pretty quickly. So then you wonder, what component has the fewest bug reports? A lot of them, it turns out. Roughly 98 and 1 half percent of our components that have bugs, bug reports, I should say, have fewer than 100 bug reports over the 19 to 32 span. 86% have fewer than 10 bug reports. So that's a very little amount. And so if we graph the distribution, we can see that it's very weighted towards the low count. So here in the x-axis, you see the number of bug reports. And the y-axis is a log scale just so you can see the trailing end. So the next thing I want to look at is priority and severity. And I'll admit that this result here completely surprised me. Because in my head, every bug gets marked as urgent. Because everyone's like, this really affects me. I took the time to report it. So clearly, it's a problem. And so when everything is urgent, nothing is urgent. But it turns out the distribution is kind of what you would actually expect. And because bugzilla defaults to unspecified, it has the largest number. Most people don't bother to set a priority. Now, in bugzilla, the intent is that priority is the developer or the maintainers sort of ranking of where they're going to work on it in what order. And then severity is the user impact. I don't know that most people really make that distinction. In fact, I actually looked at the bugzilla documentation to see what the intent was there. So I think most people probably treat those as being two versions of the same thing. Anyway, so we take out the unspecified. So only when it's specified, we see a very narrow sliver of urgent, which is what you'd expect, a little more high, the bulk of medium, and fewer low. And I think this is important to point out the bug versus bug report concern. Because I think for a lot of people, if it's truly a low priority bug, they're probably not even going to bother reporting it. They might not even realize it's a bug. It might just be sort of a minor polished thing that they don't even pay attention to. So I do think that if we had somehow a way of omnisciently knowing every bug that ships into our Linux, low would obviously be much higher, but we expect the reports to be a narrower slice. So let me look at severity. And it looks roughly the same, although the Pac-Man is a little bigger. And again, when we take out the unspecified, we see sort of the distribution we expect, a little bit higher urgent. And this actually has a really cool feature that if you turn your head to the side a little bit, it makes a pretty good peace sign. All right, so let's look at duplicate bugs. This is the graph of duplicates in Magenta and non-duplicates in blue, per release. So one thing that kind of stood out to me is being interesting is how the number of duplicates don't really change much and they're fairly independent of the non-duplicates, which I sort of would have expected that it would sort of increase, there's a word here, relative to the number of bug reports and there might be an inflection where once you get about like say a thousand bug reports or 10,000, it's just much harder to keep state. And so people just, there's more duplicates because it takes longer to search through. Turns out this doesn't really seem to be the case. And if we look at the by percentage, we see, yeah, it's not a consistent relationship. Maybe you can kind of draw a line, but it doesn't really seem to make sense. So we look at the fewest duplicates per component. There are two that have less than 1% of their bug reports. I don't know that this really tells us much other than maybe the people who file bugs here are also really good at searching for duplicates or perhaps the maintainers are really bad at closing those bugs as duplicates when they are. So I wanted to see, all right, well what components have the most duplicates? And it turns out there are 63 components in Bugzilla that have 100% duplicates. That was really interesting to me. I went and looked at a few, I didn't look at all of them, but several of them were duplicates of, hey, this new version is available and it was closed in Rawhide. And so this bug, a bug mark advance, the release was closed as a duplicate of the Rawhide bug. So if we take out the 100%, these are the components that have the most duplicates. Again, I'm not sure it really tells us much. So I then wondered, is there a way to look at the relationship? And it turns out there's not really a lot of sense here. The percentage of duplicates looks roughly steady once you get above about 1,000 total bugs with a little bit of a tail at the end, which again, I think is more that they're not getting triaged properly as opposed to there are actually fewer duplicates. Another thing to look at is the source of the bug reports. Are they coming from humans or automated systems? And in the limited amount of time I had available, I basically said, all right, ABRT or not ABRT. And I searched on that by looking for the square brackets, ABRT in the bug summary. So the percentage of bugs kind of all over the place. If we look at the raw numbers, you can see it's fairly steady with a big drop in 32. I'm not really sure why. Maybe somebody has an explanation for that. So now we'll get to the juicy part, the bug resolution. So if you haven't paid a lot of attention to bugzilla, when you close a bug, there are a handful of resolutions you can pick from. And in order to make this hopefully more meaningful, I put the closures into three categories. There are happy resolutions, sad user resolutions where basically the user said, here's a bug and we said, go away. And then sad maintainer resolutions where the user said, here's a bug and they either never followed up with the further information or it wasn't actually a bug or whatever. Again, I excluded duplicate here. There's a case to be made for including that as a sad user or sad maintainer or both. There's also a case for calling upstream, not a happy resolution, but a sad user resolution. I've seen people you do it both as, no, go talk to upstream about it. And also this is fixed upstream and we'll get it in when the next upstream release happens. So it's kind of hard to make a decision on that. This is what I went with. At the end of this talk, I will have a link to the Jupyter notebook that I used. You can change those categories into your own analysis and see what comes out of it. So if we look at it by category, the happy is not nearly as big a section of the bars we'd like and the biggest chunk is the sad user. So I want to look at the EOL closure percentage more specifically because every six months when I go through and close bugs as end of life, I see on social media and on mailing lists and in the bugs, Fedora never fixes my bugs. They're the worst. Why do they always close these bugs end of life? I'm never using Fedora again. And it turns out maybe there's some justification to that. We've had several releases of 50% or greater of bugs. Bug reports were closed end of life. Although we seem to be getting better. What stood out to me here is the very sort of periodic nature. And I was like, I wonder what that is. And I thought maybe it has something to do with the Red Hat Enterprise Linux development cycle. And maybe there's a time when Red Hat engineering is really focused on that. And so the people who persuade in both rel development and Fedora maybe are ignoring Fedora for too long. And I looked at the release schedule sort of quickly but also I didn't see anything that really explained it. Maybe it's just sort of a fluke. It's hard to say. But this is definitely one I think that if somebody else has some ideas, I would love to hear those later. So once again, we looked at the do a bike closures. Turns out if you don't want your bug closed the end of life, file it against curl apparently. Less than 1%, that's actually really impressive. And then I wondered what components have only end of life closures. It's about 2,000 of them. That's an awful lot. So again, we take out the not 1,000% here they are. And this is not to shame anyone because I know everyone's doing the best they can. I'm not sure what we can do to fix this but just from a perspective of helping our user community. This is something I think we should try and see what we can do about. So another question you think about is not just does it get fixed or not but how long does it take? The time to resolution. And so most of them get fixed in the first 100 day or they get closed I should say in the first 100 days. So I put it on a log scale and again, it's kind of a line. It's like, okay, why is that a line? Who knows? Sort of a natural exponential decay kind of thing. Tonya, more questions than answers. So we did a box plot doing it by release and these again for all closure types. And you kind of look at it and say, all right, I think I see sort of a downward trend. Obviously there's a lot of outliers in the upper end. So if we look at it as a line graph. Yeah, we're definitely trending downward both the mean and median really since about Fedora 29 or so are trending downward really high for Fedora Linux 19 in particular. And because that was the first time we've had the EOL closure category. I think perhaps there might have been a lot of bugs that were really old that finally just got cold there. And then so again, by component, the lowest median, there are a few components that I'm pretty sure just file bugs for themselves immediately close them because they fixed it and they move on. And that's a perfectly valid use of bugzilla and that's awesome. Maybe there's something else going on, but we're talking, these are probably a very small number of bugs and all closed very quickly. On the other hand, some take a long time. There's a few that again, 10 years roughly. And I'll get into a little bit more of what I wish I could do with this data in a few minutes, but that seems really bad that half of your bugs were closed in 10 years or more. I'm really hoping this is me just completely screwing something up in the data analysis. So again, like is this a function of the number of reports? And it turns out it kind of seems like it's not. Once you get above about 750 or 1,000 and you see a pretty steady line. And again, there's not a lot of components that have that many bug reports. So it is a little sparse of a data set. But it'd be really interesting to go back to do this again in 10 years when we have a much bigger corpus of data to work through. This is only, I forget, something like 60, 70,000 bugs maybe? Maybe more, I don't know. So anyway, we did the same thing for only the happy resolutions. And we're definitely getting better at actually fixing things. So that's good. When we fix something, we're fixing it faster than we ever have before. And again, these are the components by the fastest happy and the slowest happy. And then let's do this again, but with the sad users. Like how, one of the stories I like to tell is one time I was applying for a job and I got the rejection notice within 45 minutes. And that was one of my favorite job applications ever because I didn't have to sit there and wait. The bandaid got ripped right off. And if we're not gonna be able to fix something or we just not gonna get to fixing something, generally I feel like it's better to let the user know quicker than to let it languish forever. So we're pretty steady in how long it takes the sad user. And that's probably a pretty good concentration at around 380, 390 days, this is the end of life. And so again, doing the line chart, we can see the general downward trend with a little bit of upward. So we'll go through and again, look at the other components quickly for lowest sad user and highest sad user. All right, so what's next? So you'll notice I didn't do sad developer. It's not that I don't care. It's just that this is a 25 minute talk and we're 20 minutes into it. So I wanted to be able to get through. So you might ask, well, why didn't you include things like version changes? And I would really, really love to be able to see how many times a bug gets moved from 20 to 22 to 24, blah, blah, blah, blah, blah. It kind of gets punted down the line because we know that happens a lot of times with bugs that are closed end of life. Somebody says, yes, this is still an issue and it gets reopened and moved to a new version. The bugzilla query I used doesn't really have a good way of getting that data. There's probably a way to get it somehow but I just didn't have it available. Another thing I thought about is including assignees and not necessarily individually, looking at individual people but just looking at the count of assignees. Does that change over time? Is that going down while the number of bug reports is going up? Is it saying steady, et cetera? The reason I didn't do that is two fold. One, I feel like a lot of times the person who actually fixes the bug is not the assignee and a lot of times in some bugs, the assignee never really shows up in the comments but other people do. And so while that's good, I'm not sure that doing the assignee really tells us much in terms of actual useful information. And the other reason is when I do the assignees, that's the assignee's email address and even though it's pretty easy to get from bugzilla, basically if you're logged in, I think you can see it. But I didn't necessarily want to have that sitting in a CSV file and a repo that everyone could get to because it felt a little open to being spam. So that's why it got left out. And then I didn't include the freeze process mostly because I just didn't get the whiteboard field when I did the queries initially. So I could go back and redo that pretty easily. I just didn't get around to it. I do think it would be interesting to look at over time, are we granting more freeze exceptions? Are we granting fewer freeze exceptions, et cetera? So I do want to look at that maybe at some point. And so in the future, hopefully soon, definitely real soon now, I'll be writing some community blog posts with some of these graphs and some of my extended thoughts on some of them so that we can sort of have a discussion. And I welcome your theories to explain some of them because again, for some of these, I don't know why this graph is this shape. I just know that the graph is this shape and probably more graphs too. Now that I have the Jupyter Notebook set up, it's gonna be pretty easy to dump the Fedora Linux 33 bugs into there when that goes end of life this fall, winter and be really easy to redo this on a regular basis, especially now that I've relearned pandas in order to actually make this happen. And one thing that this could be if the release party timing works out, this could be a session at every release party is just a quick look at what's our updated bug graph look like. So I do invite you to explore it yourself. There's the link in text and also in QR code. This repo is available. The Jupyter Notebook and the CSV files are there for you to peruse. I will accept pull requests against the Jupyter Notebook. There's some information in there about clearing the output first. So try and keep the size low. But I welcome, if there are graphs you think you should add or tweaks that should be made, this is intended to be something that the community can contribute to. So with that, we have a few minutes left. I will go over to the Q&A tab. So Edward asks, if a bug is not fixable on Fedora but in an upstream project like PipeWire, is the user responsible for reporting the bug upstream? I have a lot of feelings about that. My personal philosophy is that as the maintainer, we should be, we should have some responsibility for helping the user with that. In particular, the maintainer has sort of a first line of contact for it. And so we can sort of aggregate multiple related bugs into an upstream report and things like that. That said, I think if you do it in a kind way and are willing to help the reporter if they don't know how, directing them to the upstream tracker is the case. Fedora does have the upstream first philosophy. So we'd much rather a bug get fixed in Fedora or in the upstream. We don't want to carry a patch forever. On the other hand, some upstreams are slow to issue new releases or it's a particularly dangerous bug or it violates our release criteria or something. And so we want to at least locally patch that as soon as we can. So the answer is in an ideal world, we would always do it for the reporter in the real world, not always. Okay, we have five more minutes or zero depending on how you wanna start the timer. So are there other questions? All right, well, I don't see any more questions. So I thank everyone for being here. I know there's a lot of conversation in the chat that I haven't had time to actually read. So I'll go through the scroll back and see if anything cool stands out. Now, as always, you can look for me in my weekly program management office hours on Wednesdays, morning U.S. and afternoon U.S. You check the Friday's Fedora facts for the exact times that those change with the UTC times will change with daylight saving in the U.S. But I'd be happy to have follow-up conversations on here and then look for the community blog post soon. Thank you, everyone.