 Rydyn ni'n ei chael eu cyfrifio o ddweud o'r cyhoedd ffordd yma. Mae'n dweud o ddysgu chi'r cyhoedd yma. Roeddwn ni'n dweud o gweld ymgawdd yma. Roeddwn i'r cyhoedd yma sy'n ddweud o gweld ymgawdd ymgawdd yma Ac roeddwn i ddweud o gweld ymgawdd yma. Rydyn ni'n ddweud o gweld ymgawdd yma yw amddangos ni'n ddweud o gweld ymgawdd yma ond we're looking at product as opposed to the necessary software quality, so it's not necessarily all about the code. This is all based around a large pan-European research project that's going on right now, so all I'm going to do is spend a little bit of time just introducing the work for that project, what it's all about, the kind of things that we're trying to introduce into free software quality assessment. And then, basically, there's a few questions I want to put to, you know, Debian package maintain, and there's a whole to see what you guys think about when you think about quality when you're evaluating your own packages or what you think about quality when you evaluate Debian as a whole. So I've got a few questions which will be sort of based on the talk. So the talk's really just kind of people before. It's actually based on what I'm giving you at the academy in a few weeks' time, so I'm going to put that short, keep it up to like 10, 15 minutes. And then it's really open to discussion from that point. It really is all about sort of best practice sharing and stuff like that. Feel free to cut me off at any point and ask questions. It's a discussion where it hits both. It's another talk. Have you said something? Yeah, I'll, the slides aren't public yet, but I'll make them public immediately after the talk and I'll give you the URL. But if you could just copy them now to some of its Facebook or Delay, because then we could just look at the slides. You look at them, sure. It happens if you're giving a talk with geeks. Yeah, yeah. The P2P installation. There is a P2P here. Yeah. The software is a P2P system. I can't talk to my web server, so I'll try again later. I can look at this up for you. It's probably just not. Yeah, that's what I was going to say. The presentation is not important. It's the basis for what I'm going to be giving to the KDE developers in a few weeks. This really is a discussion. But yeah, I'll press it afterwards. Okay, so in my day-to-day work, I'm a free software researcher. Happy for a few years now working on generally to the large pan-European projects. Today, the key point for this discussion is one of those. It's a project called Squash, unfortunately. It's the software quality observatory for open source software. We're building a platform to evaluate quality based on as much public data as we can get our hands on. So, whereas in the past, people have focused on their code primarily. And they've run simple metrics based on slot count and things like that. What we now want to bring into the equation is basically anything else we can get our hands on. So, by looking at mailing lists, how can we see how discussion affects the quality of the fine product? How can we see, from some of the metadata that's going on in the subversion, CVF kit, whatever, how what goes on inside there affects the final quality of the product? So, it's basically looking at new ways of evaluating quality based on all the public data that's available. Not just the things that are commonly looked at, like just the code. The reason particularly that as a project we were keen to engage with Debian is because, basically, because of the packaging, because you guys as developers are not so focused on the code in terms of development, but you are in terms of maintenance. So, what we're looking at is how Debian, in that respect, differs from the actual upstream development projects in terms of how the final system might actually be used, what it might be used for, in terms of what you're actually interested in as package maintainers, when you think about what makes something a good quality package. So, the problem we pose with this is that quality is completely subjective. What I think of as a good quality code, a good quality final product, is not necessarily what you guys all might think. So, we're posing a situation here where, when we do try to evaluate quality, we're left in a situation where we are forced to create our own guideline, our own starting point. So, for instance, if you're in Debian, we do have guidelines to bend minimum quality builds for packages. Certain projects have actual processes that they expect you to go through when you're submitting code and stuff like that. And those end up being like our quality definition, in effect. There are any point, that's the thing we're ultimately trying to see if we can measure in some manner. And so, what we're trying to do is build a system, which allows us to actually just automate all of that. So, the idea is it's going to be a plug-in-based number cruncher, quite frankly. The idea being is all of these concepts, all of the things that you consider to be contributors to quality, the idea of what we're trying to do in this research project is to, well, first off, in sessions like this, get an idea of what those things are, and then look at ways of actually evaluating them. So, having tools which run metrics against code is common practice and it's easy, quite frankly. I'm building plug-ins for them for our final system. We'll be, you know, trivial to build on this. However, when people start talking about things like, oh, actually, I could see how the ratio between the size of a package and the number of maintainers might have some kind of, that might have some kind of bearing on the quality, or it might be the turnover of maintainership. How many times has this package changed maintainership in the last, since forever? All of these things, they've become slightly harder to measure if we, for instance, if the public data about that is in some kind of unstructured form, we then have to start writing quite complex tools and all together to then actually, you know, run it against the rest of the code or whatever quality model to see how it actually contributes to quality. So, those are the things that I'm particularly interested in in focusing on in this discussion when we kind of, when I finish talking, we start talking more genuinely as a discussion. I'm going to introduce one. This is an example of kind of things that might affect quality that you might not have thought about. You'll have to forgive me. This is where kind of the maths is going to kick in, so please, once I post the URL, go and find the slide and the maths will be there in front of you. But something I've been looking at, like in my own particular part of this project, is looking at how a project manages to engage its developers over time. The reason for that is in my general area of interest is looking at, my specific area of interest rather, is looking at agility. So, is there a relationship between how agile a development process is and the resultant effect on quality? So, that's specifically why I'm looking at that. So, as an example of the kind of things that people don't necessarily think about, I want to introduce something called mean developer engagement. So, this is a measurement of how well a developer is used over time by the project, is the project making the most out of its developer resources. The free software projects, they're all in competition for developers and users. It's just kind of, you know, it's a fact of our ecosystem. So, being able to engage developers is really important because quite often they're former users who have become developers or hopefully the other way around, but people who have skills that you want to attract into your project and the hope that they'll become end users as well. So, it's quite an important measurement. So, I won't talk through the maths, but I'll show you the result of it. So, this thing called mean developer engagement. Basically, if all of your developers are used all the time, you'll average a score of one. If you graph this metric over time, you'll get a straight line across a one. 100% mutualisation. But that never happens, especially early on in projects where someone starts it up and says, quite like this. It's my personal itch which needs scratching and then someone else comes along and joins them and there's tension and friction and one guy drops out and another guy takes it. All sorts of things tend to go wrong at the very beginning of the project. So, I'm afraid this is huddled around the screen time just to show you what's going on. So, what I've got here, as I've got the red line, is this metric, is the mean developer engagement. So, how well does the project manage to engage its developers over time? As a reference point, which I'll show in the next time why it's important. The green line here is the actual number of developers in the codebase. So, basically what this is showing us is that this is for the entirety of a subversion for KDE. So, we're talking about 1,600 developers and it's a 50 gigabyte repository. What we'll see here, and as I said earlier on, quite often is here you'll have an anomaly early on in the project, which is almost all projects have it. I've actually yet to find one which won't. And it's basically caused by a turmoil early on in the project. Someone starts up and then thinks, you know, actually I can't do this and then someone takes over. That type of thing happens, it seems to happen early on in the project. So, that's the kind of thing we're getting. There are also almost always asymptotic to some level as well. So, for KDE it's just below 30% utilization of their developers over time. Now, the problem with this in its current form, so what we're measuring here is the number of active developers over the total number of developers over time. That's effectively what this is. It's a simple measure that's been graphed here. Part of the problem is understanding where and when someone actually drops out of a project. So, when you're calculating the total number of developers in a project based purely on some version of Git, CVS, whatever, it's actually hard to detect when someone actually drops out, because they've been inactive for five weeks. There's nothing to say they won't be in week six. I'm currently looking to ways to get around that. I mean, there are ways of getting around that. But this is a very rudimentary where we basically say if you're in some version, you're in the project. The total number of developers only ever goes up in this particular scenario. As a result, these figures are slightly lower than what you'd actually expect. You'd expect me-development engagement to be slightly higher because I'm taking into account the developers who have actually... So, I mean, the things to note here are the actual slope. So how quickly does it drop off? And to what level it drops off? And to give you a good example, let's hold around the screen time again. Here's the same for a pinch. So we've got from KE, which is obviously a very large project to a pinch, which is just a small subset of known. But what we've got here, and the reason I chose it is because it's how dramatically different it is, is it drops off very quickly and there's a lot of friction going on. And then it's asymptotic to actually a very low level. It's sort of 2% developer engagement. And so, as quality evaluators, if we're able to see this kind of thing over time and know that as events, hey, I'm pulling in a new developer engagement of 2%, but KE can bring in 30, 40%. There's probably something we should look at as a product developer, as to how we should better engage our developers to make the most of what's, you know, an effect of very scarce resource. Now, the interesting thing here is around about a week, 300, the number of, you know, the size of the MDE actually grows. Now, over time, to get MDE to actually go up, like I said, it's asymptotic somewhere, to actually get it to go up again, needs something pretty extreme to happen in the developer community. In this case, you can see the green line when it shoots off to the top of the graph. Now, actually, when you compare this to the previous one, the scales are different. So actually, the growth of developers here is still not as good as the growth of developers in the MDE project. But in comparison to how it was, this is extraordinary developer growth. And the reason that happened was events sat in the known repository of doing nothing for about six years. What happened here, this has been in development for 300 weeks. And as a week 300 was the first release, and the number of developers shoots through the roof because everyone goes, oh, I like that tool, but it doesn't do what I want it to do. And everyone pounds in. So at that point, the mean developer engagement actually goes up. So not only is there a lot more developers, the projects actually come better at using them. So this is the kind of thing that as quality evaluators, we can start to think about, rather than just code, in terms of am I producing a quality product or not. So all this is doing is, this is a part of the script that I'm running against a mirror of the repository. I don't have the data in there, but I can do it. I mean, this, this. I hope we can see which things in there that correlate. So I mean, this against, basically what I do, this is a Perl script, which parses, you know, SVN, CVS or get a lock. So, I mean, you don't have such a lock. So that's the problem. I can run this against, in its current form, I can run this against repositories. So, basically what it does is, I mean, it takes the, not in the XML form, but it takes the plain text, SVN, lock, which became ease up, 149. Because of the video, I tried to get out yesterday from my ideas. We have, like, a quarter of the archive on SVN, on AWS. And we couldn't, we couldn't, we couldn't, of course, use the XVCS to hand us to say, well, we have so many art styles in those. So we couldn't make something, but it's not as easy as purpose for KDE. No, sure. Because we didn't understand that. So, I mean, what's interesting, the reason I, I want to actually do a direct comparison between KDE and GNOME, but you can't actually do that, because GNOME doesn't have one repository. It has separate repositories for all the separate parts, whereas KDE has one repository where all the separate parts is branches within it, which is why events was done by itself. Just to give you that, so I mentioned, just to show you the kind of things that we can find, you have to, you have to flag up and say, okay, I can see now that something's going on here and, you know, this might have a knock-on effect to the quality of my product. So, I mean, here you can see two things. There's this massive leap going on here. The other thing is, I mean, this is asymptotic to a very low level. So decreasing the worker can mean, actually, at least two things. These are the project is supported in some way. Oh, just because it was a venture on your future, that's a thing. Yes, so all of... It is, I think, very steady and hasn't changed for maybe seven, eight years. And some of you are saying, okay, we need ACL, we need to introduce some X actually in it, and certainly this is going to be... Sure, yeah. Well, that's the important thing. This only works for people who are actively involved with QA to be able to interpret this properly. You have to know what's going on in your own product. You know, because I'm not involved with events, I had to do a lot of groundwork to work out what's going on, for instance, here. And why this is such a low level. But if this is an evaluation of your own product, you know what's going on. So you can look at this over time and say, oh, this is not a dip now, what's going wrong? Or maybe it's dipping and you know why it's dipping, because you know it's become stable and it really doesn't need much more work on it or whatever. As someone who's evaluating your own product, you know what's going on. Yes, it's difficult from my perspective, for instance, when I'm trying to evaluate hundreds of projects and I don't necessarily know what's going on. But, I mean, for instance, it took me a long time to dig what's going on here, but actually this is extremely hard to see in the frame. What I've got here is I've visualised what was going on in subversion. So what you have reading from left to right is columns, where each column is a week of development. And then from top to bottom you have the developers. And what happens is quite simply, if a developer does something in that week, they get a green block to show they did something in that week. And what you find is between here and then this point here, so this is time running in this direction, between here and here is that 300, I've shown you before, nothing happened for 300 weeks. That's this here. And the reason that the reason it gets so low is because for the first 300 weeks, I've never seen a project so bad at engaging as developers. It takes on about 36 developers in that 300 weeks and in that period only one hacker ever committed in any one given week. And you can really see that there's a lot of people coming just doing one or two commits and then disappear again. So this is just a week. That's what they had for almost 300 weeks was someone who came in, did a week and then dropped out and there was never more than more person in any given week. And then basically what happened was this person came in here and they basically maintained it for a long time. But then after almost a year of development themselves there and then dropped out while someone else comes in for a few weeks then they come back for a week then the other person comes back. It was almost as if they were just passing maintenancehip from one to the other when they kind of got bored of it. I thought I'll have that back again. That's the exclusive told. So that's what they did for 300 weeks. So as a result, their figure is quite low. So despite the fact they had I think 34, 36 developers which is just so even here this is still one developer at a time but as a matter of fact here this is week 300 when you start seeing multiple commit for different committers in the same week. Wasn't it maintained since no, so maybe just people had access or where more people came to do a small change and... Yeah, I mean basically what from looking through the thoughts where we have a situation is where someone says oh I want to make a better graphical wrapper for XPDF and so they check it in the libraries for XPDF and a few other things and then it sort of, you know, if you read the committers like you know added a small feature X added a small feature Y like it was piecemeal for a long time and it remained that way but it was just built up very slowly by one person at a time basically of 300 weeks so I don't know the culture of known and I don't know why that was the case so all I'm suggesting here is that someone from the outside of the village, I don't know but to me that very quick drop followed by sustained inactivity to me seems like something that has developed as they should have been maybe concerned about so enough for that time Is there any chance Yeah, I'm going to push Dad, put it up we'll be able to come out if they're not steaming This one is not steaming it's only tapes The slides will go in the archive along with videos so Sorry, I'm not seeing so that's fine Another thing to think about which is what some of the other I'm just pursuing some of the KDE that we're looking at in terms of so again just think about things other than code that we can measure is looking at license incompatibility just an example of something else that we can actively measure a lot of projects have this as a problem in that they've got code which is committed which says this is GPLV2 and in the same code base you've got people with the and newer clause or rather the or newer clause and then at the same time they're having a situation where people send them patches to the patch I mean it's like okay do I assume that the patch is for the original same license as the original file do I just turf it, do I make the effort to go back and ask the patch which license they intend to use and stuff like that and again this is something if you keep the data or if the data is already stored you can measure and again we can posture some kind of knock on effect with a result in quality for example if you are finding in a situation in your day-to-day workings you are having to turn away from the code because the patches are not licensed you might be putting yourself in a situation but actually if you did chase up whoever wrote the patch to find out which license they intended then actually your rate of development or maintenance or whatever might actually go up so again it's just suggestions about things that are outside of the norm source here thinking so basically any of these kind of these metrics these things that we can think about and say yes this potentially has this could be part of our quality model that we can measure if we can write a tool for it by and large as long as the output is in some way saying you are readable then we can wrap it up and have it as a plug-in to the system that we are developing so the system that is currently proposed we are still going into our design phase but the idea is that this will be based on a framework called OSGI is that something you've come across OSGI is the open services gateway infrastructure and it's basically a way of distributing jobs over multiple machines so that you can have this jobs engine and stuff like that and it's plug-in based basically it's very easy to write wrappers for jobs which then get scheduled it's it's a standard it's not an implementation it's the standard I think the best known implementation of it is called Equinox and that's actually the back end to Eclipse so if you've ever used Eclipse and have a feel for that that's the kind of system where basically as I said as well as we can write tools to evaluate whatever it is we see is important to quality we can then aggregate the results in some way which we see where we want to give better weight into for instance what Linda tells you about your packages as opposed to you know what MDE tells you about how how effective things are going on upstream in the developer base that's the kind of system that we're planning on building the idea hopefully is that in the end this will be a system which developers and hopefully specifically in their package printing would like to want to use in order just to keep a running you know eagle eye on what quality is to your project to your package that's changing over time so that you can adapt accordingly quite frankly to your building so it's something I didn't want to touch on too much about the workings of the project I was really focused on the tool but yes is the answer what's happening is that we're developing a tool which is free software so the idea is we want people to go out and use it for themselves on their own problems however because obviously as the developers we've got the head start we are going to build up basically a large database of certain metrics built up over time so basically we're going to pick up a lot of low hanging through during the course of the lifetime of the project so for instance we'll look at stuff like creative databases of things like slot count, RwC-L you know per revision per project for any project that people are interested in so there will be a publicly available data source for certain projects if we've run them however the tool is free software you'll be able to go and grab it for yourself and run it against which any projects interest you whether that's just your package or whether it's the upstream version as well or whether it's anything you might be interested in but yeah, a rudimentary data will be available for certain things because it's although it's going to take a while to do you WC-Dash out for every file and every revision of certain projects like doing that kind of thing against KDE is going to produce quite a massive table in a database somewhere but compared to like the metrics that I've been talking about here they're computationally more expensive so as a project I'm going to be running them against certain projects but for developers it'll take a long time to crunch those numbers so part of the problem with the finalized system is how long it takes to evaluate quality purely will depend on your quality model and if you do include things like some of the things I've been talking about then that might take a long time to compute depending on what kind of part you're going to be able to do so that's that's kind of like just like an eagle-eye view of what we're doing as a project I'm basically I'm giving a presentation followed by a boss at academy and I'm doing the same at Warwick and basically as a project we have three questions which we want to pose to developers that each want to get kind of feedback as to are we going in the right direction of the project so I'm hoping those on that form of basis discussion from now on so the first question is simply what are you guys interested in when you talk about quality so you know when you look at your packages you might say okay well we've got the deviant guidelines for package maintenance we've got linda we've got lynching we've got JavaScript which is doing nice and in terms of your product quality and again I mean either deviant or just your packages how far do you go, what is quality to you in terms of the product speaking as a user and then with the more developer answer to people like Andy which probably doesn't do it at all I think the response time to bug reports is really about teaching you can have a wonderful software product which has a really bad deviant package and same as the contrary just because the maintainer of the package is like your front end to the point of the package so if I send a bug report I would like to see an hack back if I send a patch I would like to see if applied or not but with an answer if I send an upstream patch or why did upstream and so on bug reports are actually a really good one because they're again something which can tell you something about quality but it's not necessarily something you might think of like the number the average lifetime of a bug report within your bug tracking system might so you'd hope that you have many bug reports which have a short lifetime and maybe a few that have a long lifetime but the lifetime they think is more right on the responsiveness in the amount of time till the package is at the appropriate level so if somebody files a bug report say an answer now about something that doesn't work and then the first time I'm going to qualify it and look at it and say wait but actually it's a vicious so it is a disk type bug or whatever so after that time so it's user response time is important for the developer knowing how long bugs are around for actually if a developer says well what you said is actually a vicious bug and if somebody writes a patch or you have it included so that would say from our point of view the bug is handled correctly so given you can have a really nasty bug which is not that important so you can have a long life cycle but you see progress for me if a bug is nasty if it's nasty I'm going to make sure I have a short lifetime so nothing I mean like hard to fix the point is really interesting because it's the point that the project takes to become conscious of bug report and to look at it and to make a decision on what to do with it yes for example I have seen cases where someone reports that a patch is not working at all but he reports it with normal stability nobody looked at it and something like one year later another person reports it as a sea bug and then the community fix the package within something like 2 weeks it's a decision for me to look at what's going on with Debbie and QA and I can show you some bug statistics of mine most of you already know that that's the number of a sea bugs so what you can see happening here is basically the release of edge at this point you see the release of edge happening and it's blown up which just means that the people who are fixing lots of a sea bugs before don't do it and when a couple of a sea bugs just ignore it for edge reasons it's bad but it's not so bad to relay edge which then at this point where this last slide up is we just remove it to ignore it so that's why the bug class exploded and also a new upside version so yeah but that's just measuring and taking for quality what we don't measure is what you said is the initial response from the community which we could perhaps now track a bit better because we can now ask the version how all this is bugged and when was the last written to if it's the same for more than 4 weeks then we might put it on a list to evaluate so that's my story sorry I don't know why because that's not the data in the BTS summary files but that's the main bugs in the BTS yeah you can just say I want to see any bugs that are the same data which are all this in 2 weeks that should be fairly easy to ask to implement on the normal package reports which data? the normal package if you just ask where Donald has just shown that you can say when was the bug last modified and when was the bug created you can just compare and say when was the 2 weeks I want to see the bugs so you could just to create all that unworth bugs don't be that's a cool thing actually interesting so another interesting point to measure which I've observed I think as a common factor in a lot of QA effort is look at how much a package are there to standard practices like if we introduce something new in the policy or if you introduce a new method for handling a package I consider a good package one which is up to date with these practices so I think a lot of QA effort like all which harvest the package and do checks on them automatically can be identified as something which looked if a best practice is used in a package so in that sense you have to hope that where it was maintained or mentioned is actually updating those tools to meet measurements of the new guidelines so for example a package which is using developer if I look at the source and they see that the developer combative development is true I start feeling okay this package is not really this is maintained basically so that's an example if you have a new technology for the new packages and if something is not up to date with respect to that something might be wrong that's a good thing because actually there are a few signs of it so if I usually do backlight if I see a package that's well maintained I don't unmute it so far because you see oh the package was it's not really maintained but it maintained in any ways I would just be quite fast when I'm doing it I can't even know what criteria are applied to that but there are some yes definitely there is a whole rather than necessarily individual packages one of the reasons I brought that up was earlier on WQA there was an email to say hey no one seems to have maintained this package in x number of months but in that time the upstream version has changed y number of times would having the system be able to report on activity within packages commonly be something that's maybe useful to the projectors at home like detecting when a maintainer looks like they're dropping out we are before it but actually how do you detect that a maintainer is dropping out actually what happens is if we see some most bugs are not supported by users but by people I would say first it's a definitely community so for example what I've now done yesterday I've collect a few bugs and when this package needs to go out of testing because it's too broken what I usually do then is I look at the developer and say oh and this developer didn't upload for one year so that's a special alias that's a behalf of Debian was a missing action thing handles so I have to move and it's all of that look and so another team picks it up now as they have picked the maintainer and says what's your state do you work with Debian if the maintainers don't answer the packages are taking away and some other S can take some other or if nobody does it they will be moved so it happens but in some cases it doesn't happen perhaps fast enough but on the other hand when would you say a maintainer is dropping out if it didn't do an upload for three months it could be totally okay especially if we have packages which don't need so often care it's not so different according to the metro it's not exactly it's the same point you made earlier it's the maturity of product you'd expect to slow down no you're waiting when I have had a package and it could be spent for it actually I have had a package which I upload only once during the extra disciple and that's just only because I have a new compile because I saw that it's better for quality to don't exit the same version it's the same version as it is so I would say that I maintain it didn't hit any vaccine and again that's also the kind of thing that you can measure over time so as something is in development you can look at the upstream version of something that you're developing and look at say okay how much of this is maintenance and how much of this is actually new features because as maintainers this is important for you probably in terms of ensuring that it actually works with the rest of everything in terms of ensuring successful integration so maintenance upstream maintenance and upstream like new or changing of features have different ramification for you as you know as packaging maintainers which is also something that might be the kind of thing that you can track the gap between upstream version and upstream version but I think it should be like weighted with respect to what other distribution do I mean there might be a justified gap if an upstream version sucks so it's experimental or something but if other distribution like our true version of what we have in Damian that might be a indication of something that is relevant yeah that's we should also like look at what other distribution are doing with that certain package because if some distribution is also updating a package another distribution is that that's on the way or backwards one are you specifically then talking about say other death based distributions or RPM based ones if you look at upstream versions it doesn't matter at all there might be some exceptional cases where other distributions don't really care about we should like copyright we have some exception packages but we need of course from the QA perspective so packages maintain base not by upstream but it takes quite a time to get ready to oil in because of nice issues which are but of course any of these cases means a package of bad quality that's definitely part of it ok the next question is a project we were looking to maybe I just I wonder what I have to think of as sort of this software that we are already ready for so one thing is adoption of a given package usually if a product is widely adopted we can assume the quality of this iron which is not exactly using and this is and this is a popular theory but sometimes it seems to exist here because of the disparity does Popcorn do your waiting everything that's indebted in base you expect to have extremely high popularity because it gets installed whether you want it or not does Popcorn take that into account that's something that's what I was about to say it's tricky sometimes a package which is supposed to be mainstream actually happens to be removed by many people it's a real sign of maybe this product lacks many people choose not to use this software but use this one instead so they remove from the mainstream so instead of having 18,000 I think that's a good count it's only 17,000 people choose another one maybe it remains the other one has real quality to take into account you've hit one of my favourite points when you get the chance to see the slides the first point I make is when free software products are competing for users and developers or other contributors when the thing that really does that ultimately end up splitting in this quality people the way I see it is people aren't attracted to a project because they think it's good quality and they want to contribute or they might think this is something I want to contribute because it needs help so either way quality is important in that I'm glad you're back, we've come into popularity contest because this is kind of the next big question is okay so this kind of system where we can plug and play whatever metrics are important to us as developers specifically within Debian is there a Debian specific role that you can see a system like this doing the example is I don't know if you're actually on the Debian QA list okay so the suggestion I published to the QA list was okay package maintainers could use the system with whatever plugins they choose is part of their quality maybe one's a wrapper for linger as well which just said okay how does my quality compare with the upstream quality am I actually improving the product by putting it into Debian am I doing something which is maybe it's not working so well by package maintainers the other thing is as part of that you may then say as part of my quality model what's the popularity in my package which may have form part of your individual quality model of your package the question now is is there a potential Debian specific role for this type of system are you aware of a new project that you've seen on some of the history of the company that a piece of software called Moe called Moe Moe industry for the first time where you basically collect the most information on packages and don't have an accessible database and for statistics like your little devlets and other reports and so on and so on and so on I think the structure because a lot of information is fairly loose there are certain things I'll combine in certain ways but it's a useful data repository and lots of things I don't know how active that's continuing and I've got a different interest in the time so if that's a publicly available data source and that's the kind of thing where people could write it should be very interesting to look at just seeking in people's data it's about Debian who still has to go through that that would be the tools which access that data source to become part of your individual quality models which like in this scenario could then be a or multiple plugins to whatever your quality model is is this the kind of thing like as package maintainers what are your your individual practices do they vary widely from package to package? Well I think that we should look at or chat with the mounting graph which is basically doing a PhD of this. I've worked with them and asked if presenting or searching how like new technology or new practices spread in your community The reason I ask the question is because that will have ramfig a spread of practices I mean across free software as a whole let alone just within Debian will have ramifications on how many plugins we need to run whether they run concurrently how computationally expensive they are just how long will it take that was really my motivation for asking the question I mean if something is computationally hard it's obviously not the kind of thing that people are really thinking of doing by hand and so Anyhow in that case we have this question I think it would be really interesting to look at how many patches are applied in other Debian packages with respect to the upstream package Of course you should exclude the patches which are specific of Debian packaging but that's not quite easy because a lot of packages are maintained with tools like de-patching in which you separate the patches regard to the temporal look and the patches needed by Debian and not only the number is already significant but also how many patches flow upstream like I'm maintaining VIM we have like tens on tens of packets of patches but just before the upstream release we usually ping the upstream and almost all of them flow and will be integrated in the next package on the next version of VIM but I think that's a clear example where Debian packaging is improving the software point I think you should have a look at both of the name that's supporting 15,000 packages I'm conscious of the fact that we're running towards the end of the actual hour are there other sort of Debian specific quality issues where you can see a system like one being proposed would help you either project as a whole or as individual maintenance is there anything specific to what you see this is the kind of thing this could help me in some specific way ultimately what we're trying to do in this project is build a tool build a system that people will actually want to use now we're very conscious of the fact that KDE is actually a partner of the project but we don't want to end up doing a tool which is only useful in the KDE actually what would be useful to get some tools in Debian would be speaking about the most what you said and the other one would be to get the output at least get some indications on the package type of system because it's a page on the web on the package type of system that's the main page I look for if I look at a package what's the package I look for which was the last upload so I can see oh five interviews by five people is the last maintenance upload was in 2001 so there are no watch packages so I see oh you have a lot of maintenance uploads and sometimes an interview in between so say okay that something is okay so we've only got a couple of minutes left I will actually strap up just now my thoughts I mean what I've got back in this is we've now talked about okay there's a potential to look at upstream versus package quality popularity contest the stuff that comes out more these could all be potential plugins for one overall quality model which this system could then completely as an effect part of the continuous integration of needs be actually from my point of view how should I say of course we have been using statistics a lot because you can't handle so many packages so many bugs otherwise if you don't use such methods but what I'm happy about is that currently we don't have or we have the approach to say we have pages where we just put on a lot of statistics of different parts and then it's developed us so it makes the final call which is a thing I like back much and basically the package like system is trusted it displays a different version it displays potential issues detected so I think that would be it would be good to have the output there to see what's what's going on quality wise related to upstream or where is the upstream version also for example that could be just as is on the package type of system I've got one last question which is kind of a little bit out there but it's in terms of where I'm going to take my work a bit further Do you believe there could be a relationship between the web of trust and the quality of data? Yes, by case and empathy is where everybody understands everybody's case it's not like you have days when the keys are actually saved so if you can identify a question to keys and parties a question to private consultants so you would probably get a relationship and you don't affect to the code actually there are some relationships because I have identified when I look at missing extra maintenance there are some people who just appear a bit and disappear so these people often have only one or two signatures so I doubt that this was between someone who has 10 signatures or about 100 signatures that's only about he signed a part for a lot but if somebody has two signatures or 10 that makes a large difference I think if he did he signed a part you would probably thank for a lot of trust in data and the quality of the code for missing extra maintenance all at Amazon I really don't think there are many relationships because I mean first of all when you sign a key you just sign that you trust to be who you said to be so it's not really a quality trust it shows that you can meet other people it just shows you who you meet other people in depth the reason I am is because you have to have met person and they have to have identified themselves too so it's a good way of measuring how often and what the quality of meetings are on a project so it might when you choose a quality of meetings I agree on that but I mean you can have a wonderful contributor which like basically never met other fellow developer actually of course what you say is very very to independent of all measurements measurements only if I love estimate I will give you a parody of my work or what I do is basically if you have 10 IC box it doesn't say it just says there are 10 issues and if there are 10 easy issues then it might be a 10 bad issues or 10 far worse and 80 easy ones and that's in the end a call of the developer so it might give indications but it doesn't make an assertive answer about if it's good or bad quality that's something we always need to keep in our mind in the figure out numbers and measurements to give good indications but they are not the final answer good if you're looking at doing this social structure happens well there's another project that's making us interested different angle what you're saying for the mailing lists a person from Merlin looks at the talk at EuroPyfe last year they've taken discussions on the Python mailing lists and it's about whether they're really interesting following and seeing what the resolutions we need to wear are and they've actually tracked what sort of discussions they've progressed to no problems happening a project or so on and they were doing quite interesting analysis and that can be quite good if they feed in as well so we can get a better discussion but specifically who's transported to who on a competed basis and how much we've flexed quality do you want to deal with that over there and these are all things which they might not seem that they relate to each other and they might not seem like they have an effect on the quality of web of trust to the end but they're all potentials these are all things which we can look at and in conjunction build a quality model around them and say the reason I particularly want to bring up web of trust to the end is because it's something that is actually so important especially when there's any in there but it's hard to ignore it and say that it might not have some effect on quality which is why I'm doing that and it's a number attached okay thanks very much if you've got keyboards at the ready or pen and paper I'll give you the URL which I'll post the slides to where you can write it down on this classroom I'll do that and I'll post it on Debian QA as well for those of you who are a bit per there and if there's anything more discussion wise Debian QA is probably the place or feel free to just try it out by an email and yeah I'll make another URL as well thanks very much