 Welcome Hello, thank you for coming This is a bird's of a feather session my first one of this kind about Quality quality matrix in Dabian and my question is whether there should be minimal quality requirement for software so that this software should be worth to be included in Dabian and I have collected some thoughts here in a gobby document There is already somebody else editing. Please don't delete all my points because then I have I make it here nothing to tell you and I would be happy if I will introduce the topic with my points and Afterwards we can complete the document have a discussion about it So that afterwards this document will also be a documentation of our discussion about this topic so let me start the Initial idea for doing this was I Packaged her doop The Apache Hadoop project for DBM. It's a very Popular solution now for big data It's hyped as some of the most important software from 2011 won some prices and I had Together with Hadoop. There is another Package zookeeper also from this family and I had some problems with Zookeeper and so I got involved with upstream and provided patches to fix my problems and By getting involved with upstream I learned first the Code quality of zookeeper their internal code quality to my opinion about code quality and How upstream thinks about code quality their Idea was mainly if it works don't touch it anymore And if you want to provide patches you provide them only to fix bugs or to introduce new features and new bugs by that and My idea was You need constant refactoring to keep your software Clean I had a bunch of patches in their bug tracker which Nobody was interested in Because there was no feature. There was no bug fixed is was just cleaning up the code and in the end I thought From my personal judgment this software does not belong in DBM and now there are a couple of persons who are angry about me because it was in DBM and I kicked it out again and This didn't Make a good impression about the software. So Now I'm here to discuss this issue Or something else One of the numerous Intents to package reports on DBM devil. Hi, I have a nice PHP content management system And I would like to package it for DBM Yeah, how to tell this guy without being too impolite That the interest in this might be low and I don't want that this kind of discussions are only Personal and subjective I would like to make them more Objective with numbers So first question of course is why why shouldn't we care at all which software? Comes into DBM as long as it's free and Lintian doesn't have anything to complain. Why should we? ask ask questions and Some some people say well if we care about our users then we should package software so that users can use them and If we don't provide Software for our users that means we don't care about them. It's just right and I looked up in the social contract and found at least two passages that relates to quality We will make the best system we can so that free software will be widely Distributed and used so the best system does it only? Talks about our packaging or also about the upstream code quality or We will provide an integrated system of High quality materials that means we package only stuff. That's of high quality but the DBM policy doesn't have any section about Upstream quality or code quality. What is quality? Um Yeah, this was Caring about users can also mean We don't expose bad software to them So if we package everything on earth, it makes it only hard for users to pick a good solution for them so DBM is important when when I Meet a new software project and I ask myself should I use it or not? My first question is is it packets packaged for Debian? So is there any Debian developer who thought this software is of any good and If we package everything then this quality of Debian is lost so why is The developers of this upstream project Said my patches are only as static Patches they it's only about style. It's only about taste and My idea about this kind of patches to clean up codes is You reduce bugs even bugs that you haven't encountered yet that might happen once Um, you will provide a ground so that you can In the future fix bugs faster because you understand the code better you can maintain it better or you can Clean code means you have fewer security holes because if you understand the code There won't be so many places where you have overseen something Clean codes may lead to fewer crashes or no crashes no data loss Adaptability for other environments So for example the zookeeper project and I needed to patch it so that it could compile on arm because There was some assembler issue. I have no idea about it But if it's clean then it should be easy to run on different architectures for different use cases and Clean code may Mean that this software will be around for a longer time There will in two or three years. There will still be people who have fun to work on this code and You don't need to change your solution because nobody is interested anymore in it and If we also say that developers are users Then we should also care about Those poor developers who has have to use the software and have to understand the code Yeah, that is was my motivation to make it less personal more objective Keep crap outside of Debian It's advertisements to be in Debian Debian policy says nothing about upstream code quality. Okay. I had everything of this So Should we make a first round of questions? Are there comments so far? Before I continue concerned Could we have another mic? There should be two mics in this room One more try. Hey Okay, so one thing I like in Debian is the huge diversity of software we have and I think If there's no other software available in the free software market Uniqueness I had the same idea so Uniqueness of a software should be one quality kind of quality metric. Okay Solved well There are also Some types of software where Dozens of it in Debian and they I still see there some uniqueness in there in some times Even if maybe others don't see it. Mm-hmm. So like web browsers. I package myself three web browsers in the meanwhile and I see completely different use cases for them. So Every each of them has their own uniqueness. So That's that's maybe something where Also the yeah, let's say demand for specific software is should be part of that. I Think that we cannot come up with hard numbers And then set a limit. Okay Under this number we won't include it in Debian But what we could work on is at least a framework to reason about should the software be be part of Debian or not and Some parts of this framework could be hard numbers and some parts could be arguments like well this Has the same purpose of another package, but it's a text-only package. So This is a type of uniqueness and therefore we should include it and we can Write it down so that them don't need to be any further discussion about it and then We now okay, this is from an argument from the uniqueness category One way around that would be to use the deb tags a lot more effectively because that if you can't show in the deb tags system that Package A has no different tags to package B then really Are you arguing about nothing at all? Is there you got two packages if you can't discriminate them within the deb tags that we currently got or within Templates you can justify adding and that should be a useful indicator these packages are too too close and too similar Yeah, or it may point off out some missing text we also need to for example put in in Also the the deb tags But also integrated something with UDD so we can find for instance In a in a in a first look if the two package has similar Characteristic which one has a lot of bugs or which one is low maintenance package So we can decide which one to remove not just the entering the deviant Mostly to remove packages that are already crappy packaging the in their guide okay, I Have haven't yet seen the UDD database because I'm not a deviant developer. I can't say anything about this Maybe somebody else I I would like to stress that Quality of Debian because I think that many people look at Debian to understand as you said before what is What is can be trusted to be quality software and what is not? So I think that this must be an important element in our discussion Yeah, I have very much the things the same because I got involved in the Apache community and My personal impression was that at least the project I have looked into the Apache community has a totally different approach about quality than Debian and it would be a Loss for the software world if Debian could get compromised over time in regards to quality so Continue with the next section Types of quality meet matrix Well, there are some quality matrix which are Researched from PhDs at University and well known and most people agree that Less is better or more is better of these matrix and some of them are you should she shouldn't have duplicate code You should have a higher test coverage You should have less a few coupling you have should have a few complexity and There are some Established best practices for different languages for example J as lint for JavaScript There is the same for Java with find bugs or PMD so I think There is there are a couple of Tools already available which we could just take run it over the archive come up with numbers and just see well What is the current state of? code quality and Debian and we have one point From which we could start to argue when there is a new intent to package somebody could look up Oh my package is so low in quality Compared with the rest of Debian. Maybe I should Think about think a band again about packaging it and this is the time Raphael is not here sadly because he started already a project darker Debian Automated code analysis where He already had the idea collect these tools let them run over the archive, but of course this is an enormous amount of work and Actually, you would need a couple of full-time Developers to keep such a system running, but the idea I think is the right direction and would help us a lot So there are other metrics We could have a look how do they manage their project? Is there a bug tracker is there a mailing list? Do they have version control system a public one? Is it distributed? Maybe? better does do they have long-term support versions or Do they just don't care anymore about older versions when a newer one has been released Do they have a proper versioning scheme or is it like with many Java libraries? They break their up a bi compatibility with minor version upgrades How many developers How major is a project? Is it just fresh or has it already proven for one two or three years that it's stable? what's do they have documentation do they have separate documentation for users and for developers and Are they used by other projects? Yeah, I will come back to this point Yeah, this one is my personal Thinking about the zookeeper episode What do they think about code quality, but it's hard to measure this one So and then once Debian matrix Once we gather such matrix, what do we do with it? So first There will no way There will be no way ever to introduce a hard limit into Debian there. This would be revolution so the only choice is to propose that people should orientate themselves on this matrix and that they should have a look on it and One could work with social pressure, but there will never be a Debian policy That means you must have that many points on the matrix Scala Yeah, I When you know a bit how Google calculates page rank, they have just like 100 200 different signals they call it Signals are how many inlinks does a page have and what's the page rank of the page linking to this one? Is the the HTML correct of this page? Is this Trusted domain or is it some Island where there are only spammers so all this matrix come together They have different weights and together they form one page rank for website and The same could the same principle could be applied to Debian packages. So we have many matrix we give different weights to these matrix and just multiply them with the weights and come up with just some numbers category automated code matrix category uniqueness category Yeah Arguments don't know how to call it So then then I had the idea when there is such a PHP content management system First I thought we should have general points Substracted if a package is written in PHP or a Ruby But that would be not fair And then I had the idea well we have dependency graphs in Debian So if we could use the dependency graphs and if a package depends on PHP it will also inherit a bit of PHP's code quality and This will automatically subtract many Points from this package trust me. I've seen so the inside Yeah, I know But sorry, what's the question? PHP the code quality of PHP makes no indication of the code quality of software written in PHP. Does it? If if your package relies on another package and you have written just a tiny library On top of another package, but the other package is Just crap So you should think again if you want To keep this crap in Debian just because there is this tiny library Which needs it or I think there is some relation if you rely on crappy software Then you should be punished We could discuss about PHP much longer, I think yeah, I have an idea regarding PHP I think PHP encourages Some kind of code which is not should be called quality code. Yes, that's my take on it There are many good frameworks and many good codes written in it, but I think there is some encouragement Regarding bad behavior encoding I have to stop myself not to talk about PHP any any longer I've suffered it for five years now and I won't do it again Yes, because we talked about uniqueness Position in the dependency tree so if there is package which has many dependencies and Many other important packages are relying on it. So it gets rewarded for this So maybe PHP remains in Debian because there are so many packages which needs it sadly Popcorn numbers would of course be one indicator not Doesn't need to be discussed I will just talk about my points here. Sorry. I otherwise I can't go that fast Just matter them. Yeah Yeah, this would be my dream if you fill an Intent to package. There would be some possibility to upload your software in some Debian provided portal and it's the matrix gets calculated and You attach the results directly to the intent to package that's the end goal and Related to this I thought if we could help upstream projects because it's hard to set up all this infrastructure continuous integration systems The unit testing frameworks and have them run continuously So for small projects Maybe Debian could help out with the sass that and I thought about how many Horsepower we need to calculate all the stuff if this is possible on Debian infrastructure. I have no idea how many Servers Debian has and how we could acquire more Bad how more CPU time? So I already mentioned the project from Raphael And there is a mailing list set up. There is this website go there Start contributing. I think it's a fantastic idea. We should Move it forward I found in the net another project from some Greek University. I think they Already started a project to measure the quality of open source software. We could contact them Yeah, and three other ideas that we could discuss later Introducing there is another mechanism to Introduce higher quality. I have read that there are not many hard requirements inside Google for Clean code, but there is one law inside Google that may not be broken every line of code needs to be reviewed by another engineer and There is no formal review process in Debian Have you ever thought about requiring reviews Inside Debian so that another Debian developer should have a look over your package I just continue with this two points and then we discuss, okay Peer review Well, I will give a skills exchange session about Garrett Review tool on top of git so market if you are interested And I think this whole area is a wonderful area to Write your thesis about it your PhD thesis your master thesis if you're in this situation right now Thank you and let's start discussing One more comment to you claim that there's no review process in Debian. There is actually it's called sponsoring But it's only for non-DDS so And they are also already quite some Personal how-tos on how to do such a sponsoring review so they are all a little bit different, but They also have several parts which are very common in every personal List for that so they are in the wiki wiki. Do you know somewhere? Have a look there. Yeah, thanks Actually, I think it would be possible to use the mentoring process also for DD's On on one hand there is two little men power in the mentoring team sort of It's already hard for people to find someone to mentor and upload the packages And there's also sort of review process during the freeze time for the release team and stable updates, but that's fairly limited to this Special areas where we really want to do the review during the regular release cycles or release process Uploads to unstable are usually not reviewed and we have the sort of testing transition for that, but Proper review in that time would also be appreciated, but Like Zach said in his talk, there's too little man power for Doing that properly It would be an interesting discussion whether code review is so much work or if it Reduces work in the long term, but that's another question So there are several things I wanted to say First first is about you mentioned set here at home and going I think that That's really a side problem And you shouldn't try to address this if you write tools that can easily be Run on a table or on the source package Then it's easy for someone else to just use your tools to read it on the whole archive I can provide a CPU time for that and some some other people can the real key problem is develop the tools to do it and to analyze The results for example, if you show me that you can read it on the 5% of the archive You just don't have the resources to return 100% of the archive. I can do the 95% remaining That's not the problem Okay, that's promising. Thanks So you mentioned Just completely different you mentioned masters and PhD. So that's what you are talking about is Really active area of research for lots of people working on that. So just added To two other projects working on that but there are a lot a lot more I think that it would be really great if you could Contact those people Tell them that you are interested in applying what they do to Debian and just reuse what they are doing because probably already they already have tools and Let's not reinvent the wheel. It's a really complex It's very complex stuff and probably It's hard to make a contribution To develop something new in that area And Sorry, I'm just Going through my list and the last point I wanted to say was about peer review I think that it's something that should be tested inside the team Because it's difficult to convince the whole project to switch to Pre-review by 2dds before any upload The easiest way to test that is to find a team that is willing to experiment with the idea and then come back Six months later and report on what you learn from it So the ruby team is not a candidate for it because we'd have problems finding 2dds to review But maybe some other teams with more dds Could do that First thing I'd like to say is that for some software then we are kind of a little bit stuck with quote quality like GPG or OpenSSL If like any of you remember the debacle and stuff But more concrete level That I know that Zack has started to use a tool called called coxinell which is like Knows how to do semantic analysis of C code which is like you can actually grab for code patterns and This is a tool that I encourage a lot of people to look into if they're interested into Measuring probably having a metrics of C code quality that could be used like direct for I don't know user for after free of this kind of stuff And the the last thing is I think we really miss a process to actually remove packages from the archive Because it's something like not letting Bad packages in the archive But I mean if they if they start being if they if they landed the archive and start being used by a lot of people Maybe they'll be volunteers. They will improve the code quality But if that doesn't happen after years, then hey, just you know, we should remove them And it's been like kind of blurry For me how what what is the proper process who who has the who can actually propose that a package To be removed because the quality is so bad and how we could likely have some kind of vote on that like if if many people If many developers think a package should be removed having some kind of process would be good at say Well, there is already some tools that look at product for it to that measure the judy DS statistics with popcorn and Books so you can fight easily find some All-mountain package already on the archive and then see if it's all fun And then you can just request to remove the package to FTP masters, but I don't know how exactly it works I never made a remove myself. I see that people is doing it But I feel sometimes afraid to just go ahead and ask for a more a package even if it's bad So maybe we need a guidance on that There are fairly simple ways of making sure that you guys should get the removal bug sorted out first You got to work out whether it's actually only affecting testing or whether you want to remove it from unstable as well And you're gonna look at the other reverse dependencies If it's a package with no reverse dependencies and you've got justification to remove it Then they find the bug from that is actually very simple and the package will remove very easily So it's it's how much? Trouble it cause the rest of the archive has to be taken into account when you're thinking about removing Maybe For it was proposed to have Something like mentors for all packages not only for the ones who have to be sponsored and I think that This could also be a risk because Having packages unstable is actually a way to verifying them because people use them And if someone uses unstable it's supposed to know that is it may Encounters problems or some and and not having not delaying the upload to unstable can both frustrate who is working on that package and And it's a lack of the testing that comes from having packages in unstable. I Tend to think that It could be useful to have Modify for example the patch tracker we have to So so that people can approve or say, you know that patches isn't good to to pet to the patches for for the various packages and the number of patches that are Confirm it or not can be a metric used to evaluate the package quality and And Maybe we could also think I Don't think it is a good idea to automatically File RC bugs to packages with a low quality index Because I don't think it's sustainable But maybe we we could modify a little the process process of migrating to testing so that A package in unstable we have we which has a low quality index and can be Can isn't automatically migrating to testing but Is done for example if the If the maintainer requests it With some good reason for this request or something similar so not automatic but not Neither Blocked so but you have to explicitly want it and I have good reason for it Yeah, are there any other questions? otherwise Well, it would be interesting to know The feedback kind of feedback from the audience if This is really a necessity for Debian or if You think it's not that necessary we can move forward without it or if this should be a high priority project for Debian to Introduce such quality matrix Or if you just think what fine if somebody does it fine, but I don't think that's that's important To get Debian moving forward well, I Just wanted to point out that there's a parallel to An idea that we've been toying around I don't know five or six years ago We called them upload certificates back then and the idea was basically that not just for ITPs But also for every single upload The changes file would have to have a certain number of certificates attached to it before they archive would accept it and I'm such you know for the open SSL case for instance One would say that open SSL is such an important piece of software that it needs to be reviewed by two or three People and it needs to have you know past Lindsay and checks and it needs to have the pure parts checks past and so on and every single one of Those things Constitutes a certificate that you attach to the changes file and unless the changes file has a certain minimum requirement The upload will just be rejected automatically and I think that this would fit in Well with that scheme and it would also solve some other problems and to answer your question whether this is necessary or not I Think it I think it would be good to have this sort of peer review quality Flow At the stage of uploads and it would include the ITPs as well as regular changes because there's not really that much of a difference I mean okay ITP is a new software in WN and you're trying to prevent crappy software from entering our archive and having Package inflation as well, but at the same time we could solve this at the same time that we solve this problem We could also solve the problem of Making changes, you know potentially late at night the influence of beer or something like that people might Not appreciate Okay, then I think I shouldn't hold anybody back from lunch Thank you for your contributions your ideas your feedback