 Good afternoon everyone. So Henry Guru is a guy with many hats in the biennial and I didn't expect it, but he will talk about new hats. He's a good young contributor of hats, so please welcome Henry Guru. No, but he is a guy with many hats in the biennial. Our chief weapons are developing weapons. The development of science is developing weapons. There are two weapons. The development of science and the development of science and all the other kinds. Three weapons are developing weapons. They've been maintained at the count and by the quote. Four most weapons. Amongst our weaponry are such elements as that the development of the American people. Artificial devotion to the social contract. Nobody expects that it contributes. Amongst our weaponry are such diverse elements elements as the integral step in maintenance, and it accounts back the forces and made it misrepresent of that. That's more possible. We clearly need another fact to make sense of all this. So let's make a new fact. I would like to create the fact of Debian. So that we can say, nobody expects Debian. Our chief weapon are Debian parameters full stop. A hat for everyone that contributes to Debian. Now the details are here. This is the file with notes. Is it readable at the end? No, you have much better eyesight than that. No, sorry, I open it in a different way. This is the text file with notes about what I plan to do. I would like to go through it with you and then edit it together with you. And at the end of this 45 minutes, I have a proposal that I'll try to implement doing that course and fail unless somebody helps me, probably. But anyway, we see how much of this we want to do, how much of it can be implemented with that course, and at the end of that course I'll take another talk about what we changed in Debian. So this is probably good. Okay, so the problem we want to solve, the problem I want to solve, you probably also want to solve once you realize your problem, is war wrapping. Now that we solved war wrapping, okay, so in Debian we have lots of contributions that we do not thank people for. In the sense that we do not acknowledge that people that did translations worked, we don't acknowledge it somewhere, people who did translations, people who record the products. There's lots of contributions outside of Debian, but if we look around the package production system, Debian developer package overview, it's all package based, because that's where we come from, and so we track uploads there is in the track, and we acknowledge uploads mostly. But there's so much more that we do not acknowledge at the moment. Also there is a need, there is, over the time we created so many ways of contributing to Debian, that outside being a Debian developer, so the head of the Italian translations in Debian is not a Debian developer, nobody know who she is. Well, I know who she is, Francesca told me, and at least two Debian developers that know who she is. I have no idea who's the head of the Vietnamese translation. Raise your hand if you know who is the head of the Vietnamese translation in Debian. I assume, yeah, we all know who the French one is, but I need a way to credit people properly. Debian is so much more than uploads and she's a wine boss. So there's this idea of creating a hot of Debian contributor, which is the ultimate duocracy hot. There is no bureaucracy involved. If you contribute to Debian, you are a Debian contributor. Automatically, there is no new Debian contributor process. If you stop contributing to Debian, you stop being a Debian contributor. That's it. When you are a Debian contributor, you get such diverse privileges as having your name on a list of Debian contributors. And possibly, if you are not a Debian contributor anymore, then your name will be on a list of Debian contributors in 2012. And if we want to be really fancy when you click on a name on a list, you get a page about that person that lists what they've done. And that's about it. That's all I had in mind. You report a bug, you become a Debian contributor. Then maybe everything you do for Debian is reporting a bug on the 11th of August, 2013, then you will be a Debian contributor from the 11th of August, 2013 to the 11th of August, 2013. At least you have knowledge, as I didn't have that in. Mostly I've said all this. In order to do this, there needs to be a way of collecting this information. There are data sources we can easily tap into, such as package upload, but we can see mailing-based traffic easily. We can get backtracking system activity, hopefully easily. We can hook into commit logs of Arliov version control systems. This system, whenever at any point of its usage will be unfair, because there will always be some bit of Debian that isn't yet providing data for the list. However, the point is that by having such a system in place, we create a proper way of acknowledging people's work, and we give motivations for teams to contribute contribution information to the list. Questions so far? Sorry, I didn't understand the second point. Wouldn't this raise privacy concerns? Good question. If we collect this data automatically, would there be privacy concerns? Yes, possibly. To be discussed, all the information presented here is already public. There is a public record of package uploads, mailing lists are publicly alive, but the BTS is fully in the open, commit logs on Arliov are public. I would only mind or get data from sources that are either public or that people agree to disclose. However, it's the usual issue of the information is public, but not presented in that way, for example, when I collected information from Debian change logs to see the history of people's contribution in the process, I did not make it that information publicly available because it's a bit, I would want to have a discussion whether we want to present that information collated in such a way. So yes, that is an issue. My general idea would be that once the system knows public, there should be a way for people to opt out of having information about that display. Most of the privacy scenarios, so generally people, I don't understand it generally, the default would be that somebody likes their work to be credited. However, there may be people that would have issues if some of their work is credited. Most things revolve around job hunting, like recruiters looking at the internet, and you either want to be found, or if you don't want to be found, you already have a problem when contributing to Debian. I need to be careful about contributing with different names or something. But yes, my assumption is that it's not a big privacy issue to get this started as long as we give people an opportunity to opt out. Does it seem reasonable? Maybe by default it's not opt in. I would be fine with opt in. I wonder how many people would know that they can opt in and how to opt in. At the moment, the current Debian infrastructure is not even opt out. You can't say please don't show the messages I sent to them in making lists. I would see it surprising that such a thing would be opt in, because everything's already out by default. On the other hand, I could be convinced about making it opt in. It seems to be, at the moment, something that would impact the start of such a project. So, before somebody gets a useful list, I would need to go and tell a thousand people opt in the system so that that list can get populated. Yes? I think you should also look at legal aspects. You are collecting personal information. I don't know the law by heart, but as far as I understand it, it would be illegal in half of all European countries. Then let it be illegal. No. Currently, you are not collecting this information. You have it at different pieces and at different places where it is necessary. You have it in a bug report because the person wrote it there and it's that... The buggy tracking system would be illegal. No. This would be illegal. You don't collect information there when you collect it for the person. There you have it for specific purpose and this is not illegal, but if you collect it and collect all information for a specific person or for all persons and collect it, this could possibly be illegal, as far as I understand it. I would like... You said that you are not sure. I would like to talk with somebody who is sure because we already have things that would potentially be illegal in that new, like the buggy tracking system goes beyond listing at some of the uploaders because it collects information about everything visible about the person, bugs and so on. Would it be... Sorry, we said it was useful to construct this kind of thing by having some reasonable set of people listed as contributors. You could start by assuming that people who have said I would like to be a Debian developer or I would like to be a Debian member have implicitly given their authorization to be listed as Debian contributors. It's a subset of that. You might have to... We might have to run that by a lawyer but it seems a bit more reasonable. I was just going to say I think the legal thing should probably be separated off because in practical terms if you are only wired from a legal point of view you just need to get someone who is in Debian in the US to do this on the US server because the US doesn't basically have these kinds of orders. The relevant question is not about the legal thing is the kind of retiring but about what we want to do and what we think is fair and so on. Well I keep thinking we're doing something much more lightweight than launchpad is doing. Another way to enable this would be to do this automatically without publishing the data but with afterwards sending about once a year or even possibly less a mail to every contributor if he wants to be listed. So we have an opt-in instead of an opt-out. If we do it seldomly enough it would not even be so stressing people. Yeah that makes sense. You don't need to wait a year you can do it the first time you detect someone new someone who starts contributing you can say hello if you did something in Debian we would like to credit you. Wait for a month after the first contribution if you get more then yeah say you want to be thanked. Makes sense. Even without waiting we touch my tongue for the next channel. You know the stats about which companies contribute to the new channel. It's like that they check for new commits and they just send an email to us for affiliation. There are other systems that calculate karma. Those are illegal because then they produce information that could be used to hack people from a job point of view. Would that credit only be for persons also for organizations and companies? Interested about entities directly contributing to that. So if there's a some email and GPG key which belongs to an entity called Fubar LTD then Fubar LTD will be credited. I'm not interested that there is a real person behind this. I'm not interested that that's the real name of a person. It's just whatever is chosen for contribution. That we only check identities when we have to. Which is when we give people upload rights. Because we want to know where they live and go there to tickle them. So I take it that the main index will be the email. Even behind the scenes I mean probably the main index will be the email. That is an issue because depending on the data sources the index could be an email. In the BDS for example an Allioff account for commits in Allioff or the login name for sponsoring uploads. So possibly there could be some need of mixing things at some point. Which is a bit of a separate problem. I don't mind starting with that not being perfect or being fixed by hand. And if we evolve at some point there may be a way for people to login and quote that they control that GPG here that email address or that Allioff account. You know send an email challenge and say that's my email. But okay it is. Yes okay then emerge the two sets of contribution. Possibly for next year because it starts being a bit heavyweight. But there's a problem of doing this collection. If he's aware of something that may greatly help the NIA team. So this in the end can be something that can be used to detect when people are not contributing anymore. I would also want to make it so that if you are a data developer you are not automatically a data contributor. If you are not a data developer you are not automatically a data contributor. You are a data contributor only if you contribute to that data. If you are a data developer that does not contribute to that data then you are not on that list. You keep being a data developer. There's the usual I'm not interested in changing the rules for getting removed from data curing. I think what we have is fine but it's perfectly all right. If I'm not active in that for a year but I still care about the project and follow things but I don't contribute anymore then I'm perfectly happy that I'm not listed among the contributors for that year. There was a fact. Yes I'm delaying a comment from IRC. Wouldn't recognizing every small thing like just one body report debuts the weight of a data contributor that so other data contributors like the head of a translation team being equal to one back report that's the comment I'm not saying I agree. The way I see it again you can tell me if you don't like this but the way I see it they are the same. Maybe one has been a data contributor for much longer than the other so the time span of the contribution is the only extra bit or you can click on a person and list the contribution but I don't want to calculate a number of how much one is a contributor. I don't see why one would want to rank people by contribution. I'm not interested in that. I think it creates a kind of community where I don't want to be in personally. I mean that would be my personal feeling but I would be able to say thank you even if a person reports a bug that takes time to report a bug. There was an idea while talking about this formally earlier of having some data sources used only to compute the time span but not the status of contributors. For example mailing list traffic. Many people don't like the idea that if you send an email to a mailing list you are a data contributor. There's indeed many people that send lots of emails to their mailing list but do not contribute at all. However if somebody is a contributor for some other reason then it makes sense to look at mailing list involvement to say well you report the bug today but you will actively mailing list for a year so you are a data contributor because you report the bug and you've been a data contributor for a year because you've been into that for a long time. So splitting adding some data sources only used for computing the time and not the status. I would like to add something about the issue with detecting the people because we had some experience in the team maintenance thing and in UDD you have this carnivore database which is based on the key fingerprint and we even have for this people somebody with five different names, namespellings and so on. People are using different email addresses so I really really really doubt you can manage this for say more than 500 people because it's manual work and we tried it and it's hard to cope with this. No no I don't want to do manual work. Yeah but yeah okay good luck with automatic detection. No I don't want to do that much automatic detection either. I would like the things to be fixed as much as possible and at a lower level of data sources when that is not possible I would like to offer people a way to fix the data for themselves. There's that unpronounceable and evil website that tracks place of the developers that in my opinion should be legal but isn't. It starts with O, O-L-H-O-R or something like that. I actually had an argument with their CEO saying I asked to opt out of their system and they told me to fuck off. So talking about privacy of these things and I guess if they can do what they do I think we don't have a problem to say thank you to a bunch of people. But they do have a mess their dataset is a mess but people still like that. I listed about 20 times their system and if I really cared about contributing then my identity for free for them to send it out then I can log in and merge these identities for them and I guess that's something we can offer and that can involve things like carnival and so on and possibly we can have personal updates for contributors. So they actively contribute to their individual identification? Yeah at least we've done all we can to thank them but well you know if I tell you thank you I don't expect you to say you didn't thank me in the appropriate way. Right? On the other hand I'm okay if people say well I also I'm the sort of person that they said that I don't need to thank me so many times and that's fine but on the other hand I'm perfectly happy if people want to use multiple identities for contributing to Debian. Somebody in Debian science may want to contribute to Debian games under a different name. For similar reasons I don't want to track the time frame of contributions below the month level. I don't want to say you have contributed to Debian games between 10 a.m. to 11 30 a.m. on a Thursday when you were meeting right? So it's I want it to be coarse it really needs to be mostly about thank you and about building reputation. It's reputation is also what we can reward people with which is kind of the point of the exercise and that reputation is nice to acknowledge and at the same time it's nice that we can look up someone's reputation at least as an account manager from best member if I can see what somebody's reputation and have a reasonable look then I can make the process much swifter for them and especially for non-uploading needs it's going to be very hard because I can't go and look at package changelogs to see that they've been active for 10 years in the process and if somebody is a non-uploading PDE that has no translations at the moment I have very hard time seeing what they've done possibly I don't understand the language so well and that makes makes it easier and that another outcome that I don't really want to see out of this is that we start to actually see in front of us that there is not just about technical development there's a lot more but people still perceive oh I'd like to contribute to that but I'm not a technical person and and many of us may have difficulty in pointing out the places but one can go and have a look at such a list of people and say hey look that there's people over there that do all sorts of other stuff it could be indexed by topic of contribution as well again uploading these are contributors that do uploading these are contributors that do translations that may be introduced at some point so that we can turn on a spotlight in several aspects of that in that we don't usually look into we just take it for granted. I have another question. Another concern more than anything one thing you have to say at the beginning is that maybe some people would be quite interested in this for for labor of reasons for work for creating a review and if it's as easy as to find a bag which can be very hard to do depending on the bag or very easy to do that depending on how much you put in and stuff wouldn't be wouldn't be able to be like calling people to put this running stuff or easy stuff there for just being in the list. Yeah well at that point it really depends on recruiters say that if you're a recruiter that just does a random google search and contacts whoever comes out like Google recruiters I keep being contacted about for things like managing clusters and if anyone's ever looked at what they do I don't manage clusters so if there's two recruiters like that then nothing you can do if people care to actually go in and to look at what the person has done then there is a way to find out what conversations actually were from just linking to the usual daily list of cards or uploads or commit logs and so on. So if I understand correctly you are ready to use the BTS as a source for the contributors and you could have a lot of nicknames that could be the the pope or plk8 or the name of a serial killer or whatever are you ready to use that maybe there could be some issue with people that would be falsely credited. Well I it depends we try and see what happens what happens it could be that for the backtracking system we need to have some extra intelligence built in to avoid crediting China manufacturing corporation or Viagra supplies for cheap. Mailing Mr. Kives Arvista somehow but again then maybe one maybe there's lots of noise and we want to to set a threshold that one contribution to the backtracking system is not going to be enough or maybe we can look if that bugger was closed no because you closed spam bugs anyway if there's craft if there's noise then we can filter somehow and maybe we lose some contributions but at least it wasn't for a lack of time I mean I don't want to go out of my way to thank someone I mean if I need to find out where you live to come to your home and say thank you because I don't see you anymore then or because I met you in a crowded square and you gave me directions but I didn't take down your phone number then it's unfair you won't be thanked when I find out there is my destination but that's life so yeah that maybe takes 10 emails to the BTS with different content to be acknowledged but it's possibly something to be I would leave freedom to the people doing an import thing something against it out of the backtracking system give that area freedom to work it out and and I won't I don't want anyone to be perfect the system is not going to be perfect if somebody would really like to be credited but isn't in the list and and what they do is report bugs I would be surprised because if if somebody wants to be credited only reported one bug I can say well just report some more I don't I'm not sure I want to have people in Debian that contributes in order to be credited if if contributing some more is a way to get credited then yeah we don't need to be perfect we just ask people to contribute some more or if people contribute the problem is when people contribute a lot but they're not credited that maybe we need to figure out how to import data from another part of that that we currently don't pack so well I'm totally not looking for perfection here have you sold about other ways to use mailing list archive because you said we don't always want to use it as a source of contribution but more to track the lengths of contribution but there are some lists for you on the FTA 810N translation list where I guess most people contribute are real contributors and sometimes they also use some code I mean the email could be notified like request for a view or something so you can add some hints on the type of contribution more precise than just sent a mail to the mailing list and maybe also for a user mailing list not tracking the question but only answers because why asking a question is not a real contribution to the event that helping someone else to use the Debian system is a contribution and one of the most important one to start this I guess decoding mailing list patterns fairly is not easy as far as I understand and I guess it changes from list to list but that could be solved by do adding a general mailing list source that only looks at time frame and having the translation team saying in our mailing list there's more semantic information that can be extracted so I'll make an extra imported from my team that decodes things and put it there properly as long as you can have many things done with data into the bucket then that works I currently have no idea well only take ideas on how what's the protocol to collect this data I would from from the central bit I wouldn't go much further than having a list of URLs from where I download something on a regular basis because that's the simplest protocol to identify whether some makes and whatnot and what's in the file that gets downloaded could be identity initial time final time if and then each data source makes a file available somewhere I was thinking of something super simple minded like that but then it's not enough because then if you want to look to make a link showing the contributions then well that needs something about how to how to collect how to make this data available from collection at the central point I'm only I want it as simple minded as possible I don't want the core of this to be about data collection about mine it should just be about collecting what has been mined last question just a suit on what you said there's a diversity project on the feed message which is very easy to have a bus collecting lots of information from various sorts of information among those there's already monitors that that I guess the BTS will be it's a source yeah but I guess even those evens could have made enough metadata like you don't fight a person the source is the feed message FED MSG it's from Fedora and the some of the students is trying to adapt it for a data just a stuff to a bus where you collect you get lots of evidence from various sources and maybe it could be a assuming that it goes further than this summer it could be an interesting use of this project okay as the mentor of this project I agree I think the students is arriving today so we're out of time so thank you and we're good yeah and you we can talk about this I plan to work on it during the datacons it shortly there should be Martin Ferrari coming who was also intending to work on a site similar to this and I guess we should all get together and think of something and feel free to stop me to talk about this and offer help or well that's the main thing I want to do at datacons and I want to get something done before the end of the datacons so we can present it properly thank you very much