 Hello, and welcome to the Debian Contributor's Talk where I will present the status of the Debian Contributor's idea towards the end of the conference after we worked on it or talked about it for most of the conference. I spent about half an hour trying to lay out bullet points in the open office and then I thought it was too hard and there's already something that does bullet points very nicely indeed. So I switched to a much more advanced bleeding edge presentation tool which I think is fantastic and we're trying it out. So Debian Contributor's, a revolution in Debian project headwear fashion, were making new hats for the Debian project, virtual hats, I mean titles. I presented the idea at the beginning of the conference and so to start the discussion and then collected a lot of input on it, did a little bit of prototype code as well and now I'm going to show you. So in Debian we have many kinds of contributors. Debian developers, Debian maintainers, Aliot, Committers, bug reporters, Wigging Contributor, translators, mailing list contributors, sponsored uploaders, package reviewers, language reviewers, conference organizers, events people, press people, system administrators, service developers, video team volunteers, conference organizers, food people, barbecue people, poetry people, dance teacher, massage people, cheese and wine organizers and I'm sure I forget lots. So it's too complicated, therefore let's just do Debian contributors. Easy, that's it. Innovation. Everyone that helps Debian is a Debian contributor. You just get a hat by doing things and that's how it should be. I wonder why it wasn't that before, but anyway nobody noticed so now we do it and we say we invented something. We don't say that we missed something, we say that we invented something. So what is a Debian contributor? There is no process to become a Debian contributor, you just contribute to Debian and you are a Debian contributor. And when you are a Debian contributor you get such diverse elements of power as having your name on a list, which is very important indeed. You can also choose not to have your name on a list. You can see a summary of all your contributions, get acknowledged for your work, get thanked for your work, get reputation for your work. It is unacceptable that we still fail to do this for so many parts of Debian. And then we all think that Debian is only uploading packages because the only thing we can do, we can see that happens in Debian is package uploads. Hey, there is so much more, right? So the idea is that there is a central place that shows the list of Debian contributors and this place gathers data from all of Debian. But the central place does not do data mining, it only accepts submissions. Debian is far too diverse. It is impossible to have a central place in Debian that knows everything that goes on and can understand commit logs from translators or perhaps email subjects in translator mailing lists and the dark database and the bug tracking system, hacked mailbox files, and I don't even want to go into more details. So each team knows what goes on and they are the people that best know about communicating that. So the idea is that each team can submit a track of contributions to this Debian contributors list and of course it's going to be as easy as possible, otherwise teams will not do it. However, many teams did not track contributions, not because they didn't want to, I think, but because they had other stuff to do. They didn't have a strong enough motivation to invent something because, well, if you are the games team, it's very good that the thing you mostly focus is in having good games in Debian and the point of a team should not be to list the members of a team. That should not be the priority of the team. However, now if there is a little motivation to tell who does work in the team, then perhaps people in the team can feel more a part of Debian, can actually feel that they are Debian contributors, which they are. So this gives us a chance to make visible a larger range of contributions that at the moment we cannot see and I very much look forward to that. So people contribute and then maybe after some time people don't contribute anymore and in Debian it's a problem to track when people stop uploading packages, for example. So in here, instead, it's very simple. When a contributor is listed in the list, then we also know the last month that they contributed something and if people haven't contributed after year 2012, we can move them in a list of people who contributed until 2012. It's so easy, right? So the list manages itself when people become inactive and inactive people still get credited, which I think is important. I forgot to add one thing. If you are a Debian developer, you are not automatically a Debian contributor. You surely have been in the past, but if you haven't done anything in the last two years, then you are a Debian developer that hasn't contributed to Debian in the last two years. And it's good that Debian developers are the same as everyone else. If you want to be a Debian contributor, you contribute something to Debian. It is okay to be an inactive Debian developer. It has been observed that some Debian developers had kids recently and perhaps for a year you have to focus on kids or two years or 18 years so it's fine to be a Debian developer and not a Debian contributor and then at least you get credited for what you do. It's a different thing than being trusted by the project to vote, for example. Now, design choices for this system are not about perfection. We make the idea is to... So when thinking about design choices for Debian contributors, I think it is useful to make a parallel with thanking people. So I want to make a good effort of thanking people, but I do not need... I do not want to go out of my way to thank people. If I were to thank a friend for driving me home because one evening, I don't know, my car wasn't there, whatever, then I'll just say thank you. You drove me home. I don't need to say thank you for driving me home tonight and going out with me yesterday and a week ago I recall we exchanged a few words on the phone. A year ago I needed some change for a drink and you had it ready and list all the contributions. I would have no friends if I did that every time. So it's good to make an effort to thank people, but that's it. We shouldn't... Listing all the contributions of people goes beyond the scope of this project. And if a person isn't credited but can get credited by contributing a little bit more, that is okay. We don't need to go and look for the single contribution that if we miss it, that person can't be counted anymore because it would be very hard to do. We would need to try to think about listing contributors if they wrote anything to the backtracking system. There is spam in the backtracking system. So things need to be filtered somehow. So it's perfectly okay to say you need to have 5, 6, 10 males in the backtracking system over a span of a month. If that keeps spam away, then it's fair that it's okay that we cannot thank people that report only one bug. It's a shame, but on the other hand, it means that such a system can actually be built and we don't focus on spending one year trying to get real contribution and not spammers outside of the backtracking system. However, if a person has done a lot of contributions already, then we should find a way to credit them. So if we find out that there is a team that hasn't started to contributing to this once the system is deployed, we should help that team to thank their members. And I repeat, we do not need to be perfect. I repeat, keep that in mind in the Q&A, in the question and answer session after the talk. We do not need to be perfect. So, you know, sometimes it's worth saying at that point that the intention is not to be perfect. Otherwise, you get asked very corner case questions by very precise people. So yeah, we are somewhat lossy. We are saying that person has been contributing to Debian from 2005 to 2013. That's it. Actually, you know, that person has uploaded packages in that time range. That person has made translations between 2008 and 2012. That person has been active in the backtracking system since 1998. That's the sort of thing that I would like to see on that list. So course time granularity, a month will do. There is no point to say thank you for buying me a drink yesterday at 8.37 and 52 seconds at those coordinates. You don't thank people like that, right? Just thank you. You bought me a drink recently. That's fine. Only begin and end months are stored for each contribution. It's okay not to track gaps. It's complicated to implement something that says you've been active from January 1998 to March 2001 and then you've done nothing from March 2001 to April 2006 and then from April 2006 and June 2006 you've uploaded one package and then from... What's the point? Well, thanks for uploading packages between 1998 and 2013. Yes, everyone's been inactive for some gaps. I don't upload packages all the time. Sometimes I sleep, right? Just storing begin and end, it's enough. You have contributed to Debian in that time range. The system does not want to say how much people contributed to Debian. Only that they did. And there could always be a link from here to a team page which has details about contributions. This system should link to those information but should not try to import them. It is okay if new contributions take a day or two to show up on the list. I would like that data sources can just put their data mining thing in a cron job that runs once every night. There's no point of doing anything more complicated than that. And if one night the cron job doesn't run, it's okay nobody needs to rush and run it manually. We can just wait for the following night. The time granularity is a month. It's okay if it takes a day or two to show up on the list. And I don't think people should contribute to Debian and then start refreshing the list to see if they show up. The point of contributing to Debian is because somebody has a reason to contribute to Debian, not because they want to end up on a list. Otherwise they can just contribute to a wiki and add their name on a list. Actually, we could make a separate tool which is a site where you add your names on lists so if people really want their best goal in life to have their name on a list, they can go there and list themselves. But that's not the point of this. This is just to look at people's contributions. Yeah, it's okay to miss small things. I think that's been mentioned before. And in terms of requirements for teams, these need to be very simple. If a team spends more than a couple of hours hacking a script to send me stuff, they'll give up. So submitting data must be as simple as it can be. Some teams can track history. So FTPmaster does have a database with every upload that people have ever done in their life. So that can be queried and they will know that I've started uploading in year 2000, whatever. Other teams, they will only know who's a member of the team today. And they cannot tell me that person has been a member since that date. And that's perfectly okay. So both kind of teams should be supported. And of course, there's the problem of privacy. People, if I report a bug, perhaps I do not expect that my name shows up on a list of Debian contributors. Perhaps I was just typing something at the terminal. I didn't even have any idea that a mail would be sent out. In fact, I don't know if people, when they first report a bug, realize that their email address will be disclosed in a publicly archived Google indexed website. So people should be able to know that they are going to be thanked and choose not to be thanked. If, for example, they contribute to Debian during their boring paid time as nuclear facility monitoring people. And so ideally, you start contributing to Debian and after some time you get an email saying, oh, I noticed that you started contributing to Debian. Thank you very much for it. We would like to list you in that site. If it's okay with you, click on this link. That would be the general idea. We should not publish email addresses by default, otherwise we create a nice tool for spammers or for Google recruiters. Which... And we should allow people to manage what is credited to them. Maybe I would like to hide sometimes of contributions because my boss thinks that I only do cloud stuff and then you can see that I also contribute screenshots to the game team and maybe I don't do it in paid time but my boss is a bit weird and it's just best if they don't see it. My life will be easier and I don't want to make people's life harder because I thank them. You can be very obnoxious if you thank people too much. Actually, I worked for a month in teaching web app development for a course and we were hosted by a center that dealt with psychiatric people, people with mental problems. And so we were together with also the people that were in the center. So one of them said that he liked puzzles very much, the broken picture that you put together. And so one of the students of the course gave him a puzzle and he was so grateful that every single day they met he would thank him. Very deeply and heartfelt thanked, it was very hard to hate him for it but then every time we would go for dinner that would be that minute of thank you. And sometimes we were teaching and the door would open and he would get in and say by the way, thank you so much, I really appreciate your gift. That was too much, thank you. So people should be able to say, okay, I've done this, fine. You don't need to thank me for it, actually. I'm fine just having done this. Yeah, people should be able to manage their online identity and what is visible for their contributions. However, since I am Debian account manager and I need to say whether a person should be a new Debian developer or not, it would be very useful for me to see that somebody has only uploaded one package yesterday, whether they were active in translations for the last 10 years. I should be able to see that because then I can say that person must have some experience of how Debian works, although they've only uploaded a package yesterday. So in the last 20 minutes, I'll show what we have made at Debian so far and take questions. I can take five minutes of questions on the general idea and then move on to show some details. Demo, demo. Okay, oh, this one was good. So this is the data model. What we are collecting. For one data source, and for data source you can think the Wiki team or FTP master that manages package uploads or the Italian translation team or all the translators. For each data source we have a name of the data source, so Wiki. A description, the Debian Wiki. IURL, http colon slash slash www.wiki.debian.org and a list of possible contribution types with descriptions. Now for Wiki this would be editing or admin. For FTP master it could be uploading, sponsoring. For translators could be translating. For other team may have a complex workflow and there could be reviewing and so on. So what kind of contributions? This person has done uploads to FTP master from that date to that date. Now these are all the kind of contributions you can do in that source and it is used so that you get a description to show on the website. And that's it. A description of a data source should just be this. Something to show the name of the data source and show a description of all possible contributions that can be done in it. And then data about one person doing something. Is it any readable? Okay. So this has a list of many kinds of contributions. So we are saying that the person has done uploading from this date to that date and perhaps a list, a URL to the page with tracks uploads in that source. And that's it. Now that describes the kind of contribution and the time it happened. And then we need to identify the person somehow. And so there is one or more but generally one identities. Identity could be the login to Debian systems or to Alliot, email address, GPG key fingerprint, wiki name. Something that identifies a person in that data source and then the name. So the person with this email address has done that from this date to that date. That's it. The idea is that this should be submitted via HTTP post. Because if we have each data source make it publicly available on the internet, we risk disclosing people that people would not want to be disclosed. So a data source will collect this information and then post it via HTTP to the central Debian contributors list together with a secret password so that you don't wake up one morning and start contributing trash to the system. That is the general data model. Understandable? Then I can show you the database model. If you have a person, a person would be me. That does many things with many different identities in Debian. Identity not in the sense that one day I call myself Enrico and another day I call myself John. But identity in the sense that one day I call myself Enrico at Debian.org and one day I call myself GPG fingerprint with ID 797EBFAB. Because we don't have a single naming system in Debian. So it's hard to tell if an email address corresponds to a key fingerprint. But a person can have many identities and choose to not show up on the list. Then we have identity, type of identity, login, email, fingerprint, wiki name possibly more will show up. Name, what person has claimed that identity and again I can choose to hide everything done in the name of my email address or of one of my email addresses. So I can choose to upload game packages with one email, non-work email address and I can choose to upload cloud packages with my work email address and then I can choose which of the two I want to show. Probably both, maybe one of them, it depends. Now there's nothing wrong in working with the game team. It's just an example. Yes, a question? A microphone for sir? Can we have a microphone? Will it be possible to maintain something like two identities like one private identity and one work identity? Ideally, yes. I would exactly like that to happen. So in Debian when one becomes a Debian developer there is one identity because Debian developers take responsibility for everything that happens in Debian and we need to know where they live, sort of. Actually we don't, but we don't tell people that we don't. So generally, yeah. Anyway, but here we're just saying thank you so there's no point, there's no reason to be that strict. So yes, definitely. Now with GPG key ID it gets tricky but then I can hide contributions that only show up by GPG ID and only show the one by email. I can only manage two person records and associate different identities to one or the other. That's the idea in the database model that does not exist yet in interface but I would like that to be supported. The model I have in mind is that you see contributions by email and you say that email is mine, claim this identity and you get sent an email with a link, you click on that link and you attach it to your person or you say that person is me, that other person is also me, make one only. Details to be sketched out, yeah. So is that a bit like the OLO website that tracks open source contributions on different VCSs? I mean do you can actually claim an ID in a VCS? A bit similar to what they are doing but you are the only one that's allowed to claim and OLO also allowed other people to say something about you which I find dreadful. But yeah, that would be the idea. By default each identity reported by data sources is separate. I can make an effort to merge some. I can take the key ring and check all the identities on a key that have been signed by a Debian developer and therefore say all those correspond to the same person and to that key ID and then automatically merge by default. I can do something like that. Perhaps I will have to do something like that because otherwise I cannot see a fingerprint and then send a person an email because the fingerprint is not an email, right? But, well, it's a bit to be seen. I don't want to do too much, yeah? A couple of suggestions I have. One is, so first I'll give you a couple of suggestions that I don't think matter very much and talk about things that do matter. You can decide to throw those away. First, I would suggest changing your identity name to identifier name because identities are kind of about humans. Cool. Secondly, I find it kind of weird that you're not using RSE 822 in your data format. If you could, that'd be cool, if not, whatever. RFC? Like email headers, key value email headers. What? A key fingerprint is not an email. But what I'm saying is you showed us about five minutes ago the data model, the syntax for what gets posted to you. Yeah. And that looks like JSON or something. Yes. If it could be RSE 822, that'd be cool. Oh, the data format, do you mean? It also doesn't matter, so whatever. Possibly more than one data format as well. Some data sources may be very happy serializing to JSON, some to YAML. And it doesn't take much effort to... Okay. And then I've been working with some people to start a Debian welcoming team recently, which we actually have to emerge with a different welcoming team, but that's another story. But in that welcoming team, it seems like we would also like to have the historical records that you have. So maybe we can talk afterwards about if other internal teams with some privacy requirements can use the same internal data that Dan would get. Yeah. I could see that the missing in action team using this, even just the user to identify mapping, it could be useful. Obviously with some privacy requirements. Otherwise, we are making some nasty echelon of doom. Just to make people a bit less uneasy about that, I guess the idea is that if you have different persons with different identifiers attached to them, nobody will match them from the system, that they will keep separate. Yeah. So if you want to have these separate things separate. So this data would be private, and the idea is that it shouldn't be exposed or leaked. Speaking of exposing or leaking the data, in addition to having other teams have access to the database that you have, it would be nice if other teams, like the welcoming team or MIA, might maybe have access to the raw data that you have been given. So, I think that's it. Yeah. No problem with that. The raw data is stored below. So that's the data source. Name, description, URL, authentication token for posting. Shared secret will do. SSL can do the encryption. A contribution type. So for one source, if I say uploading, means this description. And one contribution of this type will be described as this. So this is, oops. This is package uploaders. This is uploaded a package. This shows up on top of the page. This shows up for each person. Then for each contribution type, I need to have settings that is for this identifier and this contribution type, I don't want it to be shown. So do not show wiki contributions from this wiki name. I was silly at the time. And then the contributions, which have a type, which is specific to a source, an identifier that has done that contribution, begin and time, URL for tracking all package uploads of that person in the team site. And of course the URL is optional. It is a very simple database model. Would somebody want to see it in SQL? Was it, it would be good or was it, we are good? Okay. So I've done a random generator for contributions. Which is good for testing. Now in this case, the data source does not only knows when the contribution began. And the end defaults to today. In some other case, is there any with begin being? No, of course not. Oh, there you are. This data source knows that a person has done a contribution today, but doesn't remember when it began. So begin can default to the previous time I had in the database. A data source does not need to track history. It all gets imported nicely. This knows that that person is a sponsor. Doesn't know when it began, doesn't know when it was last a sponsor. But we can assume that it was a sponsor until today. And they hopefully they'll take that person off the list when they stop being a sponsor. Finally, I've done an import function for this random stuff. And let's find a web browser. The beginning of randomly generated data, obviously. This needs a lot of thinking. At the moment, I show emails. I should not. But then what do I show? As I start getting data out of data sources, what will be shown? I will have a better idea of what I will show here. Obviously, I cannot show key fingerprints. So I will need to retrieve somebody's key from somewhere and have a look at the primary user ID. But then I cannot look at the primary user ID if it has not been signed by anyone, otherwise people may troll the system. Or I can let people troll the system. We have their key ID. So we can say that's the last time you upload to Debian. It depends on how it works there. Those are considerations that are yet to be made and can only be made once we have real data in front of us. Links go nowhere. That's what I managed to do this morning. But I already have the system that can import those kind of submissions, compute the missing begin and end fields, and show people on the list with the Debian website team and compute the last contribution date. The day should be hidden as well. There's no point showing the day. Lunar? Easy thing. If you want people to obtain, being in that list, let them specify, let them give the system the name they want to show in that list. Oh, yes. Totally in my plan. Absolutely. Tricky is how to let them know that they can log in. So how do I call them before they told me how to call them? But when my gut feeling is that when I actually get to do it, the result will be obvious. There's no point in thinking too much before starting in this case. So with the demo, last round of Q&A, if there is still time? Three more questions, I think. I have a comment about... So it's a list of contributions, but what we really are introducing is a new concept in the project. That's what I care more than technical details about the list. That's why it's so loosely, it's so imperfectly specified. I want to have a concept of Debian contributors. I want to be able... I want that shortly in Debian, shortly in six months, in one year in Debian, we will be able to say, oh, we are setting up the welcome team. It's cool. And do we really need Debian developers only in that team? There can also be any Debian contributor with some reasonable experience. Any Debian contributor can do it. Any Debian contributor can join that team or any person can join that team if that team can take new people. But have a concept of Debian contributors and all Debian contributors can do things, are recognized as a resource to the project. I have a question, but it's really not for Enrico, it's for the audience. So I think this sounds great. Does anyone here not think this sounds great? Okay, cool. So I think we all love this idea. Thank you.