 Yeah, I can't be on cells. I got good working at Red Hat. I think most people know me here, so maybe not. I'm working actually on LibreOffice and before that, openoffice.org and before that on StarOffice already since, well, 23 years now. And during that time, I think starting at 2001, I think I gave the first talk about that on the first openoffice.org conference and since then, I've always dived deeper into the source code with each talk until last year where I explained some details how I implemented table structure references and after that talk, I just saw blank faces staring at me and looked like nobody could follow me, I don't know and nobody either had a question. So this year, I wanted to talk about some new feature or whatever and then when I thought about it, what I would pick, I really had no idea what to do and I thought, well, maybe why not just introduce who is working on these things and from a good contributor's side and I'm not talking about features, you can read those in release note and anyway, but I just spent some time on drawing data from the Git repository to get some numbers and to either rectify or verify some assumption I had and maybe you already had from Iterloo the word longtail, which is about contributors to an open source project where you always have this curve. So few people are contributing very much, have many, many commits and then the longtail is the many people who contribute from one to ten commits, maybe and there is some number that if you take the number of commits and divide it by two, you get some number and that number of commits is most times done by three, two, six, to maybe ten people and for the entire Libre office, we have this number already fulfilled for the top five committers and which is just 1.7% of all authors and these contributed half of all commits and I just wanted to find out if that is the same or not in KUG, what do you expect? Would it be the similar distribution or would it be like one peak and then fall down and whatever, how many contributors are there? That's my idea. I assume we would be maybe five core contributors and ten people working a little bit more on it, maybe 20, 30, I don't know, who now and then contributed one commit or so and I was quite surprised to find almost the exact distribution also in KUG. We have there the 2,100-something commits and 1,000 centering is the half and the top four authors contributed half of the commits and the other half is done by, well in this case, 103 authors. So there are more authors than I thought actually so seems I missed every single commit of all the people who just did one commit or something because I never noticed them and yeah, it simply is the same long tail so if you just contributed one commit it means you contributed to one half of the entire changes of character like in all the other authors. Doing these numbers for Calc and the entire labor office and it just occurred to me that there's some fun factor in that I call it the fifth fun factor here. I have no idea if it is something meaningful or not and I checked also in the right end there it is about the same number that is the average number of commits per author in relation to the total number of authors. If you divide one by the other you get the factor of about five. It varies between projects between four and six something which of course is related to the number of top committers and the average number and this relation shifts this factor around. So my assumption is if this factor varies a lot and is much higher than these five then the fun of the project goes away because you only have people working full-time on the project and no one else is contributing to it. So you can do the math with the numbers for yourself. I'll give the actual commands later. So I'm not sure if this number has any meaning but it just occurred to me on the spot just with the I without any calculating things and then after that I did the same for the writer and found about the same number there. So the engagement and diversity in the project can be expressed actually by two numbers one other than the average is the median which means 50% of the authors commit more than this number and 50% of the authors commit less than this number of commits and then there is the ratio of the average to the median and the nearer that one goes to one the more conform the project is the more equal is the distribution of commits per people which means the nearer it goes to one the less you have this long tail but you have a more or less flat line. If it's one it's a flat line exactly or almost exactly and so here we have numbers for example for the entire labor office project it's about 13.5 and for Kalk it's 6.7 and for writer 9.5 so I think same numbers are between 5 and 15 here. I'm doing the same for a lot fund project so we have a particular project with four authors where each author contributes more or less the same of course one is always more busy than the others but there are only four people working on it and they contribute about the same amounts you have in this case the top two do half of the work the other half of the work and actually the fund factor I had before here is not five but it's 0.016 so there is no fund and this average ratio to the ratio from average to median I mentioned on this slide before is exactly one so the average is 250, median is 250 exactly one flat line and so there is no diversity there is no people coming in doing something wondering of coming back after a year whatever it's just steady work so there is no diversity and this project has no fund factor the commit types and calcs so of these 2100 commits we have 444 code cleanups and fix it mainly by Noel Drennan with all his plug-ins and Stefan Bergman coming along fixing things after Noel broke them whatever and we have 42 covariate bugs for example mostly done by Cuellen and we have 570 bug related commits bug related here just means the bug number is mentioned somewhere in the commit summary and this of course includes also enhancements or feature requests and whatever so these things were touched by 77 authors which is about 3 quarters of all committers of that, 114 commits are in the QA unit test directory only the bug fix related unit tests so that basically means these commits actually edit most times edit some tests that is related to the bug that was fixed we have over a thousand commits without bug numbers which means just someone implemented something changed some code or removed some code whatever without there being a bug for it just out of interest or just just to implement something like I do I move things around or change code whatever without a bug number I just commit things so half of the commits are just let's do something we don't have an actually bug reason for that and that can be actually anything it can be a small change to the largest scale feature whatever it takes of these over a thousand commits we had 419 commits in the unit tests which is quite much and also given that 30 authors are involved in that it's quite a high number and of these 419 or 219 the new spreadsheet function tests that RAL does, then they always forget the real name, only remember his actual nickname these are all the new function test documents we have so for each function there's one flat ODS document that tests this function or can test the function in various aspects with different parameters and things so that still leaves 200 unit tests or unit test commits added to the entire collect that tests something else without any bug number to improve the quality then we have the commits that actually mentioned some TDF bug number so we have 566 commits with a TDF bug number and ranging from very old commits like 30,456 which is about, I think it was reported already beginning of 2011 and also the next one also at the end of 2011 until the 100,001 something which is just this summer and both the oldest ones are one enhanced merge cells dialogue to empty cells where you can now choose between whether you want to empty you want to combine the first cell or keep the strings in the cells where they were just improve the precision of the MD term function which in some cases didn't produce an exact number or should it produce one but not whatever and in these 566 commits with the TDF bug number there were 327 unique bug numbers so many bug numbers were mentioned more than once in the commit summary and in the top 5 there were my own implementation of the white cards that I did for Excel compatibility and ease of use for some users that we can use instead of the regular expressions and then of course many commits for the new Excel functions that Winfried implemented unfortunately is not here maybe we will come to Fossum next year and well translate German comments are still ongoing it seems we still have 10 commits with that bug number so we also still have German comments I just saw some the other day and so on some bug numbers with E7 commits and so on the list is very long I can show you if you are interested I put all in the pivot table where they are counted then we have next of course we have many lines changed so under the entire SC module we had 583,000 no, 58,385 lines removed but 7168,440 lines added so quite a big ratio also split this into the different sub directories core filter UI and QA and for the graphs the top graph shows the stacked percentage graph there and the middle one is including these function tests from RAL so you actually see each function test document which is about, I don't know 40,000 lines something it's a flat ODS it's an XML file and of course the lines are duplicated and multiplicated whatever and so stripping that out to get a real picture the third column there is the actual distribution without these function test documents so you can have some picture of how the changes are distributed over the core filter UI and QA directories and the graph below is just the same just with the actual number stacked not as a percentage so what is visible that is for example in core much is implemented more is implemented than removed for example in filter it's a little bit the other way around removing more code from the filters and actually adding which has the reason that the XLSX filter code originally used UNO to call back to the calc core and we are replacing that bit by bit with direct calls and then weeding out the UNO stuff and things like that so that actually gets less code instead of more even though we are able to import more things than before you can also see much is going on in still a new UI there and well the include directory always is touched as well if you do changes so that's not that important some commands are used to gather these statistics so when I place this online you can do that for other project as well if you want so basically always get short lock and get lock and of course for line changes I just did a get diff based on the commit one year ago and so I do some SED magic there to form a proper line there so you can just import it as a CSV file and evaluate things in calc to generate graphs or get statistics whatever yes all these are done without using .mailmap file which can be used for get lock commands some authors can have different email addresses or even use different real names and you can add these to the .mailmap file I didn't do that I just skimmed over it and I think we have about three AESs in calc so I just ignored that and there are a few more on the entire project we have about ten AES cases where authors show up as different authors even if they are one over the entire project so one could add these but what mean to skim over a thousand people and spot which email address could belong to the other guy as well yeah I didn't do that just one fun graphic the last these are the commits per hour of a week this is from the entire project actually that's not only calc so that's just what hours of the daytime do people commit and of course the night times are very rarely used but even there at four o'clock or five o'clock or six o'clock people do commits there's always a local time that's not UTC and we have actually one spike that's this huge red pile and that strangely is Tuesday at eleven o'clock I don't know why so maybe people who were working over the weekend and still take a look on their code on Monday again and then finally they're committed on Tuesday maybe I don't know the idea is not very visible here it's a light blue one it's always around seventeen in the afternoon so before people leave work I assume so these light blue ones so you see in the background there so that's the other spike there and Tuesday was funny so questions you're able to answer mean questions get mean answers