 Hi and welcome to this session. We set up this session to talk a little bit about all the analytics and all the metrics that we collect within the OpenStack project from the developer side and from other aspects but especially focusing on the development activity and get a little bit of an overview of all the tools that we have right now and use this time also to try to gather feedback from consumers or users, people that actually look at that data to see to get a better understanding of what the needs are for this data. I'm Stefano Maffulli and I'm the community manager of OpenStack. I'm mainly focusing on developers and the development of OpenStack and we have three of the people that are mainly involved into building this metric system. So I'll give you time to introduce yourself. The way we want to structure this conversation is to have an overview to get this chance to, members of the panel here, to talk about the tools that we have in place and then have a time to discuss about why we're collecting all this metrics and get feedback from you. Thank you Stefano. My name is Alex Friedland. I am the co-founder and chairman of Mirances where I these days focus on our community efforts and participation. Hi, I'm Dan Stengel I'm a software engineer at HP and the open source program office and I've been working on the Git DM analysis tools for a little more than six months now. Well, I am Daniel Izquierdo. I am co-founder of Viteria, a company specialized in open source software development metrics. We are in this case in charge of the activity board. Perfect. So when I joined OpenStack a couple of years ago, I was interested in understanding very rapidly the level of activity of every company inside the OpenStack ecosystem so that I could help newcomers being more effective and more rapidly effective into discovering the places where they could focus their efforts on and but also understand what the existing companies, what they were doing inside OpenStack because I had a hard time even understanding what NOVA was. So who was mostly active in NOVA, who was increasingly active or increasingly active in glance or someone else. So that was my drive to start looking into the data of OpenStack and then slowly things have evolved. So Alex, why don't we start by looking at what one of the tools that we have available now to measure things inside the OpenStack community that you have built. Thank you. Do you want me to stand up? Sure. I mean, okay, very good. So when we started to actually do the work in the community upstream, we decided that we needed to introduce metrics internally and we basically developed a tool that we used internally that we liked so much that we just made it available for everybody else. That's how it was originally born and I think it's here somewhere. That's right here. So there is a website called stackolitics.com and the tool that's called Stackolitics. And the idea behind it is that we wanted to take data from the publicly available data sources, you know, from GitHub, from Garrett, from mailing lists and the first thing and then look at this data from different angles. So you can see, you know, you can look at the lines of code that people write. You can see who the people are who commit at a specific time frame. You can look at number of commits and, you know, those are the different dimensions here. So let's look at something very simple, right? So let's look at Havana, for example, and you choose Havana here in this drop down and then you decide what you're going to look at, right? And so here OpenStack itself has, you know, there are multiple definitions of what constitutes OpenStack and so you can have just core projects or you can have projects in incubation. We decided to list documentation separately. Infra is also a separate group. Then there is, you know, Greater Stackforge and that's where people put different projects that sometimes are just innovations that they put around the OpenStack ecosystem or something that will later make its way into OpenStack proper. And so you can look at each of that separately or you can look at all of the innovation that's happening in the Greater OpenStack ecosystem on Stackforge. So let's just kind of look at, you know, the larger metric and what it shows you, and this is a Windows machine and I'm unused to Windows now. It's been a while, like two years after 24 years of using Windows, Microsoft, I switched and I have hard time remembering. So basically what it shows you is, it shows you the overall activity so you kind of see, you know, how active people are and you can see that, you know, here's a freeze and things are going slower. Then the companies that are actually contributing and this particular metric is commits. So you look at how many commits and then, you know, the number of commits here per company and then you can do a drill down and see who made those commits. So, you know, we can click on HP and sure enough, you know, Monty Taylor would be the number one guy as we would have expected, right? So he's right there listed and then you can drill deeper should you decide to go there here and, you know, see and, you know, where Monty from HP committed and so on and so forth. So basically it's a very primitive analytics tool but it's also very visual. And the reason we like it is because the data is coming directly from the public sources. The source code for stack analytics is available on Stackforge and anybody who wants to contribute to it can and in fact the philosophy that we wanted to subscribe to is that statistics of a project like OpenStack should be completely transparent to anybody who is participating in the project and it should be governed by community just like any other project. So another useful feature of Stack Analytics that we use it internally for is in open source to be successful you have to build community. So how do you see how is community built? So when we first put it out people found out about it and, you know, we evangelized a little bit and both unclear if it can be a popular tool. Now today you can just go to a module here and you can say Stack Analytics and it should come up. There it is, Stack Analytics. Click. And they can take HP out for this particular report. And sure enough in Havana most of it has been done by us but you can see there is already a very large number of companies that are partaking in it and adding things to it. So there is somewhat of a community building around it and we're hoping that in Ice House and let's see what's happening in Ice House the numbers will be larger and you know see our participation goes from 80 plus percent to 70 plus percent and I'm hoping that's going to continue to drop down further because we want everybody to participate in it and it's a platform that you know all of us you know should really you know in our opinion all of us should develop further and use for ways to you know see statistics about OpenStack. So finally the statistics that we see today is based on blueprints completed and drafted commits emails in the mailing list lines of code reviews and top mentors and that's just the beginning and whatever other ideas we have we can put here and hopefully you know you guys here and credit guys in the community would offer ideas that can be integrated here and join in the development. We just spent one minute on describing the the last one the top mentors. So I believe the top mentors is a heuristic that we use to so let me take so who the top mentors for Ice will for Havana would be. Oh yeah. We'll clean this and we'll see a lot bigger so you can see the companies and the individuals who are actually you know doing reviews and I believe so here if you look here you know these are the people and so it essentially shows you you know the the the number of reviews the largest number of views and in the metrics and ratios and I don't remember the exact the exact heuristics we use but essentially this is the level of usefulness that people have over the overall number of reviews and then how useful those reviews were to the community. Yeah it's I like it because it gives another another view of how the reviews are of all an important activity inside the OpenStack community which is reviewing somebody else's code it's something that we have built into our our systems from the very beginning and we wanted to be very open you know have committers but have people that anybody can submit code and but make sure that that code is actually good with with a public reviews. So we do we do the overall metric on reviews you can just do reviews on you know each individual projects but I think this is the first time top mentors is the first one so reviews is right here. Yeah but top mentors is the first time we're doing some heuristics which is you know like the effectiveness of reviews and there is the algorithm is actually described in the wiki page I just don't remember it of hand but the idea here is once you get the data there is all kinds of things we can do with this but again it has to be done in a transparent way for everybody to consume. Cool Dan do you want to tell us what what you're doing? Yeah so I won't take too long on Git DM but just briefly I'm Dan Stangle and Git DM is is kind of the Ford Model T of contribution analysis tools it's very simple fairly reliable and you know easy to use but it doesn't have a whole lot of features and just about anything you want to do it doesn't go very fast and it doesn't look very pretty but it yeah but it does the job it'll get you there and so that's and when when HP asked me to to start running and and compiling community analysis community contribution metrics that was kind of the only game in town so we sort of took over from where Mark McLaughlin left off and I mean like I said really simple so that this on the command line is basically all you do to run a Git DM analysis but it doesn't have the the nicer UI features that you find in the Patergia tool set or in stackolytics and so you know I don't know it doesn't have a lot of potential for future growth because almost everything that we've had to do to get DM to munch it to work with the OpenStack community architecture or framework has been somewhat kluge and so I think there are newer better tool sets available some of them built in-house or within the OpenStack community that that in the future may provide a better option for us but we'll keep working on Git DM because it works and it's there so you know it's not going away but it probably won't look any better in six months could you so you mentioned that HP asked you to somebody HP big big demand asking to compile statistics what kind of what kind of data are you collecting and why yeah so the kind I'll give you a quick sort of glance Git DM doesn't produce these charts unlike stackolytics you kind of have to do this by hand importing CSV files and stuff but Git DM sort of covers all the most basic first-order metrics so lines of code number of change sets for the most part launch pad defects although that's sort of broken right now for some of the projects and Garrett code reviews and and so it's kind of just the basic by-the-numbers canvassing of what's happening in the community and what what why your your company and why is it was HP and is HP interested in getting this data yeah so I mean a lot of people ask that question we have over the last couple of years HP has I think somewhat obviously ramped up our investments in OpenStack dramatically and so there's a lot of things that we can do internally to gauge how well we're doing and how much we're able to generate and contribute back and this is just sort of another in this case external measure of what we're able to to do in the community and it also frankly gives us a better picture of what everybody else is doing so you know it's valuable on lots of lots of different levels cool thank you so Danny I prepared some slides here I also have a live demo if you want to have a look there so what we want to provide here as we were discussing with Stefan on some other guys from the OpenStack community is to give development data about how OpenStack foundation is performing then how companies are going around or how everyone is still in there is as Joshua said in the previous talk was something like chaotic plus so I think we can we can try to understand this and try to have a path all together to provide such information so in this case in this tool we are providing some basic and some advanced metrics this is just an example we can go later to the live demo which is this case you can have here information about the whole life with all of the projects aggregated measured in commits so there's something like 50,000 commits and then you can have information about the seven days change or 30 days change or anything else so you can see for instance that during the last year there was an increase with respect to the previous year of 73% which is quite a lot and then you we can have something a bit more elaborated because what we are doing in the end here is we are counting potatoes we can say okay this somehow these guys are from this company these other guys are from this company so we can say that all of these comments are coming from such company or such organizations but probably we can go step ahead and try to have more useful what I wouldn't say what we have now is it's not useful but it's basic let's say somehow but we can try a go a bit farther away like the top mentors idea which is great or I don't know if you know the Russell Bryant metrics about code review timing etc so I think we should go there in somehow and what we are providing here is just an example is the first of all we this is kind of a well really ugly charts but they are pyramids of population okay the left part is the developers that are still developing and the right part is the developers that were born at some point came up sorry so if you check here if you go to the the five phone zero which means basically 15 months old this this says that there are 47 guys that are still committing since they were born 15 months ago okay and on the right side you can check that 15 months ago there were 122 new developers coming to the penista community so it means that the retention rate for the penista community for the 15 months ago it's closely to a 40% so this is the kind of things we can have and we can have evolution of this attraction rate or attack or so attraction bar somehow so this is the kind of useful things we think that we can provide to the penista community so this is based on open source tools this is the metrics you more community it's all in github and go there and download all of the tools I would say that it's a bit more complex than Stakelytics or Qtm but then what you can have there is you get all of the data so you can get my SQL databases either you can get JSON files to beat your own let's say your own dashboards of metrics or any other visualization tool you may have so some of the things that we were required by Stefano to talk about was the future of these tools one of the things I like to see for the open stack communities to have let's say open data for everyone here and to have this quality source of data this is probably something that we have to do all together to have this you know we are we were dealing somehow all in different paths with the unique affiliations or unique identities or affiliations for each of the companies and even as again Joshua said multi-tailer was one of the one of the developers that was changing a lot for each company so this is the kind of things that we should probably put together and work all on this we have better data in the end and yeah another point here is that all this visualization platform that we have is quite easy extensible by others so once you get the JSON files and everything you can simply change metrics around and well the question here is probably what would you like to have here and to see in somehow also from maybe a developer point of view maybe from a more managerial point of view maybe from third parties just willing to invest some resources in the community yeah let me so you just how this this is the website the main website activity work you can see some information around like the first line is the this one here is it activity some information about the community some evolution here this is the one that this is the chart they copied in slides but you also have participants in launchpad activity and what type of things they are doing also evolution etc and the same for mailing list here okay you see that there are close to 4,000 different ticket participants at some point in launchpad which is good and you also have some discussion participants in mailing list which is close to 1,600 and then there are a bunch of options around to show the study study published at each release oh yeah so the summary is yeah okay well this is another example of the another view you can see about the how the community is evolving this is specifically for the case of open a star yeah for the one I release and you can see again column it's about all of the companies here this is just another view I mean with this is all of these charts you can visualize here are based on the same JSON files so that's what I said something like okay you can take all of that data of open data in the end and you can fit your own visualization platform this is what we think might be interesting for you but it's great to have it would be great to have feedback from you but also you know we can build some other things if you are interested in yeah that's all it's the same for launchpad activity or mailing list activity etc just let me throw some other ideas we may have around to you know to go ahead with this some metrics time to fix or time to review done by Russell or mentors metrics or maybe we can even go for meetups measurements like number of guys around and etc so yeah just some ideas around yeah I I know I have a bunch of needs from the community manager perspective but so and I keep asking Peter Gia and others to to contribute to their tools to get their tools improvement but you know if we have questions from the audience I mean what why what would you like to see what kind of metrics do you collect about on stock or would you like us to collect and expose in open stock it's absolutely absolutely true it's I think it's a it's a very difficult it's a very difficult line to balance on one end you cannot improve what you do not measure and on the other hand every time you measure something or you look at something you disturb it so I mean physicists know this very well so yeah I mean you guys what you think you're the experts in stats and and smart people in the room how do we yeah right I mean so it's it's a hard problem so how do we how do we solve it how do we address it yeah I mean you know I work at a large company and a lot of the feedback that I hear is you know more on the negative vein that you know we're somehow trying to game or influence the metrics to make us look better and I mean I think that kind of criticism might be level that other players too but from my perspective I I don't think that's necessarily the case and I think also that you know it's better to have the measures and in an open and transparent form that everybody can see how these are generated and and then and the data behind it rather than then not have it available in any form and so you know there's there's benefits and there's negatives to this gamification but I think I don't know if I speak for most of us but I'd rather have the data than not and take that risk yes yeah I have a question to the audience and who is using what so does you know this group here do you actually measure statistics you do you care do you look at it who who uses tech analytics today so about you know this side of the house does is there it you know who uses some kind of a tool to measure statistics and you raise the hand what what tool do you use yeah that so I if you don't mind I would take it I haven't been thinking about defining specific profiles for for groups but for individuals I did have in mind something like that I have here on the on this second part of the activity board the activity board right now it's made of two main tools one is built with the dashboard that Daniel showed and the other is this other tool that is basically aggregating data from the different different tools that we have right now launch pad get and get it and build and build profiles for for each of the developers so it doesn't look very well right so for example we get personal personal pages for each of the people involved solving bugs committing code or doing reviews pulling information from this is the OpenStack ID is the profile page for the member of the OpenStack Foundation you know we get pictures and things and and then a little bit of charts it doesn't look great at this resolution I have to say you know we get and and we also get like saccharidics the details of the individual commit with the date the repository and we also this tool is capable of creating connections between context so if the if there is a commit that is ready to a bug then it shows the dependency between the two and from the one page so this is a like I mean it's an initial piece we could create try to extrapolate behaviors and and create generic generic profiles and this I'd also add that there are some third-party providers of that type of profiling information a great example is OLO OH So you're talking about oh you mean like personas for storyboard Okay, cool So I just want to throw something that I mentioned briefly but I want to throw something you know more specifically at this audience that's In in open source the way you develop technology is people come up with good ideas and then they follow the process and the community moves it forward so why should statistics be any different because we all care about it for different reasons and we have wonderful ideas of what can be measured and the way to handle this in the community is for whoever has this idea you publish a blueprint in any of the tools that make sense and then people who think it's a good idea and they're passionate about it will go and get it done and then there is a weekly call and our cchat and if there are different groups trying to do similar things you get together and you talk and you decide how you kind of marry that because clearly just you know little common so I notice for example in your statistics that any see suddenly became a large contributor in the lines of code now we saw that but I think that was just the type of thing excuse me there was some renaming synthetic renaming and that kind of through the statistics off so we have some logic internally that kind of takes you know and you know don't place that number I think you guys have as well so that's an example of maybe the way we process lines of code and the heuristics of that could be taken into a separate place and we all can share in it so we don't have to do three algorithms and in fact we agree on how we do the basic data processing right and maybe detergent tools if you guys have spent time thinking about that maybe they should become a foundation for that but for all the ideas that that that we have about what to measure blueprint community and let's just do it together and we don't know what we don't know and there'll be so many things I mean the reason we have this is because it came from the community we started with lines of code we got tremendous criticism and people said you can't just do lines of code it means nothing and you know we should do reviews reviews oh you should do commits we did come in we should do this that this and we did right and you know now you have Josh you know it's fixing bugs and other people are saying well this doesn't reflect properly and now they're you know fixing it you know so it starts to live so I think that's the best way in open source absolutely let's let's measure it ourselves let's have a community transparent way of having community absolutely I think we are running are we running out of time oh did we already run oh no no we still have three minutes um then where you're about to say something I would say that probably the point is to have this quality data as a basement for all of us and then just to produce some other charts in case you may be interested in so all of the data could be that feed for your bet that's saying somehow so that's that's probably our point here just to go for this and then anyone can choose whatever they want to visualize in somehow true I think that one of the main the main point of point of concern was is still the the affiliation between people like the individual engineer or the individual person that has reported a bug and the company that works for that he works for the affiliation and you know we have we have that the data affiliation field is mandated by the OpenStack Foundation membership so when you remember when you need to commit code you need to be a member of the foundation and being a member of the foundation forces you to specify to declare explicitly and keep current who you're working for or who has paid you sixty thousand dollars during the past 12 months to work on OpenStack and that affiliation needs to be shown in the individual member profile and until now we had until a week ago we had a problem inside the member database that prevented people from logging more than one affiliation and prevented from logging the multiple affiliation with the time where the affiliation was actually valuable we landed a patch before the summit so now it's possible for you to specify the time when you were working for HP for IBM or whoever finally and we're also gonna in the next month we're gonna make this all this code that runs OpenStack.org public so we will be able to accept the patches so at least I think we will get a much better source of that affiliation data that every project that right now is the recreating a master data source will have a master data or at least a better way to improve the data. Yeah and I think key to that also which you're also working on is making it really simple for anybody to update or for the individuals to update their profiles right seamlessly and across all the projects right now we suffer from you know because of the shortcomings of the OpenStack foundations member database I think we all arrived at our own individual databases of user affiliations for each of our efforts and you know that's that's a silly duplication of efforts that we don't need to do. The other thing that the foundation is doing is that is to make this member profile more useful because right now you only need to create it once and then you forget about it basically you needed to create it to create it when you need to land your first patch and do you realize that you need to be a member first so that's you created and you're done. We're launching a new portal called for user groups that will use an open ID provider and the open ID provider is the member's database so starting with now with the user groups portal but later also with other properties we will switch from using Launchpad as an open ID provider to OpenStack.org being the open ID provider so at least you know there will be more reasons for you to log in into this database this system and keep your data up up to date slowly well actually I hope rapidly and all of this will be available with publicly on Git and Git that OpenStack.org and with carried reviews and all of that so I see people coming in and probably we are done if there are no more questions I would say thank you thank you for all the violence.