 Okay, good morning. Can we start? Well, thank you for being here. I'm Jesus González Barona, and I'm going to talk about the numbers of four of the main cloud projects nowadays If you want and if you have some device to connect to the internet you can maybe download the slides from bit.ly Slash pen a stack That's cloud open cloud. There is a lot of all to click on them So if you download you can go to the URLs that I'm going to show during the presentation This is the plan for what I'm going to talk about first of all, I'm going to provide you with a bit of context of the Of myself and the study itself Then I'm going to talk about three main aspects of each of the four open cloud systems I'm going to talk about the source code. I mean what they are producing. I'm going to talk about the process That's how they are producing it. I'm going to talk about the community Who is producing those systems Then since this is the open stack so made I'm going to include an extra Information of a couple of aspects of open stack, which I hope are interesting to you And well some conclusions the usual conclusions at the end so to begin with I have like two hats One is my my company hat. I am the founder of bitergia a company doing software development analytics and Isn't this it is in this context, which I mainly did the analysis with the help of my colleagues there and I'm also the University University of Juan Carlos Where I've been working for a while trying to understand resolve of development the four Systems that I'm going to consider in the study while cloud stack elliptus and a stack in up in Nebula Obviously, there are some others But probably these are the most popular ones if we came to open source Systems that are in the let's say cloud infrastructure domain I'm going to try not to talk about them in in any special order really So the only thing there is I'm going to try to order them from let's say less to more less activity or less size or whichever to more just to you see the presentation But that's it all of these systems claim to be Similar in some respects obviously that differences They are not exactly the same but They are sort of comparable All of them are free open source offer But they have different licenses. They belong to different communities. They use different business models And they are prominent in different languages. Obviously, there are many differences there And all of them are popular But of course, they are different levels of popularity to different marketers different targets Different ways of understanding what a system like this is so we have systems that are comparable But it's very important that we realize that they are not really the same thing So this is not like ranking or something like that This is just taking four different systems and try to get some numbers from them And the conclusion is not going to be this is better than this one because they are Too similar to be compared, but they are too different to be ranked So that's that that's something that you have to have in mind during all the presentation So let's go to the study First of all what we try to do to do what we did and what we didn't So we are we are focusing on how these systems are developed so that means as I said in the At the beginning looking at the source code, which is basically what they produce Looking at the processes how they are performing in the in terms of activity and some of the metrics and Looking at the community who is working with this source code is obviously important because this is the product that you finally have But the metrics here are going to be very simple because this is not intended to be a quality analysis or something like that The idea is just about science languages things like that to put into context the other two parts Which are from a point of view the real the really important With respect to processes what we are going to see is how active the communities are That's very important because more activity more Let's say momentum around the solver and we are also going to see how big is the community and Let's we're going to talk a bit about it. It's a structure who is contributing it That's also important because the future of the systems obviously relies on the community So if you look at the community to some extent you are Seeing how this product is going to be to behave in the future at least in terms of activity on all things We didn't analyze functionality So you are not going to see here the numbers of performance of the systems or how many functions they have things like that We didn't evaluate run runtime performance it either and we didn't evaluate popularity There are studies out there that you can consult if you are interested in that For each of the projects we have produced a dashboard this dashboard basically includes that the kind of information that you have on the top of the slide This is the open Nebula dashboard You can click on the on the URL below and you can go to real thing So there are many different panels with information about different aspects I'm only going to summarize the main numbers But you can drill down to a lot of detail if you go to the real dashboard and and work a bit with it So all of the dashboards in the main page are structurally the same way on top You have source code management usually get four of these systems the second row You have tickets and low row is mailing lists the upper row is Basically telling you how the the the system is being developed in terms of number of sorry Number of commits, which is the middle chart and number of authors, which is the chart on the right So if you look at each of them, you can see always on the right persons in the middle activity so forget activities commits for Tickets activity is closing tickets and for mailing lists activity is messages sent to the mailing list on the right Calum you have then authors of it Then you have ticket closers. I mean persons closing tickets unique persons and on the right on the on the bottom right You can see authors of messages unique authors So if you look at those charts you very quickly can see the evolution And the under row numbers you can at least have An embryo about the order of magnitude of the activity that they are having and the people contributing to have that activity So I'm going to go quickly because I will come later This is for Eucalyptus if you get if you click on the URL again, you go to the dashboard In this case, we don't have mailing lists, but we have git and the tickets This is for cloud stack again, you have git tickets and mailing lists and This is a pen a stack the pen a stack Maybe you are a bit more familiar with it because you can find it in activity dot open stack org and Again, you have the usual rows and columns Guess pretty quickly if you only look at the left Callum I'm going to put back the other slides to very quickly come to realize the differences in volume So this is open a stack Look now. This is called the stack Look at the left. This is Eucalyptus again. Look at the left and this is up in Nebula So just a very quick view. You can see obviously there are differences and those differences up as I'm going to talk later matter a lot Yes to finish with this introduction to the study We also did a kind of transparency analysis So the the main point here is to which extent these projects are transparent in the sense that we can really understand How they are being developed, you know that not open not all open source projects are Providing you with enough information to know how they are developed Some are developing in-house for instance, and you don't have any information of that in this case We found that all of them have public code management Repositories get and it seems that the activity really goes on there It's not the estimate of dumping code from time to time, but really they are using those for Development so so that you can really measure activity by looking at the at the gate repository So all code seems to be there and seems to be the main place for for development In the case of open stack in cloud stack all tickets in public issue tracking systems That's important because both projects have as a policy that if they get the belongs I mean has to do with the Community should be in the community ticketing system Of course, there are companies with internal ticketing systems But at the moment that that becomes an issue of the community have to come to the let's say of visual Ticket repository or ticket system in the case of open Nebula and ecollectus. That's not that clear Contacting both they both said that maybe there were tickets in other systems So both are elated by by company So it's it's very likely that well companies have an internal ticketing system for clients or customers or whichever but That only means that for open stack in cloud stack Activity would respect to two tickets seems to be fair and comparable for open Nebula and ecollectus could be Underestimated because a part of the story goes into that so from the point of view of transparency We are probably lacking a part of the information there. Let's go to the to the staff So source code This is a very basic analysis of size So you can see how all of them have well Similar size to some extent by the way, this is July since July things have changed a bit, but the order of money to remains the same and You quickly can realize how open Nebula Open Nebula is an outlier compared to the others. So it's much smaller Despite of this they claim to have similar community to see sorry similar functionality to others I don't know why I don't know the details of the project enough to know about that But but the difference is here So it's not just a model of these guys are implementing a bit less functionality. So that there is something else there With respect to languages the mixture of the of each of the pretties really different So this is open Nebula and you can see how open Nebula is basically C++ and rubby thing Of course, there's JavaScript all of them have JavaScript But with respect to let's say the server architecture, it's Robbie C++ If you go to a coletus Well, again, you have JavaScript. You have HTML things like that, but it's basically Java right Cloud stack it's even more clear basically Java so it's almost a mono language some Python, but And the next tag is well, you know, it's basically Python. There are some other things But it's basically Python. So these differences in languages are are interesting because The systems are very different because of that because the facilities that the different languages are providing are different too And that that means architectures of the systems are different and so on and again It's very interesting that all of them are in the same domain because you could go there and look at how the different Projects are using the different languages to provide similar functionalities in which ways they are working with them Some of the languages are interpreted or some are compiled that also can make difference Well, as I said, this is just putting to context the rest of the study which from from a point of view is the interest in meat So let's go there now the process. So this is basically Something that we already saw in the previous Does work, but right now we are going to focus on Activity or on the left you have people active and on the right you have activity As I said for git which is the upper part of the slide activities commits and For tickets we have considered close tickets In both cases you can see how there is a reasonable community this difference here is just by looking at how people are Contributing you know that in most free software communities like 10% of the people do like 80% of the code So in this case we have used it that 80% to define what is core. So these seven guys have written 80% of the commits to this project Enable in this case. So it's a community with a core of seven developers Regular are those that contributed with from 80 to 95 percent of the commits And well, this is the these are the guys doing the the the rest I mean five percent of the code So they are let's say occasional contributors With respect to ticket participants. We have also Characterized them between fixers. I mean people are actually closing tickets and Summitors submitters tell you about how many people is interested enough to Open a ticket. I mean as people involved to some extent, but they are not usually developers Developers are those really fixing the backs and landing the code. So in this case, we have a community, which is you can see Very similar to court. This is usual in most projects So the the the number of people fixing is quite related to the number of people in the core developer development team because they are basically the same guys With respect to activity you can see an increasing trend But it's in the order of I don't know if you look at the number if you can see the numbers on the left It's in the order of 300 to 400 commits a month, which is an amount But it's not that big amount compared to some other players that will came later and with respect to close tickets I already said that maybe this underestimated, but it's in the order of 50 to 100 per month Okay, next one Eucalyptus Now you can see by the way I have a back here that I realized just a moment ago This number should be the the sum of this one. So it's 200 and something is the total number of developers Sorry for the mistake. You can see here a core team a bit bigger well like three times bigger and The number of fixers I said I already said is usually correlated. It's also a bit bigger But the number of submitters is a smaller So again Probably we are not having all the activity So maybe there are some other tickets in the place and we are not measuring them if you look at the Trend in the code you can see again an upward trend, but you can see how it is a bit bigger than the previous one This is between 500 and 1000. So this is the 1000 line, right the month And this is the number of tickets closed. So the chart by itself shows that there are activities somewhere else Because obviously it's very low and it's very big to be real clear the stack We have more core developers we have a bigger core and We have a very large amount of fixers So in this community, it seems that to fix back You don't really need to be in the core system So some of the sorry in the core team some of the people are also fixing bugs and the number of submitters Bigger too. So there are more people involved up to the point to at least open a ticket, which is a barrier With respect to the trend, it's just a bit more stable than the others And it's in the order of 1000 commits per month. This is the 1000 line, right? And with respect to tickets while they had a big around here Probably because they were closing all tickets or something, but you can see how the usual Decades per month tickets closed at the month is like 50 to 100 and Open a stack as I said already open a stack is a bit different. This is the the community and 250 people sort of are core at the core team So it's much bigger than the others, but you also have much more regular casual developers and If you look at the at the numbers for sorry, I'm sorry This is this is an error because this is a code review. So let's forget about that If you look at the numbers for tickets, you have them in the dashboard You can see how there are a lot of people closing tickets much more than the core and a lot of submitters of tickets a lot a Lot more This is a different number This is now for all the projects again numbers of ticket closed it versus tickets open this shows Very briefly how the project is reacting to two two issues. So that the blue line Tickets closed and the green line are tickets open So when both lines are quite close to the other as you see here So that basically means that they are well sort of dealing with what's happening So that that's the case here compare this with this other for instance Which is for Eclipse again, probably we don't have all the activity for Eclipse But anyway, here you have a lot of tickets open Maybe they were transitioning from some other system or something, but they are not following up to this point So this peak is part of the of what they were closing that were open in in this time. It's like well Eight months later and then you can see the current trend. Well sort of they are coping with or with what it's entering Cloud the stack Again, you have the two lines. Remember blue is closed green is opened So for a while they were opened much more than closing They had the peak here trying to close a lot of tickets and well, they recovered but they are still a bit Below what is being opened. So it's that there is a gap there And this is a open a stack. Do you also have a gap and the gap is widening This is up to you to July It's interesting to see how things evolved since them and you can look at them in the last word But basically the gap is still there having a gap is natural Because in many cases you have all tickets that you are really not never going to close at some point You are going to close them because they are just too old for instance. So there's always some gap It would be interesting to have a more detailed study and you have information in the dust war to try to realize Whether that the gap is it's meaning something else like you are not really coping with all the open issues that people are Opening well now, let's talk about the community This is the core team of a time semester by semester since since 2012 You can see how for open Nebula that the core the core team is very stable That's natural because it's vacant by a company and that's basically people hired by a company to work in the in the system And it's a relatively small team five six at some point Eucalyptus a bit low a bit larger Around while you see 15 people Priests table of a time seems to be declining a bit close the stack growing up to the second semester last year Now a bit more stable around in around three three people this number is different from the number I gave before for the core team because that all your core team was for the whole history of the project And this is per semester. So this number is usually smaller than the other one, right? And this is a penis tag. So growing growing growing and you are around 2060 people doing the first semester of this year Remember in all of these cases that core team means those guys or contributing 80% of the commits This is a different chart. This is trying to show the evolution of the population Picture that each of these is like a generation. So these are six months and the two words Trying to show how many people is entering in the project and how many people still is in the place from that generation So that the newer generation is this one and for the for this case They basically got 15 new people. Sorry 15 new people and 15 of them remain so basically because it's still let's say the jungle generation We consider that somebody left when isn't inactive for six months. So basically for the for those guys All of them are still here so again green is Attract I mean people entering the community in that period of six months and the blue is retained people remaining from that community from that Generation so this is the next generation. So this is that the six months before you can see how they attracted like eight people but only three of them remain right now and You can go up and you can see how for Older generations, they are still retaining some people the old talent is no longer here The that doesn't mean that they are not linked to the project somehow they could be managers for instance, but they are not committing anymore and Well, they still are retaining some people from like three years four years ago and They lost some people around here since this is a very small community It's difficult to see anything apart from the hiring policy of the company that is driving the project, right? The next one you start to see a bit more things because there are more Numbers community is a bit bigger. So you can see how again, obviously for the last generation all of them are still here They are attracted. They attracted like 15 people But then you can see how they were Remember this is attracted. They were attracting a lot of people in the past. They are not attracting that much people now and They're retaining some of them But they are returning very badly from the older generations. That's very regular. So having somebody Four years in the same period is not that common depends on the project, of course, but well This is cloud stack the community is bigger. We can see how they are they are attracting A very stable numbers for the last two years This green lines around here and they're retaining I will say pretty well So you will see your numbers later. I mean for open stack, but the numbers are very good for a community So retaining like 50% of the people or well, maybe 30% of the people In one year is very good. Remember that we consider somebody as a part of the community if just committed once Considered that many people just commits once and never came back. So those are like Not retaining people in this in this chart and these are your numbers. This is for open stack So again, this is a current generation. Well, this is July So this is basically first semester of this year. You attracted like 60 hundred people and all of them were still there, of course But you can see how this has been the I mean the traction rate has been growing and growing over time And how you are retaining very well at least for All the history that we have here These numbers F has started to to change a bit So I don't have in the slides the numbers for now But the numbers for now with respect to Retention of the old generation are starting to change a bit if you are interested we can I can see So to you solve them to you later, but you can also I consult them in the dashboard if you want Now let's go to another point. I'm going to wear to go very quickly Through this because it's very well known. I guess in this day the company diversity Company diversity is important in many cases because you want to know which companies are supporting the community And in this case, it's a very simple count is just counting commits by company Of course, this is this is bias it for many reasons But at least can give you an idea of who is contributing in terms of corporate support So in the case of open Nebula, it's basically company Open Nebula itself, which is well sort of Company to because they are not really a foundation and the university or the software started And then wall should say is there maybe they they met some committee For eucalyptus, it's basically company thing too. So we have a couple of company cells, but it's basically eucalyptus a company cloud stack is many companies With very low participation and one company theatrics with a lot of participation. This has changed right now So remember this is for the whole history of the project if you take like the last year The the balance is much equal, but anyway for the whole project. This is what you can see And This is spinnestack This is the the usual poor law, which means that we have in a very open community Again, this is for all the history of the project if you look at the numbers right now They are different but you can see how all racket spaces are Redhead HP IBM and and the rest In the if you just look at the number of companies you have like 50 companies active per month Which is again very very interesting This is a time zone analysis This is just to try to figure out where people are really living of course This is just guessing what we are doing is using the time zones in the commit records to get and those can be Misleading in some cases, but in most communities they are pretty useful to try to infer where people came from of course You cannot this are time zones. You cannot say this guy comes from Europe comes from Africa, for instance But if you have a lot a bit of information about the project well the case of this brought for instance This is UTC This is basically European time So this campaign is located in Spain so it makes sense that their developers are there so Nothing new here eclipses for eclipses. Basically, you can see this is a United States thing So this is the west coast This is the east coast and there is something in Europe. Well, remember that Times of zero is a bit special some developers want to to keep their laptop in times of zero Despite what they are or people traveling a lot So in those cases we are mistakenly taking them as living in time zone zero But well, you can you can factor that out and well, this is basically a states project This is cloud a stack cloud a stack is much more diverse in terms of a geography Here you have the United States again west coast east coast around here You have Europe and you have India because they have facilities in India collaborating to the to the project and This is a pin a stack and for a pin a stack you again have a lot of diversity Again, West Coast in the States is ghost of course, Lady in America somewhere there to Europe in Africa if they were developers in Africa Eastern Europe five is India for his Middle East and Russia part of Russia It is China and Then you have if I'm not mistaken Korea Japan Australia Okay, remember also that we have summer time zones which mess things a bit But you can get the idea of at least of of the diversity And then I'm finishing bonus track about specific things for open a stack. This is no longer Comparison this is just something that we did it for open a stack So this is a code review and this is time One of the main concerns right now in an open a stack development is how people is behaving with respect to code review and how fast the British code reviewing and how that's impacting on time to deploy and things like that. Well, this is the basically taking all the code reviews that you had per quarter starting in 2012 last quarter and up to now Up to the the third quarter of this year The time is time to review measured as from the moment Developers emits the the change proposal to the moment that lands in the code, right? So that means several patches in the process and so on yellow brown is the mean and Blue is the median the distribution is very skewed that that means that probably the mean doesn't have that much that that much meaning But well, it's there just in case probably the most meaningful thing is the median remember that the median means that at least 50% of the code reviews took that time or less and Time is in days. So that means that right now you are taking like one week for 50% of the code reviews So for the other 50% you take longer so that your mean right now is in 22 days You can see this growing trend But if you look at the median the median is not growing that much. So the median is pretty much stable during the last year The mean usually Grows because you have some code reviews that take a long of time very long time months And when you factor that in while the meaning increases That's why probably in this case the mean that the median is more important But if you are interested in the outliers, of course the mean is giving you more information Now let's look at what's happening on why these numbers This is Patsets per change it so how many rounds do we have here? Of course remember that the round can be triggered because of automatic validation or code reviewing or something like that But the fact is that the purpose of meet something I have to submit again. How many times so right now the median is like three times The mean is higher five on something Here you are pretty stable in both mean and median So that means that the number of iterations is not going very much So we don't have the culprit here of the increase in the numbers If we look here, we have a clue about what's happening This is how much time you can say this is devoted This is due to the reviewer and how much is due to the developer because remember that I submit something as a developer I have to wait until the review finishes at some point the review finishes And they say submit a new budget and I have to submit it and I there is some time now counting on my side Okay, so in this case you have waiting for reviewer on the left and Waiting for submitting on the right and you can see how waiting for submitter is pretty much stable about growing a bit, but Waiting for reviewer has been growing up to the second quarter this year Interestingly enough during the third quarter of this year it Got lower both in median and in mean which basically means reviewer Reviewers are performing better. They are attending reviewers quicker than they were doing before So there it seems that there is a problem started to be controlled On the side of the developer the mean is not growing But if you look at the median the median is growing a bit, but not significantly and Just to finish this specific open stack analysis. This is time zone analysis Specific for open stack and I wanted to show you this one because this shows the differences between 2010 and below 2014 now that's now So you can see how this project started to be mainly an estate thing with some European contributions Remember, this is times on zero, which is a bit special. So not necessarily all of those are really living in times on zero Right now. It's much more diverse Asia entered Europe is having a Higer share especially Eastern Europe and Russia around here and The the the variation in the states also changed a bit now You have more development in the in the West Coast that you had, right? and This is all with respect to the analysis Yes, final remarks Of course, there are many Differences you already saw them if an stack activity and community are clearly different from the others Remember that doesn't mean necessarily that the pen is tag is very or worse. That means it's different. It's much more Large than the others in all metric that you can measure But as I said, depending on what you mean what you are interested in that could be good or bad Look at the details. You have the dashboards and you have the databases You can even download the databases and do your own queries if you want and and you can look for the numbers You may be interested in The bottom line is quite important for me All four projects have a very large level of transparency with respect to how they are developing They are not hiding these numbers They could because you could developing house, especially for the companies they are not doing that anybody can go there and do the same analysis and Came to any conclusion. So that's very important because we don't have to rely on what this community say about themselves You can just go there measure them and get our own conclusions Final desk lumber we are working for a couple of people involved in open a stack and and and and Sorry, open a stack and a stack all the data has been checked could have some errors and you have Database dams JSON files and everything to work with if you want go to the data sources entry in the dashboards And there you can find everything Well in the end I The idea was show you the numbers So the numbers about how these communities are developing Yes, the reminder of the dashboards there and the presentation Well, this is the auscombe presentation because it's this based on that one for the auscombe presentation If you happen to stay at auscombe you also have access to the video for it, which is quite similar for this but some months ago That's it. I don't know if we have some time for questions yet. I think so if there are any please I Was kind of wondering What are there any advantages between any of the projects based on the programming language in a session this morning kind of looking at Four years back from open stack The question was asked as to if there was a certain thing that could be changed one thing that would be changed What would it be and the resounding answer was was the actual programming language Python? And so and and it was really looking at it for most scalability perspective So there were certain things that they are now limited because of the chosen language So is there any advantage that any of the others have over open stack because of that language chosen? I Cannot say about advantages But a very interesting point from my point of view is if you look at these numbers Which is not exactly language, but it is size open nebula is really very very different Of course open nebula is not providing exactly the same functionality as the others But a benevolence writing if you look at the languages, it's basically a rabbi and C++ thing So for some reasons they have found a way for doing a lot of the functionality that the others are doing With less code and I don't know. Maybe that has into the house has something to do with the with the languages Really, we should be needing much more a bigger sample to be able of saying this I mean from the statistical point of view. This doesn't mean anything But well, it's something that is very very different if you look at the numbers there That's it. They're the only thing I can say with respect to the others The size is quite similar and the languages are basically Java or Python So doesn't seem to be a difference there, but the difference could be somewhere else I mean in a scalability for instance, I don't know that's something that is not here Possibly be done in certain ways can't be done in those ways because of those. Yeah those restrictions. Yeah Thank you anything else Okay, thank you very much go there and download the slides if you want. Thank you