 Good morning everybody, thank you for coming. I understand that Sunday morning in Poland would be a little bit hard, if you're not so interested in it. Well, thank you very much Marwa, and if you're still in the community there, we invite you to talk with us a little bit about community management and data science, or social. So this is more of the outline for the talk. I have plenty of slides, so before I even join you, everyone is there, and primarily the demo is not going to work, as usual, but don't worry, it just happens all the same things. What the idea is to do that thing is usually not happening, but I don't see myself as a community junkie, I don't have to be addicted to a lot of communities. I'm stuck in the HP Calculators Computer Club in London, and for the time being, I don't have a chance. For me, I've been in several limited-use groups in Spain, I've been in GPD, I've been in... Another thing, and actually I'm working as a company, one of the things we are doing is community metrics, that is a very busy topic. Let's talk about it. I remember when I started in the University, I was from a little town in the north of Spain, and I came to Madrid to teach a talk, and we make us a photo of the community we have there in that city, we make a photo with three guys, I don't know who you know, do you remember? Okay. The young part of the area, I met there, I worked with him, I was at home with Micah. You are part of something bigger than your talk when you are in the little town. And then, this is for two years ago, in the CLS in Portland, in the middle of the 6th Summit, you see a lot of communities, they are serving the same problem, the same issue. So, first of all, I do disclaimer, I know a computer that could be the scientist, so any of you probably know more than me about technology, I know a lot of science, so it's interesting to be able to talk about data science and community management for the entire community, so it's going to be happy, I think, it's going to be fun. So, when you're talking about open communities and open development communities, usually people start to worry about, okay, what's a community? I'm probably not sure, but no, what's a community for us? It's not a group of people. Why do you usually go to the UK? The UK is usually not sure, so they go to the UK. I was too long, so it's important for people to serve something in common, things that are in common, so it's not about people. It's always about people. When we talk about communities, you might have already told people, you'll interact with people. People that serve things, especially, specifically, depending on communities, so those things are not only equal, but discussions, meetings, everything is almost online, so the community has a vision, has a vision, it's all about. Everybody in the community has its own goals, probably in a community to learn, or in a community if you want to improve the technology, but the community by itself has some kind of target, so that's the idea for it. And especially we are seeing communities in open source, I think it's the people of open source development, and now we are seeing this, all the inner source about creating communities inside companies to develop better innovation and better products. I don't know how it's going to work, but it's happening somewhere. And they are basically in three points, self-awareness, governance, and transference. When I mean self-awareness, you need to know where you are in the community, where the community is, where the community is going on, you need to know where are you reaching, what are the potential people you are reaching, or what are the people that are doing the most activities, the distribution of the community, how far are you from the community, or the community. Governance, you usually could be written to be the fact though, I don't know, but usually some kind of policy, we are discussing before or something like that, but you usually are taking that, you are following that policy somehow, you need to follow that, and that you are also learning to make sure the community, and of course, transference. And for me, transference can be divided into two different parts. First of all, transference is in the community, that means some kind of governance, because if you hide things from your community, you are going to have problems, big problems, I promise you. You need to be transparent to third parties, because that generates trust. If you hide things to third parties, I promise you are going to have very big problems, because everything is going to, someone is going to show it up, I promise you. And then, when these things start to worry, these roles appear, the community management, and this has been an interesting discussion last year in Australia, in London, it was, do the communities need to be managed? Because the world management needs something like, you are going to do this because I'd say that, that's something very strong in a community of, oh okay, we are all the same in this community, so, there are these discussions, what's the role of community management, of community management in the community? What should we worry about? And when you are talking about this community manager, the feeling you perceive is that we are worried about things like the health of the community, is the community active, is the community alive, is the community productive? Now in the sense, okay, we are releasing something each three months, but isn't the policy to be done? But I mean, is the community acting this way in the end? Are we going through all that? And is the community visible? That's for me one of the latest ones, and probably not very much people worry about, but when you discuss with people in the JavaScript environment, how many JavaScript frameworks do you know? Probably there are more JavaScript frameworks than even human beings, but when you talk to the people which are the JavaScript frameworks you should use, they are going to name what they have heard about it. If you ask for the big one, if there is someone who told me, because I saw in Twitter that this one is so amazing, so that's the series. Probably there are even better frameworks than nobody knows because it's not really yet to be seen. I don't have a marketing campaign. So when you start wanting to know what's going on in the communities, do you want people to start to want to know what's going on? I can't measure that somewhere. It's okay to measure it to know. If you don't have data, you are someone with an opinion, so fast people are not opinion. If you do any metrics, and we do metrics, we do metrics with her, because if you do metrics, people are going to cheat the metrics. I want to try to cheat the metrics. I want people to choose tickets, so I want to pull something like a notification of who isn't the most ticket. I want to start closing tickets, and one more later I want to create even tickets to close it faster than why this is one. So you have to remove that notification from the community, because you are getting out of it. So okay, when you are talking about open development, and you start thinking about allies in this case, and transparency, okay, you can start thinking, okay, can we do open development and allies? What's going on in the system? Your community managers start to be like allies. Usually you use this approach, okay. It's not that easy, but I need information, so okay, my time is limited. Where I should look at? Where does this happen? When you see around, of course everything is online. How many of you know something about Python? Please. How many of you know something about JavaScript? How many of you know less API? So for you, if you are community manager, it's pleasing to, okay, let's go to the, I don't know, probably this course or Stack Overflow, they have a nice API. Let's go ahead and create a script to start building one in there. Yeah, but my problem is that, okay, my community is discussing Stack Overflow. I'm committing to GitHub, and open issues in GitHub are discussing in the flag. And all of them are, all of them are very different. So you start wasting 80% of your time and this has been said for some community manager. And I've been wasting 80% of my time doing something I've not paid for, that is making scripts to gather data, not for taking decisions, and posting it here too, just to know what's happening now. The decision based on the data that I have, killing a half of the community. So there are some interesting approaches here, and that's the name of some of you. How many of you are about to open up? Open up? Okay, there's a few of you. Well, Stack Overflow has even parameters that publish regularly. Stack Analytics is a solution that is using over the stack, for example. GitTorrent, GitHub Archive. Well, just today, I don't know if it's in the room. I'm just coming with a little bit more about doing some Google BigQuery analysis on GitHub, and people is using that as something, okay, fresh impact. So if you need to know a little bit more, BigQuery and things like that, well, you'd have even a custom API in BigQuery. But you're still looking only a part of the week to write. If you go there, it's not okay. Stack Overflow, open up is okay. Stack Analytics. So what this is for then, so it's to talk about open source, another solution that people try, but it's going to work out. It's an open source toolkit to analyze open source development. There's no software development, but in general, you can track information from almost any data source using open development. There's the URL. You can go there. There's some documentation. And the idea for that, I don't know if I'm going to demo it. But basically, I'm not going to bore you with the details of the architecture. I'm going to give a bit of a review, please. You get out of data source, right? It's going later. You will get a complete file. Okay, this is where my community is defined. This is my project. This is the data. This is Stack Overflow. This is the maintenance. These are, I don't know, for this course, because I have set up all this course for you. And everything is sort of a complete file in the system. And then it's produced in my passwords. You have a API, a recipe. You can wear it to a skin. This is something that is going on in the development. I'm going to review this as soon as possible. I know. Yeah. One of the key points here is if you don't drive the people that are doing the analysis for you, you can download the system, run it by yourself and see how it's going to be measured. And this is something that is very important to you to look at the tools that are outside. You need to trust who is measuring your community. Here, you can even measure by yourself that it's measuring your community. That's something quite different in several cases. Well, just to brief the technology using the new model app. So it's out of Python. A little bit of elastic search to store the data. Something, some Kibana to do visualization. Name some of the pieces of the architecture you have. First of all, for tracking information. Something hard for making the magic of matching identities across the different data sources. So you can profile a person or a company organization across the whole community to know what's going on there and what they are doing. And then some of the things like predefined Kibana dashboard that you can import using the community or even Kibita that is part of Kibana. We are contributing to Kibana to allow some of the things by menus So the idea is these are the data sources that are really important. So in my indie time I was surprised to see many people as well. I don't know. I don't know what it is. Paxilla is already important. Sorry, I was in the middle of trying to keep the project. And then it has been out of the way. So I am not going to show you what Kibana is and what you can do. The idea was to do a quick demo so I am going to skip that for the finals if I can't find it. There are plenty of things that I want to show how to play with this. But just to brief, starting is too easy. This is the geeky point to start. You don't need to run Kibana or run the Kibana. But there is a tutorial that you can follow. It's the base table. How to develop with Vimorla. We will talk about Python later and how to use the tool for Python developers. If you are more than working you can join us. This is one of the things I want to show later if I can. It's actually a Kibana dashboard built from scratch. So it will take something like 20 minutes or no more than even less if you pack all the data later. So next that I would like to show and this is the thing you can't think about data science. Because the idea is you want to have this running playing with the data resources. You can think like network analysis for example. Do you know Kibana does this graph analysis that is totally valid or there is a process blooming to allow you to visualize your community in this type of but this is developing a new project and this is our own Git projects in GitHub. You can guess things like for example the nodes. The nodes are the people working in the project. The size of the nodes is the number of commits they are doing. The blue boxes are the repositories they are working on. So you can see things like this repository is only maintained by only maintained by Santiago. You can see that Alvaro is doing a lot of things. He is competing here, there and there but Daniel is working alone in the panel. So okay that's five minutes don't spend more than that but maybe you need to talk to Alvaro maybe you need to talk with Daniel about your team to run your panel somehow. Or probably it's not in your domain but you can think about it. So this is the kind of things you can do. You can even arrange by colors so that means for example organizations you can see which organization is working with which organization which repository Alvaro we are talking about repositories but you remember the resources you can think about Forlun you can think about ASC you can talk about dependency and this is something that I will talk in history with and someone told me. Do you know the pony factor in Apaches over Foundation they define the pony factor it's a number and it's a very nice formula okay you can see you know who is doing the 50% of the commits. In a period of time on the whole project you can query and see okay I should talk with these people okay in this case the pony factor is one so that project is something like 50% of the commits but this is Daniel's model typical thing who is doing the 80% of the activity who is doing the 50% 50% of the activity or the elephant factor but if we are here we are very innovative in names so if there is a pony factor there should be an elephant factor the number of companies doing the 50% of the commits or the contributions and that's something that could be measured and you can see the names and you can see the votes and now we have also the police people elephants are companies the last factor I think is something like 75% of the people you need to hit in a bus or something like that the return of the factor is when you look at the code that actually you are actually now which code is there and who are the authors of that code which is the number of people that is owning the 50% of that code in the case of Linus Kernel is around 202 and Linus Kernel 50% of the code is from 250 that means a lot of people who are there when you look at the companies and that's the united fruit company factor is around 10 companies now it's the same thing like on the elephant it came to the nowadays code one of the things again about Hibana analysis and this technology is that you can see the evolution of our time so probably in the beginning the return of the factor for the Linus Kernel was one you can see the evolution of how the burst has become in this project and how the code has been removed but not so you can see who are the people who are there and you can see it online in the twitter top you can see things like your graphical institutional diversity of your community and usually you rely on data like okay in github you have activity already there is someone where they are so you can see the results for example in Mozilla they have these reps activities panel what these reps said where they are storing what events are happening and how are you reaching community there so you can see all those activities around assets for example already but for example when you are doing commits some people configure their laptop to sell zero time zone but most of them are not doing now this pattern for example in Mozilla you can see the Mozilla authors quite often and again when you see the evolution of our time you can see things like for example when I started it was basically west code development and three years later you will be able to see how this works from asia, china, india start to grow up so you can see something is happening there so if I am community minded it is probably the group there and I have another quite easily I don't need to fully survive I don't need to make because just look at the data I already have in my community things like the demography I think this is one of the most challenging things in community management because sometimes when you are in community management you are thinking about how to engage people how I get more people in my community usually the problem is not how to get more people but that's a problem too and the idea of paying that for me in community management is how I retain the people I already have in my community I start organizing events and one more later I see that nobody of the first event is coming back and when you see you can see when they have done the first contribution and you can see when they have done the last contribution but you can't even draw how someone that joined in this period how many people start to leave the community and in that case if someone has done a contribution in just three months probably I need to send an email okay what's happening to you so be friendly some of the people probably dealing with all the issues and things like that when you are taking care of the people the people is going to thank you and probably it's going to be more engaged and that can be also long gender gender is going off we have published during the last year three analysis gender technical contribution analysis two, for example the industry open stack and so on we have seen things very interesting when we reach the analysis we put a thing like it should be 50-50 for example so that's the role we start talking to companies and see what they are performing in this area okay they are publishing things like okay my company 30% of my people are women for example that's the case for google, drum box many of them when we run the analysis for these projects we arrive to something very big gap you see something like around 10% even when you can see that maybe some of them are committing using mail names probably a 20% gap to the industry that's quite huge when you compare that to 50% in the market world there is a big gap so something should be done when you put that in numbers when you show that to these outreach programs they start to think okay I like these numbers because I can know now who are the ones that are enjoying the community enjoying the community, staying in the community so I can gain more from there I can learn from those people someone on the internet very interesting question the most difficult question in this part there is an API you can query with a name with a short time accuracy if you answer it this name is made of female the main issue is with facial names because until I think you can use the same name for male or female we have a Chinese guy in the company so he asks do you know in this case when you have that list you can ask even them are you interested in helping us in this somehow so again, you are taking the community for companies again, it's tough for you you can run the analysis things that you can't even do are related with performance in the project are related with core review core review is one of the core from my point of view one of the core processes when you are doing open source development you are talking with people accepting path sets, request you are talking with them and sometimes people only worry about hey, we have like 1,000 pull requests last week, yeah wow how long does it take to get a set what, we have 1,000 how fast are you setting who are the code that is faster, which ones are the people that are sending more pull requests and even worse, which ones are the people that have sent in a pull request you have to say no please improve the code and never come back because you are losing that for me there are some people that want to put you to your brain and you are, now it's not aligned with my code of clean code and you are losing them, they want to help you and you are not taking them from now because you are worried, hey, we have 1,000 pull requests, right? so here you can measure things like how long does it take, who are the people that are making those pull requests faster you can even measure how long does it take how long does it take for the submitter to send to the brain, how many iterations are needed to get the code accepted so you can even say, okay, these people are doing this pull request please, would you join my company please join my core reviewers people we need help on core reviews and like that, I mean how many this is called the open review one so it's opening a view, now it's not open and you can see things when you are analyzing the version information and remember, start over for making these to Jenkins you can measure information there in a lot of ways measure things like how long does it take for someone to open a user story in JIRA, then open some issues open some issues in JIRA, then someone commits to those issues those issues, those commits are tested on reviewers through every and they are okay, get accepted, deploy them in Jenkins and then you have the product so you can measure things like how long does it take for an idea to use a story to deploy many things that's something when you have all these decisions can you measure also the bugs, so the the commits of the code and then correlations to the how many bugs the person is generating just testing research around that and one of the companies has done some research about that we can talk later about these people have some basis on research on analysis that's something that we are we are I have to stop I was just the review of things you can also measure things that people are reviewing each other you can even from companies this is the testing this is what is the website for review analysis and the spots are the companies here you have the number of accepted reviews and here the number of iterations needed iterations for the exception what do you think about companies contributing to something like WebKit for some companies that that's the main product they are saying or raising their business so if you sign a full request there or a part set it's something that is related to my innovation and my business depends on this and this can be accepted by the community so you can see things like someone was accepting more and someone was accepting faster we can discuss about later because I know people involved in that you can read the paper here and you can see which company for the project at the end of this process and create a WebKit version think like your issues management system how is the process you can even look at this very complex thing and you can identify where bottlenecks are or the real process is going like you have said in the data one of the things we are working as an idea maybe was to be ready for further but there are other things ready for further and already was this contributing to this panel this idea of people, the service or attendees came through meetups, power tools join your Slack channel thoroughly then start to ok I would like to help but I have to in the background why don't you go to Praxilla or did I open an issue oh I've seen this coming ok open an issue here how is it going how long does it take if the technical skills are going now they are going to say ok I know how to solve this issue I am going to submit a pass because I have learned how to manage the cost I am going to submit a pass and then really get accepted there is the previous model champion and you are thinking I need to attract people to my community to do things like that this is the typical contributor's panel people that came from marketing and sales this is what is called as marketing or satisfying how many people can you reach and then who are the buyers in this case who is the people that are alive to be a contributor so you can learn where the people are coming from how many people how many times are in Praxilla you learn how to do that from politics and that was all if I remember but that was not all you can test it live in coderon.io and that's this is the real oh my god you can go to coderon.io and get an overview of five organizations in github all of you have github accounts you can use your github account to get it and everything is based on one hundred percent of the open source so that was all from my side but not time for them but I am a great lady do you want more thank you that's it