 Okay, hello everyone up to the next talk. We are having here Jesus M. Gonzales Barajona and The topic will be grimoire lab free software for software development analytics Thank you very much. Good morning first of all the slides should be uploading to The website I think I hope right now But just in case I was doing about them. There's the moment to go Go to Twitter if you want to have link to the slides and that may be important because a part of what I'm going to Tell you now is how to run our software from scratch using docker containers So that if you happen to have a laptop here with you, you can just run the software while I talk and You have the exact data in the slide so Very quickly about me I'm working in the university and in a company called bitersia and in both places I'm basically working on software development analytics from different points of view I've been working on this for a while and At the beginning in the university and then also in the context of the company We have been writing free software for doing this kind of analysis And this is a presentation on the state of the art of the software that we have which is called grimoire lab The more lab right now. It's a part of a linux foundation initiative, which is called chaos In fact, it's a combination of linux foundation with other foundations that like Mozilla foundation or the eclipse of foundation Which are already in in cows and what we are trying to do is to find ways of doing analytics on free software And a part of chaos is doing that For with free software itself, which is producing free software as you can read there Producing free software for analyzing software development in general, but of course free software in particular So the idea is to have our own a staff for analyzing our own staff And the software that I'm good that I'm presenting today is a grimoire lab You more lab is the second generation of the software that we have been developing in the university and in the company And it's a free software open-project where anyone of course can participate and the area is to retrieve information from data sources Put into a database right now. We are using elastic search But we could be using other databases and then go and analyze the database So that's since you have all the data there You don't need to go to the data sources to the APIs of the different systems once and again You can't just retrieve everything from the database because what you have there is mainly a Keppie of the data that you have in the original data sources when I'm talking about data sources I'm talking about almost anything related to software development We are right now supporting like 30 different data sources from github github or back seal our great to stuff like slack or IRC or telegram or Depending on how you are developing is very likely that all the channels that we are using to communicate with others And all the things that we are automating very likely we are covering them So I have a look at the data sources supported I'm going to show them in a moment and it is again go to those data sources to retrieve the information put into a database And then try to figure out what the data is about and for that We are providing two main things one is the dashboard which is based in Kibana And another one is a reporter which is basically producing PDF reports with simple data A kind of a summary of the parade and both things are custom-isabled so that you can Adapt it to your needs if you want in a relatively easy way So this is a Let's say a screenshot of the Dashboard you can see the real thing here. This is for the open FV project, which is one of the Projects for which we have a public dashboard deployed if you look at it It's just the visualization of the data in the database On the top you have some main metrics in this case It's this is the list of companies collaborating there. This is the list of People there and this is the trends and the commits authors etc for For a very period of time in this case it is months. So here you have kids here You have go it here you have million lists Here you have activity and here you have people working in that repository and For each of those data you can drill down you can click on everything and if you are familiar with Kibana or other visualization systems of this kind you can click on everything and The the the dashboard is reconfiguring and showing you a filter a drill down of the information You are presenting on the left in the real place in the left You have a menu and you can see more detailed panels for each of the data sources So that you can't drill down and you can find out the staff like which one are the most Relevant developers and you can click on them and see their specific activity of a time or select a period of time I'll find out how they are committing or reviewing or whatever So again, the idea is to visualize the information in the database If you want to try it and that's why having the slides handy may be useful. It's just like this You need like four gigabytes of RAM in your laptop or wherever and run the container You need a lot of memory because the container includes elastic search Maria dv in Kibana and And the software I mean remark up and when it's run automatically like this It is provide producing a demo of remark lab itself An important thing is you need connectivity and if the connectivity breaks Very likely the container is not going to work because it's not designed it to Tolerate a lot of network failures. I'm saying because Today, maybe the network here is not that good for for training, but have a try anyway or try tomorrow so in any case once you run it what you get is container analyzing all the software installing it installing the data into a grandma into the elastic search database and Producing the kind of data that Kibana needs if then you point your browser to this port in local host You can just access the Kibana instance and you can look at the dust were quite similar to the one I presented to you The structure of the more lab is like this. So it's in fact a set of components It's right in in Python. So everything here is Python except for of course the databases and staff and Kibiru Kibiru is our version of Kibana, which is very very similar to Kibana We try to talk Kibana as much as possible And we are only including in it the kind of stuff that we need very specifically for our with Aspects I will talk a bit about that later But if you look at the general extractor usually you start here These are the data sources the kind of repositories where the information for the project you want to analyze are all of that data you Use first of all which is the camp on and retrieving the data for story need to into The database which is this row indexes that you can see here row index mean This is the row index where all the row information is this is tries to mimic as much as possible the information that we have in the original data sources in The middle you can use Arthur if you want and Arthur is basically a machine for doing this in the large because this is designed For working with thousands of repositories if needed So Arthur is basically continuing continuously going to repositories and updating the information Continuously of course means try to be friendly to the APIs Maybe it's just waiting for 10 minutes and I came back to see if there is something new and stuff like that But basically here you can configure how do you interact with the different repositories? And how do you think in parallel you can run it in different nodes if your project is really huge And you need very many nodes for downloading the data, etc And as I said Percival is the part related to how to get information out of here And one of the interesting things of Percival is that it provides a common API for all the different data sources which is basically a Python generator and After my talk there is specifically a token Percival. So if you are interested in how how we retrieve the data and so on Please stay for For the next talk Once we have the data here in Elasticsearch We use Grimoire for doing two things one is dealing with identities because you know developers may use many different Identities in different places and we would want to track persons as much as possible not your entities So you can optionally use sorting hat which is on the top sorting hat is trying to find out The different identities of the same person for that is using some Ruristics and some information that you can configure if you are analyzing your own project and you know the different identity for your developers You can write a symbol as a very simple jam file and put the identities there So that sorting hat uses those for you and define the same person across different data sources for instance and you can also do things like Defining how you want the persons to be named affiliation for the person I mean for which company they work and stuff like that and Grimoire uses that information and some computing for doing the enrich indexes The enrich indexes are the ones that have like a summary of the information that we have in the row indexes And that summary is specifically designed for the analytics That's where we are for instance annotating every commit with the company for the person doing the commit So that later on you can query on that very easily if you want So again Grimoire uses the data from identities and produces the enrich indexes Which are a summary of the row indexes and the Enriching this is can be used of course directly if you want You can just connect the pandas notebook for instance here and do queries directly in Python and put that into a pandas They'll frame and do everything of stuff with it if you want But you can also just connect it to Kibir, which is the dashboard and you have a nice visualization of all your data Or you can connect it to manuscripts which is producing PDF documents with the summary of the project itself So the idea is that once you have the information in the data in the database You can exploit it in many different ways and on the top right you have moderate moderate is taking care of Orchestrating all of this moderate is a single tool that can run everything if you want Of course, you can run everything every tool by itself But you can also run everything together, which is usually what you do when you are Analyzing the parade for the first time and that's what the container that's The container that I presented to you a moment ago this So I'm going to go very quickly through a structure because you have the next talk on Percival We have a lot of data sources and then as an as I said Percival is taking care of providing a common API in Python for all of them and also have a common language is quite simple and basically produces a JSON document for each of the Items here an item can be a commit or can be a ticket or can be a message in the slack And in all the cases is a json document with different components, of course This is the way you can use Percival just install it and then this way you get this get repository analyze it Or this way you get this github repository analyze it in the case of using github You better include a token, you know that github can provide you these tokens via the use interface To be able of accessing the API quicker if you are not using it You do you can do like 50 queries an hour or something like that if you are using the docket You can use like 5,000 queries an hour or something like that So if you use Percival with github use the token and This is how do you do in Python? Which is as you can see quite simple on the top you have The import of the packets then you have the two data that we are going to use in this case It is a git repository. This is where we are going to clone the git repository until you have a decoration of the object for doing the staff and then just iterator on fetch which is the The Python generator that we are producing and it's always the same so that you have the data as Python Dictionaries, which are very convenient to use to fit the pandas wherever This is architecture for enrichment that I already mentioned where we are dealing with identities row indexes and staff and This is an example of how to run that for that you can use another script Which is p2o which is basically doing the conversion of the row index to the enriched index and at the same time It's already calling Percival so that you have the complete process from data source to Enrich index in this case the scam and for instance assumes that you already have elastic search deployed This is an example of the information that group Mario keeps producing So this is for a commit and there's much more information down here, but you can get the idea So it's just a simple JSON document where you have some information related to elastic search Which is on the top and then some fields which are the information that we are extracting from the data sources and since it Has a JSON document or in Python a Python dictionary is very easy to work with Because of it already has the structure of a table you can directly import this into a pandas data frame to do things with it for instance if you want and this is a summary of the architecture for Exploitation everything is into the elastic search database So you can't directly connect Python or anything else to it and exploit the information that way you can also connect here to All the tools and by default that say we have these two ways Remember Kibir which is Kibana and producing directly PDF staff, but for instance some people have been connecting graphene here instead of Kibana and it works as well So there is no problem in connecting any other thing to the database and This is the way for producing a dashboard from the command line if you just use the Python packages directly That instead of this I think that is better that you try this one Which is again the same that I showed you at the beginning using the docket container I have to look at the other one if you do that remember that then you point your browser here And you have a lot of information about your more lab in the website or go directly to the tutorial Well, you have a section on installation and everything and how to produce a simple dashboard from scratch and That's it. Thank you very much. I think we still have some time for one question comment and salt wherever Yes, please Not by default, but you could do that and it's basically a matter of connecting Let's say a collector of those workbooks and letting the Arthur go to the collector instead of the original data source And we do that don't do that because Arthur assumes there is no specific relationship with the repository Including maybe I can not set up a workbook But if you can the only thing that you need to do is to put the workbook Pointing to some collector and then letting Arthur go to the collector instead of the original data source right Okay, thank you very much