 Okay, now we have Grimoire Lab, a Python toolset for software development analytics. Thank you very much. Good morning. I'm sorry I don't have music, but I have some nice charts, so I hope that that also makes for the day. First of all, if you want, you can download the slides. They are in my Twitter stream, and you can also get them from SpeakerDirect.com. Maybe it's interesting because they have some links so that you can click on them and so on. I'm Jesus Gonzalez-Baraona, and I'm going to talk about Grimoire Lab. Grimoire Lab is a new toolset for doing software development analytics. The idea is to retrieve information from software development repositories and use that information to produce any kind of stuff that you may be interested in, from dashboards to reports to nice webpages where you can see what's happening in a project, for instance. A report is not to analyze source code but to analyze processes and activity. That means that you can go to Git repositories or to Baxilla repositories or to IRC channels or to mailing lists or to Stack Overflow and get the information about what's happening there. Our main focus is to track where development is happening or where people are talking about software development and get that information and then do analytics in it. The main aim is to find ways to learn how open source software and software in general is being developed and also to find ways to visualize that information and make it clear so that players can be self aware of what's happening with them. The structure of the talk is going to be like this. First of all, I'm going very briefly to give some context about myself and Grimoire Lab. Then we are going to start looking at the software and then we will become practical. We have to show some examples of how you can use Python to retrieve information from repositories and stuff like that. At the end, there is something for you. I've been researching on this topic in the university for some years and at some point, me and some people in my research group decided to start a small company doing the same kind of stuff. It is a company that did the software that I'm presenting today. The company is intending to use the software as a community project. Right now, most of the contributions come from the company itself. But anybody is welcome to come and join the development and, of course, join the decisions on it. The company is 100% free open source software, both for the software we use and for the software we produce. If you want to see what Grimoire Lab can do, go to caldron.io. It's basically a web service where you can enter a GitHub organization, an owner or an organization, and basically it does the analytics for that repository. It's going to pull everything in the GitHub repository and everything related to issues and pull requests. And in some time, it produces a nice dashboard for you. You can also browse the dashboard already produced. All of that is done with free software and let's say that the glue software is Grimoire Lab. Of course, there are many other software behind it, but still, you can get an idea of what can you do and at the same time, analyze your pet project. By the way, the Python organization GitHub is analyzed, so you can look for it and you can look at the dashboard and see how Python has been developed. The software doing the magic is Grimoire Lab. We have a long story of using Python for going to look for information of software repositories. We started with metrics Grimoire like maybe 10 years ago that was a set of tools for retrieving information from repositories and storing it in SQL databases. We were using that technology for research for many years, but at some point when we started the company, we decided that it was time to learn from the past and restructure everything from scratch. And we decided to rewrite everything. And one of the things that we decided is to use only Python. We had some of the things in the previous tool chain. And the second one was to try to make it, sorry, as much a structure as possible so that we could have very easily support for new dashboards, sorry, for new kinds of repositories, for instance, or for new kinds of panels in a dashboard or for new kinds of studies so that we have now a playable structure where it's very easy to provide support for new things. And that's GrimoireLab. So we started that one year ago, and in fact this is the first time we are presenting it fully. GrimoireLab has a very simple, let's say, structure from the point of view of the data flow. Usually the data flow starts in repositories, Git, GitHub, Mailman, whatever. And we have a tool called Perceval, which basically goes to repositories, extracts information, and uploads to Elasticsearch if you want, because Perceval is completely database agnostic. It only gets the data and produces a collection of JSON documents that you can upload anywhere you want. In our case, we have software for uploading it to Elasticsearch. Then that's what we call the row index. An index in Elasticsearch is kind of a database or something. So we have row indexes which have exactly the same information that you have in the original data source, which means that if you need to query wherever, you don't need to go any longer to the original data source, you just can query Elasticsearch, which is, of course, much easier. Then for production, let's say, added value indexes, we use GrimoireLab. GrimoireLab goes to the row indexes and enriches the information. For instance, can do things like calculate how long did it take to close a ticket? Or tries to find out whether a pull records is still open or not to calculate numbers on basis of that. And stuff like that. So with that information, we produce, again, in Elasticsearch, new indexes that we call enrich indexes. And those indexes are designed for based on with Kibana. It can also be queried with Python scripts, for instance, but you can visualize them with Kibana. Our version of Kibana is Kibir, but it's a soft fork of it. So it's very similar, and you can also use Kibana if you want. And Kibana is producing the task for the front ends. But with the same information that we have in Elasticsearch for Kibana, you can also use scripts to retrieve it and do other nice things. Let's go now one by one. But before that, if you want, go to the web page of GrimoireLab, and you can get much more information, of course. But let's go one, two at a time. Are you already percent imperceable? Remember, go into the repositories and retrieving information and producing JSON documents. Then we have... I already talked about GrimoireLab. Remember, taking row indexes and also managing Perceval if needed and uploaded information. Then we have Arthur over there. Arthur is going to be released as stable during the next weeks, but you can already play with it. And Arthur is for orchestrating retrieval. When you are retrieving data from thousands of repositories of different kinds and you want to do that continuously because you want to have the information updated, it's a complex task. Arthur is going to help us with that. So, basically, he uses a release database to know what's happening and have a list of our jobs being done and stuff like that. And it's very useful if you have to retrieve information in the large. So, for you to have an idea of scale, right now we're working with the Linux Foundation in getting information from like 10,000 Git repositories, for instance, that information continuously updated. Kibir, I already commented, our software from Kibana is the only thing that is not Python. And Panels, Panels is the configuration for Kibir, in fact, which is how the information about the passwords that we are using and the visualizations and all of that. And then we have Sorting Hat. Sorting Hat was developed several years ago and the idea is to track identities. It can do things like emerging similar email addresses to the same person. Or tracking affiliations for people, based on email addresses or on Git DM information, for instance, if that information is available. So, the idea is to try to make a mapping from identities to real persons and then, if that's important for the project, do a mapping of that person to organizations, to companies, wherever. So, all of them, as I said, for Kibir, are written in Python. And in fact, they are Python models that can be used directly from a Python script. This is a tutorial that we are preparing. You have the link down here. And it is for trying to use everything in the Grimoire Lab toolset. It's still a work in progress, but have a look at it and that's where you can find some more details that I'm not going to have time now to talk about. Let's be practical, as I said. And let's start with Percival. Once again, Percival gets information from a procedure and produces JSON files. So, let's look at how to do that. It's quite simple. It's Python 3. So, you just get the Python 3 environment. You install with Pi Pi and that's it. Pi Pi install Percival. You have the latest version and then you just run it. With Percival, Percival is basically a Python model, but it's also a script. So, you can run the script right over there. And just to state in this example, I want to run it with Git backend on that URL which is a Git repository. In this case, it's the Git repository of Percival itself. And then it starts just fetching everything. Of course, it clones the repository to get the information and then runs Git log with all sorts of options and wherever and it starts producing JSON documents with the information in the repository. You can do exactly that for like 20 different data sources. And it's like that. You don't need to fight anymore with the APIs of the different repositories in the staff. If you want to do the same staff from Python, it's like this. So, you can see how I'm just importing the corresponding model from Percival. This is for GitHub, by the way. But it's basically the same thing. Then I define the repository I'm interested in and you can see how I just instanted a Python class from GitHub with the corresponding information for the repository name, the owner and the API token for GitHub. Of course, you need to get an API token but you know that that's very easy from the GitHub user interface. And once that's done, you get a nice Python generator with all the items in the API. In this case, the API is for pull requests and issues. So, you get a nice JSON document and you can see that you can see that the API is very easy to import and issue. And that's it. So, this is, for instance, very simple code to just discriminate which ones are pull requests and which ones are issues. Because in the data we retrieve, there is a field which states whether this is a pull request or not. So, it's just a matter of doing that. But you have all the data available and you can do that. This is the list of backends that are right now supported. It's very easy to write a backend, by the way. If you are interested, it's usually like between 100 and 200 of Python lines of code. Most of them are just copying from a similar API. So, it's quite simple. You can do that and do that on yourself. Persever has a playable structure where it's very easy to use. But, of course, we are more than happy to get your contribution if you want and you want to produce a new backend for something. For instance, I have a backend for GitLab which is still not here. But it took me like, I don't know, two afternoons to write. The next component, Grimoire Elk. As I said, Grimoire Elk is basically taking charge of story information and he can also run Perseval using our JSON documents. In fact, it is producing Python dictionaries. And Perseval can run Perseval, sorry, Grimoire Elk can run Perseval, get those Python dictionaries and upload them as row indexes to elastic search. And then use the same information to be enriched and produce the enriched indexes. That means that basically Grimoire Elk can produce dashboards because the information stored there is what Kibana uses to visualize. So this is the way for using Grimoire Elk. So again, it's a Python model so you can install from PyPy. And you also have another one which is Grimoire Kidas, which is a part of the same package, but it's different because the job of Grimoire Kidas is to upload the panels to Kibana, sorry, to elastic search so that they can mix them with Kibana. So remember that panel is a model that I was talking about where we have definitions for visualization and dashboards and stuff. The work of Kidas is basically uploading that information to Kibana so that you can produce automatically the dashboard. Of course you can also do that via the Kibana user interface but this is faster. So the tool for running this once you install Grimoire Elk you have a model which is called Grimoire Elk and a tool which is P2O. P2O is basically getting information about the model and the enrich indexes. So that's everything. And you only need to explain, sorry to specify the name of the indexes GitRaw and Git in this case where the elastic search instance is going to be in this case it was my laptop so local host and you have to say for instance in this case no ink means I don't want it to work incrementally I want everything from the very beginning and minus to back just to show some information about how it is working and where the repository is. In this case it is a Git repository and that's the list. And basically that is going to do the same thing as before as Percival did but storing the information in elastic search. Once you do that you can upload to Kibana the information about the dashboard and the staff and that's it. You have two row indexes and you have a Kibana dashboard you now point your web browser to Kibana and there is the dashboard for this repository. You can do that for many repositories if you want because the index can be reused and include information for many other repositories and if you run again the same command without knowing step, sorry knowing option it's basically going to go incrementally over the data source that's basically it's going to ask for the same thing since the last time I got this information so it's very efficient and it can be ran every for instance 10 minutes and we very much synchronize it with I don't know a GitHub project for instance. So I will talk about Percival, Grimoire, Elkman when you are using Python you can also use information in the elastic search because in many cases you are not exactly interested in producing an you want to get some specific information in some other way so basically you want to access the elastic search information and do wherever you may want with it this is not something that we did but I'm going to explain it so that you can see the value of having access to the data from Python so for this we are using the standard elastic search API the elastic search API is a nice rest API basically where you can query using HTTP on the rest so this is using Car for just getting one element of the index and you can have an idea of what you have in the index of course this is a JSON document and basically you have basic information in this case for a commit. In the real thing you have like 30 different parameters obtained from the commit and then also adapted and enriched with Grimoire Elk but you get the idea so this is what you get with Car and it's similar stuff with Python is a model of using specific elastic search packages that are going to access elastic search in a nicer way so the people from elastic search are producing two packages for Python elastic search and elastic search DSL elastic search is like the low level one basically mimics the API and it's quite simple to use if you know the API but a bit difficult if you want to do complex queries so that's why they also have elastic search DSL the idea of elastic search DSL is quite similar to the the idea of SQL Alchemy if you know it so the idea is to write possible queries so that making a query is as calling to same different functions on the same object one after the other so that you can boost very complex queries in a simple way so in this case we are at the first line we are creating an elastic search instance for Python which is where my elastic search lives then I create a search basic search just saying I want to search something on the index and the index is git in this case and then I start adding parameters to the query so I begin with I say I want files created at zero that means that for each commit we keep track of how many files are touched by it and we don't want commits that are not touching any files usually there are merge commits and for this we don't need them so I say I want everything where the author date is greater than 2016 so that means give me all the data from that date on then I say I want commits with capability using the field hands that basically means give me unique commits so if the same commit is repeated in several repositories have the same hands yes give me one count it as one and then do a date and then in this case of a time so we are going to get several groups each of them according to this so in short it's going to give me the count of commits by query ignoring merge commits and since some date and for running it you only execute it and you get a python generator and you get in this case a python generator for its square and basically you can print the information you may want so you can see it's quite simple and quite easy as well yes I'm starting to finish this is as I said the tutorial if you look at the structure of the tutorial there is a part for Percival which is the more complete and it's very easy to follow then there is some hints on how to produce and how to improve in that part and then there is a part about python scripting probably during the next few weeks I'm going to write something about how to combine this with pandas for doing let's say advanced analysis and stuff like that but you can imagine that it's not that difficult and again remember if you want to see everything working you can just use the call runio and there you can get an idea of what can be produced with the software and remember there are many tools already produced that you can check including the python one well the python according to the python organization in github and you have to finish so enjoy and remember that we have the main web page for the project here with wild camp contributions and we are trying to create a community and this is the right moment for internet because this is the first public announcement to developers that we are going to use for two other people for a while but if you want to write back in or you have any idea about a new kind of analysis or you just want to use it but don't figure out how to do that please let us know we are more than happy of supporting you thank you very much okay we have some time for questions of comments if you have any yes please thank you for the talk so is Percival provides any API like I want to be like chatbot or something no Percival is quite simple it only provides JSON documents in fact as you saw in fact it provides a python generator and it's in its round of the python generator and you guess a python dictionary with that you can do anything you may want but right now we have no support except for uploading to elastic search as I commented any other comment or question or wherever okay thank you very much thank you