 Yeah, hello everyone Welcome to community medics by stocholitics session Presented by me and Herman. Let me take a minute bring slides on a screen Well, it looks like we will have less than 10 people to do on the presentation And I want to invite everyone to get to the first row and to have a closer look Well, unfortunately our presentation first was declined but later somebody Dismissed and we took the slot and today we will make a presentation on about stocholitics and about Metrics so that we collected some interesting facts what we know about contribution to the open stack Okay presentation will be by Ilya Shahad the Communitlite and merentus and by me Herman Narcaitis. I'm the founding member of merentus in Russia Let me first tell you about the stocholitics how it was born initially we decided to organize the group of developers who will work on on the community and the first question with we asked ourselves how many contribution to do we made and We made some prototype using open stack open source library and Published internally inside our company and at the time probably, you know Boris Ransky one of the another founding Founding member of merentus in us. He came to us with the idea that we can make a public service out of this and Actually, he indeed invented these terms stocholitics. This is a combination of open stack and analytics And we started this work probably two years ago so on on the summit in Portland we already had some internal demo and Probably a year ago we related public Yeah, okay, so actually we started a year and a half ago and we were ready for Portland summit. Okay Before we started development project We set some goals for us and actually for us as engineers It was It was very interesting and challenging task Okay So we wanted to have some tools that processes commits and reviews and other types of contribution made an open stack We wanted this statistics to be split to be analyzed by release by company by engineers by projects and Actually, we want we don't want it at this point to have statistics once per release or once per year because it's well rather long time to To get into analyze this we want this data set to be a fresh several times a day So we want what we wanted to look on our How our engineers work? at almost in real time The next goal for us as a users was rather complicated we wanted Very very fast UI. So we wanted it to be responsible and we wanted that every query in our API and Every click on the UI don't last Longer than a second. So we wanted this real-time statistics with filtering by all this release project company engineers to be completed within a second and also what we were afraid of we were afraid of That someone from outside of our company can tell that this is your private tool and we don't trust you We don't believe you. That's why our goal was to make it independent of our current installation actually It's so everyone who just clone clone our github repo and install and deploy it in their in own Environment get the same results. So we don't have any persistence inside Stacolitics.com at all Okay Just a moment it's a bit slow Okay, so what Stacolitics is now now we are able to analyze several types of contributions So it's regular commits done in our repos lines, of course, certainly reviews patch sets Change requests that are done in the get it in the review open stack org Emails everything that is sent across open stack dev is analyzed by Stacolitics Um File the resolved bugs so drafted and complete blueprints every activity that is done in launch pad is everything Stacolitics to We are able to filter our statistics by actually Five dimensions not including metric itself. So release project type module company engineer We have a set of reports and we will show some of them later We have Jason API for those who want to grab the data from Stacolitics back end directly and to use it at home And we actually are laughed by Google. So if a person Search for his name Then probably in most cases he get results on the first time on the first screen of Google the reference with reference to Stacolitics and The that's cool. Okay, so we grab from different from different sources. We grab different types of Records so commits blueprints bugs from launch pad from get it we can get change requests patch sets reviews and from my like archives males in Stacolitics we store this as a Unified record so we don't have we decided to So to not implement object model for every type of Contribution, so we decided to store just records and this have Some attributes for them and actually if you take a look at this record It's it shows how later we filter them or easily in front end Well, initially when we start the project as I told you we took some open source library that was based on my sequel and The challenge with the head that at that time in OpenStack, there was 25k commits and Processing of those commits was already not quite fast. So First initially we made some POC and decided what technology we will use as a back end And our goal was to make an aggregate query like you see in the top of the slide On the middle in the record was in a second with on the VM with one CPU So that was our target and we evaluated several technologies on this and My sequel shows half a minute for a gate of one million record Mongo showed even worse more than a minute and Then we tuned to memcached and memcached was 10 seconds. So that all those technologies Doesn't fit our goal. So as you see, there are no technology that can Meet those requirements, but we still do this Actually, it's quite simple. We just store store all the data in big Data set inside the memory of the application and after that we do all the aggregates in the memory in the real time So you will tell more about the architecture. Okay, so we as a persistence layer So the data storage, we don't have any database. We use directly data that is stored in jit In jared in the review pop dot almost at the torque in lunch pad or in mail archives. So we just All this data From these sources. So even if we start from scratch even calls start of new deployment of Stakelytics It's initially just polls data from all the sources then the parts then this data is Converted into records and stored in memcached memcached here is used as a single storage of all of all Stakelytics data on front-end side, we have several workers Based on uv's ji and flask technology and actually these workers So once workers starts it Reads data from memcached stores internally indexes in its own memory and Able to provide Jason API for a client site and client site itself. It's really very thin Think it's just set of JavaScript libraries that show all these charts on the stables and reports Well the interesting fact about the front-end part that we came to the point then all the data That we store required about six gigs of run and We we made some optimization right now. It's about two and a half or three gigs So we are able to run about two from front-end workers in eight gigas of run so that was one of the challenges with we overcome during our Implementation, yeah, okay, so the processing is quite simple So there is a two actually to work two different parts of Stakelytics the first is processor it Single-threaded process that is around now. It runs every four hours It pulls changes from all sources from jit from Jared from lunch pad From mail archives. So from jit. It's pretty simple. It's just to get Fetch and get locked the difference from Garrett There are pretty good queries that return only changed change requests in lunch pad in soul So some bugs for birth. It's possible to get updates for blueprints It's not but anyway, we get grab just all of them and for male archives. We check Changing change that yeah the last time the archive was changed The processor then stores all these records in memcached or updates existing and also it stores a list of record ideas in that changed The dashboard side is Which is USGi process? first Reads the list of changed record ideas and then reads only those records from memcached that were changed So this allows us to do the update incrementally. So even if processor add some You update and your update came into memcached the list of changed record ideas is still low and The dashboard side doesn't require a long time to do this update and also dashboard updates. It's internal cache Okay, the next interesting thing is a user profile. So user profile actually consists of different Different parts. So for user we actually have email. Well, it's pretty simple We have launchpad ID. It's available in case user does something in launchpad. We have jared ID User can do some review patch sets change records and so on and we have affiliation affiliation It's the company where the user works and Actually, so this is the main know-how of the stack or it could just because we automatically identify the affiliation for the user and Nobody else can do this and what we do here We just track the list of domains that were listed listed in in the emails and Attribute according to this list It doesn't work once again well Okay, so as the user profile can be as a herbal said it can be built automatically well Actually from different sources from different type of contributions we get part different parts of this But it's possible to match these pieces and actually There are two ways that help us. So the first we can ask launchpad ID by email It's API call to launchpad and the if user has his profile with all all his email listed in launchpad then in then we guarantee that his profile will be will be created automatically and correct way and Also, we can so get it ID and launchpad ID can differ and for most users. It's really different But we can ask launchpad if there is existing user with The name the same as a get it ID. So if it is the same we also match them and affiliation is found by Domain in the email. So this works in most cases, but Sure, there are some custom cases. So if a user has more than one affiliation. So if a user worked at one company then at another then he needs custom profile If not all emails are listed in launchpad also It's required and if company domain is not listed in Stakelytics yet In all these cases user come to our default data JSON file and at their own profile and actually in this in interesting thing is that in some companies it looks like The first thing a person do when he start working on Opus Tech He creates his he adds his himself in our default data JSON We observe this because of Because get it also all over all this show welcome message for newcomers. So he is talking about HP actually And we use standard community process for For proving all the changes for JSON API, so that's why anybody can come and see those changes and say yes We know on the change. Yeah also we use project classification So we actually we do this a standard way and For for everything that is under Opus Tech project type. It's based on official programs YAML from the governance project and Well, we can say that we are one of Well, we are the only user and we are very motivated to help governance to adopt the new format of the new format of this file and Just to make it easier for us to show the correct data. Also, we correctly track integrated incubated status of projects. So for example for For Sahara, which was integrated in June. It will not be shown in during ice house in ice house to be incubated Okay, and under step force we show everything that is under step force For module groups, we also support official programs It's what listed in official YAML file and it's possible to do this manually actually Okay, stick a little numbers as I told you when we start there was about 20 key of commits in Opus Tech And we made the projection that in five years it would be one million records and due this fact the PSE and the checks that we made initially was Based on their assumption that it would be only one million and as you see right now We already have about half of the million of records in Opus Tech itself and more than Million and a half in in stack forge. So in total, it's all about two million records And we still do no, no, it's all it's plastic for one million and a half. Okay, one and a half It was one million probably four months ago and we made One of the releases and I wrote about this in the email. So And we still can do all the queries was in the second and but some of the records are really heavy Okay, about reports. Okay. So we have a number of reports. So the first report is Contribution summary report. It's actually very useful and it's used when a person wants to be Become a core engineer, for example, so when he proposed as a core. So this report shows some the contribution summary contribution into the project for the last month or last three months and The main metrics here are number of reviews So community likes when a person do more reviews because a person it means that person knows projects better and also the important metric is disagreement ratio It's when the person does some vote, for example, it's it gives plus one to some patch set as then the core in case core also give plus one then disagree then it means that they are agreement and The person does correct job not just giving plus one to everyone and Also, it shows commits and mailing activity. So it's just the integral report for life of project Also, we have interesting thing called the activity report actually shows Contribution that is done by person or by company just like a stream like a Twitter stream Google loves this because he because it's static page and can be crawled by it Also, also it shows a nice punch card It's so it's useful when you need to find out when person works and when person sleeps So for example, we know some persons even from merantis who work almost 24 hours a day but usually Guys work for eight hours a day Um Okay, and yeah, it's usually on the first page when you quit in name Yep, and we invented some metrics that used on we you can find only in stock analytics Probably a half year ago. We had discussion with one to tell her about the status of active contributor and The concern was that right now we give the status to anyone who made at least one commit and Everybody knows that a lot of people who just made one commit to be an active contributor and go to the summit for example And that's why we decided to find some metric that will show how many time person spent contributing to the community and we Named this metric person day and effort. So how many days some person made at least something made the review Submitted the patch or whatever even a mail to the mail list and What they found in june there was three and a half thousand of different persons who made some contribution and Only one thousand working more than two weeks and what's the most interesting only two hundred engineers work full-time and probably this is the right metric to Give the status of active contributor. So and we would plan to discuss all those issues on the board and make some decisions on this and Here you can see that contribution in Havana Well in comparison june it grows on 50% Yeah, actually and I'm on these two hundred engineers who work full-time Most of them actually work more than full-time and there are some who work almost every day Yeah, full-time women that was in half of the year. They have at least 90 days of work just because the standard should be one hundred and ten So let's say the case location minus vacation minus whatever. So we took as a point 90 days work days during the half half year Okay, and also be besides contribution analysis. We have some complementary projects So the first one is driver look so driver look actually we wanted to have a central list of all drivers For all projects that exist in open stack We wanted to not to just list them, but we also wanted to have To know about what versions this drivers work. We wanted to have Ability to verify that this driver really has some external CI and this CI actually works and We wanted a standard way for users for vendors to update this List and to maintain so we started this initiative for at last summit and Today we actually we have Projects so we have drivers from natron No, I've seen their Sahara and recently ironic team joined us. So actually they did it on their own without any help from us Also interesting thing that driver looks data is used at marketplace at open stack.org. So it's a Single it's a page where we chose all drivers existing for our current release of open stack So we intended to propose to foundation these two to be used as a platform for certification program, but it's all up to foundation Yeah, okay, so it's pretty pretty straightforward. We have three filters and all data In the table Okay, also we have member directory. So everyone who joins open stack community needs to register at open stack.org so yet open stack.org there is a directories that shows user profile with This avatar, but it's very hard to analyze it because it's just there is a list of all users But it's well useless. So we've decided to crawl all this data and store inside stack analytics Actually now we able to proceed and to list all members registered there We can filter them definitely we are interesting for example in users could join during last week because it's a newcomers and probably they From some companies that weren't just joined and for example if at some point we found that Yandex Which is the largest search engine in Russia joined us and we were Surprised, okay, so Actually some interesting things about this is that there is a moment in three Approximately three months before the summit where registration increases in three times and Also, for example in this time frame it brings 200 new companies and For all times we have 17k members from almost 4,000 companies Okay, how it looks like. Yeah, so it's actually yeah, you can browse individual members companies And another thing that how we use tech analytics is to set the metrics that our developers should Achieve just because we have distributed team and we have a lot of interns that It's pretty hard to track all their progress So that's why we decided to make some tool that will help our management to set some goals some measurable goals to our engineers and to To check do they achieve those goals or not? So that's why we came to the point them We The manager should should make some some kind of the YAML that shown on the screen Yeah, actually we started with this Pre-configured some so manager writes some document in Google Docs then an engineer does some more JavaScript in the Stakelytics and got the result so it was well very hard and long process So we decided to make it very easy for top management to write this document Actually, it's just English text and nothing and couple of dashes in front. Yeah, so it's human readable form There you can see self-evident structure within some release cycle within some project some Engineers should do some kind of effort. Yeah, and here actually we just set some goal for Dina bill over who is core engineer and Cilometer and everyone loves her but also top management wants some statistics on her as well Yeah, for example for this release cycle There was a goal to become a core engineer engineer and cylinder and right now This goal is chipped and you will see the mark that it's done in the report and the report itself Look like this the manager should create the Google doc and just put the URL for this Google doc into the report in Stakelytics And it will automatically show the Marks the green or red one as you see on the right and It's pretty straightforward Tracking for each unit of the goals. So we usually do this on half-year reviews the manager Set new goals, but not only in this form, but this is a tooling that we use internally Before before we will give some final links. I just wanted to Say about a couple more words about the customer reports for our Engineering management that we do not all those reports we integrate into Stakelytics Because it's much more easier to make some Google doc Goal spreadsheet that will query using the API of the Stakelytics and on top of that It will just show the table and some Fancy charts For example in this way we implemented statistics on a number of course in each company So it's pretty simple too. Can you switch to another tab? So it's pretty easy to track how many Quarry viewers each company have and how many core reviews will really happen in the rentals so the The trick is that you don't need to make any changes in Stakelytics itself and all the Manipulation is a date is done within Google doc Well, it looks like it's a problem to bring this page to the screen. Yeah, just a moment I'll try to bring all pages Okay, so that's that's how it looks from so from Jason API just return some Jason Which is well Herman only Herman knows how to process in Google spreadsheet But it works. Okay, and can you scroll to the right? Yes trying Trying to scroll to the right just a moment. We passed some point here at the left in the column H You will see the list of the companies and the next column will show a number of the core of yours and after that The list of the reviewers itself for example, you can see that in rentals in number four in number of reviewers And if you will go a little bit lower, you will see the chart Hey Just a moment. We need that. Yeah So that So it's pretty simple for management to make some high-level reports Without any changes in Stakelytics itself only using the Google Docs Yeah, okay, and among links Just a moment Let's chart Okay, I'll keep it this way Okay, so actually yeah, we have all the community we have all documentation at wiki space in Open stack wiki open stack org We have instructions how to add yourself in a user profile So actually it's available when you click next to next to your grout are in Stakelytics And also nice thing that we we are now not the Stakelytics not only process open stack contribution it also there is also fork for open daylight community and they use the same code for processing their own contribution and Well, it's great. Yeah Well, if you have any questions, then this is a time for Q&A session But it looks like we get rid less than 20 people in this audience and Stakelytics is not the hardest Topic on this summit. Yes, Pasha, please go ahead if you have any questions well, actually we use it only internally and We never planed to publish it for public use, but it's Available on the Stakelytics right now. It was all available three or four months ago already So this is just this is just secret URL slash reports Yep inside me renters. We actually I use it for tracking of my interns I have about 15 interns and I use this tool just to track their progress Probably you set a filter for open stack, but you need to switch to You need to switch to stack forage and after that it automatically grows all the data from the launch pad and all the bugs should be there Let's double check this all offline just because it should work on 100% any other questions Yep Actually, it's enough to have only email and you will have a profile inside the Stakelytics The most complicated part is how we merge all those profiles If you will have separate activity in mail thread and if you have a review in Garrett, so we can send a query to Garrett Give us Garrett ID and all the information that you have to launch pad Using the email But in some cases it doesn't work For example, if a person used his personal email after that used corporate email So in this case a user need to create manually the profile in our configuration file default data JSON and Just to write down there the list of all the emails that he use and Garrett ID and launchpad ID if he used some different Different IDs in different systems and after that all the data will be merged into one profile Just because before that if Stakelytics can't automatically detect that this is a one person it will not merge it so we're probably five or four cases That was not done automatically But just as Ilya said there is onboarding procedures in some companies like HP Then newcomers come into into community the first commit that they do is commit to Stakelytics to default data JSON Yeah, let me let me comment on this there before it was commits but probably half year ago Monty Taylor come to us and His proposed was to use reviews. There was a long discussion around that and Finally we agreed that reviews is much more Coherent way to to measure the contribution and what's that's why we changed the default metrics from commits to reviews probably half year ago Actually, this is configured configurable parameter in our Configuration file not in default data. There is another configuration file that used for configuration of Application itself. Well, yeah, actually so other So another deployment can have some other default metric so Being default doesn't doesn't change anything Actually, we we use from the open stack probably program YAML and the rest could be applicable for any other open source project that uses the same tools as open stack like git or and give it so Like open daylight. Yep Open delight is a good example just because this is a Java project and its tracks absolutely fine with the same tools So it don't make any changes in our code So all the code that I used in stock not used but what we have in Stakelytics was written by Ilya and we do it in pair programming so I was sitting just beside him and Some small pieces of code was implemented by our interns So what the two main contributors here on the stage? Yep, we run right now. We run Stakelytics in production on VM with eight gigs of RAM with two virtual CPUs and This is enough and moreover you can run it on four gigs of RAM if you if you want So this is a minimal requirement for the Stakelytics for gigs of RAM one virtual CPU That will be enough It's it's stored all in memory in memcached just because persistently all this data is already in git and Garrett so all we do is just retrieve the data and store it in memcached and after that all the front end Just synchronize data from the memcached. Yeah, and actually you will need at least two gigs storage to store all Jits history Yep, so the hard drive should be at least two gigs just to store all the wrappers that we retrieve from the Open stack and stack for sure. Well, it looks like it's possible to run Stakelytics on Android form Okay Okay, if there are no other questions, that's it. Yeah, let's go and visit us online online Yep. Yeah. Thank you