 Good afternoon, everyone. And thank you for this opportunity of speaking at this OSP.com Japan. And thank you for participating. We'd like to talk about a data analysis approach for contribution-centric open source strategy planning. Let us introduce ourselves. My name is Kazumisato. I am a distinguished engineer at Sony Group Corporation. And I have been working on Linux-based system software for various Sony products. I also have been working on OSS compliance and relationship with a community in Sony. Since 2002, when Sony started to use Linux, I have been reading system software development using Linux and introducing it to the products and complying with the open source license. I am a member of the software strategy committee at Sony Group Corporation. Thank you. Yeah. My name is Hiro Fukuchi. I'm a senior alliance manager in Sony. I'm working on OSS compliance and building a relationship with OSS community. And I'm the leader of planning subgroup of OpenChain Japan. And also I'm a translation volunteer for English to Japanese for such as documents of OpenChain and SPDX. We have two collaborators. And they are in this room. And in the Q&A session, we will invite them to answer your questions. We will explain later. Yeah. This is today's agenda. Firstly, we explained about the OSPOR, especially the OSPOR maturity model. Today, this is the third time. And then we explained about the OSPOR challenge for us in the contribution phase, and especially for how to build a plan strategy for a contribution. And then we introduced the data analysis method. We think this is to solve these OSPOR challenges. And we explained about the approach and our proposed method. And then we will show the short result of our application of this method. We will show the graphs and some considerations. And then we will end this session with our short conclusion. Yeah. At first, OSPOR. Yeah. This is the third time today. This is more popular than yesterday. And so this is the OSPOR maturity model. This is a very good model created by To-Do Group. And I think this model showed the insight for the OSPOR journey, how to grow to the next step. And with this picture, we can see forward about what is the next step and what is the issues in the future. And this model shows, at first, we start with adopting the open source. And then we move on to the compliance phase. And we make a structural compliance system in the organization. And then we move on to the next stage. That is the red line space. And we focus two areas. One is to adapt the open source. And the next is to adapt to the community relationship challenge. And these two are very different. And so if you are at the contribution phase, you may face a new challenge and very difficult for your past experience. So we need to consider the strategy. And these red lines contain the communication relationship and the engagement and the strategy reading. So we want to use, contribute the open source strategy. What is a strategy? In this session, I mean there are many options for fostering your own project or to contribute neutral organization or other project or something. So we think about such a strategy. And then we need these data technology trends, open source project, popular open source project, and also self-position. This is very hard to know our position in the open source world. Open source world is very, very big. And so it is very hard to know your position. So then we think about the data analysis method. In this session, the data analysis means the GitHub commit log analysis. We limited these areas. But why do we use GitHub commit log? Because most open source projects are hosted in GitHub. And also, even if they're not hosted at GitHub, they may have a mirror site at GitHub. So we can observe the contribution activity on GitHub. So that is why we use GitHub commit log. And the basic idea is commit log and mail address domain filtering. This gives you the organization's behavior, which project they contribute or who contribute to them. So it is very nice. And the first analysis to aggregate in these data, you can see the popular project that collected many contributions. Or technology areas, many organizations gathered. This is very simple, but very powerful. And our approach is very simple. We use OSCI. OSCI means the open source contributors index. And this data is based on the open data and use open source tool. So everyone can do the same. It is very nice. And we can collaborate in the open source way. And observation point, we focus on the company, organization, not the project. For example, the chaos project gives many good statistics on the project. It is healthy or not. So we focus on the organization behavior. That's the point. And the organization makes the trend. Yeah, organization. Yes, the OSCI is developed by the E-Pen, the company in Europe. And this is the index for the organization's active contributors. Active contributors means 10 or more commit per year. So it is very nice to see their activity, their organization's activities. So it is very nice. And so the data is the commit log. And so tools are open. So the approach is the same as we think. This is a chat of the calculation of OSCI. Firstly, Git have commit log downloaded by filtering the email address. And also the company list are limited so that we can reduce the number. And so these tools and lists are open, open source, maintained by E-Pen. So everyone can join to collaborate. And everyone can do by yourself. So it is very nice. And then we calculate the OSCI. And you can get these kind of data. So top leading companies are growing their contributions recently. We can see the overview of their contributions. OSCI is very nice. But we want to further details of the contributions. So we think about the more, first, the analysis. For example, which project do they contributed, the technology areas, and the ratio of incoming and outgoing. The ratio is very important for us. Should we foster our project? Or should we join the open source projects? That is the point. So we added this red part. And the point is we, sorry, at this stage, the company list has 300 companies. It is very small, smaller than the GitHub whole data. But it is also the big. So this one, we reduced the company list around 10 companies. And we pick up our leading companies across the industry, such as cloud vendors, semiconductors, IT vendors, and media companies. So we can see, we can observe some kind of trend in these leading companies. And also the difference between them, the difference of strategy among them. That is our way. OK. We'd like to talk about our application. In the application of the proposed method, we analyzed total contribution by a company. This shows strengths of contribution activity of the company. So we can be observed using OSCI. And we categorized contribution. One is inner contribution. Inner contribution is contribution to self-organization. And we categorize outer contribution. This contribution is contribution to outside organization. Outer contribution is contribution to other project and another corporate governance, or project and the neutral governance. And we analyzed by this. We find key players in the focused technology field. This page shows that categorization. We analyzed where organization contributed to. A company contribute. A company's project. We categorized inner contribution. If a company contribute outside of the organization, we categorize outer contribution. And we analyzed right between inner contribution and outer contribution. This page shows by contributors. X-axis is total contributors. And Y-axis is the rate of outer contributors from total contributors. So more X-axis is right. The organizational contributor activity is strengths. And the upper side, the company is strengths outer contribution. And the lower-axis side is more inner contribution. And we analyzed receives outer contributor from outside. If companies contribute to company A, we categorize from outside contributors. And this right between inner and outer contribution is how successful to foster the project of the companies. And this page shows how organization receives contribution from outlaws. Yeah, there are many contribution from outside project. We displayed some project from outer contribution. And we analyzed neutral organization project activities. And this show the company utilized a neutral project. How are utilized neutral projects? This chart show contribution to Linux Foundation project. Leading company contribute to Linux Foundation project. So we suppose contribution to Linux Foundation project is very good. And we utilized, we should utilize Linux Foundation project. This chart show technology organization. This chart show cloud related project is many, many contributors from other companies. Many organization contribute cloud areas. This page show Apache Software Foundation case. This page, this chart show difference of Linux Foundation case. This page show Eclipse Foundation project case. This chart show difference from Linux Foundation and Eclipse Foundation. Let's talk about brief observation and consideration. Contribution pattern are different. Some company is inner centric. Most contribution go to own project. And organization may develop software project as open source. And some company is outer centric. More than half contribution go to outer project. Organization may depend on open source ecosystems. And receiving outer contribution. Receiving rate is observed as 2% to 15% among leading organization. So posturing open source project is very tough. Leading company can get outer contributor. But some of project only got 2%. So 5% or 10% contribution is successful, I think. And as a neural open source project, some organization utilize a neural open source project. Neutral open source may be a good place to collaborate each other. Let's talk about open source tools. Filter and company list and download tool develop and open source by EPAN. So we or you can try this method. And as community collaboration, we have several issue were discussed with the tools. The issue were communicate with community on GitHub. And fix for the issue were created and contributed by community and Sony. We have some limitation. And accuracy of analysis is affected by some tools. And analysis period is limited. Our presentation is a snapshot of January this year. And organization list for filtering is limited. By our interesting. And there are many bots. So many bots, there are many bots. So we ignore a lot of data. And organization list use corporate domain address. So not all contributors use corporate domain mail address. Someone use private address. And we didn't count individual contributors. And selection list does not cover organization. And we, our analysis only use GitHub commit log. And as you know, there are important project outside of GitHub such as Linux kernels. Our conclusion analysis approach is introduced. OSCI and analyzed commit log. And we show some application results. In non-contribution, out of contribution. And some company strategy difference. Tool is open source. Collaborate is welcome. That's all. Thank you very much. If you have any question, please feel free to ask us. Before the Q&A session, we introduce our colleagues to guide Arlene and Kuwata-san. Please stand up. They are very well for making the tools and analyze. So thank you for, and so we can answer your questions by force. Yeah. Please, if you have questions or comment, please give us. Yeah, thank you for the nice presentation. So my answer is when you try to measure the contribution, so you measure by the number of lines of the source code or the numbers of the commit or the numbers of the contributors. So I'm a bit confused about that. Okay. In this chart case, we analyze by contributors. We also analyze by commit. But in many cases, almost same figure. So in this presentation, we use contributors view. Thank you very much. Another question is about a technique. So when you use the JTLAB to track the commit log, so when you rename a source file or you move it to another directory, so usually the commit log will break down. So I know how you deal with this issue. Thank you for the question. So actually, we are analyzing day by day from the start day. Let's say that you have a file in directory A at the beginning of the month. So you will have the history at the beginning of the month for that file. And then on the 10th of the month, it will be moved to a new folder. So for the first 10 days, you have the old history and then you have the new history as a continuation. Yeah. So thank you very much. Yeah. If the file has been moved three years ago, then yes, we miss it. It's a limitation of the two. Thank you very much. Thank you for the question. Any questions? So first of all, very nice presentation. My question is if you have taken any actions or made decisions that you say, okay, we now explicitly approach this company or we explicitly take this project to investigate further or if it's more first of all theoretical analysis to get a better understanding. Yeah. This is our trial and so the POC and so we convince ourselves this method is work well. So the next step is to scale the period. This is just one month. And so we want to scale the several years and we want to see the transition of the contributors activity or organizations activities. And then we may analysis for our business unit and they can plan their business strategy by using this open source data. Thank you. Thank you. So my question is this is mostly for strategic contribution but are you also using that data or will you be using that data when you consume strategically? I mean Sony is a very large company. Will you be sort of trying to get Sony to use the same open source components in all business units because you like this data for a certain project or how will this be used? I personally think that this is an early stage work in progress data analysis and probably we will decide later after we see how things look. Thank you. I work at Sony so I'm going to answer that a little bit too. So one of the first things I did looking at this data was to see where the contributions were and I think eventually we would like to obviously we're doing this as I said in early stages but we hope to be able to use this for strategic planning as you described to find out what projects that Sony is involved with or that maybe we should get involved with that will be beneficial for our product lines. Thank you. And I think the first thing to do is to move to the X axis and then you should consider the upper or lower. If you are a contributor it's very small. The first thing is to encourage the engineer to contribute to the world. Thank you. So how many companies in your organization list? I thought it was an organization list so I guess here. Yes, yes, yes, yes. There are two lists. One is here. This is maintained by E-Pen and this contains 300. And then we can select several leading companies. That is around 10 companies and we can increase but we need more computing power. So due to our not sufficient computing power we limited to 10 or around 10 companies but we can do them more. Okay, thank you very much. And another question maybe. So there are two types of organization. One is the inner system trick. Another one is the outer system. So can you share maybe more information about these two types? What is the difference and which one is better? So in your opinion. We are neutral and they have a business strategy. They have a business strategy and so it depends on the business strategy. For example, we found that relatively the semiconductor company are upper part because they need many hardware platforms to work their software platforms. So like Linux kernel or other more open source platforms. So they want to connect such platforms. And the other hand, there are some companies like these Unity or Elastics. They have a very strong software core business. And so they are perhaps, I don't know, but perhaps they are in a centric. But it depends on the business strategy. So we cannot say which is better or worse. Okay, thank you very much. Okay, thank you very much.