みなさんこんにちは。私の名前はシノイワミです。NECコーポレーションです。今日はお世話のデータサイエンスフォーをお話しします。オーエッセーストレンドがデベルパーのアクティビティーから出てきました。まず、バックグラウンドにお話しします。なぜオースフォーをデータサイエンスフォーで使っていますか?このフィギュアがオープンソースプログラムオフィスのオープンソースが出ています。これを2Dグループで作ったことです。初めのステージ0でオーエッセースを使ってお拠撲される人はいます。オーエッセースがオーエッセースのまま出来るものの人々を信じているのが、エントランスに移動され、オスポートスワンプを採用することができます。これはステージ1です。もちろん、もちろん、オーガナイの方々はオープンソースを使ってステージ4を使ってオープンソースを使ってステージ4を使ってステージ4を使って実際にオープンソースは荒 explainオープンソースではない今、テクノロジーマネジメントのパスペクティブについては、エヴィデンスベストポリシーメーキングとデータドリブンマネジメントについては、これがエヴィデンスベストポリシーメーキングについては、エヴィデンスラザザンについては、だけど、それについては、大切な 行為 マネジ себеに使う技術を使う技術は、これは 人の必要性を持っているのだろうIf you pay attention to the shortcomings of evidencethere are many useful points that being able to make automatic judgment based on conditionsand making work more efficient.前のスライドで、テクノロジーマネジメントを説明しました。このスライドでは、テクノロジーマネジメントのポジションを見てみましょう。上 right corner shows the sections of an academic society related to management called AOM.Involved here, technology and innovation management is a section of factors that also should incorporate ideas of technology management generally available to other engineers and other sciences.Meanwhile, the left side shows the sections of computer science from SEM.As you know, open source model is one field of computer science and technology management strategy office located at an intersection of technology management and computer science.To data science for OSPO must use methodologies in technology management.The left figure is an analytics of robotics in the field of science with clustering and other data science methodologies.In the OSS field, the larger project is Kaos project since 2017 and it held a hackathon at an international conference MSR in 2022.MSR is a shortened name of mining software repositories.Lethert published the network analysis about the in-company OSS community in 2019.As shown in the center figure, NEC published an analysis with co-occurring keywords network and overview the fields of OSS.I explain it later again.In order to get hint from the existing analysis, next I compare histories between science and OSS.In science, metrics start to make indicator, overview, detect emerging technologies, and forecast.Forecasting is for continuous improvement of technologies and distributed innovations and expected to jump up from forecasting.The metrics of OSS rapidly catch up with metrics of science.The causes are improvement of hardware and hardware capabilities and popularity of machine learning.Now, I think OSS is at the stage of over-viewing and detecting emerging technologies, this and this.In NEC, several analysis has been prepared to capture the status of OSS.Today, I introduce three analysis.The first analysis is OSS categorization by clustering co-occurring keywords networks.At OSSJ to 2022 last year, I introduced clustering methodology using the Ruben algorithm.In this session, I will introduce the result using inform-up algorithm.The explanation on this slide may be too detailed to explain the methodology simply.Network is built and clustering algorithm is applied.Original ideas are incorporated into network building and different algorithm can be applied.When selecting an algorithm, I am careful to select one that automatically sets the number of clusters.This is the result of the last November by inform-up algorithm.This data came active OSS that have commit records on GitHub between 2020 and 2022.The largest cluster is about JavaScript and related language.The second cluster is about machine learning.The third cluster is about cloud computing, green one about Kubernetes and Docker.This result continues about 200 clusters.Next, watching the length shift, yellow cluster about Java and microservices.And orange cluster about database is two cluster growing.The red cluster about C++ and game seems to decline.It is normal to watch top clusters, so let's check lower-rank clusters.We can find current temporal topics about COVID-19 here and log4j.Another case is quantum computing and drone here.Maybe future big market.Interesting cluster is this one.This is medical OSS like bioinformatics and genomics sequencing.Using data science, we can gain human understandable information like this.Next, it was revealed that the differences of these two algorithms.This year is inform-up and the last year use Ruben.The differences of these two algorithms impact on size and number of clusters.So, when you watch smaller clusters, inform-up algorithm may be better.Among smaller clusters, we can find emerging OSS fields.The second analysis is about OSS shift based on developer movements.For this analysis, I explained the substitute goods and complementary goods from economics.The substitute goods has a relation between tea and coffee.When one is consumed, another is not consumed.Complementary goods have a relation between bread and butter.When one is consumed, another is also consumed.I think these relations are available for analysis about OSS.When some of OSS substitute, developers will shift from one to another.When some of OSS are complementary, developers will develop both OSS actively at the same time.Then, this analysis calculated the correlation coefficient between timeline of number of commit of two repositories.At the same time, the share rate of developers between two OSS repositories are checked.I think the higher share means the stronger causal relationship.So, this is the result.As an example of the result, the case of Python is shown.This is a part of the result between Python, not Python, but PyTorch.PyTorch and other repositories.As an interesting result, when the top five with correlation coefficient of 0.4 or higher are checked in descending order of share rate,then TensorFlow is a complementary OSS to Python, not Python, PyTorch.Next, when the top five with correlation coefficient of 0.4 or lower are checked in descending order of share rate of developers,then all the OSS repositories related to meta and Torch are substituted for PyTorch.Most of the results are rational.However, when I analyzed about Docker and container D,the shift from Docker to container was detected.Although I had not known the shift from Docker to container D,data science gave us the awareness without field knowledge.The third analysis is about key developers and their specialties.In this analysis, the methodology uses a network again.This network comes from the relations following and followers of accounts on Github.Web page of Github, there is data of followers and flowing,and we can get these data via Github API.In this analysis, I use degree centrality and between centrality,mainly degree centrality.Degree centrality means accounts who collect interest from many people.Higher score of degree centrality means to collect interest from many people.Meanwhile between the centrality,the higher of between the centrality means bridge to bridge between groups.This is methodology.This analysis uses centrality after building a network.A network consists of following and followers in computing of this analysis.The more data there are, the longer the time is required.Therefore, I wanted conditions to reduce the amount of data.This table is a result.While searching that conditions,leader's two words is located at the higher rank here and here like this.It means he collect interest from many people from this result.The network seems to have a good condition because this area does not include two words and other top accounts.This field comes from another analysis.Comparing this analysis,that's two people account highlighted in the previous slide.I can not read too few.They are relax and view developer.Their project has been going on for a long time and have many subscribers.Considering why this condition is good,firstly,account with few followers are considered beginners.Beginners are likely to follow important people in the community.Based on this result and discussion,this condition has reduced the amount of data to compute and regarded to be useful.Data science can utilize collective intelligence from a lot of people.This is the latest result and two words is number one.Yellow cell is a famous organization here.Microsoft,Facebook,Google.OpenTof is here.So,lastly,there is the sideline findings of this analysis.During investigating the top account in the result,I noticed the existence of GitHub sponsorship and independent developers.From this,I felt ready for a new work style and found it's easier to work independently.That's all.Thank you very much.