 Yeah, this is the second session from the end of this summit. Thank you for being here, and I appreciate your time and attention. Today, I'll be talking about from S1 to call graphs harnessing OSS tools to streamline change impact analysis in cloud services. That's okay. Let me introduce myself. My name is Noboru Iwamatsu, and I've been working for Fujitsu for over 20 years. I started as a researcher, then moving on to the lead of our cloud services, and currently I'm focused on promoting innovative cloud service updating method. Yeah, today we share. Here's agenda of this talk. The top two topics, supply chain attack and S1. I believe you already know about it and already learned a lot about this summit, so I will briefly explain here. And about the main topic, update change impact analysis, I will explain the details and share the results. So let's get started. So software supply chain attack is an increasingly common type of cyber attack. It's not a direct attack, but it injects malicious code into the third party library, compromising the integrity and the security of the final products. The reason this list is on rise is modern software that we are developing is heavily relies on numerous third party components. Third party in this component means open source. So the extensive use of open source means that the vulnerabilities of a malicious code in open source can lead to widespread and severe impact on our society. And S1 or software build of materials is a comprehensive list of all components in software products. It is considered a key defense, sorry, a key defense against supply chain attacks. This importance is mentioned in here, in the here, executive order by President Biden linked here. About S1's benefit, it enhanced the transparency by clarifying the software parts, origins, and their licenses. It detects vulnerabilities by scanning each component for risk and it helps promote prompt response with fast identification and fix for security programs in components. Here are some of the open source S1 tools and solutions, especially that have cute icons. There are increasingly becoming available in both open source and commercial. This is an example of using S1 tools. Integrating S1 tools into DevOps process is essential for clear software components tracking and risk management. In the deliver phase here, where the software is packaged and prepared for the release, the S1 is generated and registered into the S1 database. By scanning the S1 database in response to the provided vulnerability information, security risk can be detected. If any risks are found, the incident management system immediately alerts the developer. So a developer to initiate the remediation process. So thanks to the S1, the S1 tool automate these processes time-round faster and faster, but actually the actual remediation efforts of the developers are escalating. So developers must now address an increasing volume of complex security risks. In the rush of updates, we easily tend to fall back on the artifact and the problematic approach described in the top side. We briefly skim the release note, just a release note, and it may lead us to miss some important changes and also blindly trusting package manager is dangerous. Moreover, we always overestimate our test coverage, but actually not, so which can result in unchecked release. To address these issues, we propose introduced an update change impact analysis here. This approach enables us to evaluate all impacting changes and dependencies in advance. As a result, we can expedite our update decision and enhance reliability and availability of our development and testing phase. So I'll explain about the update change impact analysis. The basic idea of this update change impact analysis comes from this study. Can we trust test to automate dependence updates, a case study of JavaScript that's issued in 2022? And its implementation, it seems to be a Swedish word for update. If you Google DTRI, probably you can hit the IKEA furniture, try it later, and talk about objectives. That study calculated the test coverage for direct and indirect dependencies for more than 500 projects, and it claims test coverage and its ability to detect the defects is insufficient in most projects. And the study analysis proves to be approximately twice as effective than test. And its implementation update error does analyze semantic code changes and call graphs to identify the impacting changes. So inspired by this work, we have developed our own version of update change impact analysis system leveraging all this open source tools to evaluate our in-house cloud application implemented in Node.js. So here shows our architecture. Our prototype is implemented with Python script and with Jenkins pipeline. It consists of five stages, update simulation, semantic change detection, call graph construction, reliability analysis, and change history mapping. In the first stage, update simulation stage, we set up a pre-update environment and a post-update environment. Here shows the setup interactions of pre- and post-update environment. Both environments are required to build from application source code. First, we launch Node.js in Docker container, then obtain the application source code, then set up the reproduced exact dependencies described in the package log JSON, including the application repository. We use NPMC command here. Then perform update. In the second stage, we extract the semantic changes between pre- and post-update. Here shows how to find only the difference in the program codes to identify the package version difference. We analyze the package dependencies and then compare the versions of the same package, like this, same package, and we identify the package version from including the package JSON file. If we detect the updates, next we find modified program files. It's essential in this step to exclude documental files, like license or MIMC down, and the two specifiers include only JavaScript.js and .ts files. During this inclusion, we avoidably retain test and sample code due to their fire types, their same time file. You can't imagine, but the package code contains a lot of test and sample code. I think they are necessary, but actually they are included. Finally, we use diff command to check the difference. After identifying the difference in the program code, we use Gumtree to extract functions that have semantically changed. Gumtree is a syntax of a diff tool based on abstract syntax tree, ST. It also has academic origin here. We trace, sorry, Gumtree passes source code into ST expression like this, then identify the locate and locate the difference, some components that are deleted or updated or kind of that, different ST and look at the difference in the ST level. We trace these location information to specific line number of functions, then compare them into the list of changed functions. Here is an example of diff output and Gumtree output. You can see the difference in the comment, commented section or documented or just style changes. They are ignored and semantically, semantically change, only semantic change was identified by the Gumtree. In the third stage, we construct the call graphs. So to do this, we use Jelly. Jelly is a static analyzer for constructing call graph for JavaScript and TypeScript on Node.js platform. It is based on these below academic studies. Jelly is a tricky tool, so to prevent out of memory errors, we create multiple call graphs by splitting the dependent package to be loaded. First time, we ignore dependencies to create baseline call graph. Then after the second loop, we load each dependent package and we merge them later. Here's an example of Jelly's HTML output. If you zoom on the center part, you will see this kind of this. The rectangles represent packages and modules and circles represent functions. And arrows and lines represent package dependency and function calls. We use this. The call graph data included in the HTML art is like this. There are JSON elements, so many elements that represent packages, modules, functions and call. We use them as a nodes and edge in a graph salary term. So the next stage is reachability analysis. We use network X here. Network X is a famous Python library for manipulating complex networks. So we already have fully update call graph and post update call graph and change function list here. So we load there all JSON file and all JSON into network X and link changes function to call graphs like this. Library A is updated and some function is deleted. And the library D adds a new function here. And the library C updates a function like this. And finally, analyze reachability of changed and connected to the application. Networks will then identify the impacting call graph. The last stage is the change history mapping. This is our original part which wasn't mentioned in the previous studies. To begin with, we need to get the source code of each package. Since the installed package don't come with this information, not have it get the information, so we use the NPM command, NPM view command to retrieve the repository details and then proceed to clone it. The next stage is selecting the correct tag that corresponds to our package version. This can be challenging since the version number like 1.2.3 is not always much. It doesn't match the tag name. For example, 1.2.3 is a version number, but tag names come with start with V or dot replace with underscores or kind of this. In case of monorepo, it's a kind of consolidated multiple repository, one repository style. In case of this monorepo, the tag might include organization, organization prefix and package name underscore and underscore then have a version number like this. So it's very challenging and difficult. So we use multiple similarity algorithms such as Engram, Guest Out, and Ravenstein Distance to identify the most similar tag. But unfortunately, some repository don't have tags nor don't have the version information in the commit. I can't believe so in this case we gave up. After successful retrieving the south scan and identifying the correct tag, the next step is to map the package module, module in this consistent file of the JavaScript to source file. In simple case, it's straightforward just replace the top directory from package deal to ground deal. So when dealing with package module that bundled or transpiled or minified, this means merged into multiple files into one file or trans-coded type script to JavaScript or compressed by removing unnecessary white space or line breaks or kind of so. And if the map file is available or provided in package deal, we need to use source map decoder. This to help us restore the original file's information includes source file's path. It's the last step. We use Git log with revision range and options to identify the commit from function name. It's amazing function and Git identified the function name, the commit from function name. But if above command doesn't work, this command doesn't work, use line number range instead. So finally, we collect these changes information into one Excel file by open PyExcel. Now, let's look at our experimental result. So we conducted three distinct use case to understand the impact of package update. The first is update one dependent package that has no dependent, this simple case. And the second case is update one dependent package, but that has a lot of dependence. The third case is most of the update, update the most of the updated package at the same time. The first target, the one package is Axios. It's a popular HTTP client for Node.js platform. It's not an application, it's a library. But we updated its dependent package as follow redirects from 150 to 153 from the minimum requirement of the Axios to the latest version. Its follow redirect is a small and just one file package that has no dependent. Here is the Axios result. By updating the follow redirect from 150 to 153, this indicates that five semantic function changes are detected in the following redirects. And these five changes affect 646 call paths in Axios. It's large impact. So let's track these five changes by change history mapping result. Most of the changes were related to newly introduced function here. Here is buffer. This type of change appears appear when refactoring introduce a new fundamental level of functions. The causes of the impact were consolidated into two commit. There are so many changes, but they're triggered by just two commits. So developers will be able to make an updated decision after carefully reviewing these just two commits here. It's a first result. Let's move on to the next case. Here's an Azure sample application, Node.js docs, Hallow world. We updated Express here from 471 to 473. 471 actually has a vulnerable dependent packages in it. And the last one, 473 has resolved. So before looking at the results, a brief description is expressed here. It's one of the most widely used Node.js web application framework, and it has over 20 dependent packages. It said it's minimalist, but it's quite a large package. Here is a Hallow world result. After updating the Express from 471 to 473, 543 semantic function changes were detected. However, no impacting changes were found. It's small Node, but it's important. We investigate post-updating environment, and it really has 76 packages and almost 200 modules, but actually in this application, the call graph, they're not so used. Only about half of the file were used in the application. So in this case, updating would not cause any problem, or no need to update, we think. So let's move on to the last case, Azure SDK Restore. It's another sample program from Azure. In the last case, I'm afraid you don't aware it, but here it has a pretty old package log JSON file here. So with this as a pre-update environment to build, and then update all outdated package at a time. Yeah, here shows the result. So yeah, in this case, more than 40,000 semantic function changes were detected. And post-updating environment, sorry, in case that just includes a post-updating environment, it really includes 55 packages, but actually the call graph used just 20 or so. But the call graph shows the code that's actually the code was pretty small, I think. So the reachable changes were 69. And surprisingly, the 69 changes affect just one call pass. So we investigated the trees. This application simply calls one function here, the public catalog functions. It was the heart of, here, the rest public catalog functions, core functions, and the public catalog and the dependent packages, the dependent packages here were updated at the same time. So let's look into the change history mapping results. We found more than half changes due to the removal of one package of an intermediary API, one package of an intermediary API. And as you're quarterizing, it's for removal. So other causes, one committed for 1432 is narrowed down to 10 commits here. So it's very helpful for developers that on just only 10 commits need to be reviewed, despite such large update changes. Also, I'd like to mention that all changed file lists here is distindix.js. So, yeah, it's a transpiled file. So we also probed that our source map decoder worked well in this case. So let's summarize this my talk. Yeah, yeah, she has so she shows key takeaways. So SBOM or software build of materials, management is crucial defensive measure against software supply chain attacks, but remember SBOM alone aren't enough to make software update easy. So our update change impact analysis revealed the following insight. The package dependencies in Node.js applications are very complex and many of package and the source code, but many of the packages and source code included are actually not used. By analyzing these semantic changes in the source code and the call graphs, only the changes that truly have an impact are identified. Furthermore, by associating these changes with Git log use histories, only the key changes that should be focused on are pinpointed. So we believe UCIA update changing cost analysis would expedite vulnerability response planning and accelerate development and qualify the scope of testing and verification. Last but not least, all these analyses can be achieved by using open source tools. So thank you for all these developers and contributors. The last slide, the future work. I talk about the future work. We have a lot of work to improvement our system. It's a one example, but I spent optimization, mapping is important. We use many open source tools, but they have a lot of, we found some background or they fix some tools. So we'd like to feedback these fix to the community. And now we are in development. We are preparing the support of goal language and the Java language, and also we'd like to add some dynamic analysis integrations because we'd like to enhance quantum behavior analysis. Also, we are now migrating our UCIA application from Jenkins to the application running on the Kubernetes. And the last is not yet determined, but we'd like to publish our result as an open source. That's it. Thank you for listening. Do you have any questions? I guess it's partly a comment that you could actually use similar kind of analysis on one from distribution packages to applications on top of them, a similar kind of mapping, different tools, but similar mapping of concepts to core changes. So it's got a lot of broad applications. So well done. Thank you. Thank you. This idea is pretty capable for all programs, I think, basically. But theoretically, yes, but the source code, we use the NPM package in this presentation, but yeah, our NPM source package and the built creating, yeah, of course, called graph by C tags or kind of like this. So it depends on the program language. Our key component is called Constructor and the Gamutria Syntax 3-based DiffTools. If they support your language, it's theoretically applicable, I think. Oh, thank you for your time.