 Ευχαριστώ για την συμμετέρα του Δημιουργία. My name is George Goussius. I'm a professor at the Dutch University of Technology in the Netherlands. You can find me on the internet using this tag here, on Twitter, GitHub, or any other social media of your choice. What I'm going to be talking about is basically a project that we are currently running, and what it tries to do effectively is to scale static analysis at the ecosystem level for programming languages. We're in a room about dependency management, so I don't need to explain a lot what dependency management is. So, well, we can basically, on any programming language, we can have a dependency to a package, and this package can have its own dependencies. And, you know, dependency to dependency to dependency, if you're a computer scientist, you immediately understand that this forms a graph. And the graph has certain properties. For example, it is version, so every time that there is a new version of the package, we have a new version of the graph. The graphs are pretty big for common languages. For example, for Maven, it currently has around... Maven repository has around 3.5 million artifacts, and an artifact in Maven is a package plus a version, so just in terms of nodes, it's pretty big, and also the number of connections can be pretty big as well. What we have been seeing lately is that such graphs are also very fragile. One thing that basically triggered much of the research community, at least, to do research into ecosystem, happened in 2016, when somebody removed something from NPM, a very small package, just 11 lines of code, and in response, the whole Internet died, basically. So it was very, very centrally to the NPM dependency graph. So there was no resolution that could work for many hours and there was quite a bit of trouble there. The aquifax incident is also related to dependency management. More recent cases of failures, the event stream incident where somebody injected a code to steal Bitcoin from user wallets, a REST client that happened like two, three months ago, if I remember correctly. Again, it was... it is a REST API client for Ruby and somebody injected code to get login information and posted to another URL and so on. The list goes on every week. There is a new failure that we need to account for. People have been doing research on that front. There have been quite a bit of interest lately. Tom will be surely presenting a lot of his own work on that front, but I have some numbers here. So we have done some work in 2017, just before the left-pad incident happened and we started looking into the dependency graphs and we found that the average JavaScript project had 54 dependencies. Some people replicated our work this year. This grew up to 80. And I heard, I think it was Jeff today, that raised his number to more than 100. So the average project on GitHub or the average package actually on NPM has around 100 dependencies growing very, very fast. Another thing that happens is that, you know, those ecosystems change in very high rate. So we have done... in our own research, we found that from the transitive dependency closure so all the dependencies that we get by doing package resolution and so on, 50% of those change over the course of six months. Why? Because new package versions are released. So we, you know, our transitive dependency closure, even though we don't change our client, the code that we include in our client will change. Okay. So it is very important to have an understanding of what's going on in the periphery of our dependencies, not just in our immediate dependencies. We also found that... Now, there exist packages in ecosystems like RubyGems, for example, that if we just remove them, and I can tell you which those are and which versions, around 40% of the ecosystem will collapse. So we'll have another left-back incident. Another dark side of dependency maintenance is that very few packages, sorry, a large number of core packages is maintained by very few people. So what we... those researchers, Zimmermann and Tolle, have found is that just 400 people maintained basically the 10... 10,000 most important, more central packages in the node ecosystem, in the NPM ecosystem. Now, from the consumer side, from the developer side, it is very difficult, developers complain that it's very difficult to assess basically the impact of an update. So it is... there has been research there, because there is no tooling, so you can basically update, but you don't know what code you're going to bring in first, and you don't know what the impact of this code, of this update, will be in your own code. Okay. So what we... Raoul Koula found in his research is that 85% of the dependencies in Maven are outdated, and 50%... in 50% of the most important packages, so the top packages in terms of incoming connections. Even more alarmingly, 70% of the dependencies have some kind of linking to a dependency that has a security issue. Why is that? Developers said that they were unaware, basically, of these security... those security problems. And as a result, there's basically vulnerabilities that proliferate in ecosystems, so there was a technical report by Comcast in 2017 that showed that one fourth of all library downloads actually include the vulnerability in the transitive closure, not necessarily in the library itself. And one third of the top 100K sites have a vulnerability dependency. They include a vulnerability dependency in their JavaScript code base. I'm pretty sure that those numbers are a bit alarming. So if we aggregate all the... what researchers basically have found the last two, three years that this has been very active, we can see... we can find... let's say, four basic problems. There's the observability problem. We don't know whether a new dependency was released and what is the impact of this dependency. And this is also encoded into the update problem. So if I update, what will break? The compliance problem, there was a whole discussion in the morning about that. How do I know that I'm not violating anyone's copyrights? In any case, how I know what a distributed code that is compatible with the license is? The license applications that I have. And there's also the trust problem. I have all the data. I have my precious data. How can I trust my precious data to code that somebody else has written on the internet? Somebody else might be a company, but this doesn't actually give any assurance there. There's no assurance. Now on the maintainer side, if you maintain a library, we have again, let's say, the maintenance problem. So if I update, let's say, a function signature, or a function, how many clients will I break? And that's a result because we cannot answer this question. What happens is that people keep stuffing into the dependency. So the dependencies grow bigger and the code bases become really big. There is also the lack of incentive problem. This is a meta problem, I would say. How can we ensure that maintainers that do such an important job are basically adequately funded by the clients, by their clients, even though those clients are not explicit, there is no contract in big companies. So why should an open source developer respond to issues that is coming from big companies is that they are not contributing to the development? The state of the art independence management, well, what people do is that they either lock their dependencies so they don't pull in updates. This might be good, but it also helps in, well, not being able to update against security patches. Another thing that emerged lately, the last couple of years, is that we have dependency checkers, bots on GitHub, especially, where that read, let's say, the package descriptors, and know that there is a new update in a specific library that is included in the package descriptor. And then they propose either a full request or in case of GitHub they have a message in the repository. But this is not necessarily a panacea because, well, the analysis level is very high level. Just because we include something in our transitive dependency set doesn't mean that we actually use the code. So there are lots of false positives. And beyond that, there is not a lot beyond package version management. We don't help the maintainers to maintain those dependencies. There is no support in making decisions on which libraries to include in our dependency set. And there is also no support in assessing updates. This is the state of dependency management, as at least is being encoded by researchers the last two, three years. Well, we believe that we can do better than that. To get to the root cause of the problem, what we see is that while most dependency management is being done at the package level, and the analysis is being done at the package level, the actual use of the dependency happens in the code. So this might look like a dependency tree or a dependency linear dependency here, in that case. But the actual dependency, usage of the dependencies, might indicate, for example, that this part of the code is not used at all. So if there is a security bug here, we might want to update, but it's not necessarily important. We believe that this is the root cause, that the analysis that we are doing currently is very high level. So what we propose is instead of doing analysis at the package level, to do that at the call graph level. What is a call graph? A call graph is a graph, as the name says, that links function calls across the project. Okay, this is as simple as that. Of course, it's a bit complicated to do static analysis in all programming languages in order to get call graphs, but we are working towards building tools that will allow us to do that. So if we have such a call-based dependency network as we propose, we can solve quite a few problems in one go. Perhaps not all the problems. We cannot solve the trust problem, for example, because this is a larger and meta problem, I would say. But many of the problems that we are currently facing dependency management, we can do something about those, at least. One of them is, for example, does this vulnerability affect my code? Because we can have a direct path from a vulnerability in a function down to our code. So if we find a path in the call graph, this means that we are affected and we should update. We can do a more precise impact analysis to help the maintainers. So we can say, for example, well, if I change the, if I remove a function argument from a function call, how many clients would I break before I release my update and the clients actually see what I have broken them. So if I break lots of clients, perhaps I might make the decision to not release this update, perhaps do a function overload. So what we are effectively trying to do is to augment the soundness. By soundness in static analysis, we mean basically the ground truth with more precision. So make the ground the actual reuse, not the actual use, the analysis of the reuse, more precise. By changing the unit of analysis. We have done this, an initial prototype, which we presented last year at Fosden. We have made basically built call graphs for 70% of all the cargo packages. Cargo is the package manager for us. It was a very precise attempt to construct these huge call graphs. But the problem was, in our case, the call graph generator was very in a bad shape back then. We're working on it currently. But it still didn't allow us to show the full potential of the approach. But still it was a very promising prototype. And based on the ideas that we have, we designed and acquired some funding for the Fasten project, which is like a big European project with seven partners across Europe that aims basically to implement the pricey technology or idea, in that case, for java, c, python and rust. And on top of the actual call graphs, we can do various analyses that will be offered as part of a package manager. And those analyses can be, can i safely update, for example, security, build propagation and so on. So let's see how this whole thing works. And I can tell you where we are and where we're heading to. So we get updates from all those repositories, the four repositories that we support, java, debian, pyapi and cargo, in a streaming fashion. So we have streams of new package releases. We have a call graph generator. Those are basically per language. For java, there are quite a few that are already there that are high quality. For python, for example, we are developing our own because there is no tooling. For us, we are also developing our own. For debian, we are using existing tools. This gets into two databases. The metadata database, which is basically based on Postgres, will be based on Postgres, actually, because it's not fully implemented yet. We are also building an in-memory, high-performance graph index that will allow fast reachability queries on top of huge graphs. If you take Maven, for example, we are expecting, because it has already millions of packages, package releases, we are expecting billions of nodes and perhaps hundreds of billions of edges. So it's very important to have a custom query layer to do propagation analysis. Then we have this metadata database. We augment with data that's coming from sources that are part of the community. One of them is clearly defined. We're going to have a talk about that afterwards. Another one is from gh-storing, which is a tool that collects all data from github, and we do some analysis on top of that. And we also try to collect vulnerability information from open sources. SNIC is not currently an open source, but for some reason I have the logo here. The other ones are open, so we are trying to collect this and basically annotate the appropriate location, either a function or a file or a package with information from all those sources. Another thing that we're trying to do is try to come up with a bill of materials, which was a very hot topic today, at least in the meeting rooms that I was, and basically get information about that and annotate our metadata. And this we do by building the actual packages. And on top of our two databases, we have a custom query layer that allows to combine data between the two. For example, if you want to get a call graph for five dependencies, you can give it a dependency tree and it will try to get individual call graphs for all the packages and stitch them together in order to come up with a global call graph. The analysis sit on top, of course, the query language and there is a REST API that will allow the package manager basically to get information from all those sources. So the streaming sources, which is basically updated as new packages come in, will be available also to the public. You can take a look at the CodeFeeder project relatively soon, hopefully in a couple of months. Now, the question that arises is how we identify a function uniquely. So what is a node in our project? In our project is basically a function. So how we identify a function uniquely across everything. We design the custom protocol to support that. We have a grammar to validate, of course, those URLs and so on. The core technology that you are using, we call it call graph stitching. What happens is that what we want to do is to only build a call graph per package version. So usually call graph generators, what they do is that they build the whole project first and then they try to explore it, to come up with nodes and draw edges between the nodes. If you scale that up to the Maven scale, for example, what you will be doing is analyzing the same thing over and over and over again. Because there are lots of packages that are reused. Lock4j, for example, SLF4j, projects like that. Alright, so what we do is instead of doing that, we analyze all packages per package, at the package version level, sorry, we create a call graph for that. We annotate the extension points and then we load them and link them together. Okay, that's on request. And we call this process call graph stitching. This is how our call graphs look like. So we have designed the format to basically exchange call graphs, get the information from the call graph generator and put it into our metadata database. But this is part of the open data that we'll be releasing relatively soon. So it records information about the class hierarchy, the graph itself, which is basically an adjacency list, and information about the products that we are analyzing. Now, how can we use this? One particular pain problem that I have described is basically how we do dependency updates. So the prototype that I will be showing here is not based on Fasten currently, but it will be based on Fasten in its immediate release, next version. So what people usually do in order to do dependency updates is that they use dependent boot. This is the default tool on GitHub. It was actually acquired by GitHub a couple of months ago. Pretty successful, 2 million pull requests and so on. What depends upon similar tools in order to ensure updates is that they run the tests. And they have some, in the description of all those bots, is that you run your test switch and this assures you that the dependency update will be successful. Who thinks that this is a good strategy? Only one hand? What do you think? Why is it not a good strategy? It's not testing transitive dependencies. Is it testing direct dependencies? Are we testing direct dependencies? What do you think? We don't have to think a lot, also in the interest of time. We have done this for around 500 projects. What we did is that we measured function coverage, and by function coverage we mean if there is a function that links into a call to a dependency, does this line get executed while we are running tests? So we instrumented the jvm while the tests were running and we tried to find, from all the dependency calls, from all the calls basically, how many of those are executed during a test run. In the case of direct dependencies, actually it was not that bad to our surprise. It was around 60% of direct dependency calls are being executed by tests. But if you factor in transitive dependencies, it's like only 20%. So all the paths basically that lead to transitive dependencies only 20% are basically being executed. Not a great scenario. Because most of the updates will happen, statistically will happen in the transitive set, not in your direct dependency set. So what we do is that we take two dependency versions. We do a source level astd, using a tool from Spoon, yes. What Spoon allows us to do is to come up with a list of precise changes at the function level. So what we get is information like this if statement and change the condition from x to z, something like that. So we get a very detailed list of changes. Then we build the call graphs, and then we try to see whether from the functions that we have into the updated dependency set there is a direct path back to the developer, back to the project that we are analyzing. Okay, this is basically some easy form of reachability analysis. In order to test this idea, we took some pull requests from Dependaboot, very fresh ones. We have a project called jhtorrent that collects data live from github, so we mined very recent pull requests from Dependaboot. And what Dependaboot tells you is that there is this version here that updated from a... You only did a minor... It's not even a minor. It's a patch version that changed. So theoretically you wouldn't expect a ton of changes. This is the comment that we actually inserted into the Dependaboot pull request. What we can see here is that, yes, it is indeed a patch version that changed, but we found that 773 functions actually changed, or 84 of which are actually affecting our own code. Okay, and what we do is that we have information about the paths that were affected, and also basically the paths and the functions that... a sample of functions that allows us to... Allow the developer basically to see what the problem... Where the problem in their code is. And, you know, we have also done some... A bit more research. We tried to evaluate, let's say, whether... It's the name of the tool that we have developed. It actually detects the changes. In terms of tests, it was around 40%, so we introduced artificial changes in the tragedy dependency set. In the case of tests, it was around 40%, the detection rate, in the case of data, it was around 90 something percent. So it's way more precise. This part here is basically represents reflection. So if you do reflection, we are lost. Okay, that's simple as that. Anyway, my time is up, so I wanted to show you somehow how the fasten would look like in terms of developer workflow. Think about this update scenario that I presented, being integrated into PIP. Yeah, let's send it there just to give an example. Yeah. So if you want to know more information about the project, it's here. We are running a course survey about dependency management. If you want to help us, this is the URL over there. And we will be around for questions and so on. Thank you. Yes. We are doing call graph for Debian binaries, yes. Using C Scout and SVF. The example you had mentioned was that you could use the call graph level dependency graph as a false positive reducer for impact analysis when we make a breaking change. So you can tell if you're actually using the part of the library. Did you also do that when you were analyzing whether or not the downstream packages were testing their dependencies in terms of finding them at the bottom? No, but that's a very good suggestion, basically. We submitted this specific paper like a couple of days ago. Yesterday when I was putting the slides together, it just crossed my mind, so we didn't need it. The result? Yeah. I mean, we could add this information in the report that we, in the comment that we do in the dependable request, and it would be very helpful, actually. No, it's actually based on call graph. We don't do that because we expect the compiler to be able to do that in compiled languages. So if you compile with a new version, there's an API that breaks, the compiler will tell you, but that's valuable for languages that don't have a compiler. But we don't do that currently. Yes, at the moment, yes. Usually what you do with static analysis is that you tend to over-approximate, which will generate some false positives, but for use cases like security, for example, it's better to have false positives than false negatives because it gives a wrong impression to the developers. So the direct answer to your question is that we don't have any features. We basically let it explode. The biggest challenge is that call graph generation is not a solved problem. It's actually an unsolved problem, to be precise. Because you can never be 100% sound by definition. So the better the tooling, the more insurance we can give to the developers. So languages like Java, for example, the call graph status there is pretty good. Languages like Python or JavaScript that even works basically, that are completely dynamic. There is so much that we can do with static analysis. Yes, so one plan we have is to, because we are doing that on the massive scale, to actually run the tests in all the projects. And in many cases, the tests allow you to, allow projects to execute reflection paths, paths that are affected by reflection. So we will take those edges basically and put them into our call graph. So our call graph will be richer by running the tests across all projects. And we also plan to actually crowdsource the test running. So allow people that have the, use our tooling basically to upload part of the call graph that is not in their own code base, back to our database. There is so much we can do with reflection as well. There have been people that are doing that in Java with tool kits like Doop, but it is not extremely scalable. And also it relies on heuristics.