 Okay, we are hoping to be able to start with a couple of minutes from the university office. Okay, thank you. So my talk is like similar experiences. The first talk will also be a tiny amount of overlap that will help me with researching the same type of topics. The main difference is that my analysis, basically we were in the line during the last three years or so, we were exploring all these dependency tissues we are facing with a number of different programming language package distributions for JavaScript, for cargo, so we have like 37 different package distributions and basically what we're hoping to learn and to understand is whether the same type of dependency issues are actually being faced to a larger or smaller extent in the different packaging ecosystems. So everything we did was basically only by looking at the package data to actually dive into the source code, aesthetic code analysis or dynamic code analysis, because at this scale it's not really feasible at all. All these projects, all these research is also part of another project, a faster project. It's basically a project that is more than going on parallel, it's not the same process with four different functions. The talk will basically, so if you want to read much more details, based on five presentations that we have published in the articles and conferences and journals. So okay, you have already heard from Gustavs that there is also different dependency issues, this is basically a petition. One of the problems is what actually we find is technical lack. Basically you have updated dependencies and because of this you are missing opportunities to have new functionality, you are missing opportunities to benefit from bad fixes or security vulnerabilities. On the other hand, maybe you don't know how to update the code, it's too costly, it's too much, it works. We also heard in previous talks the notion of dependency health. Basically you might have too many driving dependencies and even worse, too many dynamic, constant dependencies and as we will see later, this constant dependency graph can be really, really high. So it's actually not even manageable to start looking at all of what's happening in all of your transient dependencies. Another problem is that you might have, if you actually decide to upgrade, you might have broken dependencies, backward incompatibilities. Sometimes you might have different dependencies that you cannot start to install together, which is a great problem. Some of the reasons of these dependencies are actually more social in spirit because maybe this, well, we depend on a package, that's not updated, why? Because basically there is no maintenance anymore for this package. And then there's other problems that I think was also mentioned, it's for example, you might have the license problems and also it's not always clear what the update policies of all the packages are different. So basically, I have a different number of things. For some of these things that are still available, for example, for the licensing, there's probably many more tools that I'm just mentioning. One, the Tyredlift, which is one of the stars, as far as I know, has this for detecting other license problems. The transient dependency problem is already mentioned in two previous talks. So I would not go as far as saying that it broke the entire internet, but it could be like a little exaggeration. But actually, it is indeed a quite huge problem. If actually we started out, we saw the actual effects of the NPM incident on only the NPM ecosystem. So only all the NPM packages that were existing at that moment in the NPM dependency graph. And basically this was, it had an impact locally in all the NPM libraries as 5,400 packages were directly or indirectly affected by this single package. Of course, all these packages were also used by external clients, by websites, and there, of course, the impact was much more which was inside the ecosystem. It was 5,400 packages. Which is pretty big. This corresponds to about 2% of all the packages at that time in the ecosystem. There was a similar incident that was much earlier on, but that was very helpful for the, which one is this? Sorry, I forgot the logo of this. It's Rubigenz. Rubigenz in Iraq, of course. In that case, there was one package that was used by another package that was in use. In that case, it was a similar problem and broke 5% of all the packages. So basically what we think is needed is like tools that also help you to assess if a particular package that is transit will be dependent upon downstream, what would be the possible impact that this, any change, any breaking problem, any security vulnerability might have on the rest of your ecosystem. So we're presenting the next slides, some analysis. Basically, to get all of the data for the different ecosystems you are studying, we are basically using the libraries of IEA, dependency management service. They have lots of data, metadata for all packages, all dependencies, all releases of your time, for all of these different package managers. So of course, we didn't study all of them. There's a little bit to emerge. And also, we only wanted to study those for which we found that the package dependency metadata was sufficiently precise. So we actually manually checked because for some of them, like for maintenance and for IP, it was not reliable to use this data, but for the others we did some checking to find out what it was. So as a first study, we actually looked at seven different package managers for seven different programming languages. You can see that most of them are static, interpretable for five programming languages. Some of them are pretty old, so we have C-Pan and C-RUN in the 90s. Some of them are pretty new, like cargo, only created in 2014. Also, in terms of number of packages that were available, in this case, the study was not this, but the oldest study in April 2017. We tried again today to see what has changed, but it cannot get in range from national packages for cargo to 460,000 packages for NGF. Today, it's already more than 1,200,000 packages, I think, for NGF. I didn't check. For dependency, it's even much more like 13 million dependencies, so it was quite a challenge to do analysis. So first thing we experienced, plus if you look at evolution over time, how fast are these different ecosystems growing? So we looked at the growth in terms of number of packages, number of dependencies. You can see the growth here. It looks like OK growth, except that this is an exponential, sorry, 20,000. No great big scale. So in most of the cases, we found exponential growth in terms of number of packages, in terms of number of dependencies. And like the worst student in the class was MPM, which is actually growing much faster than the others, and actually here you can see, this basically is an outgrowth compared to the differences now, but now, in 2017, between MPM and the others. Kind of growth as well. It was very small, because it was just born in somewhere in 2014. It's growing much faster. It's actually now, it's already higher than the growth coming up here. So it's interesting to see how this will continue to evolve over time. That's basically just the growth, because we have an exponential growth in the size of these package dependency networks, which of course make the analysis also more challenging over time, especially if you want to store these constant dependency networks. For a number of updates, we also checked how frequently our packages are being released, how many new package updates you have for existing packages. And basically now we saw that for half of the different packages there is an update within two months and average. So that's quite frequently. And the younger packages or the more dependent packages, the more packages we have on this package, the more frequently it will be updated. Then another thing we want to be trying to analyze, this type of network that is being formed basically is actually dependency network. We form a kind of complex network. Basically, we find evidence of this horrible behavior everywhere. So for example, there is a minor like a small set of packages that actually receive all incoming dependencies. Like 20% of all required packages cover 80% of all the incoming dependencies. Because not all. Let's start all over again. This side. One thing we wanted to know, so we already mentioned before, that this problem of transitive dependencies between the different packaging and the different packaging dependencies So one of the things we wanted to know is can we actually measure this potential transitive impact that's a change in a single package we have on all the other packages in the ecosystem. So if you remember the previous slide for NPM, the impact of NEP was 5% of all packages within the ecosystem. And then time were affected. Also it was 2% for NPM left path and 5% for the active record in RubyChance. So basically here, what we try to measure is say we focus on 5% and we start counting over time how many packages and a particular point in time will if they break, at least 5% of all the other packages in the ecosystem. And then we can count how many packages these are, here we see the case for NPM. So it's growing over time, also than the others. Like in 2016 there were more than almost 250 packages that had an impact of at least 5% of the entire ecosystem. Until somewhere in March 2016 was happening here and basically the left path incident and then it's going down a little bit and then stabilizing again. So they apparently learned from mistakes it's going down and then it's again stabilizing. I don't know what happens now but my book is not going up to faster but it's still like 2 hours of making it to higher than all the others. I'm interested to find out what really happened for Carol because then we can also see that it started very low and it's not becoming the second one. So maybe this is also a problem for Carol as well in terms of transitive dependencies. What we try to explore is why is this happening, what is actually going on. So we are looking at a transitive dependency because it was impossible because of computing resources to this are all possible packages in the ecosystem so we only looked at the top level packages which is those packages that don't have any incoming dependencies from within the same package manager. They might have crossed lines but not within the same... not all of line with the dependencies. In that case, basically what we can see is that over here for example the dependency depth for more than 50% of all packages they get a dependency depth of at least 6 or more. So this means a dependency of dependency of dependency of dependency of dependency of dependency and so on. This is really the worst case. Also a very deep level of transitive dependencies for MPMX also quite high. You can see it here. So depending on which ecosystem you take some might be from Mavic, Tango, MPM and maybe Nougat others like Steven and Sibram they are much more... well-behaved. So then we have problem of outdated dependencies which I already mentioned before what should you do? Should you upgrade to the newest version so that you can have it from that fixes security fixes dependencies from new features? Or should you keep your old version because it's easier for you and you don't want to introduce making changes but you might have a follow-up of that. Difficult to say but we actually also try to analyze in this case for MPM to find out to which extent the fact of being outdated the package being outdated depends on which dependency constraints you are using in your packages. Basically you can use and here you can see it in different colors. Basically I think industry model is a strict constraint you can see that about 1 out of 3 packages is specified strict constraints in their dependencies which basically means that it's not possible to benefit from minor upgrades or patch upgrades of your dependencies but if you mindfully modify your dependency yourself. So this is a pretty high percentage that might be one of the reasons why you have lots of outdated packages in MPM. So to actually so one of the things what we are currently doing is proposing this new like measure constraint work which comes technically like is a way to decide to measure to which extent it can be outdated because it has outdated dependencies. So I will give you an example of this suppose I have this package which is one of the packages in MPM that has 3 direct dependencies and 1 in direct dependency with both of these dependency constraints strict constraints here current constraints here and because of the constraints this particular version can only depend on the version which is taking these constraints 2.7.0 so it's missing 3 versions so because of this it's outdated but you can actually compute the outdatedness in terms of time if the outdatedness here will be 4 days because the most recent version is 4 days younger and this one for this version which is in direct dependency you're outdated with how much is this what you can see like 6 months or so so transitively you are outdated with 6 months for direct dependencies only 4 days so if you would only look at the transitive dependencies it's okay sorry if you would only look at the direct dependencies it's okay if you also look at the indirect ones it's much more complicated this is about in that case you can also you can measure if you are outdated in terms of time you could also do this in terms of number of versions you're behind but you could actually do the same for works to find out how many if I upgrade in terms of could I reduce the number of works in my current component because my dependencies and works are good at reducing the number of modalities if you want to know more about this I presented the technical light frame art yesterday in the ChaosCon conference which I think there is also a video about it not yet but it will be a little soon to some extent some tools are starting to also provide similar support for example if you know the dependency manager ladies for npm it provides you with things about okay and up to date I got a lot of dates and we have new dependencies it's not as far as I know it's not supporting indirect dependencies only direct dependencies it's not really a measure of how up to date you are another study we wanted to do is to find out to which extent art is different ecosystem supporting semantic versioning I hope I don't have to explain to you what semantic versioning is so basically semantic versioning if you would use a strict non-caves constraint then you are conforming semantic versioning because you assume that you can upgrade any minor or patch version the problem is one thing you have to compare different ecosystems but the problem is there is not really a single unique way in which is different is different expressions depending on the ecosystem it might have a different interpretation and so basically semantic versioning combined might be something different depending on which you are using for which ecosystem all of these in reds are more restrictive than semantic versioning all of these in green are more permissive than semantic versioning and everything white is semantic versioning so based on this we try to analyze for different ecosystems are these different ecosystems increasing in semantic versioning compliance over time and we saw that it was clearly in case for cargo it's almost going to 100% for MDM like 80% or so and for packages so that's actually a good sign RubyGems is not supporting semantic versioning at all the blue card is much lower than but we also found that there are still a quite high percentage of restrictive to restrictive constraints, print constraints which would actually be reduced to have more semantic versioning then you have of course the version of security vulnerabilities so which depending on performance with normal abilities is one of the top 9 application security risks by the robust foundation so there we did a small analysis to find out to which extent to quantify to which extent security vulnerabilities in packages may affect dependent packages this was a smaller sphere of study and access to lots of vulnerabilities so only 399 vulnerabilities in 269 packages but in total for the dependent packages this affected like 55% 762,000 dependent packages we also know how long does it take to discover a vulnerability basically here you can see that after 2.5 years 30 months you can understand the vulnerabilities that are not discovered how long does it take to fix a vulnerability basically after 1 year and you still have about 20% of the vulnerabilities that are not fixed yet how long does it take if you have a package that has a vulnerability that has been discovered and fixed how long does it take before your dependence will actually also fix a vulnerability because maybe there will be a fix available in your package but the dependent package is still an older version that still has a vulnerability that basically we also have analysis but just look at the blue curve this is all the cases the percentage of situations in which you have a dependent package you package that hasn't known for the ability that has been fixed and in the dependency it's not yet fixed in actually the number of cases we studied in 1 out of 3 packages there was still a vulnerability available in the dependent package so it's quite high and this was actually surprising to us why could this be the case one simple reason could be that all of these dependent packages basically they are not maintained anymore there is no one to actually make a fix to a fixable ability that's one of the reasons the package is no longer maintained it could be that basically you're simply using two restricted to hard to fix the problem maybe you cannot do it because there are incompatible changes or maybe you're simply not aware there is any vulnerability because you're not using any security monitoring tool so there can be many different reasons it might be a combination of all of them so of course what we definitely need to do is use monitoring and update tools for monitoring and updating vulnerable dependencies we already heard those used mentioned there is NIC there is a Java script with title GS something reasonable there is of course also the tooling that in the project is being developed which could be used for this and another one that is similar to this that I heard about yesterday is very new to me is English study which is also a combination of static and dynamic analysis for Java and Python which is also promising a project that is putting a package dependency so to conclude there is still a lot of work to do and there is also quite a lot of challenges especially by the thing that was just mentioned before if you want to detect a vulnerability that really matters how can we avoid false properties and false negatives really use challenges with these transcave dependencies how can we actually do this the transcave dependency method becomes so huge that it is really a lot of resource consuming we have to take into account semantic versioning but semantic versioning is just a policy you can follow it but how can you be sure that you are actually following it how can you detect a change it is actually a breaking change to do this you have to do a behavioral analysis which is very difficult it is actually undecidable ok that's it interesting study did you find any particular package manager that has better than the others the ideas that you think others would follow yes and no so one of them was really better than the others and we will go back to the beginning slides most C run but why is it better basically the main one is C run it is always better than the others but there is a specific reason for this it is basically because you are using a rolling release policy and actually you always have to rely on the latest version of all the packages you need to know that your package will automatically after a while be archived so basically you are putting the burden of the dependency management all the dependence of the package not on the package itself this is actually like a good policy to have an ecosystem level to have a good ecosystem in terms of all these metrics of course it's much more effort for people that depend on the packages so yeah I don't know what's the best way should you put all the effort on the maintainer on the package these are actually two different sizes two different sizes the points of C run is very good from that point of view but it's actually we also did a study for actually comparing all the packages and actually lots of people are moving away from C run and we are just saying there's too much effort here I'm just going to put my package computer everyone who wants to use it just download it and install it there but I don't have to take into account all of these constraints because it's too much effort for the dependency has an excellent so actually we are just shifting the problem to somewhere in the stand on the slide we put like the list like tools yeah like the development management activities this one do you know if they all support like indirect dependencies no most of the tools actually I do not still have many tools maybe a good study is doing it taking into account transitive dependencies that is not this was not recently supported so I think for this one it is supported to some extent but this is brand new so I cannot check this so that's the main implication today there is quite a lot of support and good support for direct dependencies there's almost no support for indirect transitive dependencies but it's only at it's not like a more fine-grained level where it also has a different dynamic very good that kind of thing transitive dependencies basically if you don't have that kind of tool how do you check it how do you learn it because these have like 10 dependencies it's like 2 hours yeah of course so if we stop the story actually how in terms of the last question of the previous presentation how can it actually happen as we have transitive dependencies I mean for many cases where in the beginning at the moment you decide to start depending on a package basically this package has zero other dependencies itself you say okay I'm going to depend on it it's safe because I can manage it then after one year you see that this package is starting to depend on others because I'm not great and there were cases where we found that even without you knowing about it certainly after one year what are your dependencies doing in there? do you have any devices to also for these things you potentially have tools where you can say I want to limit my transitive dependencies and you would have actually basically a tool that measures the amount of transitive dependencies because the dependency depth you're facing and you have some warnings where you say here watch out you have either too many direct dependencies for a too deep transitive dependency does that kind of do exist? yes I know it so for people that are interested yes we're building some tools for to get it it's especially designed as an open source of placement for your normal, your white sort of blackback which commercials that have these kind of capabilities so one of the things that we identify at this point is where I'm not a black player at all where is the company that they like in the blackback place and they scale basically we have thousands of years that I could create this problem so we decided to just make it as far as I know but it's possible so we need support I'm going to go