 So, hi, we're going to talk today about the mystery of the disappearing NPM package. So, hi, I'm Bella Wiseman, I'm the lead engineer for open source at Goldman Sachs. I'm also a second generation woman in fintech. My mother was working in fintech before the term was coined, which means that I grew up hearing about things like detecting cycles and garbage collection algorithms and funks and algal if anybody recalls what that is. So, and other fun things like that. I've been at Goldman Sachs for 10 years, architecting and building and scaling systems across areas like regulatory reporting, OTC derivatives, and most recently open source. In terms of languages, frameworks, technologies, I grew up a bit on Java. I've done some work in Scala as well. I enjoy lots of new technologies, new frameworks. And I've enjoyed watching the JavaScript ecosystem in particular mature over the last decade from a point where testing was an afterthought, if you did it at all, where the only architecture anyone really knew about was MTC to today where the ecosystem is really vibrant, really like just such a great community around it and such a great community to be a part of. And that's what this talk today is going to actually be about. So, the intersection between the wonderful ecosystem of NPM and JavaScript and open source, specifically open source license compliance. So, let's get into it. On April 28th, 2021 in JavaScript land in Goldman Sachs, there were several build failures. All of them teams that were using Gatsby.js in their software. So, what is Gatsby? Gatsby is an open source framework that lets you use React for your actual coding, so React.js for your coding, but then generates static HTML so that when you're deploying the site, you get a faster time to load on each page, giving your customers a faster browsing experience. Goldman Sachs uses Gatsby in a few different places. Two of them that I would call out here, whose builds were actually impacted by this issue. One of them is developer.js.com, which is our external facing site that enables access to Goldman Sachs APIs, Goldman Sachs documentation, consoles, related tools, awesome site. I actually recently built a blog into it, so go and check it out, developer.js.com. And our internal documentation site as well, NChub, was the build, not the site itself, but the build itself was impacted also, and that's an internal documentation site for developers. So, what was the problem and why were the builds failing? So, there was an NPM dependency called SmartWrap, which was quarantined and blocked and inaccessible to builds. And since these projects actually depended on SmartWrap, they couldn't build, so that was the problem. And the reason that they were quarantined and blocked was because the license, the open source license, was GPL 2.0. So, now we will dive into a bit of a background at open source licenses. I am not a lawyer, this is not legal advice. If you take this as legal advice, you are doing it at your own peril. And this is obviously also not entirely accurate. This is an oversimplification of the very long topic of open source licenses, which our legal team could probably talk for several hours on across many, many days. So, this is just a brief overview. But broadly speaking, there are two types. There's permissive licenses and copy left licenses. Permissive licenses are growing in popularity. You have things like MIT, Apache 2.0. These are very popular and used in large enterprises for the reason that they're very permissive. Once you use that open source software in your code, there are very few strings that come attached to that and very few obligations. You're basically not allowed to sue the maintainer. They're disclaiming that it's high quality, you have to do your own due diligence. But other than that, there's some other clauses as well and any lawyers in the audience will probably tell me that I'm vastly oversimplifying it, which I am. But that's kind of MIT, Apache, the permissive license side of things. On the other side, you have copy left licenses. Copy left, try to play on words on copyright. We are, if you are creating a derivative of that software, and again, the word derivative, I'm sure you can talk for hours upon, and you distribute that derivative, and again, that's something also you can talk for hours upon. You then come under certain obligations, usually to actually open source the resulting derivative work. Because of that obligation, there are many large enterprises that limit or do not allow the use of GPL or copy left licenses in their internal source code. Goldman Sachs is currently one of those organizations. So, digging into the actual dependency tree, we have Gatsby, which dependent on Gatsby CLI, Gatsby Recipes, GraphQL, Tuskema, Value or Promise, Change Set CLI, TTY Table, and SmartRap, which is GPL. Notice here that this is eight levels nested down, right? Notice here that it also happened from one build to the next, which we'll get to in a minute, right? We take open source compliance very seriously at Goldman Sachs. We introspect our licenses all the way down, and this is what was actually causing the build failures. So, what changed at first? The answer seemed to be nothing, right? There was no policy change. Teams would reach out to us and they would say, hey, maybe did you start blocking GPL 2.0 recently because our builds just started failing? And the answer was no. GPL 2.0 was always blocked in general at the firm. So, there was no policy change, right? And then in turn, was there a code change? And the answer that the teams gave us was no. There was no code change at all. The build just worked an hour ago. It just worked yesterday, whatever the situation was. There's no commits in between. It just randomly started failing from one moment to the next. How could this be possible? And when I looked into it more deeply, I looked at SmartRap. SmartRap didn't change its license. GPL since 2017. So, that wasn't the culprit either. And that's how we got to the mystery of the disappearing NPM package, right? What exactly was going on? What happened? So, to solve the mystery, we need to understand a bit about NPM versioning. So, NPM uses semantic versioning, x.y.c, where x is the major version, y is the minor version, and z is the patch version. So, when you actually run an NPM install minus minus save with the package, it will, by default, and this is configurable, by default it will install the latest version that's available, and it will install it with a carrot in front. So, if the latest version is 1.0.0, it will install it and put it in your package.json as carrot 1.0.0. So, what does that carrot mean? Why is that putting that into your package.json? What is it going to do? Well, what we'll do is something pretty interesting. It will actually say that from here on in, 1.0.0 may be the version you installed right now, but when you run build subsequently, this is actually compatible with many versions. Specifically, the carrot means that any future minor version would also be compatible with this. So, if you have carrot 1.0.0, it means 1.1.0, 1.2.0, 1.2.1, 1.13.9. All of these would be compatible with what's in your build.json. So, and just as a side note, if you would put in a tilde, tilde x dot y dot z, either manually or by configuring MPM to do that, that would allow auto-upgrades of the patch version. So, just the z, x and y stay the same and z can get upgraded automatically. So, this is awesome. I have a lot of experience in the Java world also, and in the Java world, this is not done. I mean, it's possible, but it's really not considered best practice. In the JavaScript world, this is the default. This is what everybody's doing, and it comes with a lot of benefits. You can run an MPM update. You can upgrade all your dependencies to the latest minor patch version, without really having to think too hard about it. And this lets you focus on writing your code and not upgrading dependencies. And as we all know, from a security perspective, you want to keep your dependencies up to date. So, this is a good thing. But, your builds may not be reproducible if you don't follow certain guidelines, which we'll get to later. So, specifically, if you run an MPM install on your build, what that will produce actually can change. It can change based on whether there are already dependencies there on your build server that were cached, that are compatible with that version range that's specified. It can change depending on when you're running it, right, from one day to the next, depending on what's been released into the MPM ecosystem in the interim. Right, and also, like, just from a developer experience perspective, the results that you get on the build server might be different than what you have on your machine. And specifically, there's no real record of what exact dependencies were used for each build, right? So, you know that it was, like, something in this fuzzy range, but what exact dependency did I use is an unknown if you don't follow certain best practices. So, you might say, I am going to follow those best practices. I do not like fuzziness. I do not like ambiguity. I want exact versions. So, I am going to put every single thing in my package that Jason is going to be an exact version. And while that sounds great, that's actually not going to solve your problem, because your dependencies in the JavaScript world are almost certainly using the current notation or some other approximate dependencies. And when you resolve your transitive dependencies, right, your dependencies, dependencies, those dependencies will be resolved with the same kind of fuzzy logic that we mentioned previously. And an example of this, surprise, Gatsby, right? So, let's take a look, let's speak Gatsby CLI, drill down into Gatsby recipes. So, you can see this is their package that Jason amongst almost every single one of the dependencies uses the current notation, right, very common. And specifically we have GraphQL tools slash schema uses 7.0.0. So, let's take a look when I dug down what I found, 7.1.4. Patch version, 7.1.3, 7.1.4, there's a small patch. Adding in a new dependency, value or promise to manage async values and, you know, make things prettier on the code side. So, if you would look at it at first glance, it looks like a match made in heaven and perfect because GraphQL tools schema is an MIT license, right, permissive license, value or promise, also an MIT license. Seems like a match made in heaven. However, not quite, right, like we saw before. Five levels down, you got smart wrap. Smart wrap is GPL. Oh, we've solved our mystery. On April 27th, right, Gatsby did not rely on smart wrap. On April 28th, Gatsby did rely on smart wrap. There was no change in Gatsby, and there was no change in smart wrap, right. The only change that happened was that GraphQL tools schema added a new dependency on value or promise. And value or promise had a pre-existing dependency on smart wrap, which had been, like we mentioned, GPL since 2017. So, now we've solved our mystery and if this was, you know, a detective story, we might say that this is the end. But, at least I'm a developer and I think most of the people here are engineering, you might actually have some additional questions besides for just what happened, right. And that would be things like how did you actually fix your build? How can I stop this from happening to me? More importantly, I have a release tomorrow. How can I make sure this nine build is not going to break before the release tomorrow? And if you maintain an open source package, you may want to know, like how can I make sure that my open source package doesn't have changes that have unintended impact for teams. So, the first point that we'll take a multi-pronged approach here is that you don't want to be taken by surprise. So, changes to the dependency graph are inevitable. But you want to have some level of control over when they're actually going to happen. You also want consistency between your machine and the build server and you want reproducibility of their build. And that's important from a security perspective as well, the ability to reproduce the build and understand what's happening during that build in a determinative fashion. So, there is a solution for this. And it's very simple, it's lock files, right? So, you have package lock.json, pnpmlock.yaml, yarn.lock, and older versions of npm shrinkwrap.json. If you're still using shrinkwrap.json, you may want to upgrade. It's probably past time. So, what does a lock file do? It specifies the exact version of each of your direct and your indirect or transitive dependencies. So, you want to generate them. That would happen by default with most of the popular JavaScript build frameworks. And you want to commit them to source control. That's underlined, that's bolded, that's for a reason. Do it. If you run npm install, we'll tell you, commit this file to source control. If you don't do that, then the file doesn't help you at all because it's not shared amongst developers, it's not there on the build server. Second important step is to use npmci in your build pipeline. So, if you would run npm install, like we mentioned before, you would be plagued by all the issues that we talked about previously because npm install will override your lock file. So, no point in having a lock file if it gets just overridden, right, when as soon as you run an npm install. Npmci has a different behavior. Npmci will actually just check that the version in your package.json matches up with a version in your package lock. If not, it will fail the build. And it will install exactly those versions that you specified in your lock file. So, you have reproducibility. Important point to note, just like we said, generating a lock file doesn't help unless you commit it. Committing your lock file doesn't help unless you run npmci. So, all three things are necessary in order to get you to a reproducible state. Just a comparison of the two approaches. Recommended approach, lock file, npmci. You will need to run npm update in order to update your dependencies. Every change to a dependency will actually have a corresponding commit in the lock file or in the package.json or both. Which is a good thing. It means that you can look back and say, hey, this broke because of this change. That's always a good thing to be able to point at. Npmci will also delete your cache dependencies on the build so that if there's anything there, it just won't use it. And then if you run your build locally, a clean build locally, it will match that clean build on your build server. So, you have that compatibility between the two. On the other hand, if you just go with the npm install on your build, it's just loose. It can change based on when you build, what artifacts were already installed, no corresponding commit, and you can't necessarily reproduce it on your local machine. And just pointing back again, from a security perspective, reproducible builds are like the foundation of any security scanning that you may want to do on your builds to make sure that nothing gets tampered with. So, now moving on to the next approach. Now that it's reproducible, it will happen, right? So, how can you fix it? So, the first thing is a quick fix, right? This is not something that you want to keep on your source code for months or years. This is something you might want to do in days or weeks. But you might want to pin to the last known good version. So, in this particular case, the version that broke the build was 7.1.4. You might want to pin yourself back to 7.1.3. Pinning to an old version is bad for all sorts of reasons, right? So, there's not a long-term solution. But if you did want to do this in the short term, there's a few different options. So, on vanilla NPM, there's a tool that will actually override your package lock.json before every build. That's just one hacky option, but that's out there that can be done to kind of save the day, right? If you're using a monorepo manager like rush or NX, there's support there as well. So, in rush in particular, there's something called preferred versions, which is what one of our teams used was 7.1.3. So, you can use that to, you know, kind of manage the exact versions that you're using. And then the benefits of monorepo has also come out here. In this particular case, if you have a monorepo and you have 25 different packages that are relying on 7.1.4, you can change in one place and then have everybody on 7.1.3 until you figure out what you're going to do about it. So, moving on, in the medium to long term, you actually want to fix the underlying issue, right? Maybe fix this too strongly of a word, but you want to change something so that it will work for you, right? So, there's a few options in this particular case. You might say, hey, you know, you introduced this new dependency. It's breaking our build for these reasons. Like, any reason you might want to maybe back out that PR. That would be one option. Option two is a smart wrap, right? That library 7, 8 levels down, you could reach out to the maintainer and say, hey, you know, would you mind switching over to a different license from GPL, which could work in some circumstances, but it's actually not as helpful as you might think, because changing from GPL to MIT or adding a new license will only help if all the contributors to that project agree to that. And depending on the project, that could be one person. It could be 500, right? So, that may or may not be an option. Option three, in this particular case, the dependency change that CLI actually should never have been production dependency in the first place. It should have been a dev dependency. So, a dev dependency is when you're relying on it only for building the actual project itself, right? And then your dependence do not actually need this to be pulled in. So, somebody correctly pointed out on GitHub that, hey, change the CLI. It doesn't really need to be a dependency. Can you move it to a dev dependency? It's breaking us and the maintainer did that. Last two were actually done on GitHub. Number three, I think is what solved the problem. Hey, can I have some? There we go, okay. So, this is just a timeline. April 28th, the new dependency was introduced. April 29th, someone raised an issue on GitHub and April 30th, the maintainer fixed the issue using the dev dependency fix that I mentioned before and it was pushed out to NPM. So, a pretty quick turnaround in terms of actually getting, you know, getting it fixed out. Now, moving on to package maintenance, best practices. If you're maintaining a package, just some things, so, first of all, be aware that adding a new dependency can actually be a breaking change, right? And even if that dependency appears to have a compatible license, as in this case, right? If you're adding a new package as a dependency, you might want to check and make sure that all the transitive dependencies are also compatible with the license that you're using in your project. Google has a tool called Deps.dev, which you can use to actually look through the dependency graph for, I think, Java, NPM, Python, maybe a couple of others, that could be like a manual way to check. And then you may want to explore different tools for automating that as well for each new PR. So, that's a wrap. The NPM dependency graph is always changing. You can't avoid that, nor would you want to. You want to follow best practices so that you're not taken by surprise. You want to engage with open source maintainers to solve problems and make things work better. And if you're a maintainer, you should be aware that your transitive dependencies matter. Just about Goldman Sachs Engineering. So, Goldman's been around for 150 years serving our clients with excellence. We're also a great place to work for engineers. We have 12,000 engineers about a third of our organization. It's a great place to learn, to grow, and to learn new things. And on that note, we're hiring. So, my team in particular for open source, enabling and encouraging engineers to contribute to open source, many different levels, most locations, amazing opportunity to make an impact, and you can feel free to go to GoldmanSachs.com slash careers to find a role in my team or any other team that you're interested in. Or you can reach out to me on LinkedIn to find out about opportunities. So, I'll just open it up now for any questions on any of the subjects I copyright or anything else. Sure. Yes, hi. Yes. It was like right at the beginning when I had first joined the Open Search Program for six months. This was one of the first things that I encountered and I just jumped on it. I'm like, hey, what's going on here? And actually digging into it was a lot of fun. Maybe because I was kind of new to the team and I mean new to the team which was a new team as well. And I was the one who was like, hey, let me find out what's actually going on here. So, yeah, it was a lot of fun. Sure. Any other questions, comments? Yes. Sorry, maybe a little bit louder. I can say exactly which vendor it is, but it is a vendor product that we use to block dependencies that have unacceptable licenses or vulnerabilities. So, it actually did. It sent a message, but that message still didn't solve the mystery, right? There was a message that this dependency is blocked because of GPL2, but it didn't tell the developers why this happened all of a sudden, why today, why am I depending on SmartRap when they didn't even realize so like the entire dependency graph was not clear on the build message. Maybe that would be doable, but I, you know, right, I mean, I don't necessarily know if I would call it a fix. It would be useful information definitely to have. Like I told you, you know, this is the package, this is GPL2, perhaps providing a dependency graph would be helpful, but it's something you can get at NPM as well. But yeah, thank you. Great point. Sure. Any further questions, comments? Thank you, everybody.