 Yeah, I'm going to be talking today about supply chain security, specifically JavaScript supply chain security. This conference actually has the open SSF added and a bunch of other tracks that I'm sure if you've gone to, you've heard about this topic quite a lot. And in the last year or two, you've probably heard about this topic quite a lot. And yeah, if you want to, I think that link works. You can follow along the slides or go get them yourself. There's lots of links, lots of content in here. Feel free to follow along. And yeah, so a little bit about me. I see a lot of familiar faces, but I'm sure there's some folks that don't know who I am. My name is Darcy Clark. I've been an engineer or in software development for over 20 years. I was the former staff engineering manager for the NPM CLI for the last three and a half, almost four years. Left GitHub in December. Also built a company called Themeify. It was like a commercial WordPress theme company. Did lots of consulting with all those big brands that you see there. If you want to follow me on Twitter, if you're still on Twitter or not, I'm at Darcy on Twitter. So cool. Most notably, I said I worked on NPM, the NPM CLI team for three and a half years. It was part of the acquisition in 2020 by GitHub. So a lot of my experience comes from, obviously, managing a team that supported a portfolio of over 100 or almost 100 packages with billions of installs monthly. Likely, who's run NPM install in the last month? Awesome. Cool. So I'm hiding your dev dependencies somewhere. I'm sure. I didn't write a ton of code. Obviously, I was doing more management, but a bunch of the folks that I used to manage are in the front rows here. So blame them for any problems you have with NPM. Specifically, Luke, who still is on the team right now, if you need. Exactly. And yes, please go to Luke's talk tomorrow. He's going to talk all about how we orchestrate releases and everything with those hundreds of projects. But yeah, so significant impact, roughly 2% of that portfolio, or 2% of all traffic that we saw on the registry was for that portfolio of projects. And if you can imagine, and some of these maintainers, this might not surprise you that I see in this audience, the small team of about four or five supported 2% of the entire registry traffic. So when you're talking about supply chain security, you're talking about the packages you depend on all day, every day. There's just a very small minority of folks that are really doing a lion's share of the work to make sure that we have a secure infrastructure and secure supply chain. So I see Jordan in the back there, too. We'll bubble wrap you later. And for us, and there's a bunch of maintainers in here. Let's hope there's not asteroid that comes crushing down here, so. Cool, so when we talk about this area, when we talk about supply chain security, what are we talking about? And why do we even care about supply chain security? Well, it's really open sources taken over. We used to have 89% of the software we were shipping was closed source, and now we've totally shifted that to 89% of the software that you own in production is actually open source. And really, securing the open source supply chain means we are securing our own products supply chain as well. So it's critical to get this right in the open source so that we can have safe, closed source software as well. And what are we talking about when we talk about supply chain, what are we talking about trust? So let's take a look at the current ecosystem, and this is going to help shape what I'm talking about, especially because different talks, I'm sure, can go into different parts of supply chain security. Specifically within JavaScript, there's a whole bunch of tools. And the landscape has different players in it, different open source projects. There's the runtimes and engines. There's the package managers, the transpilers, and then a whole bunch of other packages as well. And really, this is the layer that you see in MPM. All these categories of projects are the ones that are actually in the registry that we usually consider to be the ones that we actually can interface with and download, install. And today, I'm going to focus primarily on package managers because that's why I know. No big surprise. But we're going to talk about essentially why package managers are so important and essentially what we can do to have more secure package management. So of course, what does supply chain security look like in practice? Well, it's the fancies you have, right? And of course, JavaScript has a lot of dependencies. It is the largest software registry. MPM is the largest software registry in the world. We see over 3 million packages. I'm sure. Does everybody here have a package in MPM? Hopefully, most of you, some of you. And of course, we see a ton of downloads, right? We see 219 billion downloads a month. And of course, you've seen the memes, right? This is the stuff that gets a lot of likes on Twitter. People are sharing this concept that we are a very greedy ecosystem. And obviously, we just consume open source on mass. And I actually look at these. And I'm sad because it seems like it's a failure that we don't really have good observability, right? Like I think a lot of people look at, there are no modules folder in the think that's a black box. And I think that's a bit of a failure on the tooling side of things. And I'll take a bit of a responsibility for that. Of course, a large portion of those dependencies aren't actually things that you've included yourself. They aren't the direct dependencies. I know for us, you've talked about this all the time. In your talks, and it's actually being documented quite widely, the Octaverse report on this back in 2020, it's an average of 683 transitive dependencies and JavaScript projects today. That's crazy compared to the fact that there's only, I think on average, it was like 10 direct dependencies in the projects. So that's compared to all the other ecosystems. It just looks insane how many transitive dependencies we actually see. And if you don't know what transitive dependency is, this little diagram helps you. If this is a new term for you, it's essentially something that you haven't included directly. It's coming into your project because it is a dependency of direct dependency or even deeper in the graph. Of course, the fun stuff comes up with the fact that the fun insights are that these types of dependencies are actually the ones that are found to have the most vulnerabilities or be representative of having the most vulnerabilities is those deeply nested dependencies that you didn't even include yourself have the most vulnerabilities. So roughly same 5% of volums are living there. And you would have seen, I think this stat was used in one of the keynotes as well. It's from Sonotype State of Software Supply Chain from 2022. So we've seen like an increase of over 742% in attacks on the supply chain or open source supply chain. This is what this looks like on a graph. Graphs, yay. It's kind of scary when you think that attackers and people that actually want to disrupt our ecosystem, not just sort of benign vulnerabilities. This is increasing, right? And the JavaScript ecosystem being the largest is sort of ripe for disruption and it sort of has a lot of low hanging fruit in terms of the API surface area that we support. So I don't want to know what this looks like next year, but I'll see you next year and hopefully we're all still here. Hopefully we're all still here. Everybody in this room probably will have a job next year if you take this back to your companies and they're like, I know more about security. And you'll have a job for a long period of time, right? So this is great. Everybody here is gonna switch to be security experts. In terms of GitHub, because I know the Seca system, while I used to work at GitHub, GitHub's own advisory database for the MPM ecosystem was like 2,900 advisories. A lot of those had actually been migrated from the MPM advisory database. And roughly 80% of all dependabot alerts are actually for the JavaScript ecosystem, which is kind of crazy. So we're talking about a lot of compute that GitHub actually gives away for free with that tool. And actually there's roughly 8,000 malware takedowns that now are represented in the advisory database. It's kind of hard to find if you want to know how to actually see those. Let me know. And that doesn't actually represent all the malware takedowns. I know some companies and representatives here know that that's the case, that not all malware takedowns are actually in advisories, which is unfortunate, but this is a pretty large sample size of what we're dealing with. So who feels like this is true? This stat is true. Feel free to raise your hand. Wow, okay. We got some, a lot of ICs in the room. It's true. You feel like it's low. This was the last data point that I could find. It's actually a couple of years old. So you're probably right. It's probably a lot larger than this. And you get a sense that it's becoming unmanageable actually dealing with dependency updates and the fact that you're getting hit with an advisory. It feels overwhelming. And I have some examples of this actually down the line here. But yeah, so 59% chance in the next year, you're gonna get an advisory. I bet you in the next week, you'll probably get hit with something. So what are the actual threats when we're talking about supply chain security? I'll try to move quickly through these because many of these are known, that they get reiterated, rehashed. Vulnerabilities are sort of the easiest thing. Like I did not do something that exposes me to cross-site scripting attack or something like that. malware is sort of the worst case scenario. People are actively trying to write software that is malicious. Typosquadding is fat fingering something and sort of getting socially engineered to download something that you didn't mean to. Dependency Confusion is sort of a very unique type of attack that is seen or an exploit that can be used where essentially the infrastructure that has been created for enterprise, enterprises and corporations to proxy the NPM registry has created this sort of situation where you can potentially download and be socially engineered to download software you didn't mean to in your company. And then there's registry compromised. This is like very unlikely. Does anybody know if the NPM registry has ever been compromised? That you know about? No, no and the funny thing here is that most of you all have essentially a cached version of most packages with integrity checks and can basically tell whether or not a package might have been modified in some weird way. That's why lock files exist. This is why folks that are mirroring the registry can also, you know, if you're writing registry follower, the NPM registry follower, then you can also sort of flag to the ecosystem. We would long hear that, you know, Tarball is being compromised before I think it would ever reach your front door. And then the worst case scenario is account takeovers. So somebody like, I don't know, offers you a beer and then starts publishing packages from your laptop or is able to do some sort of session spoofing or something like that. They're able to actually get your credentials published as if they were you. How can we mitigate some of these? So GitHub and NPM have been trying to do a lot of work here. I can't speak to any initiatives in the last four months since I quit in December. But there was definitely a focus on doing more active scanning. I know a lot of companies in the ecosystem have been doing active registry scanning for malware and helping to report it back to the NPM registry. There's also new sort of technologies in terms of the AI models and behaviors that they're seeing within the package contents. The key here is like focusing on package contents for malware. It truly is like the source code doesn't matter as much as the actual thing you're about to download. So focus on contents is key here. And of course, like surfacing this back to the canonical registries is important. Typosquadding, the ways we can mitigate that is definitely through heuristics. So known patterns or dark patterns to sort of socially engineer, you can imagine pluralization or prefixing or suffixing. Package names is an easy way to essentially socially engineer packages that can be downloaded by accident. And then so finding those patterns, creating sort of heuristics also based on potentially downloads or package authors, having good sort of heuristics around this can create tools that we can help prevent typosquadding attacks. Of course, policies and enforcement, just how do you configure the tool that's installing the thing to make sure it's not actually downloading a typosquadded tool. In terms of how we mitigate dependency confusion attacks, it's really tough. The key here is own a scope. That's what we've told people for years. Own the public scope in the MPM registry if you're going to have private packages that you're publishing to an artifact three or Nexus instance, you should always own the scope so that you don't get exploited and always make sure your MPM RC file has to config for that scope to be proxied through. In terms of registry compromise, we already have the tools to essentially check this. If you're not using lock files for some reason, I don't, I'm sorry, like I don't know why you're doing that. But that's the key here is we store that information about packages into tar balls, or sorry, packages into lock files about the tar balls. And we do integrity checks today that should prevent any kind of like registry compromised asset. And account takeovers MPM did a ton of work here enforcing mandatory 2FA. A lot of folks are not happy about that, but it does sort of raise the bar in terms of potential account takeovers. Also improved sort of login experience with modern sort of web off end. Login was implemented into the MPM CLI itself. So if you ever type MPM login as a V9 and beyond, we've defaulted to that web off experience. So now you can get thrown to the website, put in your credentials or use a yeah, like a web off end experience in the browser to essentially log in or publish when you're publishing as well to authenticate. So those are sort of the known sort of attack vectors. I'm sure you've even heard like many times over the five or six categories of exploits that I talked about there just now. But there's a whole bunch of other areas that are sort of less talked about. Specifically the noise within supply chain tooling is a huge problem today. There's a lot of confusion as well about how dependency graphs should be represented. There's some obfuscation around APIs as well. And just generally there's a real lack of tooling to help you deal with advisories, to help you deal with updates. And in terms of standardization, there's also a pretty huge lack of standardization around package management. And the very last one actually, there's a ton of immutability when we're talking about installing software within JavaScript and it's a huge problem. Gonna take some water. So this area, essentially immutability and non-determinism, it's something that, I'm not sure if a lot of folks in this room have dealt with, but it's likely that you've installed a package which has a lifecycle script, a post install script that executes and modifies files on your system in some way, which is different than the artifact itself that they existed. Might even reach out to the network to fetch something and it is inherently not the same thing that was published that is now living on your system. So I thought I would share a quick example of just sort of immutability and non-determinism within the ecosystem. Who's ever used Create React App? It's okay. It's okay if you're not a friend and developer. You can. And not to pick on Create React App, like great project. This is after initialization of, I think I ran MPX Create React App. This is what your dependencies will look like in your package JSON. So you get seven direct dependencies added to your project when you're starting a new React project. And so you're like, okay, great, that's awesome. So what happens if I decide to use one package as manager or another? Well, to actually install those seven direct dependencies with Yarn, I actually get 1200 dependencies in my graph. So you go from seven direct dependencies now to an inflated number of transitive dependencies to fully realize the dependency tree there. And also you start to see this crazy pattern, which is none of these numbers are the same across all these package managers, right? So this is a huge problem. The representation of that project, the package JSON, to PMPM is different than Yarn. It's different to MPM. It's different to BUN and Deno, these new projects that are also doing sort of seamless no install experiences. And so there's a difference of almost 850 dependencies between these package managers. It's pretty insane, right? So this is where the lack of standardization about how we resolve your dependencies is a huge problem and probably a bigger problem to supply chain security than folks might realize. So you're probably like, wait, what the heck is going on here, right? And I actually shared this insight on Twitter. And I'll show you the responses there in a second. But what's actually going on here is that each package manager has a very different understanding and different capabilities about resolving your project, your dependencies in package JSON. So it depends on whether or not they're going to install, like the context is key, but also their understanding of development dependencies, optional dependencies, peer depths, any overrides or resolution features that your package manager might have. Life cycle scripts as well is going to create some immutability and potentially change the number of dependencies installed. So this is a lot of, these are a lot of features that each package manager can take a different stance on any one of these and create a different representation than one or another. So any kind of like security tooling or anything that's using one representation may be false or invalidated if you decide to use a different package manager tool the next time you install your project. And so I said, I posted this on Twitter and some people are like, what the heck? I think ZB was kind of like, okay, but are they fast? You know, like it doesn't matter as long as bun is like blazingly fast, it doesn't matter which packages I get on my system, right? I know Wes is in this room. Okay, but some of the people are in this room so I'll let them take a picture. So some of the responses were really funny, like to be like, oh, like what the heck is happening here? I think Jordan, I didn't put your response, but you know, he was digging into how I got that, this data and it was a bit of manual work to actually get the out of the caches for Deno and bun I had to go into the caches to go get those results. But yeah, this is pretty interesting. I think Wes has come in specifically, so Wes sitting over there is super interesting. You know, he's saying, and I think very sarcastically, because you know the answer to this, right? But not everybody here might know. Well, what would happen if I ran that same package manager again on the same package JSON? What would happen? It would be pretty funny. Well, there's actually this like, statement. And of course, when we talk about supply chain security, there's been a lot of talks about reproducible builds, probably hermetic environments, hermetic builds, and they actually, that comes back from this Heraclitus of Eusephorus, I think. The Greek philosopher said, like no man ever steps in the same river twice. I like to say that no package JSON installs the same river twice, and that's a hipster from San Francisco. It's specifically full stack. Probably using Versailles. But it's true, it's an unfortunate fact that MPM install is not consistent. It's not reproducible. And part of it has to deal with time, and it's very philosophical. Are you the same person you were today that you were yesterday? It's a bit philosophical, but also at the same time, there's real world problems with these tools. So how can we avoid that worst case scenario that as you continue using the same tool, you're actually getting different results over time? So MPM install, if I use it twice in a row, like how do I ensure that's the same thing always? So the key here is, at least with your dependencies, you should be avoiding sort of mutable references. And so this isn't very well documented, but things like distribution tags, so that specifier like at latest or at like a pre-release or something, where there's a distribution tag that lives within the MPM registry, that's kind of considered like a release channel. And that thing can update all the time, right? It's not a specific version that you're sort of opting into. Same with remote tarball URLs. You can actually specify a package as if it's an individual file. And that thing has no kind of registry integrity value that is being stored back in the registry. So this thing can change over time. And same with Git repositories, obviously the references there aren't sort of locked in time. And in fact, we don't store back an integrity value at all for Git repositories. Specifically, we don't store back integrity, sorry, value for repository dependencies in your lock file. So key here is just be mindful that like these references are mutable. And actually this morning is that not all the package-mended data in the registry has been validated. So how can we sort of get rid of this mutability or avoid it? Use lock files, plain and simple. Understand what's in those lock files. So know what those values are, what can and can't be checked for integrity, like I just said with Git repositories. And then you can actually use time travel, which is pretty cool. For any registry dependency that exists, you can actually lock in time the manifests that you'll fetch and the versions that you'll fetch from the registry that you're configured to. So this is a feature that's in the NPM CLI and it essentially creates a way to have a subset of the versions available. This kind of gives you a way to sort of lock things in time unless the NPM team has like removed that dependency for like malware or something like that. This should be a pretty foolproof way of making sure you sort of get almost a reproducible build. And of course, if you're a robot, just cash and bundle everything. Basically, once you fetch it once, just hold on to it. Yeah, yeah, this is a little robot that's coming back from the future to tell us what we should be doing. And it's still running .NET in 23. So I only have about 10 minutes left, but I want to also quickly look at some of that current tooling and the state of that tooling. Because I know that it's not super easy to understand what's going on. And yeah, sometimes you can be afraid, you might be afraid to ask me how to do certain things. You might be afraid because of this. So here's another example. I'm again using create react up again, no shade, but this is a fresh install of create react up and I already have six high security vulnerabilities. I generated this today. So anybody starting a project with the tool, unfortunately they're starting off with some vulnerabilities already in the project and that's unfortunate. But it's great. NPM has just showed me post installation and actually I ran audit again just to make sure I wasn't going crazy. But I ran audit on the project and it showed me that there were six high security vulnerabilities and it told me how I can fix them, right? So it says here we go, run NPM audit, fix force. So I'm like, sweet, I'm about to do that. Luke, what's gonna happen? Say it again? Okay, and we got 84 vulnerabilities. So this is really unfortunate, right? So we have this tool, it tells you that there's some problems. We say you can go fix it running this command and then you end up with what? What's the factor? Does anybody, quick mask? What's the factor you go for? Oh yeah, we've got all kinds of new dependents, new problems, right? So, and what's even more probably furious for end users is we tell you to run NPM audit, fix again. Right? And you're just hoping, it's like, go back. No, like I want the old set, right? Right, so this is really frustrating. We got people constantly complaining about this. And this is the state, as of today, of this project and these tools. I like that everybody in here is laughing because I know that you all experience this and it's actually, it's sad, it's not a good situation. There's some companies really trying to do great work in this space though. They're trying to innovate. GitHub obviously has, dependabot, men has renovates, sockets, as for us as in here is trying to do great work in here. Trying to obviously improve the state of security insights, these advisory tools. But I do want to note some red herrings and sort of issues obviously. Red herrings in terms of the NPM audit tool, that's a great example. Oh my gosh, like I do this thing, I get another thing. Some of the advisory tools obviously, like NPM audit is a good example, are giving false positives. They're creating noise and we're doing all this work, we're creating all this noise in the guise of, well, a false negative would be worse, right? Like that's the, we want to tell you about everything and even if we're wrong, it's better than if we happen to not tell you about something. Like Sbombs is, I'm sure you've heard the term, software build materials are in the JS ecosystem, we've had lock files for a long period of time. Just be mindful that some of the standards don't map one to one with our ecosystem. I've been able to generate Sbombs from GitHub's new tool that don't actually match reality, which is unfortunate, so these are things that might, may or may not provide value to you, but you already today sort of have a ledger of the software that at least within JavaScript's ecosystem for your project, you should have a ledger and an index of the packages that you have in your project. This is another one that's a bit concerning, there's a lot of work that goes and a lot of focus that's gone around sort of artifact signatures and it's like you really care about the package contents more than anything else and I know that people care very dearly about provenance, but yeah, MPM did ship a feature here if you wanna validate the integrity of the signatures, you can do that with MPM audit now, so there is a tool if you do care about these things, I think the focus is a little bit misplaced. In terms of scorecards, brands badging, this is something that every organ needs to, I think way against the policies and sort of be concerned about the biases, definitely have a zero trust mentality when it comes to security, don't trust me, don't like validate everything I say, don't trust like the big orgs, definitely look into these things yourselves. And just in general, like watch out for pancias, like there's a lot of companies that are telling you they fix all your problems, S-bombs are gonna save us, it's not true, there's not one like silver bullet here, there's a lot of different tools, a lot of different things we have to fix. So with the last few minutes that I have with you, I wanna talk about the future state and some solutions because I know this is being a little bit dark. Don't worry, there's hope. So insights, I mentioned before, there's companies that are actually doing some great work in terms of bubbling up what's inside the contents of packages, creating great metadata that we can use, that's something we've been missing from basically the dependency graph today to be able to build great tooling around great data and insights about packages. Of course we're also, I think as an ecosystem, we are really needing reproducible installations. And the only way we can get there is by removing our need or sort of crux on the mutable sort of installation process with post-install scripts. And so we definitely need first class, what I've been pushing for is first class support for package distributions. It's an idea that has been around for a while, it's something that there's been some support for or pushes for with yarn. At NPM I put together an RFC for this. This is theoretical again, this is just a spec that we've been putting together, but this would create a blessed way to have essentially per environment or sort of per condition sort of variants of packages. There's also been some great work in terms of runtime isolation with the experimental permission work that went into Node. I think, when did, do you know when it landed? Okay, so it's been around for a while and then same with the experimental policy API which Bradley unfortunately isn't here did a lot of great work there a while back. And so we have these two great features within Node today that you can go try out and see what it looks like. This is very similar to what Tenno has implemented with sort of policy enforcement of the runtime. NPM has an RFC open for permissions and I would love to see per package permissions. I have a screenshot here of, has anybody ever written a web extension? Yeah, couple. Web extensions in the web manifest file actually has like permissions and scopes like baked into their web manifest format. I would love to see us eventually adopt something like that to make it easy to sort of provision at the package level. And so that's what I think we're trending towards and I hope we get to. The last thing I wanted to sort of talk to John, I don't know only have like one minute or almost over is sort of introspection. Last year we did something really cool. I put together this spec for what is called like the dependency selector syntax. It's basically lifted CSS syntax that makes it super expressive to write queries against your dependency graph. This is a set of examples of what that looks like. If you've ever written CSS, you're probably like, holy cow, like I know how to find all the react versions easily, you know, you can also do really interesting things like finding your peer devs, whatever that is. The attribute selection is especially interesting because you can essentially query for any kind of attribute that lives on in your package JSON. So you can see how with more information into the dependency graph, we could actually do really cool queries. And this landed, here's some more examples of exactly what you can do with this syntax, but this landed back in MPMV 816. And big, I should say big applause to Roy and Luke for work on the MPM CLI that got this done. So here the very last example is a query for all the life cycle scripts. Programmatic, you can actually use this today. Like if you want to go write some interesting queries, you can actually query it just like you would in the DOM, right? Like in, you want to write some JavaScript to query your HTML documents, you can actually query your dependency graph with the query selector all, which is really cool. And if you want to, you can use the command line MPM query. So this takes your node modules folder and sort of says, let's stop letting this be a black box and do some investigative work. Noble selectors, you can do some interesting things with Sember, we've created like some pseudo selectors that are really cool, not yet implemented, but that were specced out are things for CVEs and CWE selectors. Also, our seed today is a query support for MPM audits. So that crazy noise that you see today which might not be relevant to you could go away very quickly by filtering out sort of the things that you need. And lastly, with validation or sort of with the query selector, you can write really expressive policies, right? So imagine ESLint for enforcing installation policies of your dependencies. So key, imagine a world where we eventually standardized package resolution, similar to the browsers did with HTML5. Imagine we can have an amazing query syntax that allows you to expressively write and traverse dependency graph. You would not feel like you're in the dark but actually like you could find the things you need. And so I really truly think that that's the way forward. Obviously this talk is a lot about the accuracy of our dependency graphs and securing them. We definitely need more standards. If you stay with a zero trust mindset you're gonna be a lot safer than you were coming into this. Please share any discoveries that you find. If you're gonna use the package manager today please use MPM as far as I know, the most accurate. But yeah, that's it. Thank you. I think I have mad time so I don't know if we have time for Q&A. But yeah, I can talk to you guys. Any immediate questions I guess? We might have like five, 10 minutes. Okay. Does Arborst have a method? You showed the query selector all. Does it have a matches where you could give it and as, yeah go back. Wait, no, Arborst API specifically. Like you see tree query selector all? I'd be interested if there was like a, you have a reference to a node in an Arborst tree that you could say does this match my query selector? Does that API exist? If we standardize it can exist. Yeah, we can definitely do that. Yeah, the hope towards the end here with the talk of standardization is that over the next year I would love to bring a lot of this work and like tooling to the OpenJS and to standards. Anything else? Any other questions? Yeah. Did you try to fix React after another time? Did I what? Did you try to run npmod to fix-4? Yes. Do you know the result? No, it's- You go back to the first one. Yeah. Yes. No, no, it's actually intermittent. We actually, I think there's potentially a race condition as well. Yeah. Yeah, so that's another issue, truly. Like no package install is the same place or no npm install is the same place. Anything else? Cool, thank you so much for coming.