 You know, we're all using open source software to build our projects and get things out the door quickly. And so, you know, it's pretty hard to avoid using open source software in some way. And so this topic about how to secure your software when you're using open source code is super important. So thanks for coming to my talk and hopefully we'll have some good stuff to talk about today. So just a little bit about me. So I'm for us. I started this company called Socket, which you can learn more about socket.dev. And that's mostly where most of the things that I'm going to share today are putting them into this website, this tool that's free for people to use. But I also teach web security at Stanford. And I work on a lot of node packages, a bunch of different NPN packages over the years. Maybe two you might have heard of are standard JS, which is a style guide and a code linting tool and Web Torrent, which is a project that I started back in 2013, 2014, which was trying to make the BitTorrent protocol work on the web using a JavaScript library and WebRTC. And it's been a really fun project to work on over the years and I learned a lot about open source and a lot of things about the supply chain. So that's all that stuff is going to go into this talk hopefully. Now, and then also two other just random facts. I was part of the Node Foundation for a while and then also I helped with an episode of the Silicon Valley TV show, which was a pretty funny experience. They flew me to LA and I got to help give advice for that show, which is hilarious, by the way, if you haven't seen it. Cool. So what are we going to talk about today? So four things. The first is I'm going to tell you a story of a real world supply chain attack with some attack code. And we're going to go into the attack code and look at how it works. Then we're going to talk about why is this issue happening now more than ever? And, you know, kind of what has changed to make it happen now? How does the supply chain attack actually work in detail? And then finally, some tips and thoughts on how to protect your app. But I don't have like a full solution to this, just some ideas and some recommendations, but it's still kind of an open problem, I would say. Cool. So let's start. Let me tell you a story. On January 13, 2012, over 10 years ago, a developer named Faisal Salman published a project called UA ParserJS. This is actually the first commit message from that project. And the goal of the project was to parse user agent strings. So kind of a very simple JavaScript library. Lots of people found it useful. And so over 10 years of steady work, he was continuing to develop a package with help from many open source contributors, publishing ultimately 54 versions over all those years. And the package continued to grow to the point where now it has around 7 million downloads per week. And it's used by 3 million GitHub repositories. So a very huge project. Now let me tell you a different story. On October 5, 2021, on a notorious Russian hacking forum, this post appeared. This is a hacker selling the password to an NPM account that controls a package with, what does it say, over 7 million weekly downloads. If that number sounds familiar, it's referring to the UA ParserJS package. So two weeks later after that post appeared, the UA ParserJS library was compromised and three malicious versions were published. And malware was added into each of these versions so that whenever a user would install them, the attack code would execute immediately. So if we actually open up one of these compromised versions, this is the package.json file, which is the metadata manifest file for JavaScript projects. And you'll notice this line here, the pre-installed script, is basically a shell script that NPM will automatically execute any time this package is installed. And this is a feature of the NPM package manager that allows the author to specify code to run automatically on installation. And as you can see here, it's running two files. The first one, the slash b flag here, is actually telling Windows to start this script in a hidden terminal window to make it a little bit more sneaky and not show the output to the user. But in any case, it's running this file. So let's open up the file, pre-install.js, to see what it's doing. Can everybody see this? Hopefully it's OK. Cool. So what we have here is the file that runs. And the most important piece of it is it does different things on each platform. So the first kind of if statement is for the Mac. And really, it does nothing other than set this variable here. So Mac users are fortunate they didn't bother making an attack for them. Now on Windows, it spawns the command prompt and runs this bat file. And on Linux, it calls this function here, which ends up spawning bash to run pre-install.sh. So let's open up pre-install.sh and see what's in there. So here's the script. The first line, you'll see it's curling this URL here to go and get the user's country. And if the user happens to be coming from any of the countries that they're grepping for, so Russia, Ukraine, Belarus, or Kazakhstan, then it will set this variable. And then if it's set, then it will terminate the script. So basically, if the user is coming from one of these four countries, the malware does nothing. This is pretty common in attack code, where usually the bad guys don't want to antagonize their local law enforcement, so they usually will not attack people from those countries, the countries where they live or where they operate. So this is probably what's happening here, but who knows. And then, so assuming you're not in one of those countries, then it looks for whether the malware is already running on the machine using the pgrep program. So it's grepping for this process. If the process exists, it exits the script. So otherwise, it will download the file from this IP address, and then it will make it executable and run it. So this is the actual malware payload. It's running this file now. And if you can see the arguments here, you can maybe guess what this program is doing. Minexmr.com should give you a hint. It's mining Monero cryptocurrency. So it's basically a cryptocurrency miner to steal your system resources and mining this for the attacker. So that's basically what the payload did. Now let's look at the Windows version. So the Windows version is very similar, but it does one extra thing, which we'll see. So it starts off very similar, but then if we go down here, you'll see it's actually downloading an extra .dll file. And it's also, by the way, it's this kind of funny. It tries to download it three different ways. It tries curl, which is not usually installed on Windows machines. Then it falls back to WGET. And then if that's not available, then it tries this thing called certutil.exe, which is a tool for signing certificates. But apparently also allows you to just download files from the internet as well. So that's handy. So it will download the .dll file. And again, it runs the malware payload. And then also, at the very end, it registers the .dll. And that piece there is the really scary part, because that .dll file actually scans the whole machine and steals passwords from over 100 different programs on the machine, as well as the Windows credential manager where your passwords are stored. So yeah, pretty bad piece of code. And remember, this was added to a project with many, many downloads, like 30 million downloads a month. So it was affecting a lot of people very quickly when it was added. So this was the aftermath. The package was basically published for about four hours before it was discovered. And this is actually pretty fast for a piece of malware to be discovered on NPM, mainly because this was a very noisy attack. It was mining cryptocurrency. So you would notice your computer was getting hot, basically. So it was not very sneaky. So it was discovered in four hours. And this is the post from kind of the maintainer apologizing for basically getting his account compromised. But anyone who installed this package during this time period would have been compromised. And any software builds done in projects without a lock file would have been compromised or affected by this. And finally, anyone who merged a PR to update the dependency would have been affected by this. So anyone who is basically unlucky enough to use a new version, one of these new versions would have been affected. And this is just the tip of the iceberg. So this was just one story. But there's been over 180 packages removed from NPM for security reasons in just the last 30 days. And this trend is accelerating because attackers are basically taking advantage of the open ecosystem and our trusting natures in the open source world. I'll just mention a few other examples that maybe you've heard about. So there has been, in January of this year, maintainer of two packages, Colors and Faker, added code to those packages to denial of service the users of them. And it was a protest of big corporations using open source and not contributing anything back to the community. But this affected a lot of people's projects who were using this. And this is an interesting attack because it was the maintainer himself who basically became the attacker. So it's very hard to stop that. A lot of people like to talk about, let's use code signing to solve this problem. And it's true. It's a valid criticism. NPM should probably have code signing like APT and other package managers do. But that wouldn't help here. They would not solve this problem. And then in March 2022, there was a package that added code to delete data. If it suspected you were coming from Russia or Belarus, it would basically look at the IP address again. And if you appear to be coming from one of those two countries, it would run some code which would loop over every file on your hard drive and replace the contents with a heart emoji. And it would do this automatically when you installed the package. So it was like a protest for the war. Obviously, people use VPNs. There's all kinds of ways the IP address could be mis-detected and innocent people could be affected. So this was a very clear cross the line of this is clearly malware. And then a few months after that, there were multiple other people who maintainers who actually joined in this protest but did it in a more mild way. So maintainers of these packages here, events, source, polyfill, ES5X, and styled components, they added messages to their project to print out messages during the install process to tell people whatever their opinion was on various issues. Very much less destructive, but still probably it's unwanted by most users of the software. And in particular, one of them actually would redirect you. So if you included this code in your website and built it into the final product, then users would be redirected after like 15 seconds. So it was sort of like a sneaky, unwanted behavior added into the web code that would execute in certain circumstances. So these are just a few examples, but there's been many more as well in the news. So why is this all happening now? I think there's four main reasons. So the first one is that open source has clearly won. We're no longer in the days of reinventing the wheel every time by writing everything in-house. So we're relying on a lot of code from third parties. And we're not copying code from Stack Overflow anymore. So most of the lines of code in our apps actually comes from open source. And I think everyone already knows this. But obviously, it's a better way to write software. We're not going back to the world where we don't do things this way. So this is like one big kind of shift that's happened that's causing this dependence on third parties. But also, the way we write software has really changed. So we use dependencies a lot more than we used to in the past. We have a lot more transitive dependencies. And this is no more evident than in this example of if you go to install a React application today, the Hello World that they recommend that you follow is to use this package called Create React App. And if you use that, you will get nearly 1,400 packages just to get the Hello World going for React. So this demonstrates kind of the massive number of packages that are coming in in this new way of writing software. And if you look at one example here, just taking one example out of the blue, Discord is a popular chat application. It's a proprietary chat app that a lot of young people use. And it's an electron application. If you open it up and you look at the JavaScript packages inside, there's almost 20,000 packages. And coming from almost 400,000 contributors, if you look at every commit that's contributed to this. So in a way, it's amazing. I mean, this is like proof that open source works in Discord is basically built on this mountain of open source work. But it's also, obviously, a huge amount of risk for any app that's built in this way. In 2019, there was a paper that came out at the USENIX security conference that found that installing an average NPM package would introduce trust on 79 third party packages and 39 maintainers because of the transitive dependencies. So this is how you quickly get to the number of dependencies. As you install one, you get 79. And so it's not a surprise that you end up with a lot of dependencies over time. So this is a visualization that we made at Socket that demonstrates just one dependency. So this is Webpack, the Webpack popular package that everybody uses. And what's going on here is each gray box is a package and every purple box is a file. And we're peeling back each level of the dependency tree, one level at a time. And so you can see within each package, you have multiple other packages and multiple files. And the size corresponds to the amount of code in each of those files. And this is just one dependency. I can restart it just to, you can see. So this is Webpack, you peel back Webpack. These are all the gray boxes, are all the dependencies inside Webpack and the purple ones are the files. And then if you peel back another layer, you can see more packages underneath, more files. And we were just trying to come up with a way of visualizing like what is going on inside of a package that makes it kind of clear the complexity and this is what we came up with. So the third reason is that no one really reads the code anymore of the dependencies that they use. I don't know if they ever did, but what we're doing now when we use open sources, we're downloading code from the internet written by unknown individuals that we haven't read, that we execute with full permissions on our laptops and our servers where we keep our most important data. So it's like a miracle that this actually works. Like how, why are we doing this? Why does my computer not blow up every time I run NPM install? I don't know. But most of the time people are good and so this isn't a problem, but if even 0.1% of the time you have a bad maintainer or someone getting compromised then this can be a big issue. So I think it's a miracle that this even works today. And especially with like the way that I don't know how many people here use Dependabot if you use GitHub and work on open source projects, but this is a very common bot that will help you update your dependencies. And usually people when they see this, they just say, okay, cool, new version came out, looks good, merge. And every time you do that, you're bringing in code from somebody that you haven't read and so it's mind blowing to me that this works. And the other thing is attackers often publish different code to the package registry that they do on GitHub because if you expect your users to actually, maybe read the code, then they're gonna go to GitHub to read it and they're gonna say, oh, it looks fine and they'll go install it and the code they get from the registry from NPM is actually different than what's on GitHub. So this is actually a very common technique and the package manager doesn't do anything to confirm that the code is the same, unfortunately. So, and then you might think, okay, I don't have to read the code, like someone else will read the code. We can rely on Linus's law where given enough eyeballs, all bugs are shallow. You've probably heard this quote before. But the thing is if everybody thinks like this, then nobody is actually gonna read the code and you end up with, in many instances, malware remains undiscovered for hundreds of days and that's actually what was found in this 2020 paper that did an analysis of this. They found on average a malicious package is on, is available for 209 days before it's been discovered and reported. So that's a long time before people are finding this stuff in the common case. And then another paper in 2021 at NDSS, which is another prestigious security conference found similar results, including that 20% of malware it persists in package managers for over 400 days and has more than 1,000 downloads before it's discovered. So it's a huge problem and not all of the malware is being removed in just four hours like the first example I told you guys about. So these are my reasons why this is happening more and more. Now let's actually talk about kind of how does a supply chain attack actually work and the mechanical level. So I'm gonna focus on six specific TTPs. So if you're not familiar with the term TTPs, it just means tactics, techniques and procedures. It's a common term used in the security community and referring to just sort of like how do attackers do what they do. So we're gonna talk about these six. Let's start with hijacked packages. So this is the most common one that you see new stories about. This is whenever you see a story about some NPM library being compromised, it's usually because of a package getting hijacked. And there's many ways you can have a package get hijacked. Obviously the maintainer could choose a weak password. That's what happened with UAParser.js. The maintainer could give access to a malicious actor. This happened with event stream. Maintainers could become malicious themselves, which is hard to stop. That happened in those two examples there. Maintainers could protest. They could use their package to make a point. Maintainers could have malware on their laptops that steals their credentials. And finally NPM itself doesn't really enforce 2FA. So if you reuse your password on another site, then you can lose your package that way. So then let's talk about the next one, which is typosquadding. So I'm gonna show you two package names. One of these is malware and one of them is the correct package. Raise your hand if you think the first one is the real package. Okay, raise your hand if you think the second one is the real package. Okay, everyone who raised your hand, you're wrong. Yeah, the correct package has the worst name, the confusing name, and this is the malware. And so yeah, people need to think of these package registries as basically a wiki and anyone can register a page that hasn't been registered before. And so that's what happens a lot of the time in these types of attacks is you just register a package and you hope that people will accidentally type it in and you'll get some installs that way. So that's what happened here. So if we open up the malware again, we'll see it uses an install script because that's the most obvious thing to do. Why not just run your code immediately when the user makes the typo? So this is the file that it's gonna run. And if you open up the file, what do we find inside? We find this. So I didn't bother to try to de-office gate this, but all I can tell you is I guarantee you you don't wanna run this on your computer. Exactly. So let's talk next about dependency confusion. So this is similar to typo squatting. This is another technique that is closely related, but the way this one works is when a company publishes packages, sometimes they do it to an internal NPM registry. So it's private to their company. And they may use a name for those packages. They may use a name that has not been registered on the public registry. And so if an attacker later comes along and registers the public name, then the internal tools need to be very careful to always use the internal version of the package and not the public version. But some tools make a mistake and they will say, oh, if the public version exists, use that one instead of the internal version. And so very simply the attacker can register the public version and now they have code running inside the company, like that, right? So this is called the dependency confusion attack. And it's hard to know for sure how often this happens, but if you look through the list of packages that are being published publicly, you can find very many examples of probably dependency confusion attacks happening. So this is just a list of, we were looking through which packages have been deleted from NPM recently due to them being malware. And we found all these packages here which have names that appear to be the names of internal company packages. So like stuff from Yahoo, URID, 18F, Palantir, DuckDuckGo, Shippo, just, you know, and then even more like Wix, Unity, Game Engine, Grubhub. Like people just publish these and then they end up getting installed by internal tools and run within the company. So that's a very bad one. Okay, then the next one is install scripts. We've already been talking about install scripts a lot, but I'm just gonna mention again because most malware is actually inside install scripts and there's a paper published this year that found almost 94% of malicious packages used at least one install script. And unfortunately, this feature has some legitimate uses and so it's actually pretty hard to just disable this feature and have your project keep working. So this is a huge attack vector. And again, like we said, this is kind of how you do it. In this example here, they're actually using three different attack payloads, running them in parallel and we'll dig into those in a second to see what they do. But let's talk about data exfiltration. So this is a very common thing that attackers wanna do. And if we actually, I'll show you these three scripts here, you can see kind of what they do. This is data exfiltration. So this is a very, very common thing that you'll see when you look at the stuff that's malware on NPM. And it's not even tricky, like you can just read it. It's very clear what it's doing here. It's making an HTTP request to this IP address and the host is some pipedream.net URL, which is their site they're using to collect the messages from their victims. And then the data that it's sending is process.env, which is the object that represents all the environment variables in the environment. So it's gonna steal all the tokens and API keys and whatever other stuff is set as an environment variable. And then they don't wanna rely just on HTTP because maybe you have a firewall or something that would catch this or maybe there's some way this would be logged. And so they also often use multiple methods. So this piece of malware also uses DNS as an exfiltration mechanism. So they do it in a clever way. They make a DNS, a custom DNS resolver. So they're basically saying use this IP address as the DNS server. And then they do a lookup for a domain that will go to that server. And then the domain they lookup is, they loop through every environment variable and they make the subdomain be the contents of the key. So basically they're doing a DNS lookup and then the data's going out through the DNS lookup. And then finally, data destruction or ransom is very another kind of thing that you'll see a lot of attackers doing. And I won't go too much into this example because I already mentioned it, but this is the code from that piece of malware that would replace all your files with the heart emoji. So this one is sort of obfuscated, but the most important line is if you go to, later in the file you go to this line here, you'll see that it's basically writing over your files with the single heart character to destroy all your data. So anyway, those are kind of some of the techniques I think are the most interesting ones. Now let's talk a little bit about how you can protect your application. So first let's talk about what won't work. I think vulnerability scanning is a big red herring. It's really not enough to solve the problem and the entire security industry is really obsessed with scanning for known vulnerabilities. And you know, it's fine, it's good to do that I guess, but it's an approach which is too reactive to actually stop a supply chain attack because vulnerabilities can take weeks or months to be discovered and you know, we're merging dependencies a lot faster than that oftentimes. And so there's not enough time for a CVE, which is the officially kind of reported vulnerability to be created and make its way into the tools that everyone is using. So long story short is a vulnerability scanner is not gonna stop a active supply chain attack. So vulnerabilities are very different than supply chain attacks. Vulnerabilities are accidentally introduced by maintainers, by the good guys, and they have varying levels of risk. So sometimes you have a vulnerability that's very low or medium severity, and it's actually okay to just ship those to production and not worry about solving them immediately because maybe an attacker won't find it, won't exploit it, or maybe it's just a minor risk. Maybe it's a denial of service or something like that. It's not good, but it's not gonna, be the end of the world if you ship it because it may not be discovered. And you know, a lot of times we have so many vulnerabilities being reported by this tooling that we ship, like I mean, a lot of the companies I've talked to know that they have like 1,000 vulnerabilities in production, right, and they're working to get the number down, but like it's a project that's gonna take many months to do. And so you can have vulnerabilities in your software and they may not be exploited or even reachable through public API endpoints. So maybe they're not even really gonna be affecting you in any way. On the other hand, a supply chain attack is very different. So a supply chain attack is malware and it's intentionally introduced into the package. It's usually not introduced by the maintainer. It's usually somebody else, although it can be the maintainer, and it will always end badly if you ship malware to production. You don't have a few days or weeks to solve the issue. You will be owned the moment that it runs. So you need to catch it before you install it. So this is the key difference. So basically to summarize, a vulnerability scanner will not catch the next supply chain attack. So by all means, keep using vulnerability scanners. Keep using depend-a-bot or whatever tool you're using. It's fine, but know that that's not protecting you from the supply chain attacks. So what can you actually do to protect yourself from supply chain attacks? So this is the part where I just have ideas, no perfect solutions, but the first one is, I think we should support open source maintainers better. 23% of open source projects have only one developer contributing the bulk of the code. So 94% of projects have fewer than 10 developers accounting for more than 90% of the lines of code. So you can see these projects are basically supported by a handful of people. Yeah, and there's actually a lot of parallels here to another place where you'll see a lot of these types of attacks happening, which is the browser extension ecosystem. So if you look at the Chrome extension web store, or whatever they call it, you'll see a lot of people have created extensions which have millions of users, but they have no way to make money from it. And maybe after a few years, they get tired of working on it. And so some company will come along and say, I'd like to buy this extension from you for 10, 20,000 dollars. And the developer says, okay, yeah, I'm tired of working on this thing anyway. I never made a cent from it and I don't care anymore. So sure, give me the money. And when somebody buys it, they actually can add a tracking code or change the behavior of the extension in some way. So it's a very similar situation here with open source dependencies. Yeah, finally, or next, I think we should really change how we think about dependencies. We need to shift our mindset around it. I think a lot of people think if the code is inside of a dependency, then it's not my problem. If there's a bug or something in a dependency, it's not my fault, it's my dependencies fault. But the thing is, you're shipping this code into production. You're shipping it in an app that your users are gonna use so you're really responsible for it. It doesn't matter whether you typed in the code on your own keyboard yourself or whether the code came in through a dependency at the end of the day, all that code gets bundled up into a single process and it runs as one process. So effectively, that third-party code is your app, so you're responsible for it. So if we think about it that way, then we sort of shift how we approach the problem. And if you open up the most popular open source license, the MIT license, you'll actually see it literally says this in here. It says, the software is provided as is without warranty of any kind, express, or implied. So even the MIT license is saying, it is not our problem if this code has problems, it's your problem. Okay, the next idea is sort of, okay, so say we now accept this and we're trying to, okay, we wanna do something about it. So what do we do here? We need to update our dependencies at the right cadence. So a lot of people these days are using tools like Dependabot to keep their dependencies at the latest version. And this is usually a good idea. The thinking goes, the quicker you update your dependencies, the more likely you will be taking in the security fixes and the improvements. And if a new security fix comes out, if you're on a newer version, it should be easier to update to the fixed version, instead of being years behind on an old version, it's gonna be a lot harder to update when you need to to fix a critical issue. So this is good. However, the quicker that you update your dependencies, the fewer eyeballs that have had a chance to look at that code. So, in this example here, this Dependabot is configured to send a pull request anytime a new dependency comes out. And so you'll get these within 24 hours of a new version coming out. And so if you click merge, you're basically running code that has only been live for less than a day. And so very few people have had a chance to look at it and check for problems and check for malware or other types of attacks. So the question is then, okay, how quickly should you update? And this is a very hard question. This is something that we struggled with a lot when we were building when we were building various tools and websites. So I think you can think of it as a continuum from slow to fast. If you update too slow, then you've really exposed yourself to known vulnerabilities because you're running code that everyone knows has vulnerabilities. So you don't want to be too slow. But on the other hand, you don't want to be too fast. If you update too fast, then you're now exposing yourself to supply chain attacks because you're running code that very few people have even read. It could have been published a few hours ago. And so you're installing that version, you're running it and you haven't read it. You haven't looked at the code. No one else has looked at the code. So you're just hoping for the best. Doesn't seem like a good idea, right? So I don't have an answer here. This trade-offs, no perfect solution. One thing I've seen various, so there's a product you guys probably all know about called Signal, the Signal Messaging app. Their desktop app is actually written in Electron. It uses JavaScript dependencies. And their policy is actually quite interesting in terms of this balance. So what they do is they keep all their dependencies six months out of date, except for critical security updates. So they're always six months behind and they assume that any supply chain attacks will be discovered in that six month period, hopefully. And then if there's a critical known vulnerability that comes out, they will specifically update that one dependency to fix the critical vulnerability. And that's their policy. So that's one idea of something you could do. But again, trade-offs, there's no perfect solution here. Yeah. Here's one that you can do. So you can dig deeper before choosing a dependency. And I really like this idea because really, if you read the code, you'll be 100% sure what it's doing before you run it. And so ideally, you audit all your dependencies, right? Yeah, so how closely should you audit your dependencies is the question, right? You could do a full audit, right? And obviously, you're not gonna do that. That's why you were laughing. There are actually a few companies that do do full audits. So I've been talking to a lot of companies about this and for example, Google actually has a team of people that the security team that has to effectively audit any open source that Google uses. They bring it in to, they check it in to their own repo and treat it as like their own code. So they just, they vendor it in. It's their code now. They're responsible for it after they do a full audit. The problem is this is kind of, it's a lot of work. It's a slow process, right? It's expensive in terms of time and number of people needed to do this. And the other thing too is, you know, the whole reason why you're using open source, a lot of the time is you don't want to understand this problem. Like you're trying to, you don't care. Like you, I don't wanna read the code. I just want my problem solved. I'm not an expert in this topic. If I was an expert, I would have just written it myself. I just want the problem solved. So you don't wanna read the code, right? That's kind of defeating the point of open source for a lot of people, okay? So, or at least defeating the point of using these dependencies to make your project go faster, right? So this is not the best solution. And then finally, obviously doing nothing is what most of us are doing today and that has its own problems. It means you're completely vulnerable to supply chain attacks. It's risky and it's expensive in a different way. It's expensive in terms of losing your user's trust or losing your user's data or getting your own computer compromised and losing all of your important files and your data that is very personal to you. And so it's expensive in a different way. So, you know, again, there's like a trade-off here. I think we can actually do something in the middle though here and we can do kind of a semi-audit using automation and actually catch the most risky and highest risk dependencies. And so that's actually what I'm gonna explain here. So, you know, most of us, we have some process for choosing a dependency. We don't just, you know, use the first one that we find. Usually we do a little bit of research. We say, okay, does this dependency get the job done for me? Does it have an open source license? Does it have good documentation? Does it have a lot of downloads and a lot of stars or whatever? Does it have recent commits? So it's maintained. Does it have maybe types? If I use TypeScript or whatever. Does it have tests? So you do a little bit of poking around. Maybe you spend five minutes, you investigate it. And then you say, okay, I'm gonna use this dependency. It's the one for me, right? But I think in 2022 with the rise of supply chain attacks, we need to go beyond this basic set of checks. We really should be asking much more pertinent questions. So we should ask, if I install this package, is it gonna run a shell script on my computer immediately? It doesn't have native code. Does it have an executable inside where I can't even audit the file if I wanted to? It's a binary, right? Does it talk to the network when I run it, right? Because if you're installing a date picker web component, that should not be making HTTP requests. That's a UI component. If you're installing something that does should not be talking to the network, you should know if it is talking to the network. Does the package run shell commands? Does it read your environment variables? Does it gather telemetry? Does it phone home data to the maintainer? Does it contain giant blobs of obfuscated code, which you don't even know what they do, right? So these are things that you could find these things if you really wanted to spend the time to dig into the package and look for each of these items. And I recommend you do this, certainly. But I think a lot of this can be automated because we can detect some of these things with static analysis tools. And so that's basically what we built, and I'm gonna share a little bit about it now. So this is Socket, which is the tool that I've been working on with a team of friends. And basically it's a tool that can protect you from certain types of supply chain attacks because what we do is we're auditing every NPM package, we're downloading every NPM package and looking for malware, typosquats, hidden code, permissions that are being used like network, file, system, environment variables, et cetera. And we're basically tagging that package and saying it has all these attributes, right? So this is like an example of what it looks like if you search for a package on Socket, you'll get a list of scores at the top, which tells you kind of the security rating, the quality rating, the maintenance status, number of vulnerabilities and license. And then we call out the most important security issues at the top of the page. So this package will run a shell script immediately when you install it and it contains a native code. So these are like the two things we want you to know about this package before you install it. Now this package is actually good. You can click this button here and you can see what the install script does and it's actually innocent, it's not doing anything wrong. So you could feel comfortable installing this without any issues. But what about a package that is doing something a little bit more sketchy or nefarious? So this is another example of a package that you can look up here and you can see the score is lower on security and we found some issues in the dependencies. So say you click that and you go look at like what are the issues in this package's dependencies? Because remember that's the thing too, is like the issues might be not in the package itself but they could be in the dependencies of the package, right? So that's important. So what we found in the dependencies are it's gonna run code upon installation and it has telemetry. These are the two that are the most important here. And so what the telemetry is doing here is actually if you install this React component to build a feature on your website or whatever, it's actually collecting information about your system including your Git remote URL, your, I think it collects your IP address, a few other pieces of data about your machine and sends it to the maintainer. And one of the things that it collects is actually the name of your project as well. So the Git remote URL might contain the URL to your Git server, right? So it's collecting a bit of personal information. So that's good to know. But this is, you could say maybe you don't care, maybe it's fine, you still wanna use this package. Okay. Finally, there's like a third category. So we looked at a good package, we looked at kind of a sketchy package and now we're gonna look at like a malware package. So this is actually a piece of malware. This package was ultimately removed from the registry. You can see the security score is zero. And you can see all the things that it's doing. So it's, you know, install scripts, network access, and it scrolls off the page. It does a lot of things. And if you click into one of these alerts, you can actually see directly, it links you directly to the line of code where it's doing that behavior. So we're able to tell you, okay, it's accessing the environment variables on that line. And on this line, it should be saying it's going to send data to the network. This was actually, was a bug here. We fixed it, but this should also be telling you this is a network request here. But you can see, we basically tell you exactly which line the behavior is happening on. So very helpful. And then finally, I think it's important to monitor for changes. So if you use automation, then you don't need to remember to use this tool every time you're picking a new dependency. So the way to do this is you basically use some type of static analysis on your dependencies and detect whenever they're doing something privileged, using privileged APIs or contained obfuscated code. And then if something is detected, you do need to have a human that does a manual audit. So the human needs to be in the loop there. The tool can basically warn you something suspicious is happening and then the human has to go and look and say, okay, what is the shell command doing? Is this good or bad? And then we believe you should basically put this information into the PRs so the developers can actually get that information directly where they're working and act on it. So the developers are empowered to solve their own security issue before it becomes a problem. And again, like I'll just mention the kind of the tool you can, if you socket, you can install, right now we have a GitHub app, but we're also working on GitLab and CLI and stuff like that so you can use it in other systems, but this is what it looks like on GitHub. So when the developer makes a mistake, we will leave a comment and say, hey, you installed Bowserify instead of Browserify. Maybe you meant Browserify because it has 170,000 times more downloads and it's one letter off, so it's probably a typo, right? So we catch that. This is another example where the developer installed a package which was gathering telemetry and running code upon installation, so we warn them about that. And for the telemetry, we even tell you how to disable the telemetry by saying, if you set this environment variable, that will tell the package to stop sending telemetry, so we can tell you how to opt out. Yeah, so just a few things you can do with the GitHub app that we have. And finally, I think there's just a few things we could do in the language itself, and I'm very JavaScript focused, I know a lot of people here probably use other stuff as well, but just within the JavaScript language, I think there's a few things that if we could just fix these problems, then we would have a lot better security situation. So the first one is, right now everyone is drowning in vulnerability alerts. There's just so many. If you install something, you usually get a message that says you have 500 vulnerabilities, right? And you're just like, okay, I can't do anything about this. I accept it, you know, I'll just take my chances. So if we need some way to sort of lower the volume on these alerts and make only the important ones rise to the top. Second, there's been some interesting experimentation with the Dino project, which allows you to actually sandbox the process and give it certain permissions using flags like allow net. And I think we should try to bring that into Node.js. And finally, there's been some language proposals such as secure ECMAScript, ECMAScript realms, compartments, and then this flag in node. So a lot of interesting experiments and proposals that will allow us to build a system where we could actually create per package permissions. So similar to a smartphone app, a package could declare which permissions it needs and only have access to those permissions. So if you had something like that, I think that would go a long way to making sure that a rogue update or a hijacked package could not suddenly start sending all of your files off to some IP address in faraway land. So those are my ideas. And yeah, I mean, hopefully this overview is helpful. You know, we covered a lot of ground. We talked about kind of what the problem is, some examples of attack code, and finally some ideas for solutions. But like I said, this is an unsolved problem. We're trying to solve it at Socket. And yeah, thanks for the time and I appreciate you having me here. Thank you.