 Now, next talk, we got a talk that made the news a little bit, the fact quite a lot regarding repo-jacking and the implications in dependency management and how it can impact your organization or your apps. So a really interesting talk featuring Indiana Morro. Internet is a security engineer at security innovation who specializes in testing web applications, APIs, and cloud configuration. He has a background in web development. He has previously worked in telecommunications and banking, performing penetration tests and security assessments. In his spare time, he works on personal coding projects and eats copious amounts of sushi. So please, let's give a warm welcome to Indiana Morro. Thank you very much, Lionel. So let me share my screen here. Can you see that? Okay. Hopefully so. Welcome, everyone. Welcome to my talk on repo-jacking and how GitHub user names expose a lot of open source projects to remote code injection. I'm Indiana. And today, we'll be talking about repo-jacking, which is a novel dependency supply chain attack. And so we'll start by talking about supply chain attacks in general, and then we'll talk about specifically repo-jacking. I'll go into the details of it, how to exploit it, you know, how it came to be. And we'll also look at how we managed to scan all of open source software for this vulnerability and we'll be able to see just how widespread it is. And we'll also talk, obviously, about remediations and how to protect yourself. So let's talk about supply chain attacks in general. There's two broad categories of supply chain attacks. You've got hardware supply chain attacks and software supply chain attacks. I'll be talking about software supply chain attacks. And in the recent year, there's quite a few examples of software supply chain attacks, the most noble of which being solar winds. But there's been a couple of examples over the past couple of years. Now these are referred to as vendor supply chain attacks. We're an attacker attacks a vendor and attempt to reach a company that uses them as a vendor. I want to talk more specifically about a slightly different type of software supply chain attack, open source supply chain attack, or dependency library supply chain attacks. Now, open source is obviously a huge part of a modern application, right? Everyone's using open source, whether it's for your web framework or your crypto library. And as such, it's become a very lucrative target for attackers. We're seeing increased attacks in this space, especially supply chain attacks. And as such, it's very important to protect your libraries. Now, there's a key difference, I think, between a vendor supply chain attack and an open source supply chain attack. The difference being when the attacker goes after a vendor for supply chain attack, they're going to try to hit the biggest vendor, right? Solar winds, because they're used by lots of companies. But in an open source supply chain attack, it's not always the biggest libraries that have the most impact. Here's this page called Unseen Infrastructure, maintained by libraries.io. And the premise of this is that libraries.io keeps track of libraries, which are highly dependent upon and that, you know, everyone uses, but that aren't very popular. So if we take a look at example number two here, Concatmap. This is a library that's dependent upon by 800,000 other repositories, but only has 30 stars and one contributor in GitHub. And that's why it's Unseen Infrastructure, because it's basically the infrastructure used by so many different projects, but it's not very popular. And so in vendor supply chain attacks, you go after the biggest guy, but in this case, you don't have to go after the biggest guy. You can go after Concatmap, this small library that's actually dependent by a whole host of different frameworks. I'm sure you recognize a lot of these. In fact, if you look at the third one down here, Angular, that's AngularJS, the web framework. And that's actually the framework that SolarWind uses as their front end. So if you want to break into SolarWinds, all you have to do would be break into this Concatmap, have that code loaded up by Angular, and Angular would get loaded into SolarWinds. So you can see how it can be very dangerous to have the open source libraries since they don't have the same security protections. In fact, if you don't want to find a vulnerability in Concatmap, what's happened in the past is people have just asked, there's an example back in 2018 where a library called a vent stream, very similar to Concatmap, not very known but very dependent upon. A malicious attacker reached out to the then maintainer and asked them. They said, hey, can I take over this library? I want to maintain it under the false pretext of wanting to update it. The maintainer had moved on to other projects and said, sure, and passed it on. And immediately after that, malicious attackers started to inject malicious code and malware into that library in an attempt to exploit a crypto wallet up the supply chain. Now, in this case, they're caught, but the scary part of the story is that they're only caught because they used a deprecated API in the malicious code, which means that every time a user has run it, they get a warning. If they haven't used that deprecated API, who knows how long this could have gone on. So there's a great example of how these small libraries can be really foundational to kind of the security of the modern software community. And so it's important to protect them. Also, Q repo jacking. A while ago, when I was on an engagement, I was doing a source code review, right? So I'm looking through a source code and I run into a dependency. And it's dependent on the GitHub repository. Pretty common. I've seen that a bunch of times. I've done it myself hundreds of times. And so I followed the URL, the GitHub URL to see where the code actually was. And when I did, I got redirected to a different GitHub repository. Thought that's kind of weird. Try again, same thing. So if we look at the actual GitHub URL format, what you have is you've got that first part, the protocol domain, that standard, that's not going to be changing. And then you've got the username followed by the repository name. Now the repository name is chosen by the user, right? So that username is really the key identifying feature of who controls that URL. So it's important to protect it. So browse to this GitHub link and it redirects me. So then I go check out the username. Who's this user? And I browse to the user, I get a 404 error. User not found. That's weird. I said, you know, this project is clearly using this dependency, but the user doesn't exist. And I asked myself, what would happen if someone were to create an account and re-register the user? So I tried it, create an account and re-register the user and it worked. Afterwards, I created that original repository. And all of a sudden, instead of getting redirected to a different repository, the project was now loading code directly from the repository I controlled. And I had achieved code injection to this repository, to this project. And that's the core idea of repo jacking. Essentially, GitHub allows username reuse. So when a user deletes or renames their user, anyone can then go in and recreate an account and take over any URL pointing to the old username. So here's a couple of key points of repo jacking. So first of all, it's conceptually similar to S3 bucket sniping or subdomain takeover, if you're familiar with those. In the sense that, you know, an organization or someone pointed to an old URI and that's now available to be re-registered by anyone. And in this case, it's the GitHub URL that's vulnerable. It's very impactful. Since GitHub is a place to store source code, obviously people are using it to reach out to dependencies or libraries. So it often results in code injection. And another scary part of repo jacking is that you can become vulnerable without ever knowing. You know, one day your code might be perfectly secure and you're relying on a GitHub dependency. And then all of a sudden, that user who owns that repository you're relying on changes their username. And now all of a sudden, you're relying on the non-existent URL and anyone can recreate that user and you could be vulnerable to code injection. And it's trivial to exploit. So that's kind of the double, the triple trifecta because anyone can create a GitHub username. So we've got easy to exploit, very impactful and hard to detect. So it's definitely a pretty bad vulnerability. So let's look at actually three specific scenarios where repository can be hijacked. So scenario number one, this is the obvious one, right? The user deletes their account. So if a user deletes their account, that username is immediately available to be re-registered. That makes sense. This is actually happening in the wild before. Kodi, which is a video streaming platform, allows users to have add-ons that they can load in from GitHub URLs. A popular add-on had the user who owned the repository delete their account. So now all of a sudden, that user was up for grabs. And all of these add-ons who were checking back to the repository for an update were hijackable. And that's exactly what someone did. Someone recreated that user and started serving new updates to the add-on. Now in this case, it wasn't malicious, but it very well could have been. So that's scenario number one, the user deletes their account. The issue of this scenario is that it's hard to exploit for an attacker because the moment a user deletes their account, everyone else is going to start getting four or four errors. The repository is not found and all the links are going to start breaking. So anyone who has a URL as a dependency is going to start getting errors and they're going to look into it and they're going to say, oh, this user doesn't exist and they'll point their dependency somewhere else. So you have to be a very well-timed attacker to exploit scenario one. And it's a pretty niche kind of scenario to exploit. Now two more scenarios here. And these scenario two and three are distinctly more dangerous because of one small GitHub feature which I'll talk in a moment. So scenario two, a user renames their account. Same deal, user renames their accounts so their previous username is up for grabs immediately. And then scenario three is a user transfers a repository. So they transfer repository to another user and then they delete their account. It sounds a lot like scenario one, but it's a lot more dangerous for one reason and that's because of repository redirects. So back in 2013, GitHub introduced a new feature called repository redirects. If you were to rename a repository or a username, GitHub will conveniently set up a redirect. So any of the old URLs now point to your new repository. It is super convenient for a developer because it means you don't have to go into your readme file and change all the URLs or change all of your dependencies or your get remote references. So very useful feature. The issue though is that it now sets up for a far more dangerous repo jacking scenario. If we take a look at how repository redirects actually work, you can actually go right now to github.com slash twitter slash bootstrap and take a look and you will be redirected to github.com slash TWS slash bootstrap. That's the Twitter specific account for managing bootstrap. Now this is an example of repository redirect where Twitter transferred that repository over to another account. If Twitter were to delete their account or they were to rename that Twitter name, that repository redirect still exists, but that means that now anyone could register the username Twitter and they'd be able to take over that redirect. So anyone looking at this top link would get redirected. So that's where the danger lies. Of note, these repositories redirects apply even for Git operations. So if we look back to scenario two and three, these are far more dangerous. Because in scenario one, you might become vulnerable but everyone would kind of know, right? But in scenario two and three, all of a sudden these repository reacts are now kind of hiding the fact that it becomes vulnerable. So everything keeps working for a project that's depending on a github repo, but now they're vulnerable to code injection and that can stay in the code for months or years. And so at this point, we had a lot of pretty damning vulnerability. And so we reached out to github, say, hey, github, you have a problem here. You've got your repository redirects. You've got username reuse. These are pretty dangerous. And github's response was, yep, we know. Here's what we did about it. And they pointed us to a page where they have their mitigation in place. And the mitigation they developed a while ago was that if a repository had more than 100 clones in the week leading up to it being deleted or renamed, then it can no longer be re-registered. So for example, Microsoft's type script, the example they give here, if that repository were to be renamed, it has more than 100 clones a week. So therefore it can't be re-registered. Now, the issue of this remediation is that oftentimes with repo jacking, it's not these big repositories which are used. These big repositories are going to properly be put on a package manager and you'll access them that way. It's often the smaller guys, something like Concatmap, or the ones which aren't necessarily put on the proper package managers and people linked directly, which are vulnerable. Or for example, if someone were to fork a project, make a small code change and then link directly to that, that won't have 100 clones a week. And if someone were to rename their account, then all of a sudden that repository is now vulnerable to repo jacking. So this mitigation definitely does not catch every scenario and makes a lot of projects quite vulnerable. So at this point we got this vulnerability who's very impactful, trivial to exploit. GitHub has said that they're not going to be fixing it more than they already have. We wanted to know just how widespread it was. So we did some mass analysis. So I'll walk you through the eight steps I took here to actually go through all open source projects and take a look and see how widespread it is with vulnerability. So step one in scanning all open source projects is data collection. Use two data sets for this. Data set number one, the GitHub activity data, published by GitHub is a three terabyte data set that includes raw source code for a lot of repositories. Now the issue with this is that it was last updated in 2019, so it's not the freshest data, but it does allow us to search for GitHub links and find the ones which are outdated. Data set number two is the libraries.io open data, the same people who maintain that unseen infrastructure page. They maintain a graph database of open source dependencies. So this will allow us to do our reverse dependency analysis using this. So those are our two data sets. First thing we had to do is extract all the GitHub URLs that we could find, right? And we left you 3.9 million unique GitHub links in source code. The issue of this is that not all of these are used meaningfully. A lot of them were in a readme file, for example, right? So for example, you've seen this many times, GitHub link in a readme file, but that's not going to load in code. So it's not as impactful security-wise or maybe in a comment, right? If someone's just pointing to where they found the code, that's not going to result in code injection. Something like this Docker file, which does a W get to a GitHub link. If that protocol buffer user were to delete or rename their account, that would be hijackable. And all of a sudden, this project would be vulnerable to repojacking. So that's what we're interested in. And the other thing we had to consider was that all the different package managers had different ways of linking back to GitHub code. The most notable or an interesting one to look at here is that NPM and Ruby also have this shorthand format where you can just put in a username slash a repository name and NPM will just automatically assume that you're referring to GitHub. And then that really speaks to just how common it's become to link directly your dependencies from GitHub. Since even NPM just assumes it's GitHub if you just give it the bare minimum amount of information. If you had to grab all these links, all these links where code is meaningfully being used and imported. And we're left with 2.1 million unique impactful GitHub links. So these are links that are reaching out to GitHub to load in source code. Okay, so we have these links. Now we need to find out which of these links are actually hijackable, right? So if we extract out of those 2.1 million links, we have 650,000 GitHub usernames. We have to check all of these to see which one exists. The GitHub API has some rate limiting in place so it took quite a while to scan through all these. But eventually we were left with 50,000 unregistered usernames, which accounts for about 7% of the total number of usernames which we found. That's a very large number. And when I first heard this, I was very surprised. I was expecting sub 1% maybe, but 7% of all the usernames we found were unregistered and that kind of foreshadows how widespread this vulnerability is. So that was pretty simple. Now that we have our unregistered usernames, all we had to do is a reverse search, find out which are the links which are vulnerable. We're left with 92,000 hijackable GitHub links. So these are links to use in code that are loading code, so code from GitHub and are hijackable. And simple question to do the reverse search again, find which projects use one of these links. And it ends up that we have 18,000 directly vulnerable projects. These are projects that are directly linking to GitHub using a username which does not exist and can be registered by anyone. 18,000 projects which are vulnerable to repojacking. Let's take a look at one example project. Algolia Search Helper version 3.2.2. It's since been remediated, but over here we have the package.json file which is the npm package manager file. And you'll notice it has this super wolf slash mel smith in place repository. And this is that shorthand format we talked about. So this is pointing to the user super wolf. So first thing we do is we take a look at this user. Browse to that user, it does not exist. Four or four, user just not found. So super wolf user doesn't exist, which means that this link can be hijacked. And then if we browse to that full repository, we'll notice we get redirected to mel smith in place slash mel smith slash mel smith in place. And this example of that repository redirect, right? This is why the Browse search helper that repository kept working because GitHub was redirecting them. But that repository is in fact vulnerable to repojacking because anyone can re-register that name. So that's an example of a directly vulnerable project. There's 18,000 of these. But as we saw before with the concat map example, with only 30 stars on GitHub and one contributor, looking only at the vulnerable projects doesn't give you a whole scope of the tool impact. We had to do is do a reverse dependency search and find which projects we're actually using one of these libraries. So if we take a look at the algorithm search helper example, this is a directly vulnerable project, but it has 37 dependents. So we take a look, okay, let's take a look at all the projects that are dependent on algorithm search helper. Okay, we've run into view dash instant search and that has 14 dependents. And so we keep doing this kind of reverse dependency analysis graph until we find something interesting. And so we keep going at depth level of three here. We've got view slash CLI dash UI and that has dependents and eventually reach the view.js command line tool. This is the official view.js command line tool that is used for all view operations. If you're familiar with the view.js framework, it's a web, a very popular web framework. And essentially any developer is going to have this installed. And I'll go search helper is one of its dependencies. So if we, if we to go back, what we have here is we have this super wolf user has changed their name to mail Smith in place, most likely. And by doing so has made our goal search helper vulnerable of co-injection, which in turn has made the view command line tool vulnerable to co-injection, which in turn basically compromise these entire view.js ecosystem. So simply because this user on GitHub has changed their name, you now have the entire view.js ecosystem exposed and vulnerable. And I think this really speaks to just how, how dangerous repo jacking can be and how dangerous repository redirects paired with the interconnectness of modern open source have become. So that's why it's important to do this dependency analysis, right? So let's take these 18,000 projects. That's considered a depth of one and we, and we move on. Okay. So now depth of two, we've got 32,000 projects. Great. Keep going. 38,000, 55,000. By the time we reach a depth of five, we're at 70,000 impacted projects. We actually had to stop at a depth of five because as we tried to process the depth of six, it took too long. The computer physically couldn't because all of a sudden at depth of five, we had run into some huge fundamental frameworks and libraries used by basically everyone, very similar to that concat map example, which means the computer could not process is too computationally expensive to do. At this point though, having 70,000 impacted projects and seeing the type of projects that we had impacted, that was enough to kind of lead to the conclusion that this vulnerability is clearly very, very widespread and very relevant in open source today. So this is the key findings about this mass analysis. So first of all, 70,000 on a minimum, 70,000 affected open source projects. And this spans on every language you can imagine because any language can load code in from GitHub, right? Go JavaScript, Python. We found virtually every language to be impacted. Every major open source provider was impacted. GitHub, Facebook, Microsoft, Google. We found if we found in basically all of them some impact of repo jacking. And now this last stat is a little hard to quantify because not all package managers publish the daily downloads, but there's that very minimum, bare bare minimum, at least 2 million daily downloads of impacted projects. These are projects that anyone on the internet can go in right now and achieve code injection on. So this whole vulnerability is clearly very impactful and relevant. Now that we had all this data in a graph database, though, we thought it'd be nice to actually do a visualization of it. So I've set up a little visualization here where every circle, every node, this blue circle here is one project and the size is based on its popularity. So this is a one project with 1,000 downloads a month. And as we zoom out, we'll see all the other projects that exist. So this is at a depth level one, right? These are all the directly vulnerable projects. If we take a look at that, you've got all these projects here and you see that that thousand dollars a month compared to some of the other projects are genetically vulnerable. Some are quite popular and these white lines are how they connect with each other and the interconnect miss. So that's depth level one, right? And as we go through our dependency analysis, we reach depth level two. We see that quite a few more projects that come up reach depth level three. And we keep walking through these levels and you'll kind of notice already, you see some very big projects on the left side here and some tight bundling based on how interconnected they are. And by the time we reach depth level five, you'll notice on the right side these huge libraries, which are affected. And these are the libraries which made it that basically we couldn't progress further. If you look on this left side over here, this is basically the Vue.js ecosystem. So you can just imagine the size of these other affected libraries. So I think there's a very cool representation to be able to visually see just how impacted it is. And if you look at the original, that original thousand download a month project, you can barely see it compared to the size of some of these other ones. So that is repo jacking. It's very prevalent. It's very dangerous. How do we fix it? Remediations. The most obvious remediation and the most effective one is to not link directly to GitHub repositories. It goes without saying, but GitHub repository URLs are not immutable source of truth they can load code from. That's not their purpose. Always use a package manager because that's what the package manager's goal is. So loading directly from GitHub, even if maybe today you're safe, you never know when all of a sudden your project could become vulnerable to remote code injection. Recommendation number two here is version pinning. To version pinning is when you take a specific dependency and you tell your package manager to only download a very specific version of that dependency. In the case of a GitHub, you're all that amounts to a get commit hash. If we look at this package.json file, which is the NPM package manager file, you'll notice that our super wolf example has actually a little hashtag at the end and that get commit hash. So this package was actually doing version pinning. Now, the issue is as we were looking and looking at this research, I also looked at package managers and looked at their version pinning and I was able to bypass virtually all the major package managers version pinning, which means that the super wolf is actually still vulnerable. I don't want to go too much into detail about that because I'm still working with the vendors and the package managers to actually get all of those fixed up. But be on the lookout in the future for an article detailing all of that. I'm going to say not a silver bullet. Be careful because it might not always work as expected. And then if you're doing version pinning, also don't pin to a tag or a branch. Tags and branch can be changed and if someone hijacks a repository, they can just change that. Second potential remediation, if you do need a link to GitHub repositories, is use a lock file. A lock file is a file which specifies a specific version to download, very much like version pinning. Now the issue of lock files is that they're not all made equal. They're not always have all the same security considerations. For example, this is the npm lock file. If you look at this events dependency, this comes directly from the npm registry and it has an integrity check, right, a hash. But if you look at this express dependency, this comes from GitHub and it does not have an integrity check or hash. It relies on the git commit hash to specify a version. And as you saw with version pinning, that can be bypassed. So version pinning and lock files are always a great feature to implement, but they're not full proof. Basically, if there's one thing I want you to remember from this talk is that GitHub repositories can be hijacked and it'll result in code injections. Just don't do it. Try to find another way to refer to dependencies because if you have a project right now that you're maintaining or a library, I would recommend you go take a look at it, see if you're using GitHub URLs anywhere and ensure that you're not already vulnerable to code injection. I have written more about repo jacking on the security innovation blog and an article entitled repo jacking exploring the dependency supply chain. So if we go check that out, it has more information which I wasn't able to cover in this talk here. And I am open on any of the standard methods to contact me if you have any questions and I'll also be on the NorthSec Discord later if you want to reach out to me directly. Thank you very much for attending my talk and hopefully you learned something new about a pretty cool vulnerability.