 Step on too much of your time, you know, we really appreciate the flexibility of Ross. He picked up this brief at a very like last minute Things swooped in and build an awesome deck. You're gonna hear this journey of open-source software mapping to the latest recommendations how to think about 161 and beyond and then some pro tips on Techniques TTPs and things like that. So over to you Ross. Thank you. So my name is Ross Bryant I'm the head of security research at phylum and we're an open-source software supply chain security company Originally Aaron Bray our CEO was slated to give this talk, but he wasn't able to attend So I'll be sharing his thoughts with you as well as some of my own thoughts and Observations from our research team concerning recent malicious attacks against open-source software and developers This is my first open-source summit and I've learned a lot about what the community is saying around open-source supply chain security It seems reasonable to me to assume here in this room that we're all on the same page When it comes to our common understanding of the term open-source. So before I delve into trusting open-source Software supply chains. I'll pose a simple question Next slide Who is a supplier? Specifically in the present context who is and what are the conditions that satisfy the qualifications of an open-source software supplier? I've heard so far this week a number of attempts at an answer to this by drawing on analogies to other supply chains For example just outside this room is a spectacular view of Vancouver Harbor there it is Right outside our door and we can plainly see all of the industry that moves goods and provide services by land, sea and air and That gives us some notion a point of reference for what a supplier is in a supply chain But even what we see is only a tiny fraction hidden beneath that is a tremendous amount of logistical complexity and orchestration that We just naively assume just works Well, hopefully we've learned in the recent years that these complex systems are fragile and shouldn't be taken for granted But another analogy that I've seen a lot of traction in this space is our food supply You hear this when Software bills of materials or s-bombs are referred to as the list of ingredients or the nutrition facts that you find on processed foods This analogy can't help but a one-to-one correspondence to software I don't think is helpful and we can easily be misled by pushing the analogy too far a Slight variation on this analogy is healthcare and pharmaceuticals There was a thoughtful article by Google from last August about s-bombs as the prescription label on a bottle of medication and as supply chain levels for software artifacts or salsa as the child-proof lid and tamper-proof seal guaranteeing the safety of the medication going further vulnerability exploitation exchange or VEX as the bottle's safety warnings This actually resonates with me somewhat because I'm old enough to remember my mother throwing out all the Tylenol in our house Because someone in the Chicago area lay several bottles of cyanide that killed seven people Which prompted the industry to now adopt tamper-proof seals which are now ubiquitous ubiquitous across everything and all these analogies are helpful to the extent that they help us try to wrap our heads around the vast complexity surrounding software development engineering, but they do fail in one critical respect These all described supply chains of things composed of matter in our physical universe and The pesky thing about matter at least as we commonly experience in our daily lives Is that a thing made of matter can only be in exactly one place at one given time and furthermore It displaces all other things in the space it occupies for example a Farmer grows a banana which he exchanges for some form of currency with a supplier Who repeats that exchange with other suppliers until I go to HEB and exchange my currency for that banana in order to consume that banana if We're considering a physical medium which holds software then this supply chain analogy is robust The medium holding the software is a thing in time and space, but It doesn't take a tremendous amount of thought about software per se for it to be indignant on us that software doesn't exactly behave like this Like the other things made of matter in the physical universe We commonly do use the language like we are running the same software and Enforce this sameness with things like cryptographic hashes even though we may be in two very different locations at two very different points in time But it's patently preposterous to say in the same way We are eating the same banana when we are in two different locations at two different points in time and So hopefully with those in mind and to keep in check some of the fallacious notions We may still harbor about software supply chains. We're still stuck with our original question Who was a supplier? And so just as I pointed to the harbor outside, I think part of the answer is right under our noses here package maintainers The people who invent produce maintain innovate and disseminate open-source software They are suppliers Some of them are actually here in person We can speak with them face-to-face and exchange thoughts and ideas with them and This ability to meet others and exchange information has been the bedrock of establishing trust among individuals since before recorded human history These things too. I hope we have learned our fragile and not to be taken for granted Perhaps some of the maintainers of the very software packages at your institution agency or enterprise Fundamentally rely upon are in this very building right now So since you're listening to a talk on trust, I'll ask you have you sought any of them out? Would you even know who they are? To be completely transparent, I haven't and I use open-source software in some form or another every day So if it isn't in person face-to-face human interactions that virtually all other systems of human trust have been built upon Since the dawn of civilization, then what's our foundation for this trust? What are you trusting when it comes to your supply of open-source software? And so I think there's a decent depiction of this Phenomenoff first in literature that was then made into a popular movie Ready Player One I was written by Ernest Klein a letter over a decade ago and was made into a Steven Spielberg movie in 2018 both of which I thoroughly enjoyed because they overflow with pop culture references from my childhood If you're not familiar The setting is a dystopian world a couple of decades from now in which the majority of human interactions take place in a virtual reality Universe known as the oasis which bears a striking resemblance in many ways to the present I suppose that resemblance to the present is fundamental to making all these dystopian stories resonate with us, but at any rate There's a scene in act one of the movie that I think captures this paradoxical notion of trust that we all find ourselves in and exposes for comedic effect the naivete of the pro Protagonist and his trust in a recently discovered love interest So the avatar of the protagonist the short one on the left z and The avatar of his best friend H the big one on the left are talking about an upcoming date that parsable is going to have with another Avatar Artemis and it goes something like this Z. You got to be more careful about who you meet out on the oasis And H is ah Artemis gets me. She she's she'll get my outfit. There's just this connection I mean sometimes we even finish each other sentences. Yeah We have that me and you. Yeah, I know dude. That's because we're best friends She could be a dude too dude Nah, come on. I'm serious She could actually be a 300 pound dude who lives in his mama's basement in suburban Detroit and her name is Chuck Think about that So I won't spoil it for you, but this plot point gets designed delightfully resolved in act three of the movie But the point is that this scene mirrors own common experience We do the same thing when it comes to trusting all of the humans involved in all of the processes that are open source That supply our open source software Namely we trust and use software from strangers on the internet Strangers who will never meet Strangers in such vast quantities across every aspect of the software development life cycle that it is hilariously unlikely That we will ever truly know the full extent of all the identities of all of our open source software suppliers And this is true of every consumer of open source public or private Not that long ago governments were inoculated from these risks from open source With the spoke software solutions, but the financial costs coupled with the quality of some open source solutions have made these ubiquitous And so the response has been for new federal requirements for supply chain risk management to manage and mitigate risks This dod memo highlights two primary concerns for using open source within the dod First create that open source software creates a path for malicious adversaries to introduce malicious code into dod systems I'll get to that in a moment And second the imprudent sharing of code develop for dod systems potentially benefits adversaries by disclosing key innovations And so even though this memo concedes that open source software Is critical for delivering software faster policies have to be set in place that articulate the manner in which the dod operates with the open source software community NIST 861 soon followed in this impressive tome lays out in fine detail the guidance for From NIST for cybersecurity supply chain risk management practices, but in its vast 326 pages open source software gets 12 mentions The largest being configuration management 10 in appendix c which refers the reader over to executive order 14 028 Which is improving the nation's cyber security, but the guidance there boils down to these three things Track open source software use and documentation Check licenses and monitor distribution Periodically audit your supply chain And so if we're going to hope to do the last bullet point well, we're going to need to understand how supply chain attackers are currently operating Today And so I have a brief survey of some of the attacks that we've seen in open source ecosystems in the first quarter of just this year So here's some statistics that we gathered across seven major open source software ecosystems Cargo go lang maven npm newget pi pi and ruby gems And I won't read them all to you, but some notable ones are 18 000 packages that execute suspicious code on install More than 12 000 packages making requests to a direct ip address 2500 packages with obfuscated code 1600 packages executing code from a remote source Almost 2200 typo squatted packages and over 800 000 spam packages The spam packages encompass a wide variety of subjects free streaming services get rich quick crypto schemes herbal supplements others This first column here Shows 357 000 of these were created by 17 quote unquote users in npm though. I don't think those user names look all that impressive The second column is a ebook piracy scam in pi pi 110 000 packages And while we don't claim these are sophisticated attacks what they lack in sophistication they make up foreign volume Here's an example of a typo squatting campaign that we discovered in february in pi pi The real package name is ccxt Which is a javascript python php cryptocurrency trading library with support for more than 130 exchanges All these packages are just a simple typo a repetition a deletion a transposition of a character in the legitimate package name And here are all the packages that were targeted by this campaign As you can see they didn't just target cryptocurrency libraries But rather a wide variety of targets web development scientific computing artificial intelligence machine learning packages And other developer developer tools And over the course of a few days this attacker registered 899 out of a possible 1042 typo squats around all of these packages So they just laid a bunch of little landmines around these and so if a developer mistyped pip install matt poltlib I think i've typed matt poltlib about a hundred thousand times in my life Um, this would have been the setup.py file, which would have automatically run upon installation Definitely not matt plotlib In fact underneath these layers of obfuscation is a crypto wallet clipboard sealer that gets installed Here we see a novel evasion technique Which uses a little known feature of how the python interpreter handles unicode characters to hide in plain sight That may be too difficult to see but here's a zoom in It appears that those words are in just some weird font And they're not actually these are Various unicode code points that get normalized by the python interpreter just before execution And this particular specimen this obvious once you peel back these obfuscation layers Install the notorious and popular credential stealer called wasp and so What we're seeing um, if if if any of your organizations Mirror these popular What we're seeing is that There's emerging threats in the in the gaps And gaps in the tooling About one out of every three packages were either malware spam S bombs and salsa and vex even though they attempt to mitigate the risk associated with the software product The software product was not the target of these attacks. It was developers themselves And s bombs and salsa and vex don't seem to have any efficacy against these kinds of attacks And again, what I was going to say is if if you or any of your organizations mirror these popular Repositories pi pi npm for your developers across air gap networks It's reasonable to suspect that these malware packages are in your mirrored repositories And so we have a long way to go in terms of establishing and verifying trust in our open source Supply chain and we can't just leave it to developers to defend themselves And thank you for your time. This is a link to our research blog where the details are on All of these attacks and with that i'll take any questions Yes All right, this was really cool. I mean, I want to thank you for that But I also want to pick on you for a minute because at the beginning you talk about All these open source developers. We don't know Who grew your lunch today? Oh, I don't know that you have no idea. No, but but can you do you have a thought on Why say our food supply chain? I feel like we're not all keeling over dead Well, most of us aren't sure but but the software supply chain clearly Need some work. No, I I know that's that's a great analogy like we have all of these layers of trust that we just implicitly assume just work Until they don't right until somebody starts Dropping cyanide in Tylenol or polluting water supplies or on and on and on right? I think your point is spot on I don't disagree and so I I I guess to go further is like I don't think it's just software developers who are The problem in fact, I'm just saying People I run people's code all day long and I have no idea who they are or what their purposes are and these attacks um A lot of these packages probably none of us in here except for the typo squats, maybe most of them None of us would ever intentionally install and yet we've seen other ways that they'll take these packages And hijack a dependency of a dependency of a dependency so that you never really saw that that little Extra package was getting installed when you did react install reactor something along those lines. I'm just making an example of but that's Yeah fair question Any others So a big portion of this talk was right typo squatting And pointing out the issues. Have you seen any meaningful solutions or proposals either in the open source Community or proprietary conversations that you're willing to share. Yeah, because this is reasonable to probably share So I got I have two examples. So the ccxt example that I showed When this attacker was trying to register these packages It was pretty easy to just you you take a word and you can write a little python script that says Generate every repetition deletion and transposition and you get a list of stuff And they apparently just scripted the ability to say try to try to Upload this to pi pi one of them that they missed I believe it was cctx When you go to pi pi cxt actually registered cctx and said oops Dear developer. I think you may have made a typo We've gone ahead and registered this for you just like they used to do domain squatting back in the day where Google registers all the variations of google, but that's That was their initiative to do that And the longer the package name obviously we're going to grow the number of it's it's linear, but it's about three times the number of Package names and now you begin to wonder okay is an organization going to take it upon themselves to register every possible typo squat because that would be a solution, but that's a individual choice to maintain all of those you know Squad on all those yourself and that way nobody else can take those names The second one that occurs to me is I don't know if any of you remember the yandex code breach from february We saw over the floor over two or three days The andex security team registered every single package that they had with a large version number to Prevent dependency confusion attacks If you're not familiar with those basically you're trying to update your package and your package manager says Let me just check internet real quick to see if there's a newer version And somebody's registered the same package or Same package name in a different namespace and it goes out and gets package version 99.999.99 And the package manager gives you that which has the secret sauce in it Those are the two that come to mind of like what have others. What have we seen others do? But as far as a broad solution that applies across the board to everybody, I don't think so and in fact I think this one is actually It's nefarious But the ones that really I think are going to be difficult to detect It's a paper from a couple years ago talking about typo squatting and what they call combo squatting Where instead of just making a typo just fact fingering fact fingering something on your keyboard They actually just take common words and smash them together in hopes that you don't know naive developer. What is the real package? so Instead of open ssl python They register python open ssl And now you if you don't do any of your due diligence, you know, some developer might be like, yeah, that's what I want I want the open ssl version for python except that's the malicious package and that isn't really a typo squat anymore That's just sort of preying on our um preconceived notions about like oh, yeah, that looks fine to me without actually vetting it Great question any others Sure, that's fine So I think another thing you mentioned that's really important is These are targeting developers whereas the things we're talking about like salsa and fresco and vex and all that stuff There's little focus on developers and this is something that there was uh A panel a couple hours ago. I was in about salsa and s2c2f And fresco and that was that was the question I asked was like You talk about these attacks, but we also kind of hand wave them aside as oh, we could solve solar winds with with salsa It's like can you though? And so I think like this is a place of research We don't understand exactly how to stop or even what some of these frameworks will stop That's right And so I would love kind of your thoughts on where that exists today and what you think we do next right so so this is I I was hired by phylum about 18 months after they started in march of 2020 and so they already had the ball rolling before solar winds to It started by a bunch of software developers looking at that precise thing that um Who's defending the developer? I mean not just from simple mistakes that we might make out of human frailty like this, but like Hey, I was doing a really pretty good job I'm not an insecure person, but somebody got me because they were attacking me not the software product that I'm producing um I don't see a lot in this area because it's so much easier to take a piece of code and Create all these artifacts and run them through these wash cycles, but it never gets to the human being. I mean You know we have spam filters now So we don't get as much spam and that was a technical solution to a human problem Which is oh somebody sent me a really good-looking thing about my ups package Let me check the tracking number and that's just human frailty That's like that we had to be taught that that wasn't us being done That was people taking advantage of us and I I think probably As a software developer and a mathematician like we have some hubris about this like I would never get caught clicking on blah You probably wouldn't I mentioned those 800,000 spam packages. It's interesting. Um, they're all in a lot of them in npm And when you look at the packages, there's two files a read me in a package.json the package.json is trivial It's it's like the bare minimum. It doesn't have anything in it whatsoever But the read me is in markdown And when you go to npm and look up a package because you're doing your due diligence is developed like what the heck is this What what is that and you go to that page and you get the read me right there on the front page And guess what renders is that link from that link shorten url? and I don't know about you but With my touchpad, I have often tried to swipe past a web page or get another browser, you know Press it a little too hard and I'll click on some random link and a table pop up I think that's an attack against the developer because there's no other There doesn't seem to be any other reason except for maybe like search engine optimization and google indexing a bunch of these packages But why did you put it on npm if you weren't after developers in the first place? And again like the spam problem if I send out 800 pieces of spam 800,000 pieces of spam, but I get eight solid clicks. It's a win-win the the barrier of entry is low and the reward is high So why wouldn't I give it a try? Originally these things came out sort of in little bespoke batches, but the amount of automation that we've seen recently It's pretty staggering the the 17 the list of 17 people that people usernames that I showed you They've actually adapted a little bit and now we're seeing A username registered and maybe like here's 25 spam packages and then they never use it again. Here's 40. Here's 30 and so they're improving their Tradecraft to get better at Making it harder for us to detect them But no, I think this this this is precisely where phylum is trying to be right now is to say, yeah, but what about the developers themselves? Who's protecting them? We see a lot of Discord token stealers right now Not sophisticated All it does is look and you're looking some path for a file And it's not too far of a trip to say if you can go grab a file from that directory and ship it out Well, why can't you go into dot ssh and pick that key and ship it out and now I mean that's a real attack on a developer because in every sense of the word if you've got A key and possibly some other things But at least you have a fighting chance to be that developer if they don't have two factor authentication set up And I mean this is These are real problems. It's a great question Anything else? Well, I really appreciate your time here late in the day. It was great talking to y'all and thank you very much