 Hi, my name's Max as you've probably now gathered and today I want to talk about malware on the Python package index. Now this is an interesting topic to me for a lot of reasons and I'll share them very soon but before that I just want to mention that there's a company here who paid for my flight and hotels so I'm going to mention what we do in like 30 seconds real quick so we do stuff with communications APIs. So things like SMSs, voice calls, video chats, two-factor authentication codes, things like that. Via API, if that's interesting to you, you know, I've got so many stickers, come grab me. But more importantly, I maintain as part of my job, I maintain some of our Python tooling and specifically an SDK called Vonage, which is the Vonage Python SDK on PyPI. And I've also recently discovered a new version of this that's appeared, which is named similarly but it's not the same thing. And so we'll go back to this in a little bit, but before we get there, I just want to take us all a little bit, just a little trip down memory lane, does that sound good? Memory lane, awesome. So who remembers this old font, this old site? Anyone remember that? That's fine. That's fine. That was intentional. Who thought that said Google.com? Oh, half the room is honest. Okay. So this actually says Goggle.com and this is an example of something called the typo squat where essentially this was a real site that did exist and in the early 2000s you could visit this site and what it would do was bad things to your machine. It would essentially prey on people who misremembered the name Google or spelled it incorrectly and went to this website and at that point it would do a drive-by download of malware onto your machine. And so actually what's interesting now is I was able to get some footage of a computer being infected after visiting Goggle.com and I'm going to show that to you now. So here's the computer before visiting Goggle.com and here's right after they've entered that address bar. So you can see this kind of malware is a big deal, right? It's a big deal. So in this talk, what are we going to do? What am I going to share with you today? Well, we're going to talk about malware on the Python package index, as you might expect. We're going to talk about how it gets onto your machine. We're going to talk about how it's been made to look legitimate, how it works, and how we can protect ourselves from it. Does that sound good? Open question? No. Oh, no. I'm sorry. You should probably leave before they locked the doors, really. So, right, in this situation, let's get started with a quick disclaimer. First of all, malware constantly evolves and changes. I have had to rewrite this talk three times because whenever I give it, I give it again and I have to change the information that I give because these things change, right? This is changing fast and I am not a security professional. I'm also not a wrapper. This is something that I'm very passionate about, but it's not something that I'm a super expert in. I just want to share with you what I've learned so that you can understand as well and hopefully care about this situation as much as I do and understand the value of that. Is that cool with everybody? Yes or no? So, that one was yes, yes, no, versus crowd noise, but I'll take it anyway. So, let's talk, let's start with the cost of malware. Now, if you haven't guessed, my talks are quite audience participation heavy to prove the point Guido Van Rossen walked out of one in April, and I'm okay with that because I'm very in your face. So, cost of malware, I'd like to ask you, this is a genuine question. This is something that was a study done by the Poneman Research Institute in 2022, and they actually surveyed about, well, a few hundred, I think, 450 organizations to find out certain information about them, and so I want to ask this question to you. What do you think the average cost of a data breach was for one of those organizations? Half a million. Half a million? Ten million. Two hundred thousand. Two hundred thousand. Can someone other than those two people guess, please? Five dollars, maybe slightly higher. A quarter of a billion. Well, that's fair. Four million euros is about that. In Corona, that's 97 million in Corona. Just thought I'd put the conversion in. I don't understand how the money here works, but it's pretty. Speaking of which, actually, am I intelligible? Can you hear me okay? Can you understand me okay? Awesome. I've got another question for you. So, what percentage of breaches were caused by stolen or compromised credentials? So, not phishing. This is people actually either having their data scraped or socially engineered in some way, but not specifically phishing. What percentage of breaches were caused by compromised credentials? In percent? One. One. Twenty. Twenty. Twenty. Forty. Ninety. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Seventy. Again, this is a big deal, but what does this mean? Well, I'll tell you what I think it means. I think it means that developers are a very lucrative target for malware actors, right? And there's two reasons for that. One is that they're going to download stuff from PyPI, but the other reason, maybe the more important one as well, is that they actually get to their end users because people are going to use software that these developers make, and that can be a problem as well. So, ideally, what a hacker wants to do is be able to execute some code remotely on your machine or in your environment. And what they'll actually want to do is things like stealing your credentials, ransomware, for example, or even crypto mining or actually crypto diversion where they actually just, whenever you make a crypto payment or you get something, it actually just goes to their account. And that's also a real scam that I've seen. So there's lots of interesting stuff they can do there, but also end user targeting usually works like this. So they might have malware injected into a package which behaves as normal. It behaves as expected, but it actually might include some vulnerabilities. It might use outdated versions of code, things like that. And in that way, your users can be vulnerable as well. So do we see now how this can be a bit of a challenge for developers? Awesome. So this thing, this beautiful thing, it's also a wretched hive of scum and villainy, I think, is the expression. But it's a place which we've all hopefully used. Hands up if you've used PyPI for anything. If you've pip installed anything in your life. Oh, wow, the whole room at PyCon. I'm not surprised. So awesome. Well, I have two, but there are some challenges that kind of come with this. So there's two main ways that malware kind of gets onto PyPI and then gets to use. So there's typosquatting. And there's also infecting existing projects. And what I want to do is tell you how each of these works and show you some examples and then actually show you some real malware that I've got on my machine that I made myself just for this talk. I'm not going to drink some water, so please talk amongst yourselves. You know what? When I gave a talk at PyCascades, I went to the US and someone actually asked me, what were you doing in Canada while you're now in the USA? I said, I'm a software engineer and the person in this border got looked at me, dead in the eyes and said, what's the difference between that and a hacker? I was terrified. Anyway, typosquatting. Let's talk about this first. Please come in. Please come in. You're very welcome. Typosquatting, what is that? We saw an example of this already. We talked about this with Google.com. We talked about what that can do for you and what it can do that's bad for you. Sorry, I'll put it back. Yeah, so what this can do for you? But essentially what happens here is that the user is expected to mistype or misremember the name of the package. So yeah, with Google.com, that was the situation there. With PyPI, we have that, too. So I just want to ask this question here. What percentage of PyPI packages are estimated to actually be typosquatting right now? In percent? 30, 40, 2, 45, 10, 5, 40. Okay, varied answers, 3. Might be higher now. This was a slightly older study. But yeah, there are some people in the front row who are very happy about their guesses, which I respect. Here's a better question, then. What percentage of downloads from PyPI are estimated to be typosquatting packages? This is people who think they're getting something, but they're getting something else. What percentage is this? 12. 12? 8. 1. 1. Also 3. 15. A lot of varied guesses. The actual answer is a little smaller. 0.5%. But that's 1 in 200. Think about how many packages are downloaded every day. That's a really, really big number. So even if these typosquats aren't very likely, they're likely enough with the law of larger numbers to actually get someone, right? So let's talk about types of squats. And when I thought of this pun, I was so happy that I didn't do anything for the rest of the day. I just signed out. So there's misspelling typosquats, right? So too much honesty there. Everyone knows this package. Has everyone seen this? This is a package that lets you do HTTP requests. Hands up if you know it. Has anyone ever installed this? Keep your hand up. Okay. Has anyone ever installed this? Or this? Or even this? These were all real examples of malware that was found on PyPI that was typosquatting on the request package. This is real. There's another type here, confusion typosquats. So this works where the user might misremember the name. Or there might be some separator confusion. So they might mess up the order of the words in the name of the package. Or they might get the versioning kind of thing wrong. For example, this is a piece of malware that I actually saw on PyPI a while ago. Request three, zero, zero, zero. And if you saw this, you might go, oh, it's like a beta version of the new stuff. I wonder what new stuff they've got in Request v3. Don't install that. That was malware. So this is real. And so what I want us to do is play a little game. Is that okay? No choice. You're in the room now. Tell me which of these is malware? Top one, bottom one. Which one's malware? Can you tell? Hands up for top. Hands up for bottom. Everyone thinks the bottom one. They're correct. Don't worry. You can redeem yourself. Which of these is malware? Libcurl or Pycurl? Hands up for Libcurl. Hands up for Pycurl. So this is a sneaky one because, yes, because Libcurl is the actual system package, but actually the PyPI package that uses that is called Pycurl. And this is a real type of squat that was made. It's very sneaky. So people are getting quite clever with some of these, right? There's the other thing we can do, which is corrupting existing projects. Again, talking with yourself. Thanks. So corrupting existing projects worked in a few different ways. One might be that a project doesn't have any malware. It's a normal project. But then it might build a user base and it might then add malware in a later release. And this was the case for a package called Fast API Toolkit, where it started off as like a utility module, just adding stuff for Fast API. But over time, actually, there was one commit that came out, which actually changed that and added malware. Alternatively, a repo might get a new maintainer because a lot of this stuff is open source, right? And people need help. They need all the help they can get. And it has happened where a repo, you know, someone starts committing to it and contributing to it, being helpful, and then they get maintaining rights and suddenly they add malware. This also happens. You've got to trust who you're actually letting, you know, have the keys to the Lamborghini. Alternatively, a maintainer might just get hacked. This is the simplest one, but it does happen. People can get hacked. They can get social engineered in some way. Then you get malware, okay? So this one's a little simpler. But how does this stuff look legitimate? This is the real question. Once it's on your machine, how do you look at that and go, well, this looks okay. This looks normal. So a few ways. One way is that you can have malware dependencies. So these are things where there's an innocent package, but it has a malicious dependency on that package. So it looks okay, but something it pulls in is actually pretty bad. And that might actually behave like normal. So you might not know the difference. But then, oops, sorry, maybe, it might behave as normal to avoid suspicion. But then, actually, the dependency will do something bad. There's also, has anyone heard of this, starjacking? This is an interesting one. And it works. It's so easy to do. I'll show you. But essentially, what happens is that on PyPI, the URL that you give for your project, when you set up that thing, you set up PyI, your PyProject.toml, it's not verified. And that means that it can be abused. So basically, on PyPI, I upload a package, and I can pass in a URL of any GitHub repository whatsoever, and it will give me that number of stars. So, quick one here. This one's going to be obvious. Which of these is malware? That one. How do I know? I made it. I put it on PyPI. That's how I know. More importantly, if you look at this, first of all, look how popular I am. You might be able to see here. If you can't, I've got 900 stars in this. I made this last night. So this actually is a real example of starjacking. And this is used because if you now visit this PyPI page, it will look legit. If you name this after the requests, and you give it the stars of requests, it's going to look more legit, right? Goodness me. So, a typical chain of events might go something like this. So, the user will install a type of squatted package that might depend on a malicious package. Now, that package might run and decode some base64 encoded code. And this situation is usually like the first stage. And then the second stage is that the real malware will actually get downloaded. So the first one will basically just pull malware. And then that malware can then look like anything. They don't have to hide it, because they've just pulled it directly to your machine. The result of that is a very sad snake. So I've talked enough about this, I think. I think I have. I think we're ready for something a bit more. Let's show you this thing. So this is an SDK that I maintain for my company Vonage. It does lots of communications API stuff. So I'm using lots of requests, lots of HTTP calls, things like that. You can scan it if you want. I put this in because they paid for my flight. Thank you. But more importantly, there's a new version that has appeared recently, i.e., last night. Definitely not Vonage. Now, I didn't want to make a type of squat that was too close to the real name, because I don't actually want anyone downloading this malware. But this actually was something that I uploaded yesterday, and it's on PyPI right now. So here, we can see it actually looks a bit like the real one. You can see here, I've used starjacking again. So you can see here it has got the right number of stars and things like that. But it's not the real package. And so what I want to do is show you what happens if I try and run this. So I've got these two bad packages which are type of squatted. So definitely not Vonage and request tool belt. So this actually, this is spoofing a module called request tool belt, which is real. It helps you do HTTP requests using the request module. So this is a real package that I've just added a v2 to, and I've added malware to. But what we're going to do is require this inside of here. So I'm actually going to just show you now what that looks like, because, oh, this is small. I'm going to make this a lot larger. That's too large. One second. I'll be right with you. There you go. So this is actually my fake, definitely not Vonage package. Can we all see this okay? Or shall I make it bigger? Bigger? No, it's okay. I'll do one. There we go. So this is a piece of code. This repo looks a lot like the real one. It has all the stuff of the real one. It should behave as expected. The only difference is actually it requires this extra thing here, request tool belt v2. And you can see actually in the client thing here, what I actually do, there's a package in there called requestsUtil, which again is a different name. But what this will do is essentially it will run my piece of malware. And why will it run it if it's in a client file? In my init file, I import everything. And so when I import Vonage, which is the name of this package, it will actually also import my malware. So as soon as I try and use this, I'll get malware on my machine. Now, this is the fake package I've got here. And you can see actually the requests file here. This method doesn't do anything. It just prints something out. It doesn't actually do anything. It's not useful. And if you look at my setup file, I'll move this so you can see better maybe. It doesn't look weird. It looks pretty normal. This is, by the way, a real author name used by a malware actor recently. I thought I'd use it as an homage. So this all looks pretty innocent, right? So let's look at the init file again. This looks pretty normal, right? So where's the malware in the dependency? This is my dependency. So where is the malware inside of here? Inside of it's in here. So someone there actually just said the answer. This is a real technique that is actually used to obfuscate code. It's not just base64 encoded. It's also just a little bit further away. And if I just scroll along just a little bit, a little bit more. Oh, dear. Oh, what have we got here? Oh, no. A base64 encoded payload. I wonder what this could do. So we can see here. I've got some malware. Okay. So what's cool here is that this is a real technique that is really used. I mean, that's not actually cool, but I find it interesting. What I will say, actually, I was going to run this live, but just before I got here, someone tweeted me, a security volunteering organization saying we found your malware and we reported it to PyPI and removed it. So this file is no longer on PyPI because, luckily, someone saw this. This is real. This happened just before. I'm having to wing this part. So actually, I'm going to have to show you my local copy because it's literally just been removed. So what I'm going to do is show you this locally. So if I open up my terminal, I've got this here. I'm going to make this really big. I'll just do this. There we go. Cool. I'm going to make this really big. Okay, also, is that visible? I'm going to make it bigger anyway. So if I do... I'm in this virtual environment with this malware right now. I've pretended I've just pip-installed that, although, obviously, it's not on PyPI now. But if I do a pip list, what we can see here is I've got definitely not Vonage and I've also got my malware dependency here. So are we all ready for me to run this on my machine? Shall we all see what happens when I do? I see people's eyes like I want blood. Okay, this thing's going on fire. All right, I'll get you. Okay. You're right. It is, isn't it? Interesting. Yeah, let me do that. Oh, goodness. Oh, this one, Paulie. I like your idea. We need more ideas like that is what I'm saying. I'm going to do that. I wish. I wish I was this good. This looks like death. Okay, anyway. I'm now running my pipe and shell inside of there. So if I import that package, it's just called Vonage. That's the name of the package, right? So if I import this, what's going to happen? Well, let's find out. It's the worst malware of all. And so it's always beautiful to see, Rick. Always makes my day. Okay, so let's go back, shall we? It's the first time it's ever played live. Usually the conference Wi-Fi is too poor for that. So let's actually, let's refocus now. So what have we done? We've made some malware. We've uploaded it. And in general, it's actually very easy to get hold of. In this one case, thank you. In this one case, we actually, we were lucky in that actually a security team managed to find this and save all of you from getting Rick rolled. But luckily you're in my talk, so you got it anyway. So the real question, this is a real question because there's so much malware on PyPI. What can they do? This is a question I get whenever I give this talk. And well, the answer is, unfortunately, more limited than we would like it to be. So a couple of months ago, PyPI actually suspended user and project registrations just because they couldn't cope with the amount of malware that was being put on every day. Like the amount of typosquats, the amount of vulnerabilities that get put in, it's really, really large. And so with PyPI, there are some things they want to do. So PEP458 and PEP480, these are to do with package signing and basically trying to stop people tampering with packages. But that doesn't actually help with typosquats at all. So actually, first of all, PEP458, they're just starting to work on that now. It just got approved. And PEP480 is still in the works. So these security things are going to be a long time coming. But also, right now, the best way you can do this is to manually flag malware. So like happened to me last night, if I get malware and someone sees that, they can flag it. And you should do this, too. If you see malware, you can report it to PyPI and they will remove it for you and they are pretty quick when they get a request. You can also... They also want people to essentially eventually implement automated scanning. So a point where you can actually scan the repo, work out if a package looks like it contains malware like mine. Now, the issue here is that, essentially, they want to do this, and it's been in the works for a while, but it's hard because the tools they have give so many false positives they can't really be used, and they want to improve them. But the issue is really... It's an open source project and it's quite small, so the real issue is this. It's the funding. We just need more support for this stuff. We need more organizations and enterprises to actually fund this. So in the absence of that kind of support, how can we protect ourselves? That's the real question, right? And there's two different sort of streams of answers if you're a maintainer of a project or if you're just a user of a project. So if you're a maintainer, you might want to just accept the fact that your dependencies may become compromised in some way, and so you might want to think about minimizing them. You might also want to consider defensively typosquatting your own packages. So this is actually quite a useful technique. So, for example, if I own the request package, I might make some similar packages with a similar name that are safe, that don't do anything. They just say, hey, go download the real one. And why would I do this? Because that way, if you own a PyPI project name, no one else can. And so no one can typosquat those names. So if you have a popular package, you might want to consider that. Actually, as well as that, there was actually a researcher who did this. There was a person here, William Bengston, who downloaded... Sorry, he actually created sort of, I think several thousand, actually, three to five thousand packages, which were typosquats of real packages. And in two years, they got 500,000 downloads each. And if he hadn't done that, a lot of those downloads would have been towards typosquatting addresses that were vulnerable to malware, right? So a lot of those would have been malware. So everybody, not just maintainers, but what I would say is, first of all, don't just type pip install and then the thing, because if you just hit the wrong button, you could get malware. You might want to think about requirements file. This will just give you a bit more time to vet what's in there before you run it. If you're using a mirror, which often is a good technique because it means you can vet what's there, you might want to check the latest safe version because often mirrors are configured so that actually they don't roll back. So for example, with FastAPI Toolkit, it was a normal package that was eventually, after one commit, suddenly a new version came and there was malware, but a mirror wouldn't automatically roll that back even though that release was pulled out of PyPI. And so that means, essentially, that the mirror was actually holding the malware and so you thought you were getting something safer, also, if you see something, say something, you know, like the security researcher who found my malware and removed that, if you see something dodgy, report it to PyPI because they will remove it for you. And that's just something that, you know, if you see it, say it, only you could prevent forest fires and all that good stuff. Also, think about automated scanning tools. So you might want to see for vulnerabilities things like Men for GitHub or Dependerbot, these things, these just kind of tools that let you, they tell you if any dependencies you use might have become compromised think about these. So, finally, to sum up, first of all, I think this snake is so adorable. I would put that in my shopping basket. So, type of squatting on PyPI is an attack vector, but so are benign packages that become malicious over time. So you want to vet your dependencies really, really carefully and you want to minimize the number of those dependencies. You also want to use automated scanning tools if you can and you want to be really, really careful when you're using mirrors. So, I'd like to just leave us with a few words from an eminent poet of our time that I've just paraphrased slightly here. So, it never gives security up. It will never let you down, run around or, and I quote, desert you. Thank you very much. So, just before we start the Q&A, I've got some links here if you want to connect with me. Please feel free to scan that. If you want to make a free developer account with Vonage, please scan this one too. This is the final thing. I've got so many incredibly shiny stickers with pythons on and if you want some, come and talk to me because let me show you. Let me show you real quick. Thank you for your questions. I don't know if you can see this at the back, but it's very cute and I got a load of them. So if you want some, come talk to me. Thank you, Max, for this very informative and entertaining talk. We have a few minutes for questions. Anyone, raise your hand. Nope. I see a question. I'm running here. Thank you so much for your talk. If you install the packages inside a virtual environment, maybe you know this, maybe you don't know. How much, if you create a new package, how big is the capacity to go out of that virtual environment on the machine? That's a really good question. Thank you. I think I can help with that one. So the question that I think if I I'll say it back to you just see if I understand it is if I do this in a virtual environment, if I'm in a Python virtual environment, does that make me any safer? No. Because in a virtual environment, you can still interact with the whole system. In a virtual machine, you are safer. In a virtual environment, all it does is containerise your Python dependencies into that, but actually, you can still access any file on your system. It's still at risk. Question? As a follow-up, would using a Docker container work or no? Yeah. So that would basically, I think that would be better. Basically, it means that your container is at risk in the same way that with a virtual machine. So the same kind of thing. Actually, related to this, there's a security talk to after me, I believe. Sebastian, who's talking about this, I think a lot of people will actually have, essentially, they'll still, they'll take a container from the internet, not really mind where it's from, as long as it's got the dependencies they need. They won't check what else is in there, and then they'll chuck that onto their local site for doing that, which means that they can install malware, too. So actually, sometimes containers can be less safe. I think go to Sebastian's talk in an hour's time if you want to see that. Thank you for the talk. It looks like it's a lot of patching here and there. So you take care of the malware and use strategies. But is there any vision for completely changing the way the Pi Pi works? Is there anything that we could think for the next 10 years or five or something like that? That's a really good question. So I'll try and say it as well, just to understand it. So I think the question is right now, Pi Pi is pretty insecure in these ways. But is there anything that could be done in a longer term to make it more secure as a platform? There are some things that have come in. For example, the PEP is being accepted. If PEP 480 gets through, that's to do with package signing. But it would mean that the dependencies being compromised on the way to Pi Pi or down to a user, that would be limited. That would be a good step with more funding. I think something else would be getting some funding and getting people to actually work on that. But I think you're already talking about fundamental change to how Pi Pi works, aren't you? Not just these little incremental things. Sadly, I think to do that, we'd have to make it something more like Maven. So we'd have to really change how the entire thing works in a way that essentially trades off openness versus security. I think that Pi Pi have chosen where that line kind of wants to be for openness versus security. And I think that it's possible, for example, that you have to prove that you are on how you own everything, that could be something that's implemented. But I think that that's something that's unlikely just because of the ethos of Pi Pi, honestly. Thank you very much. I don't think we have any more time for... Maybe it should be right there. Sorry. No time. Come talk to me at the end. I'll chat to you. Please reach out to Max in the hallways or on those links or on Discord channel. And thank you very much, Max, again Thank you very much.