 You're a target. You know that? There are people out there. There are attackers out there targeting you, trying to break into your systems, your software, to steal your data, your money, your identity. Or maybe they're not targeting you. They're trying to use you to get to someone else, your processing power, your network connections, your credentials. Either way, that makes you a target. And these attackers are not creative. They're not coming in with magic hacking software. They're using your own software against you, legitimate tools used for illegitimate purposes, exploiting combinations of software that the original developers never even considered. How are you, as a regular user, regular developers supposed to protect yourself against these kinds of attacks? The good news is you're not alone. There are whole teams of people helping here. And I mean more than just devs fixing bugs, more than just researchers trying to get bug bounties. There are entire teams working on the detection, prevention, and response to these issues as they're discovered. Experts who analyze the risks and offer fixes or workarounds. People who are working to help you avoid becoming a victim. I'm part of two of these teams. The Python security response team, which is a group of volunteers from the Python community who receive reports about issues in Python and determine how to respond to them. I also work with the Microsoft security response team when they have issues that come up involving Python. Today, I'm going to share some stories from working with these teams and the work that we've done. Now, I want to point out the specific issues I'm talking about are secondary. So don't get too hung up on the details, but I'm more than happy to chat about it later. I really want to focus on the process. I'm going to discuss how we decided on the right action to take for a particular issue. I'm going to call out ways that you can protect yourself from those issues and similar ones in the future, ways that you can help us to protect you and your users. And I also kind of want to help you see that we're all just humans trying to do a really difficult job, and any sympathy is welcome. I have three stories to tell. So, sorry? No, blank slide. Blank slide, it's fine. Three stories to tell, so let's bring up a slide and get started. The first one relates to the release of Python 3.10.3, where a few months ago, a set of security releases. We actually did most current versions were released on the same day because there were a lot of fixes in here. There were 20 CVEs fixed in this set of releases. Now, the majority of those were in dependencies that hadn't been updated in a long time. But there were some relating to Python itself, and there was one in particular that I handled from start to finish. So that's the one I want to talk about, and it's this. CVE 2022.26488, the Windows installer enabling escalation of privilege, which essentially means that if you'd installed Python on Windows using the installer, an attacker could come in as a regular old user and turn themselves into an administrator. Really bad, really bad. As I say, the detail is not so important. What I'm going to walk through is how this played out. I'm going to go through the timeline from when the report came in and the steps we went through to deal with it and eventually get those releases into your hands. So the timeline. Second of March 2022, 10 p.m. This is UK time because that's run based. The initial email arrives to the Python security response team. It's encrypted because we accept encrypted emails. There's a GPG key because that helps protect reports from being intercepted along the way or ending up on random mail servers. And it came in from a research team at a company who had done a really good job. This was a really good report. It was straightforward. It was clear. It told us what was wrong. It told us how to trigger it. They had some assessment of the impact of it. And it was really easy to kind of read and say, yes, this looks like a legitimate issue, which we did. And by 1 p.m. the following day, we basically responded to them and said, thank you for the report. Because that's what we always do first, is we want to try and get back as quick as we can and say, yes, we've got your email. In this case, we didn't have more information. But a lot of the time, we've ruled out an issue by this point. We're able to see either we know about this one. This one's already public. This is not actually a security issue. This relates to something we don't handle. And so a lot of issues get resolved in that first response where we'll say something like, hey, this is public. It's on the issue tracker already. Go here. Feel free to discuss this in public. In this case, we thought this was a legitimate report. And so we accepted it, but didn't offer any more details there. I then spent the rest of the day confirming it. And by 9 p.m. I was able to respond to the team internally and the researcher with a separate email saying, yes, we're accepting this. This is a vulnerability. We are going to handle it from here. Thank you so much. And then spent the rest of the night. And by midnight, I had a draft patch fixing it, which involved just a few updates to the Windows installer. And I'd pre-written the announcement, because the important thing about these vulnerabilities is when we disclose them, we have to describe them in a way that users know whether they're affected. It's really easy to describe the technical change, but that doesn't tell people whether they have to be worried. It doesn't tell people whether they're impacted. And so getting a good description of the vulnerability together and then sharing it within our group, because they're the people who have agreed not to leak anything early, is a really important thing. So I shared that that night. I also requested a CVE. Now, the CVE identification number system you've probably heard of is commonly seen as a way of notifying vulnerabilities. That's not strictly true. The numbering system is a way of identifying vulnerabilities. The notification is separate. I say that the industry is somewhat moving towards when you get an ID number, now you've notified everyone, and people only respond when you get a number. And there's a bit of kind of weird dynamic, which I'll come back to in a later story. But in this case, we said, yeah, we definitely need a CVE ID for this. People need to be able to say, here is the specific one I'm talking about. So I sent off to request that. I forget now, but I think the fifth and the sixth are a weekend, so I didn't do anything. But then by 6 PM on the sixth, we had the email come back from MITRE saying, here's your CVE ID. They didn't have the details at that point. We just requested it vaguely because we haven't disclosed anything. And they know we want a number. And so I shared that ID with the rest of the team and basically said, hey, I'm going to publish this tomorrow. Any more feedback? Any more discussion? Any more concerns? There was not. And so by 5 PM the following day, I pushed a set of pull requests to our GitHub repo with the fixes. I sent an email to our notification email list so that everyone would know about it. And I got back to the researcher and said, thank you so much. We've fixed this. Here's the link to all the information. Now if you keep in count, this is five days, including a weekend, between the initial report and the final disclosure. That's really, that's unusually quick. Normally we're not that quick. Normally no one is that quick. But this was very smooth. We try to get most vulnerabilities at least to the point where we believe it's possible to disclose and say, here is the patch for it at a minimum within 90 days. It's kind of the industry standard for this, largely driven by some researchers basically saying they'll go public first if the product doesn't wait that long. But in this case, we got it done much quicker. So this story is basically about me and what I did. But I want to share that with you. What should you do to handle this kind of issue coming up, which is completely outside of your control as a user, but still makes you vulnerable? The first thing is get security notifications. As I said, the CV ID is not a notification. It's just a number. It's a reference. We provide the notification. If you search for Python security announced, you'll find our mailing list. Sign up for that. It gets about one email a month, hopefully less, because it's only announcements. Discussions go on a different thread if you're interested in discussing them afterwards. But this is the one where if you care about issues in Python, this is where you get the first notification of anything the Python security response team has handled. Next, we need you to install updates. And not just the ones that we tag as security updates. Every single update has bug fixes that may have security impact. They don't all get CVE IDs. We don't always know that there are vulnerability at the time they get fixed, because every bug could be abused. You're generally going to have two options for getting your updates. If you usually get them from Python.org or directly from GitHub, that's going to be the quickest way to get the most timely official release. But it won't have gone through a whole lot of testing at that point. It'll go through our standard test suite. If you normally get Python from your distributor, you want to be encouraging them to have the security update delivered to you once they've run it through their additional tests, possibly further patched. And of course, if you're really worried or highly impacted by something, it's usually possible to patch Python itself. Most of our issues tend to be in Python code, somewhere in the standard library. And so if you need to just go into your installs and modify it, that's often very possible. And finally, we need you to report security concerns to the Python security response team. Now to do that, you start by going to Python.org, scroll down to the bottom of the page, and there's a link down here which says, report a security issue. It doesn't actually flash blue on the side. I just did that for here. If you click on that link, that's going to get you to a page that has our email address, it has details about encrypting issues, it has details about some things that we never want to receive on that team that can always just go public, and a few examples of things that we do want to hear about. So that's where you can go. If you have any concern or issue at all, and we do get concerns, they're not always issues. We have people asking questions about things that they saw changed, that they think shouldn't have changed. And those are great. We're the right place to ask a question where you don't think the details should be in the general public yet. And we're happy to make that decision. Let's talk about PyPI. PyPI is the best package repository for Python. By far, it has all the packages. And anything you could ever want, anything that you want to release can go straight up there. But full of malware? Yes, but it's a little bit more nuanced. It's a little bit more nuanced. So let's dig in. The reality is, yes, there's quite a lot of malicious code on PyPI. But it's virtually all typosquadding. What do I mean by typosquadding? Let's have a picture. Imagine this is PyPI. You've got all these awesome packages. It's NumPy there, and Pandas, and Addis is somewhere in there, probably the biggest one. And these are all legitimate. They're good packages. But there's all this gap between it, right? There's all this space in between those packages. And that's where the malicious code goes. They don't have the names of the real ones. They have different names. They have packages that no one has ever wanted to use. But they're there so that if you misspell a name, you might get it by mistake. To date, we've had exactly one known takeover of an existing package, which was entirely outside of the control of PyPI. That was due to the author messing up. And so you really don't have to worry about packages that are already on PyPI that are known suddenly turning malicious short of trusted people suddenly changing their mind to do it. Most of them are aware, if you double check your package names before you install, you're never going to get it. We also have third parties regularly scanning PyPI. We often get notified about these new packages within hours of them being published because there are various groups scanning everything that goes up and detecting it no matter what it's trying to do. They're often stealing credentials or stealing environment variables. And we get notified really quick. PyPI is also doing some work to add first party scanning. So when they get uploaded and before they ever go public, we'll be able to detect and flag some of these. But there is one aspect to this that is a little less obvious but is unfortunately really, really exploitable. And it's known as dependency confusion. So I worked on this one with the Microsoft security response team because the researcher targeted us, targeted Microsoft, via PyPI initially. And so I got looped in. We spent about six months discussing this one. It's a lot of time spent analyzing, trying to understand the impact, understand the mitigations, what we could do about it. And testing the various fixes that we came up with. It started for us with PyPI. It spread out very quickly because we found other package managers have similar issues and certainly other languages as well. But here's how dependency confusion works. First off, you need a private index and you also need to be using PyPI, which you can easily do with PIP. You can add extra indexes to install from. And you're gonna have some private API that you use internally on your private index, where it should be. And let's say you're also using requests from the public index because good chance most of us are. As I saw before, there's mess in the namespace, but only on the public server. Our private one is clean because we restrict who can upload to it. And so we don't get all this malware being thrown up there. But despite this, PIP is fine. It knows that when you install private API, it will grab it from there. It will grab requests from the public one. Everything installs just fine. It works and we're all happy. But what happens if private API shows up on the public server? What happens if it's a higher version than your one? PIP treats multiple indexes as mirrors. It assumes that they're all interchangeable, which is a historic requirement before we had content delivery networks. And so it's gonna say, here is a higher version of this package you requested and it helpfully runs off and grabs that one instead. And you've just been attacked. Our end result after working on this for six months was guidance. It's a big document with a few tips. Tried to keep it fairly readable and easy to follow, but we put it out as a white paper on mitigating the risk of this situation. Unfortunately, you can't always fix the software. Sometimes you just have to explain to people how to use the software in a way that won't make them vulnerable. And in this case, what we found is that modifying PIP to make it protected from this would have required everyone else to do just as much work as what we eventually recommended, which I'll show in a sec. And so with that trade off, it was like, well, we don't even need to modify PIP then because you've got to do that much work to change everything so we can do it without the modification. It might seem silly to have taken six months to get to just a document, but we tested a lot of stuff and it wasn't just, does this technically work? It was, can we explain this easily? Can we take within Microsoft tens of thousands of developers that were impacted by this issue and in one single email that they all read, okay, already lost there, one single email that their manager eventually reads to them, understand what the issue is and make the changes they need to make to mitigate it. That's much harder than it sounds. And of course, for us, it extends far beyond PIP. It includes, you get NPM is usually somewhat protected in almost all cases, luckily. It does affect Maven, but again, the public service for that is a little bit more protected. And yeah, we found what worked. Incidentally, this document never mentions dependency confusion because we wrote it before the researcher published his paper and so we didn't know what it was gonna be called. And so we picked some corporate speak and went with that and didn't know that we didn't have the cool name. So here's the recommendation that basically comes down to. There's more nuance in the document, but I don't wanna leave you hanging on this one. So if you're in this situation, the best way to protect yourself is to actually mirror the public packages into your private index and then only refer to that one and never touch public PIP anymore. The critical PCA is only referencing one index at a time because once you've referenced multiple indexes, that's where they can all pick and choose between them. Now, if you trust all of them completely, then that's all right. But that involves trusting that no one is ever gonna upload a package by the same name to more than one index. We tried that. It was very unlikely that that was ever gonna work and so we boiled it down and said one index, only reference one index, copy what you need in there. Or if you've got the tooling for it, which a number of online feed services will let you do, is have it automatically forward packages and do that mirroring for you. It's not quite as good because the typo could still get through but it's going to avoid someone stealing your package name if you've already got it internally. All right, this is another big one, Trojan source. Whether you've heard of this or not, anything you've heard is probably drastically overblown. I'm just gonna say that up front. It was sold as a critical threat. No one can trust any code anymore. It could all be malicious and there's no way to find out. There were two CVEs came out of this one, CVE 2021, 42574 and 42694 against the Unicode standard, which is not a product, you can't just update it. The CVE systems occasionally get misused by this. In this case, they were reported by the researcher and the Unicode consortium found out when people started contacting them saying, hey, you just told us you have a vulnerability, what are we meant to do? And Unicode was kind of like, we didn't tell you that because they had no idea. Let's, before I go on, let's see the two issues in action just to help give you a bit more context. The way we'll warn you, this is, I've also been working on this one for six months and it keeps blowing my mind every week when we meet. So it's an incredibly deep one. So we'll look at some code. Imagine someone submitted this as a poor request to your repo and you're reviewing it. Relatively straightforward, we have some scope that defaults to admin. If the user's not an admin, we're gonna use the user scope instead and then call some function that makes a decision based on that. And I think it's fair to assume that if they're not an admin, they shouldn't be getting admin privileges. But you merge this in because it looks fine and then you start getting reports that, hey, we've been exploited because people who are not admins have admin access, what's going on? There's a homo-glyph attack. This is one of the two vulnerabilities. And it's in this scope variable. Can you see it? No, that's the point. Okay, let me expand these out to the underlying code points. This is what the compiler sees. That first one and the third one are the letters SCOPE. The second one, I believe, Cyrillic letters that are not SCOPE, but they look like it when rendered in most common fonts. Chances are if you see this in whatever your editor is, they're gonna look exactly the same and you'll have no indication that it's actually a different variable completely. And in Python, this works just fine because it creates a new variable, puts user in it, and then we happily go along and pass admin through. So this is one of the two pieces of issues with the Unicode specification that had been raised by the Trojan sources. Now, someone sent this pull request as a fix, which looks better. It looks better. And we've checked SCOPE. They're all the same scope now. That bit's fine. We've inverted the logic a bit, which is nice. We default to user and then only upgrade access if it passes the check, which is a good move. And then we go ahead and user, and they've hopefully added a comment that says, it should be admin if they need access. And so you merge this one because it looks good. And now you still have the same problem. What's going on this time? In this case, it's the other vulnerability it's misusing the Unicode bidirectional algorithm. And what the compiler sees is actually this. Now, the bidirectional algorithm is what allows computers to always have characters kind of going from left to right in memory, but if it's a right to left language and particularly languages that switch from being right to left and left to right at various points, there are special characters and special behaviors of certain characters that help it render correctly for those languages. If you only ever use English, you've never run into this before because computers just always kind of default to that. But if you've used any, normally I would say any Arabic language, but every time we talk about this, there's like 17 languages from all over the world that run into this all the time. There's just so many people out there that need this functionality. But what's happened here? That string is actually a right to left embedding. That's the control character. And that says the following characters should be treated as a right to left language. And then the PDF means reset that state essentially. And so the user that we saw in the string before is actually in the comment. And there's a Unicode control character in the string. So we're not passing user into allow access. We're passing something it probably doesn't understand. And it's choosing to fail in the way it is. And the challenge here is the compiler's right. So the code that we see is actually the thing that's wrong. The human reading this is reading something and interpreting it different from how the compiler's gonna read and interpret it. But ultimately the compiler's right and also the person's right. The researchers originally suggested, and they do still suggest on their website, that compilers could see this and raise a syntax error. But that simply doesn't work. Strings are allowed to have Unicode characters in them. If it's a Unicode source file, that's allowed. Comments are allowed to have Unicode characters in them. That's allowed. Probably half the people in the room would not be able to write their name in a source file if that was an error. So then the CVEs went to the Unicode stand and said, well, Unicode should prevent this. And so out of that, we've had a working group going for about six months now. A multidisciplinary team of experts from Unicode, various programming languages, various editing tools, just meeting regularly to try and work through all of the implications of this. How can we make this work? Some tools have already added various notifications. If you open this file on GitHub, it's gonna have a little notice at the top saying this file contains bidirectional Unicode text that maybe interpreted or compiled differently than what appears below. It doesn't tell you where it is. It doesn't tell you how it's gonna be interpreted or compiled. It says it's there. And that's enough. That's about as much signal as they can reasonably give to, you know, you need to pull this down and inspect it more closely. I haven't checked, but I suspect if you're working in any normally right to left language, every single file is gonna have this marker on there. So there's only so much value in this, but what else are they meant to do? Some tools are rendering it either like the lower one with the control characters visible, but there's a big discussion about whether the characters in between should be shown right to left or left to right because they should be shown right to left because that's correct. They should be shown left to right because that's what the compiler sees. And it's a really, really complicated issue. Unfortunately, there's no happy ending for this one yet. We've been working on it for a long time and I think what we're gonna end up with is some updates to the Unicode Appendices for source code identifiers. Uax31, Uax39 if you're interested in looking those up, but they are gonna be updated with more guidance for languages, tools around languages, and editors or renderers of text to be able to help people reading code understand how it's actually going to work without preventing users of right to left languages from being able to use their normal language in the code that they write. Security is everyone's problem. The good news is there's many people out there who are working to help, experts around the world. For you, remember to install your updates. Double check your packages before you install them. Review contributions carefully, especially if you don't know who they're coming from, first time contributors, and please report concerns upstream. But most importantly, use and enjoy Python with confidence. Knowing that there's people all around the world, there's a huge network dedicating their time to research, report, review, resolve, and release a reliable Python runtime. Thank you. Again, if you have any questions, can we use the mic? We definitely have time for two or three questions. Thank you. Do we have any remote questions? If there are no questions. Steve, thank you very much. That was a very interesting talk. Oh, no, a question. Excellent. Yeah, so I have a question about you. So your solution is great because since you mirror it, you are getting this thing that if people was hacked and someone would replace a package there, then you are not really impacted by it. But you said that there were no changes made to PIP, right? There could be changes made to PIP. They would have to be in the form of you use new command line arguments and probably new configuration to mitigate it. And when we were checking that out, I was like, well, that's gonna be more work for people. It's like, first we have to update PIP, then everyone else has to update PIP, and then they still have to do more work to use it than they would if they started mirroring packages. Now, I guess in our context, we have an internal package feed. So that we use internal, it was Azure Artifact, it's available externally as well, that we were able to add the mitigation into that first and say, okay, everyone use this and then you'll be okay. So we had that advantage, but yeah, the feature going into PIP we decided against because it wasn't actually gonna save people work. Well, it could because then people wouldn't have to mirror the whole index, which may have gigabytes of data. And another thing, maybe that would be good to others, like some UX when you install a package and then when you find it in two different wrappers that could have a notification. But then it would only work in a terminal, so. Exactly, there's a lot of options, there's a lot of ways to inform people that things have gone differently. PIP already tells you what index a package has come from. And so if you run it in verbose mode and read it, I doubt anyone but me has ever read that verbose mode output for a PIP install. But if you do that, then you'll get that information and you're able to check it out. But yeah, absolutely, there's a hundred different ways to resolve this, which is why it took us six months to choose one. Yeah, and another question, what was the tool you made in the presentation because it was awesome? What was the? What is the tool that you made in your presentation? It's called Blender 3D, and a lot of time. Yeah, every single animation I've done in Blender. So did you make the animations in Python? Sorry, what was that? Did you make the animations in Python? I did use a little bit of Python. So because Blender 3D includes Python in it, it runs largely on Python, you can script the tool with Python. So a little bit. Yeah, thanks. Yeah. The changes will be acceptably incompatible if you just say, if you have your first repository having a certain package of the version and the second repository has a package with a higher version, then you will block that loading of the second option. Yeah, unfortunately not because then you have a similar issue kind of in reverse where if you're intending to get a package from the second index and someone uploads a lower version to the first one, then they can hijack you in reverse still. And so, yeah, if we did get to a point where we had multiple indexes available through PIP, it would probably be some syntax that says this file from this index, you know, this package from this index, this package from this index. And then, you know, dependencies of this package from, do we want them all from the private index or do we want some from the public index and it turns into a very big complex problem. Hashes is a good solution, incidentally. Yeah, violates, then I think the first unical problem would be detectants because you're assigning a variable that's not used. Exactly, yeah, yeah. I think you're very great to listen to. Thank you. Thank you very much. You're welcome. I think we might be covered with other questions.