 Hi everyone. So this talk is called Secure Python Development, Tips, Tricks, and Tools. I'm Dominic de La Rere. I'm a software engineer at Red Hat. I thought I had Red Hat under software engineer, but I think that disappeared somewhere. I work mostly on internal tools and internal processes within Red Hat. So what I'm going to present to you is not a project. It's not really even anything new exactly. What it is is a summary of things that you can get started with if you want to bring some secure development practices into your Python projects. This is just a summary of what I've discovered as part of a team working on trying to make our coding more secure and particularly in an environment that uses a lot of Python and uses a lot of CI security. So I'd like to lay a few ground rules. The first is that I'm not the security police. I'm not the Python security police. I can't tell you what to do. I'm not your mom. So anything I say here is advice from my experience. It's not the law. Security can only be improved, not perfected. You've probably heard it before if you've read anything about security, but it bears repeating that anything here is not going to make your coding perfectly secure. It's only going to maybe help you get to that next step of a more secure environment. Also, I'm going to mention a number of tools here. You'll see logos for different projects that you can use. They're not really endorsements. These are only the tools that happen to be familiar to me. And they're here as examples. So I'm not recommending one thing over another necessarily. We're going to focus on a few different areas for secure Python development. So first, I want to focus on secure dependency management. Basically, how to know that you're pulling in the Python packages and formalize that process? Secrets management, because like any software project, a Python project can have a dependency on credentials, API keys, et cetera. And Python has particular ways that you can manage those safely. Static application security testing or SAST. Again, this is not unique to Python, but Python has a unique way of interacting with SAST. Some basic network security reminders around Python and how to enforce best practices in Python coding so that you have a common starting point for a secure development in your Python project. So here's the Python package index. I think, well, I don't want to speak for everyone, but I like the Python package index. I like being able to pip install things. It's very nice. But there are some security challenges around that model of installing software and adding dependencies to your project. So the challenges we have are isolating Python packages from the system-wide Python, making sure that the library versions match across different environments. This is a challenge for everything, but obviously one Python environment could have different library versions from another, so that's another challenge. Vending package releases before trusting them in your application just as you would in any other language that has its own package management system. But we have particular tools for that in Python and verifying download integrity, just the basic making sure that you're downloading in a verified package mitigating against so-called man-in-the-middle attacks, et cetera. So in terms of isolating packages from the system-wide Python, the very basics that you likely have heard already are like please don't use pseudo-pip install on your main system. So instead you want to be using a virtual and you've probably heard that they've been around for a while, but it bears repeating. But it also helps to use TOX to provide dependencies. TOX is a great system for running all your unit tests in one place and in a predictable way. But it can also provide kind of a front end to various other things that you might be doing with your Python application or Python scripts. It provides a very reproducible way to install dependencies right when they're needed into a temporary environment and tear them down when you're done. So that's good for lint checks and unit tests, but again, lots of other stuff you can run with TOX. You may also want to consider something more comprehensive than TOX. There are whole Python packaging systems that will integrate that kind of dependency management. So those include Hatch and Poetry, those have both been recommended to me. And we definitely want to make sure that library versions are matching across different environments. TOX, for example, takes requirements.txt often. Requirements.txt doesn't have to have versions in it. You could just be writing the names of packages and that could result in inconsistencies. One thing to help with making sure that you have the same version everywhere across different development environments is PIP Tools. PIP Tools is a project that has basically two main utilities in it, PIP Compile and PIP Sync. PIP Compile converts a requirements.in file to requirements.txt file. So in a requirements.in file, you can be a little bit vague. You can say, I want requests, but you don't have to say what version of requests that you want if you're okay with it having a strategy of pulling in, say the latest available version or the current default. And a really nifty thing about that PIP Compile is that you can generate hash's flag to it and it will compute checksums, SHA256, so a nice secure hash that you can use to validate those requirements when you're pulling them in. So later, when you run PIP and install all of those, it's actually checking hash's as it downloads the packages. The annotate flag is also very useful if you've got this in a shared project and you want to see where a dependency came from because you install requests and then it pulls in something else. There are certainly packages with much deeper dependency chains. So annotate will show you how you end it up with that dependency in your project in the first place. PIP Sync is the other command that's in the PIP Tools projects and that will update your virtual LEM to help you keep those packages fresh. So this is a smaller solution than one of those full packaging solutions that I mentioned earlier like Hatch or Poetry, but it helps you manage the dependencies in a similar way. Of course, as with any language with a package manager, you want to be vetting the package releases that you are using them before you trust them. It's really simple, but again, just asking the basic questions. What other projects are using this package? Who maintains it? Do they respond to bug reports? How active is this project? And you can be checking change logs to see what's coming into the project, what new features are in there. To help you with checking those dependencies that you have, there are dependency scanners. So the example I chose for this presentation is called safety. Safety will scan your project for dependencies that have known vulnerabilities and tell you how to remediate it. So if that means that there's an upgrade without that vulnerability, it can recommend that or even if there's a small downgrade that reverts that vulnerability, it can recommend that as well. You can integrate it into a CI CD pipeline. The thing is this is an open source project, but it's on a model where for non-commercial use, you can use their free database. It's very good, but for any kind of commercial use, their license asks that you, or requires, rather that you buy an API key from the developer behind safety, which is a pie up. Verifying download integrity is pretty easy once you've got a tool like those package managers I mentioned earlier or PIP tools. So there are specific suggestions that the PIP team has for verifying download integrity. The first is that you use PIP instead of running a setup.py. Setup.py is arbitrary code that you're invoking. I mean, you could read it first, obviously, but it's considered safer to use PIP these days because it does more than just running the setup.py. There are certain checks involved. You can pass a require hashes flag to PIP, and that will verify the downloads using those hashes that you added in with the PIP tools. And you can use the only binary flag that's recommended. So if you run the only binary flag, it will pull in binary packages instead of pulling in the source package, which means that it will simply place the files where they need to be instead of running setup.py, et cetera. So this is kind of like an extreme case. This is a lot more work, but if you have an environment where you really need to place a barrier between yourself and PyPI, there's the option of caching PyPI. You can have your own mirror of PyPI. You can vet everything that comes into it, all the updates, and you can choose what you approve and what you don't approve. This is just one example of an enterprise solution that one company has proposed. They call it Nexus Repository. I just provided that as an example, but I should also mention, you may have seen the table for a project called Pulp lying around. So that's another example of an open source project that's out there trying to provide various ways of mirroring different types of repositories, not just PyPI, but other sorts of repositories as well. Now like anything, a Python application might need passwords, keys, certificates, tokens. These are all secrets. They're things you keep secret. The obvious thing that you probably have had drills into your head for years is don't hard code secrets into your Python source code. But also don't store them in secrets. Sorry, don't store your secrets in source code management. There are private source code repos, obviously, but really no source code repo was ever meant to hold secrets. They're meant to share things, not to keep things secret. So if you really have to store secrets in SEM because I'm not the security police and I can't tell you not to do it, then at least use GitCrypt, so that all the secrets are PGP encrypted. GitCrypt is a transparent process that you, it's a plug-in for Git. So once you integrate it with GNU PG, it will use your private public key pair to just transparently encrypt any secrets that you're adding into the project. It's a little more complicated than that on a project with multiple developers. You have probably multiple keys from your team, but it will handle the math essentially of using all of those keys to encrypt the secrets so that any one of your developers can then go back in and decrypt them. But that will probably make road vacation harder if you ever have to do that. So rooting out plain text secrets in source code management also has some tools. So some examples are projects called detect secrets and GitLeaks. You run these either locally or in a CI-CD pipeline and they scan your project and say, hey, this looks like an API key or a password. Often it's scanning against lists of known credential formats or it might do a little bit of entropy detection, but either way the idea is to point out, these might be secrets and you can either mark that as a false positive or you can take it out of your source code and obviously revoke that secret. You don't want it live anymore. So anytime it's possible, instead of using a source code management for your secrets, you want to use like an actual secrets management system if you're sharing that with a whole developer team. Hashicorp Vault is the big commercial example out there. I mentioned it here because it happens to have good programmatic access to secrets and the way you can do that in Python is with HVAC, the Python client for Hashicorp Vault. That's a community supported client so it's not an official part of the supported Hashicorp Vault system, but it's good and it's out there. But moving on from secrets, we have security linting. There's a good chance you're already linting your code for style, but a security linter does something similar where it builds an abstract syntax tree from your code. A style linter will usually throw up an error if it notices that your code isn't syntactically valid and then move on to style checks. Security linting, instead of going into style checks, we'll go into security checks that will look for common security issues. A good option for security linting is Bandit, that's open source. Again, run it locally, run it in CI CD, but it just looks for common security issues in your Python code, including things like, for example, you've run a shell command with a sub-processor, whatever standard library function you've got in Python for running shell commands and you've just thrown a variable in there. This will throw out an error and you have to decide if it really is a security vulnerability in the context of your project. So for example, you can mark this as a false positive if what you're developing is a tool that developers run on the command line in their own environment and they're allowed to give input that is going to go right into a shell command. They're just doing it on their own system, that's not really compromising security or for example, you are already thoroughly sanitizing variables that you're putting into your shell command. You can say, well, it's not a vulnerability here because we've thoroughly sanitized these variables but maybe it is a vulnerability, maybe you're taking user input from a web application and putting it into a shell command and then you'll wanna fix that instead. So static application security testing you can kind of pull all of these things together. There was a great talk earlier in the conference about effective SAST so I won't repeat all of that here but the examples that were given there used mainly SEMGREP which is nice, it's an open source SAST tool. The example I decided to provide here is SonarCube. So this is kind of an open core type thing that you can use for SAST. There is a community supported open source version of it and then there's an enterprise version that they sell. It does do all those dependency checks that I talked about that you can do with something like safety but it also performs the security linting and it provides you with what are called code smells. So if you're not too familiar with SAST, code smell is essentially something that is maybe just not best practice, not necessarily a security vulnerability but something that makes your code maybe more prone to errors in the long run. So for example, the first time I started using SonarCube I got some code smells about strings that were, I was repeating string little rules that repeated a few times in the same piece of code and it said, hey, this could be a constant which I hadn't thought of and I just went and corrected that, easy to do. It will also pull in your unit test coverage metrics. So if you're running PyTest, maybe you're running PyTest through talks or you're running some other system that runs all of your unit tests, it can take the output metrics and parse them and a wonderful thing about it is if you're integrating this into CICD then it's going to be able to focus on just the code that's changed in a merge request or pull request and tell you what percentage of that new code is covered or how many code smells are in that new code. So for gating, that's very good because you don't have to filter out a bunch of false positives off the bat. You can just focus on improving the code that you're checking in now. So you can start adding better formatted, more secure code now and gradually improve the security and formatting of your overall code base. And you can set up, I think I said you can set up gating, but anyway, that's basically just, you've got a percentage, right? I need this percentage of coverage or I need a certain, it needs to be under a certain threshold of code smells in the new code, et cetera. But this is, again, just one example. I like it for this presentation because it does integrate with PyTest coverage and things like that. So it will tell you how those metrics apply to your code, your pull requests. So basic network security reminders. Again, nothing new. Please use TLS. So anytime you're interacting with APIs, you really want to be using HTTPS rather than HTTP as your prefix. You want to be using that certification. And this is like a super niche issue, but as just kind of an example of the way this can go wrong. Say you're in an organization that has like a self-designed certificate for things on an eternal network. Request has this weird perk where it doesn't use your system CA bundle to verify TLS. You might have a custom certificate authority installed to check these things and requests just won't use it. So the temptation in this example is to do something like just turning off SSL or whatever for certain operations. But instead, you can just use an environment variable to point to your CA bundle. So this is an example where, unfortunately, the secure thing to do isn't the easiest thing to do. It's kind of a weird edge case, but I like this example because it points to an area where you would ideally like to make the secure thing to do just the easiest thing to do, the path of least resistance. Anyway, you also want to check for precise correct URLs. We live in an age of typo squatting. You've got two letters switched up and then you end up at the wrong URL that is controlled by some attacker. You want to specify the fully qualified domain name if you're doing container image polls. This is not directly related to Python, but a lot of us are gonna be working with containers connected to our Python projects. And there was an old way of doing things where people didn't specify like docker.io slash name of container and that's not good because nowadays there could be multiple registries. The order that registries are queried in could be different and that is a similar problem to typo squatting. And obviously you want to rely on standard number cryptography and authentication libraries. You don't want to be making your own solutions for these things because there are tested solutions and if that's not the focus of your project then there's really no sense in creating something that's less tested. Now I want to talk about enforcing best practices on a team project. You want to set expectations for code quality and some of these things might not seem security related. The issue is that code that is following certain best practices and that is consistent across a team is going to be code that's easier to review and good reviews of code lead to more secure code in the long run. So you want to be running unit tests on all merge requests, all pull requests. You want to have unit tests to begin with. I think I bring that up again later but it's always good to repeat. You want to create code coverage or SAST code quality rating thresholds. That's the gating that I was talking about earlier. So there's a number of ways to do that. SAST makes it pretty easy. You want to be using code for matters maybe. Not everyone likes code for matters but they can help if you want that consistent code style across a whole team for again easier code reviews. So if you're going to do something like that I recommend black. They call it an opinionated Python formatter. You may have seen it around. Not everyone likes it because it has very specific ideas of how your code should look but it also takes all the work out of formatting. It just formats. Linters like Flake 8 are also good. So maybe you're okay with having a looser format but you want to avoid anything that the Python peps tell you not to do. For example, Flake 8 is good for that. And this is really what I said at the beginning of this section but code style can be something of a security issue just because it's easier to be on the same page when you have a consistent style. But back to these super basic things that you've probably heard a million times already. Use Python 3. If you've got legacy Python 2 code I know what it's like to have legacy code lying around. There's ways to make it a little bit easier. There's libraries like 6 that you may have used years ago and maybe you want to pick them up again. They're good for bridging that gap as you're moving from Python 2 to Python 3. Obviously the reason for that is just that Python 3 gets security updates and Python 2 doesn't get updates anymore. You want to have unit tests, definitely. So many things that you could catch with them that you might not otherwise notice. And it can be incremental. You can have like, you can go from no code tests to like, I don't know, 10% coverage and that's better than putting it off until you can reach 80%, right? So if you want to start with just requiring a certain amount of code coverage for new code that's coming in, that's often a great approach that lets you build up coverage over time. And you want it to be meaningful coverage. You want your test to actually make some kind of sense. But if your test is just running a function then at least you're catching something catastrophically wrong that prevents your function from running, right? So no, some coverage is better than no coverage. Sorry, go ahead. About requiring coverage for new code, do you have an example of a way to do that technically or is it just like manual engaging on multiple quests? Yeah, so the SEST tools that I talked about can apply code coverage metrics to just what's included in a pull request or merge request. So it's not too manual. It's like actually fairly automatic. You can even, I'm most experienced with GitLab CI. And if you're integrating those kinds of tools into GitLab CI, you can actually say, throw up a badge on something if it doesn't reach that code coverage threshold, sorry, on just the new code. So like if the new code is beyond 80% covered then it gets a green badge. If it's like under, then it gets a red badge and the code reviewer says, hey, it's not ready yet. So that can be super helpful. It doesn't have to be manual. It can be easier if you have the right automation in place. Thank you, that was a good question. Using established libraries for cryptography is super important. I think I already said it, but like again, don't roll your own crypto as they say, but also don't make your own SQL requests. These are maybe obvious things, but like maybe you're coding fast and you start to get into these habits of reinventing the wheel. So don't, there's great libraries for handling SQL. If you're doing SQL and HTTP requests and stuff, you probably wanna have like a good framework. You probably wanna have, whether it's something as big as Django or something lighter, maybe you just need a flask or something, but you probably don't want to go into this territory where people have already done the work of doing these things securely and just kind of like forge ahead. Instead, you've got community-supported projects to lean on. So I'm gonna end now. If anyone has any questions, I'd love to hear them. Hey. Yeah, I'm curious about the product that came at the beginning, the big tools. Where did you get your file with all three expressions? In some cases you might do those things getting old with the latest, for example, I don't know, not working libraries. So how did you reconcile those two things? Did you have some pretty early bumping of your libraries? Yeah, I think that's mainly what Pipsync is for, is you can run that periodically and it can't help you upgrade things. You can, or, it's been a while since I tinkered with Pipsync, but like PIP compile itself also, you can run it again. You can, it will regenerate things and there's different settings you can run. So you could basically wipe out your requirements.text and rerun PIP compile on the requirements.in. But you don't have to do it that drastically. You can do just like, I don't remember all the options but you can do just portions of your dependencies that you pull in the newer versions. I think you can target like, oh I wanna upgrade this dependency and run PIP compile that way and just say like, upgrade that one, the rest of them are fine. So there's like some customizability there. Yes please. You mentioned poetry. Yeah. So as we're at this particular conference, are you aware of the way to make poetry installed just the project you're in, like the one you downloaded and want to work with for say using in a spec file? I have to be totally honest. I have like very much no experience with like these comprehensive systems like poetry or Hatch. They were recommended to me but personally I've been using the approach of using all these other like individual tools to focus on the individual problems instead of like the all-encompassing thing. I expect this would actually be a topic for more than if you... Yeah. I'm sorry I don't have the answer but if you give a talk like that I will attend it. Yeah, I'm not going to answer that one. Any other questions? I'm like you, I'm using it in body. So body will be something that runs inside the exhaust tubes? Yeah, I'm not super sure about the like provenance of the security linting within SonarCube for example. Like I think there might be some in-house stuff going on there but if you're not running something like SonarCube that's already gonna do security linting for you, you can run Bandit as a standalone thing. Any other questions? Thank you all for coming. Especially those of you who are here live in the room in the middle of a Sunday. Thank you.