 Here we go. Okay. Well, thank you all for showing up. I know it's been a long week and we're near the end of our Thursday, so thanks for hanging in there and coming to listen to us. Subject of this presentation is securing the OpenStack code base with Bandit. I'm Jamie Finnegan. I work on the security team for HP Cloud with responsibility for the Helian portfolio. We've got Tim Kelsey here as well who's also in the same team as me. We're hoping to cover three main areas today. The first is why Bandit, essentially. Next up, we'll call it developer 101, which is what is this thing and what is it telling me? What can I do with it? Then the other thing, developer 201, essentially, how does this thing work and how can I extend it? Hopefully, you'll be able to see Bandit in use, go away with some feel for how it works and how you can use it, as well as the internals and how you might be able to add additional tests, extend it yourself, and contribute back if you're that way inclined. Just a basic introduction, OpenStack is big. There's about 2.5 million lines of code and about 76 percent of that is Python, according to OpenHub as of yesterday, I think I pulled these numbers. It's relatively secure, but we in the OpenStack security project team want to find and fix vulnerabilities before anybody else does and prevent the introduction of new security issues as well. That's the context that we work in. The session after this one, Rob Clark's going to be presenting on the OpenStack security team and Bandit is one of our initiatives, but there are others that he'll be covering to some degree there. So if you're interested in that, stick around. So yeah, this is the environment we're working in. About a year and a half ago now, we were sitting around in a mid-cycle meet-up talking about how we're kind of part of our project team's role is providing GFV as a service. It's great for vulnerabilities as a service. We'd find an issue in one project and then figure out a way to identify that issue in other projects across OpenStack. A classic one was shall equals true being used, introducing essentially a command line injection vulnerability. Grep worked to a point, but Grep only gets you so far with Python with things like white space, things like the fact that true could also be one or various other values. It gives you so far. We wanted to have other options, so we looked at some others, like improving our usage of Grep, using OpenStack hacking, the Flake8 tool set, extending PyLint using PyChecker, but those were more focused on general Python code-based hygiene as opposed to being security-specific. So by the end of that week, we ended up with a proof of concept, essentially an AST visitor, that takes Python source code, turns it into an abstract syntax tree, and then parses it, processes it, analyzes it, executes tests against it, and presents possible security issues. And that's what Bandit is. It's a security-specific, Python-specific, easily extensible source code linter. We built it with OpenStack in mind. It's a command line tool. I'll give this a demo shortly. And yeah, there's various contributors. I know there's a few in the room who've got code in there, so thank you for that. You can see the company's listed there. But it's run under the umbrella of the OpenStack security project. Interesting fact, this guy, he was given, he was offered the role of Han Solo in the first Star Wars movie. He passed it out. In 1977, the second highest grossing movie was Smokey and the Bandit. The first highest grossing movie was Star Wars, so. Anyway, time for a demo. So Bandit is PIP installable. We've got source code repos out there as well, but the easiest way to get it is to create a virtual invet in first. I'll spell that right. And then I'll just do a PIP install. Bandit, that was a good start, wasn't it? I'll give you permission tonight, sorry. I'll activate the VM. Thank you. I will activate the VM first. And I will install. There we go, that's better. Okay, Bandit's only dependency is Py YAML. We use that for the config file. If I just run it straight up, you can see the command line arguments. We have a very useful dash H, which is exactly what you see here. You can tell it to recurse, so you can feed it a single directory. It'll go find all the Python files in that directory. You can tell it to aggregate its reporting based on per file or per vulnerability that it identifies. You can ask it to show you context, so the number of lines, similar to grep, the dash N argument. You can feed it a config file, profile. You can ask it to give you text output, JSON output, CSV output. That's the bulk of it. It also has a debug flag. So if you really want to see the ASC pieces that Bandit's actually looking at, debug mode is great for that. If I actually run this thing, we have an examples directory that is in our source tree, which has a number of files that contain potential security vulnerabilities in them, essentially. If I run Bandit against that directory, you'll see the run, it runs fairly quickly, and it gives you a relatively long list of issues, obviously in this case the directory is it's by design with the examples we have. We use color to indicate good, bad, ugly, essentially, and then there's the context lines there. We can also do things like filter by severity. So if I give it one L, then it drops all the essentially informational issues. If I give it three Ls, then it's only gonna report the high risk issues. So you'll see here anything. We're filtered down to include only the issues that we consider to be of high severity. That's more or less it for an initial demo. I think, let's swing back over here. So anytime you talk about static analysis, people immediately think, oh, you're gonna give me a list of thousands of file name, line number, and maybe 10 or less percent of those are gonna be actually useful to me. There's gonna be a lot of false positives. That is true with many static analysis tools. In our case with Bandit, we think Bandit being specific to Python and being specifically designed for OpenStack with a customizable test suite makes it easier to eliminate those false positives than perhaps it could be otherwise. The workflow that we've found to be effective for us is to run Bandit locally against an OpenStack project. Look at the results, assess for accuracy and relevance. You can tweak the tests. So you can modify a test directly. You can tune your configuration to enable or disable specific tests. I demonstrated the filter before, so you can make it only report issues of certain severity level. And there's also a no-sec tag. So similar to the no QA tag that PFA uses, a hash no-sec will make Bandit ignore whatever line it's appended to. And then just rinse and repeat, essentially. The first few runs will probably be painful, but as you kind of get a feel for what tests are relevant to your project and maybe make some changes to your project itself, make some changes to the configuration that you're using, hopefully false positives will become much less of an issue fairly quickly. Obviously, you've caveat down the bottom here. If you do find serious issues, do the honorable thing, push them through the VMT for appropriate handling. With that, I'll hand over to Tim for a little more information on how it works. Thank you very much, Jamie. So my name's Tim Kelsey. I'm a security engineer working at HP and I'm gonna give a kind of a high-level whistle-stop tour of Bandit. So Bandit effectively breaks down into two main pieces. We have our core, which contains our scanning code, functional bits and pieces, utility code, reporting, that sort of thing, and a fairly extensive library of plugins. So within the core, we've got a whole bunch of different bits and pieces, but the most interesting ones, the node visitor, the context, then the test set, and then finally the result stop. So essentially, what Bandit does to work is it takes Python source code and it transforms it into an abstract syntax tree. To do this, we actually use the built-in AST module, which exposes the interpreter's own compiler mechanisms to the user space. So we can be pretty confident that anything that is valid Python based on being able to run it, will also produce a valid AST that we can analyze. So no matter how complicated the code is, it's not a problem for us. So our node visitor, once we've actually got an AST, we iterate over this AST using the visitor. You can see here we've got a very basic Python statement, pardon the breaks down into four AST nodes, variable, an assignment, a method call, and a string. So as the AST visitor traverses the tree, we encounter various nodes of various types, and we do whatever we've been told to do upon encountering that node type. Most of the time, this is building a context. So context contains various bits and pieces of information that we've acquired thus far. And as we process more and more of the AST, we can acquire more and more information. So we can support things like modules that have imported as an alias with a different name, things of that nature, it doesn't matter, we know what module it was you imported, you can call it anything you like, it won't confuse the tests. So in this case, we've now traversed to the point where we've encountered the call node. So we built our context, we provide the actual AST node that we encountered, because some tests might find that useful. We have a qual name, that's the fully qualified name of the function. So in this case, it's module.doCall, and then the name of the actual element. So here we've actually gone from working with strings and matching characters to working with actual AST nodes. So once you have the context constructed, the next element we're gonna use is the test set. So the test set contains the complete list of all of the tests that we have been configured to use from amongst our various plugins. And each of those tests advertises its interest in a particular AST node type by using a function decorator. You can see there in that blue box, the top line at checks call, indicates that this particular test is interested in processing call nodes. So these are AST nodes where a Python call is actually being performed rather than where a function is being defined. There's slightly different semantics for that. So we pass the context through, we invoke the check function, and we get back a result. So the results that we come back with from these calls are effectively a tuple containing three values. So the check will do whatever it's been programmed to do. It will either identify nothing or if it finds an issue, it will return this tuple. We have a severity value. This is a low, medium, and high score. And it indicates how severe we feel or the author of the test feels the particular issue that they've discovered is from a security context. We have a confidence. Now, some tests have an element of fuzziness to them. So, yeah, the confidence value effectively indicates how happy the author of the test is that they have identified the specific issue that they're trying to check for based on the pattern of AST nodes and the context that they've been presented with. And finally, an information string, which just lets us report some useful feedback to whoever's running these tests. We actually have another undertaking within the security project to create a catalog of security best practices, guidance, notes, and bits and pieces that we'll be referencing from within these bandit information strings. So once we've run all of that and we've got our result set complete, we'll produce a summary. And it looks something a little bit like this. So this is a very simple summary. We've just tested a single file from within our example set, in this case, the random module file. You'll notice that it's got a score 39 there. That's kind of a legacy mechanism that we were using for our own internal functional testing. Based on the number of issues and the severity of the issues, we would score the file effectively, and then we could check that that score didn't change after we made modifications. We actually don't need that anymore. That's probably going away. So I hope no one's actually using it. If you are, let me know, because with the new confidence system, things are a little bit different. Anyway, so there we can see, we found the import random statement. That's the first statement in the summary there, labeled one, and we printed out the appropriate message that you should not be using random for anything cryptographic or security related. I'm sure we all know why. And you know we have a severity low and a confidence high. Now that's because what we've actually tested for here is whether or not you imported random. Just because you imported it, doesn't mean to say that you're gonna do anything that's not particularly sensible with it. Hence, our severity is low. There's a chance there, but we don't know for sure that's what you're doing. And our confidence, however, is high because what this test is doing is saying did you import random? Well, yes, yes you did. So we're very confident that the check is correct, but that it may not actually that bigger deal. So we also, as Jamie mentioned, support outputs in other formats. So here is the exact same summary again, but presented as Jason. And we also support a CSV output as well. And that's designed to allow people to integrate Bandit into a more complete pipeline of processes that might wanna feed one into the next or maybe generate some form of web report or whatever. It's machine readable. So finally we come to our plugins. We have quite an extensive library of plugins. So far we have 18 distinct plugins and each one of these encapsulates a number of related checks. So for example, the insecure SSL TLS check, that's a whole library of checks against various versions of SSL that you might have specified in your code base. So if you say, please give me SSL two, we will say no. Well, we won't say no, but you should say no. So and then for example, the blacklist imports plugin there, those are functionally similar tests rather than being sort of related by issue. You could specify in your configuration a list of modules that you consider not suitable for importing. And we will then flag up when those imports happen, which could be very interesting if you wanted to do something like say, you're using a module that's not in the global requirements list, for example. You could configure Bandit to catch it in a check there. So there are some interesting non-security specific uses as well. So what can we actually test for? Well, as I said before, we use a decorator to advertise interest in a particular node type. Currently the node types list we can support. We support call, calling a function, import, import from, they're basically the same thing, but slightly different. Exec, string, and assert. It's not a very big list, but it gets us quite a lot of mileage actually. And it's very easy to add any of the other AST node types in there should they be useful. So here's an example of very simple test. This is a check for the use of exec. This is a keyword in Python, masquerading as a function, but it is a keyword that will execute an arbitrary string as if it was Python code. Potentially there are plenty of security issues with that. So we've got a severity of medium, a confidence of high because exec actually has a unique AST node that nothing else uses. So we're pretty confident that is what it is. And then our text summary. So one of the interesting things with Python is that it is incredibly dynamic. And this is where analyzing the AST is infinitely better than using grep or regular expressions or what have you, no matter how complicated you wanna make that. So for example, these four statement blocks here are effectively doing the same thing. We're using the sub-process module and we're calling popen with a shell equal true. So yeah, there's various different ways of doing it, different ways of using the imports semantics to present the popen call either fully qualified or abbreviated aliast shell equal to one rather than equal to true. So there are a few different ways you can say exactly the same thing. And all of these statements will produce effectively an identical AST once we've run it through the AST processor. So we just write an analysis of those AST nodes and we can catch all that, no trouble. So adding tests, it's demo time again. Right, so what I'm gonna do is I'm actually gonna modify an existing test. I just wanna show people how easy it is to actually add tests to this system. So, oops. So this is our test for the use of assert. Now assert much like exec has a unique AST node type. And it was discovered fairly recently that there were some projects using assert to ensure various requirements were met in interface calls and bits and pieces. And that's fine except assert gets removed if you compile with optimization enabled. This is actually stripped out completely and obviously those checks were no longer there and anything goes. So we added a test into Bandit to detect it. So what I'm gonna do is I'm just gonna change this enum value. So the check is almost identical to the exec example we saw earlier. We're just validating that we've received assert. Now I'm gonna run Bandit. Actually I'm gonna create a test file first. So first off, okay. So I'm gonna run Bandit against that file and I'll point out a couple of interesting things. There we go. So you notice that the text there is highlighted in yellow because this is a medium severity result. I just changed that from a low severity result. Very simple to do. So you can adjust the severity of the individual tests based on how severe you think that actually is for your project. And then of course you can set thresholds and filters to make sure you don't display tests which fall below the required severity or confidence levels. And you notice it says that the location of the error or the issue it's discovered is assert.py line two. Line one contains the phrase assert. So a simple grep test would obviously have found line one. The comment is completely benign of course. The test indicates the error is on line two. And we've given you three lines of context there but you can configure how many lines of context you get. It's designed so that a developer who's working on a bit of code and runs a check can easily find where exactly that problem is. So Bandit gate tests. So Bandit is a pretty useful tool to run as a standalone helper. Just run it locally against your own stuff but where it really has value and what we've really built it for is for inclusion into the CI CD gate tests. So it's actually really easy to do this. I mean we didn't wanna be problematic for people have to fiddle around and do a bunch of complexity to get their stuff into a gate test. We wanted this as simple and as easy as possible. So it is pretty straightforward. So the first thing you need to do if you wanna add Bandit into your project is to provide a Bandit.yaml file. This is our configuration file where you can specify the specific tests that you want to run where you can provide any configuration information like blacklisted modules, whatever that are specific to your project. And you can configure various thresholds and bits and pieces there. The second thing is to add a test requirements Bandit.txt. So this file is identical to the existing test requirements that you already have in your project. And it does the same job. It's just specifying the dependency packages to build the environment that Bandit will run in. And that's just PyAML and Bandit itself. You could mix that in with your existing test requirements if you wanted to, but the way we've seen it done at the moment is to use a separate file. And finally, you need to edit your Tox.ini to add an environment for Bandit to run under. And this is all documented on our Wiki. In fact, there's boilerplate sample files you can just copy and paste and you're done. It's really easy. So once you've done the necessary changes to your actual project, you need to do some changes to the infra project. This is the same as you would for any other gate configuration that you might want to do. So there's no surprises here. And the whole process probably takes less than 20 minutes to go from zero to a full Bandit test environment. So who's using Bandit at the moment? Well, we know for sure that Magnum, Barbican, Anker and Keystone are all using Bandit in their gate tests in one way or another. That might be as an experimental gate or a non-voting gate, but it's in there. And outside of OpenStack, we know HP are using it, because we are. Rackspace are using it. And also Uber are using it, which is pretty cool. Probably others, it's out on Pipeye. You can download it and run it. So if you are using Bandit or you feel like you might want to use Bandit, then feel free to contact us. You can either jump into the OpenStack security group, I beg your pardon, security project, meetings that we have weekly on IRC or grab us at the end of this talk, whatever. And with that, I will hand the floor back to Jamie and yeah, he's gonna talk to you about our future plans. Cool, thanks, Tim. So hopefully that's given you all a pretty good feel for how you can use Bandit and also how what its internals look like, how you can extend it. As far as roadmap goes, we're pretty happy with where Bandit is at at the moment. It's been useful to us. I've used it, I guess you could say, in anger against OpenStack projects, against internal HP projects and found security vulnerabilities with it. Does take some work to tune, but yeah, it's been useful to me and I know to others. So really at this point, we wanna get it adopted. So we're talking to as many projects as we can, get it pushed into gate tests. Like Tim said, we can start experimental non-voting. We also are quite happy to have somebody from our team or from the security team work with you in your projects to do that initial triage, whittling down the false positives and building a config that works for your project. As far as Bandit itself goes, we'll be looking at adding more plugins and tests in future, both to address other known vulnerabilities at the moment and when new issues are identified, we'll be adding tests as we go. Also looking to improve and extend the framework. So to improve the UX a little, to build out the security projects, set of documentation that we'll be linking to and to build more contacts, to build a better view of the state of a file when we analyze it to improve the level of analysis we can do against a source code file. Tain checking is something that we've talked about doing, also concerned about making sure we don't go insane while we try to do that. And then security review tagging is another use case that we've talked about. So if Bandit's running in a gate and it sees that there's a change coming through that has, let's say something crypto related in it, then it could potentially add a tag to that Garrett review saying, hey, maybe somebody with security expertise should look at this change. Just another thought that we've got in mind. So yeah, next steps for the community. We'd like to get it out there. We've got a wiki page that has links to the repo and various other information. The read me is worth a look. If you, that's in a reasonably accurate state at this point I think. Use it locally, use it at your gate and yeah, come join us in contributing if you're that way inclined. Thank you very much. I think with that we'll call it and if you have any questions there's a mic over there that they've asked us to have people use. Hi, this is great work. Thank you. I had a question, you had the Jenkins part on the gate because it seems like if you run this on a particular project you would get some errors and you'd kinda look at them and go, yeah, we understand those are false positives. So do you have a way of establishing a baseline like we expected these 20 or things like that? Is that how you integrate it into this gate check? So at the moment we're not doing anything like that. So you can set the gate check up to be non-voting in which case it's purely an informative check. You get a certain amount of mileage out of that but it depends on people to actually go look at the results. And you can configure your actual test set so that only those tests that you're particularly interested in will run over the code. And if you get a particularly noisy test which is generating a whole bunch of false positives you can just disable it or configure it, you know. And you can set thresholds for the confidence level and so on and so forth. So you can do some tuning to get that down but at the moment we don't have any sort of baseline mechanism in place but maybe we should, that's a good thing to have. I'd be interested in that. So I don't know, maybe we can even help you with that. Brilliant, absolutely fantastic. Thank you. Static analysis and linters in general find a lot of false positives. If I've got false positives in my code how do I stop bandit bugging me every time it sees them? Yeah, okay. So we have, I mean most people are familiar with the no QA comment that you can use to switch off stylistic issues that are not, you know, not relevant or just plain annoying. And we have a similar mechanism, no QA. I beg your pardon, no sec? Yes, it's no sec. So if you tag a line of code with no sec as a comment then bandit will ignore that. So yeah, you can use that to quiet, you know, spurious results. That it? Cool. Sounds good. Thank you. Thank you very much.