 My name is Anthony Hausson. I'm speaking to you from the UK. I'm sorry I can't be with you, but logistic didn't quite work for me. Today I'm going to describe software-villain materials and how a number of Python tools can be used to help. Don't worry if you don't understand what a software-villain material is. I'm going to introduce the key concepts through this presentation. A bit about me. I'm a solution architect and a cybersecurity consultant with particular interest in software security. My journey into Python was primarily through education outreach and teaching students the joys of coding, often through the Kododojo Foundation. In the past few years I've been supporting the Google Summer of Code initiative and particularly supporting one of the Python software-foundation applications. You'll see that later in this talk. Our agenda this afternoon will provide a brief introduction to what an S-bomb is, software-villain materials, before describing a number of tools which could form part of your development lifecycle. I will then conclude with a brief summary. So what is a software-villain materials? Who's going to use it and why should I be really interested? So briefly a software-villain material is a formal set of machine-readable metadata that uniquely identifies your software package, typically including version and supplier information. Contents may also include other relevant information, typically including copyright and license data. A good analogy is to think of an S-bomb as a list of ingredients in your product. S-bombs are typically designed to be shared across organizations, hence the importance of a formal definition, and are particularly helpful in providing transparency of the software supply chain. Increasingly organizations are concerned with software security in the supply chain, so S-bombs are increasingly a cornerstone of a cybersecurity strategy. S-bombs aren't that relatively new, they've been around for more than a decade, I think in the more recent past they've become more important, as demonstrated in the last 12 months, as I've noticed a significant upturn in interest in S-bombs across industry. If we look last year in May 2021, the US President's Executive Order on improving cybersecurity, S-bombs were identified as a key enabler to providing greater transparency to the construction of delivered software. Then in December we have the famous Log4J vulnerability. I know that's a Java incident and you wonder why your Python program is interesting, but what it demonstrated to me was the need for S-bombs in an organization, because organizations really struggled to say whether they were impacted by the Log4J vulnerability. Did they know they actually had Log4J in their software products? What it also identified was the importance of software dependencies and the different types of dependencies, more about that in a minute. In February the Olympics Foundation produced a research project looking at the awareness of S-bombs across industry and organizations. They came up with two conclusions at satisfying that S-bombs were going to be enabling better transparency. In the supply chain and also providing a better understanding of software dependencies. And finally in May OpenSSF published a number of recommendations about improved cybersecurity practices, particularly in open source. One of the recommendations was simply titled S-bombs everywhere. That's a wonderful statement, with the aim of improving S-bombs tooling and training to drive greater adoption. I'm expecting later this year to start seeing some industries to mandate S-bombs to be part of their product delivery, particularly expecting things like in the healthcare market. So you can see there's a lot happening. And if we also look at dependencies, a typical Python application is a bit like an iceberg. We write our code, maybe import a package, test it and then release it. However, have we really thought what we've really done with a simple import of a package? In addition to the obvious of saving as writing code, it may have also brought additional baggage with us, often hidden. You can often see this when you do a pip install and you get more than one package being installed. However, these packages are often hidden, certainly from your code, because you don't have any import statements visible. So these hidden dependencies may be also hiding vulnerabilities, which might be of interest for security. You can see this on the right-hand side where a red package is installed. The blue blobs are the direct dependencies and the green ones are the implicit or hidden dependencies, just like an iceberg. There are a minimum set of elements that you need in an S-bomb. Obviously the product name, each version in the supply name, but also dependency relationships are very key. As is the time stamp of when the data was assembled, because obviously when we assemble the data, that could be quite an important part of a security audit, what software releases we were running at a particular time. The author of the S-bomb data as well doesn't necessarily need to be the creator. It could be the consumer who basically is producing that as maybe as part of their overall software audit or software asset management. I can't remember how many times when I've come across multiple standards when standards are there. So S-bombs are no different. There are two standards, SPDX, which is promoted by the Linux Foundation and the Cyclone DX standard, which is originated from the OWASP organization. The SPDX format is the older of the two and has been around for more than a decade and is now an ISO standard. It is under continuous revision and there's expected to be a major upgrade later this year, which is going to have better support for security vulnerabilities. It supports a production of S-bombs in a number of different formats, including JSON, tank value, essentially text, YAML and XML. Cyclone DX is a community-driven standard, so it's slightly more dynamic and it provides S-bombs in both JSON and XML format. Although there are two formats, there is good interoperability in these two formats so you can transform an S-bomb in one format to the other format relatively straightforward. So now we hopefully understand what an S-bomb is. Let's now look at some of the tools and how we can use them in a development lifecycle. So the first item I'm going to consider is the production or the generation of an S-bomb for a Python module. So when I started developing a module, there were three issues I was particularly interested in. One was what was the best way of capturing all the package dependencies. Secondly, how could the Python ecosystem support S-bomb creation? And thirdly, how difficult is it to generate the necessary content in the quick format of an S-bomb? So let's look at dependencies first. I think we need to distinguish between explicit dependencies and implicit dependencies. In Python, explicit dependencies are typically specified within a requirements.tx file or equivalent and are therefore relatively easy to identify. However, what they don't do is explicitly say which version of a component is installed. There may be some version constraints, but it doesn't necessarily have the explicit version and it doesn't identify the supplier of the module. Implicit dependencies are these hidden dependencies, dependencies on dependencies. So how can these be determined? Well, the Python ecosystem comes to the rescue and it's particularly PIP. We can see PIP shows a lot of the metadata that's associated with the installed module and we can see that most of this data is useful in the S-bomb. We can see the name of the module. We can see the version, the installed version and the supplier or author. It also shows a license over all good stuff. But it's the requires attribute that I found most interesting because that shows the list of the dependencies and then the list of these can then be used to drive the implicit dependencies defining these dependencies as a list and examining the metadata for those packages. So with this example, we've got PyTest and we can see Atras is a dependency. We can then go and find the dependencies of Atras by just doing the recursive process. So there's a module called S-bomb for Python that we've created which generates an S-bomb file for the installed module and it's compliant with the minimum content of an S-bomb. And produces machine readable formatting one of the two standards and is available now currently on PyPy for evaluation. But while it was generating it, I found that consistency of the metadata was variable, in particularly two attributes. If we look at the license metadata, in particular standard licenses like Apache or MIT, there was not a particular consistent way. If you look on the left-hand side, you can see these are all instances I found of the way that Apache 2 was specified. All probably quite readable, all quite understandable, but not consistent. Both SPBX and Cyclone DX use the SPBX license list and the formal definition of an Apache 2 is Apache license 2.0 or short-forming brackets Apache-2.0. You can see on the right-hand side the MIT license, there's just a difference in case in one of them between a capital L and a small L. It would be really good if Python modules, when specifying the licenses, adopted a consistent approach. Maybe the SPBX license list would be a good standard to follow. Similarly for supplier identification, the S-bomb standard wants to identify is it an organisation or an individual or person who's created the module? The metadata in an S-bomb needs to do this, but how can we determine whether it's an individual or organisation from the metadata that's captured in the Python module? Can we just work on the number of words, for example? Is the name of the organisation consistent? Probably not. Maybe again we need to think about some consistency. Is there a way that we should be adopting when we specify the metadata, whether the metadata refers to an organisation or an individual or a set of individuals? This is what an example of a S-bomb looks like. This is the format which is called tank value. You can see this at the key attributes, things like the package supplier name and I've identified this as an organisation, the version and the license. Again, there's some various things like the date that the file was created. This is all quite understandable and is ideal for being passed into other tools to process it. Let's look at now managing an S-bomb because it can support a number of use cases. Remember I said earlier that S-bombs were typically increasingly being used in activities to support cyberbullying. These are the typical use cases that people are going to be asked about, am I using this product? Am I using this version? I wish I had a tool like this when the log for J was around because this would have been a really efficient way of identifying and triaging so many products. A key one was the vulnerability analysis. We'll talk about that in a minute. The S-bomb manager manages a collection of S-bombs in either Cyclone DX or SBDX format and he allows you to readily identify the components that have been included in the software builds and supports these use cases. Again, this is available on PiPi for you to enjoy. Here's an example. Am I impacted by vulnerability? If we look at this vulnerability with Sudu, he's saying, do we have Sudu installed before this version 1.9.5 patch 2? By just interrogating the list of S-bombs that have been included in the manager, then we can see that yes, we have a version of Sudu on one of our S-bombs and it is a version less than 1.9.5.2. We can say we are possibly impacted. It obviously needs a little bit more time analysis, but at least it's been an efficient triage to try and identify whether I'm impacted or not. This is the one that I would love to have had in December if I'd have had all my S-bombs and I would have got a set of Java applications and that would have been a really happy day scenario. Finally, I want to look at how we can use S-bombs and scan them to vulnerabilities and introduce you to another tool called CVEmbinTool which is a very useful tool to introduce you to another tool called CVEmbinTool which is initially a binary scanner to determine whether binary files have potentially vulnerable packages or libraries. But it also scans S-bombs for vulnerabilities in their components and produces a list of the components with reported CVEs and the associated severity. This is the tool that the Google Summer of Code is enhancing, so there are a number of students that are currently actively working on this to add new facilities and capabilities. This is the output that it will generate. As you can see, it's a list of product versions and vendors or suppliers with associated CVEs and severity. A word of caution as with any of these types of tools is security scanning and security scanning doesn't guarantee that it's found all the vulnerabilities nor does it guarantee that you are affected by all the vulnerabilities. But let's hope it's a aid to your process. So nearly finished now. So I think we've now got a suite of tools there that this can support S-bombs particularly as a DevOps or DevSecOps pipeline and particularly looking at maybe managing your security exposure and security vulnerabilities of your installed products. So to summarize, I think hopefully now you know a little bit more about S-bombs and a little bit more about the importance of understanding your dependencies. I think the Python community has a role to play in trying to ensure that we have consistency across our metadata to support S-bombs being used in the wider software and systems ecosystem as a way of supporting better vulnerability management processes. And just to thought, is it now time that when we do a PIP install we actually create an S-bomb and do some vulnerability scanning as part of the integral part of the installation. There's a few references there to resources I've referenced in this talk. Copies of the slides are available on my repo on GitHub and where there's also access to some of the tools as well. And there's access to the CDE on the Intel's tool as well. Thank you very much. Any questions? Hi, thanks a lot. You mentioned the presidential notification about S-bombs. Are you aware of any UK or EU based moves to put them in place as an aid to cybersecurity? Not currently. I've asked about the UK and they have said that they will just follow best practice initially. So this is the National Cyber Security Centre. They will hope that industry will probably dictate best practice but there's no formal need at the moment and I'm not aware of anything from the EU. Maybe in the UK I'm less interested in the EU these days after Brexit. Thank you for the talk. The tools you showed there, the creation tool, S-bombs for Python is used for finding information about Python dependencies. Do you know of any libraries that can help with non-Python dependencies such as operating system library and things like that? A good question. There appears to be no tools that is a one size fits all tool. The tools tend to be focused on the language ecosystem so Python, Go, Rust. To find operating system ones, I'm not aware of anything. I suspect I've seen something from Google looking at things like given edged infrastructures and containers which are generating S-bombs with a container image but I haven't seen one for an operating system distro. Yes. Thank you. I'd like to create one because I think I can see there's a really important need for something like that to see what's installed and what's running in your operating system. Yeah, it seems like you have to use a patchwork of tools at the moment to get a full list of materials. Yes. Thank you. There's no tool, I'm not aware of any tools that cover all the bases. I don't know whether that will change over time. So, thank you very much Anthony for your talk. It was very interesting. Okay, thank you very much. Bye.