 Good afternoon, everybody, and welcome. Thank you preemptively for your attention. Welcome to Untrusted Execution, attacking the cloud-native supply chain, so hot right now, and I hope to give you an attack-driven defensive perspective. I'm Andy. I'm CEO at Control Plane. Very lucky to have a number of my excellent colleagues here today. We have a booth. Please do stop by. Got all sorts of good stuff, including the supply chain security best practices, flashcards, including such interesting things as EBPF for observability, content addressability for immutable artifacts, everything from VEX through to Frisker, and we'll talk about some of those concepts. I am also the author of Hack and Kubernetes, along with my esteemed friend, Mr. Michael Hassenblass. We'll do a book signing at half past four, so that is all the advertising briefly out of the way. I'm lucky to be involved with the CNCF's technical advisory group on security, where we assure the open-source CNCF projects that look to achieve sandbox incubation. The goal there is to provide a safe space to collaborate with maintainers, and some of the things that we'll talk about today, involving threat modeling and best practice, are how we interface with those projects coming through. I'm also in my spare time CISO at Open UK. Open UK is an open-source advocacy group. We are a charity that asks UK governments and the private sector to remunerate open-source maintainers for their time for shipping security fixes and helping to advise the UK government on policy. My background is development. I've also got a deep, of course, security interest, but also these things have to be usable. There's no point shipping a security control that is constrictive, that has no quantifiable benefit, and ultimately that has a long-term maintenance burden on the users of the system. Developer retention is important, and needless security controls are diametrically opposed to that. The book is also available as a PDF for download. I lied when I said I was done shilling for myself. That is the end of it, I assure you. Control Plane, we are security specialists and open-source cloud-native advocates. We have just opened an office in North America. We're about to open an office in Asia Pacific in New Zealand. If cloud-native security challenges nice people and interesting work are your bag, please do come and have a word with one of my colleagues to understand what we do. Okay, while we talk about the supply chain problem, what is it, why is it a problem, how is it directly exploitable? Generally the problem space, then we go into the threat model of the thing, how we quantify the controls, and then we look at the tooling that we can use to help secure ourselves. This is how hackers operate. Be very afraid. Okay, so what is the supply chain problem? A supply chain is anything we depend upon. We know this, we have producers and consumers. Any point that we publish code on GitHub, we become producers, we end up as part of the supply chain problem. Supply chains are long, they are transitive, they can be difficult to reason about, and they exist in all assets of life from, as we see here, pharmaceutical and military through to the things that we care about today, open source technology. So what is a supply chain attack for shipping open source code? Here we have, first of all, our trusted producers. This is anybody who publishes code to GitHub, but for a trust perspective, of course, we trust root CAs, we trust our operating system distributors, we trust our hardware manufacturers, we actually trust the postal service that ships laptops to us. Plenty of interception potential there. From a software perspective, though, we're just talking about the software dependencies. If we can intercept that code in transit, or we can push a malicious update, or we can compromise the SSH keys or the GPG keys of somebody who's shipping software, then this supply chain ends up looking like this. That is the nefarious dread pirate captain hashtag, and his primary goal in life is to run his malicious code, his implants in our production systems. What could he possibly ship that would be a problem? Classically, it has been something like a crypto mining, a crypto miner that is slightly less appealing these days. A lot of cloud providers have detection for these things, and they will shut down a crypto mining instance. So actually, a reverse shell, just getting command and control within somebody else's infrastructure is a perfectly reasonable thing to do with the supply chain attack. What is a reverse shell? Well, it's what happened with the interpolation of check for log for shell. It is running malicious code on a victim's device that is then able to connect back to a public endpoint that we as an attacker control. From that perspective, we then gain command and control. Why shouldn't this happen? Well, we should be dealing with correct firewalling for egress, but of course, it is expensive for a security team to maintain outbound firewalls. So as we saw from log for shell, as soon as you are able to execute code out of context, which in that case was an interpolation attack, you can do this kind of thing. You can fire a TCP connection to somewhere that the attacker controls and then they have a shell running inside your infrastructure. There are multiple layers of control that can help to detect and prevent this, especially in cloud native Kubernetes flavored systems. But this is the fundamental desire for an attacker if they're looking to persist and then pivot and escalate their access. How could that work in a Kubernetes system? Well, as we see here, we have got a CI CD pipeline in pink on the left-hand side of this diagram. What happens? Well, first of all, this malicious code in a supply chain either gets built into an image or into a dependency. Just pulling and running something from Docker Hub is a classic version of this because ultimately as the package and distributor, they don't necessarily look at what's in that image. In this case, the malicious dependency goes into the CI CD pipeline and is built and trusted. That then runs in a production system. This is our reverse shell firing back with the payload. And what has happened here? A malicious update to a software package has resulted in a compromise of a production system. As I say, we should have multiple layers of control that prevent this kind of thing, but this is exactly the log for shell pattern except for the fact that that was actually in the software for a long time. We could still see a supply chain attack, a replay attack where somebody forces a victim to install an old version of ROG4J and the exact same dependency attack is in play. So, how do we actually attack a supply chain? What can we do? We can infect a trusted supplier. This is somebody who has commit access or push access to a package repository or to GitHub or some version control system. That's either infecting the dependency or the package itself. Of course, we have this question of transitive dependencies which mean we get into Sbom and CDE scanning. We have to understand exactly what we have installed because the attack surface is the size of every dependency running in that package. We can infect the source code itself. So it's not attacking the dependency. It is attacking the program or say the web server or the application that we're shipping. We can play with the build infrastructure as an attacker. This is the SolarWinds sunburst style attack where attackers were able to infect and then find the presence in the CICD infrastructure and tamper with the build in flight in a difficult to detect way. Cloud-native community has responded with tools like witness that we'll talk about briefly later that can now detect this kind of thing, but it is a nuanced and difficult attack to detect without some form of consciousness of it. Or finally, affecting the runtime environment. Why worry about a package when you could infect or persist on the underlying host? So what then happens if we're compromised? Well, all the negative outcomes, ultimately APT advanced persistent threats are the worst nightmare for an organization because that results in probably a slow drain of your IP or your key material or potentially allows a compromised or allows the attacker to attack somewhere that that producer is being consumed. So for example, seeing things like the octabreach, that was a support company who were breached within the access to the octa systems themselves. Crypto jacking and extortion, while crypto mining is no longer the jewel, but encrypting all of an organization's assets and also leaking them is pretty bad day. It means that even with restore from backup, there is potential IP loss. And then we have of course further supply chain attacks downstream of the producer. Tag security maintains a list, a catalog of different types of supply chain compromise. It is interesting to peruse this. It is kept up to date and gives a view on the different categories. It's not an exhaustive list, but it is a categorical exposition of the types of compromise. Okay, that's the abstract background. Let's have a look at the problem space. Why is this a difficult thing? Well, the problem space is very large. Software supply chains can be super complex, multi-dimensional and not just in terms of technology. There are also people involved and generally people are the biggest threat. Technology is rarely the most complex part of the problem. The flexibility and adaptability of supply chains is also a threat. We potentially introduce non-deterministic behavior and that is something that we need to manage. So how do we navigate this problem space through these dark waters without being hijacked by nefarious pirates? Threat modeling. This is the process of quantifiably building a map of our systems, applying security controls. We might use different forms of diagram like an attack tree to look at the kill chain or the specific versions, the specific steps by which we compromise the system. On the back of these, we have a quantifiable model that we can then use to apply controls. Again, this concept of developer retention and not making a system constrictive or overly restrictive to use is really important to maintain and develop the trust between security and development parts of an organization. So how do we go about threat modeling? Well, there are some differences. For the sake of this presentation and the next steps, we'll look at a high-level reference architecture here, purely as an example. So artifact storage may include binaries and images. Metadata depends on the stage, but may include our test reports, vulnerability reports, pipeline execution, S-bombs. Build might have a local runtime temporary storage for artifacts before being pushed to the main. And we're looking at node workload, pipeline attestors and observers, which are capturing verifiable metadata from the pipeline processes and finally an admission controller, both when the pipeline is instantiated or when workloads are deployed to production. So we will come back to that. What is threat modeling then? A systematic approach that democratizes the gathering of information and the surfacing of threats. Development teams, test teams, product owners, action security people, different individuals in that group know where different bodies are buried. And bringing everybody together, again, democratizing the information gathering process leads to a far significant or significantly better result than security working in isolation, for example. Okay, so threat modeling in the stride model popularized by Microsoft. What are we building? What could possibly go wrong and exercise in extreme catastrophization? How can we reduce the risk of catastrophe occurring and then finally, are we ever finished? Well, no, threat models are as big as the, or expand into the time allocated. So we scoped them in order to have a reasonable chance of completing things and rinse and repeat. So what are we building? Well, we're looking at supply chain and the flows of data within it, use cases as well, anything that is supplied or produced. The business impacts, we've got confidentiality, integrity, and availability. The classic CIA triad and the operating model, the consumption and use of that thing by teams. So a quick shout out to my colleague, Nick Simpson, from the control plane security team. We have toxic combinations here and we can reference individual threat models for each specific piece. The point here is decoupling, downscoping, making sure that we have an attainable problem space in which to work. Because as I say, work expands to fill the time, threat models will easily expand into the available space that we give them. So what's the worst that could happen? The way that I personally enjoy doing this the most is to operate in a completely open free-for-all. People just throw threats out and you basically invent the very worst things you could imagine occurring to a system. That can go anywhere through from a cypher is cracked and someone was recording the information. We look back to previous TLS revisions, it has happened. But what's the actual probability of it happening? What's the impact? Well, the probability is probably quite low but the impact would be reasonably high. We can multiply those two numbers together and get a probability of abstract risk score and then we can order those threats by our abstract view of risk. That doesn't necessarily mean that we implement controls in that order, but it gives us some precedence. This is a very abstract exercise and bringing some sort of quantifiability and reason to it is just useful to allow us to have a conversation. Starting with that very abstract catastrophisation, we get those threats. We can then cross-reference them against something like stride, which is where we're saying, well, what happens if we spoof this particular thing or we impersonate or we tamper with it? By having the very loose, diffuse thought exercise to begin with, that gives space for a lot of creativity, but then by going through the stride process, we apply a little bit of rigor, we make sure that we've exhausted the potential permutations of those different threats. We can then go through and look at a standard document. There's NIST, CIS, produce governance things for whatever it is specifically we're talking about. For a cloud-native system, there are CIS benchmarks for how to secure Kubernetes, for example. They exist for your cloud provider. They exist for your operating system. And then finally, once we've exhausted, and again, this is a question of time, what I've just described could be two hours, could be two weeks. It depends upon your availability, really, and your organization's desire to pay for it. Then we get to a place where we can start charting the attack parts. Which different problems do we have to chain together to actually get to end up in a problematic situation? We saw on that compromised document earlier, if we start from exploiting an open-source library and we get into a CICD build and then we run that thing in production, well, we shouldn't have the ability to open arbitrary shells or attach a reverse shell to a bash process. That should be detected by intrusion detection. We shouldn't be able to punch outbound to the internet through our egress-controlled firewall. So we look at the parts and we say, what's the most simple control we could apply to cover as many cases as possible and to ensure that the threats we've identified are not actually executable or exploitable in our environment? We've also got those... We can use our synths and various other guidance produced by SIGs and various other standards bodies. Micro attack, of course. Stride and Pasta, both as types of enumeration of threats frameworks, let's say. So threats are against the security properties we want and must preserve on our supply chain but instance, integrity. Integrity of the source, integrity of the build, and integrity of our runtime. Let's see how integrity can be compromised. So threats by themselves can be useful, but actually the exploitation is much more useful. I'll publish all these slides on Twitter because there's a number of diagrams coming up here. After we've identified those threats, we can build something like this and a tack tree is a view of exploitability chaining together those individual points of compromise. They are a visual representation of an attack and that likelihood and impact is what we can multiply together to give this, again, abstract risk score. So, step three, what can we do about the threats that we have identified? It is rarely possible to implement all these controls. Generally, we wouldn't even want to. Again, this question of allowing developers the freedom to do what the business needs, which is ship features, and ultimately, there's no point having a secure system if we're beaten to markets or our feature set is reduced by developer ability to ship. Breaking the attack chain is where we layer our controls across the branches of the attack tree in order to get the minimum viable set of security configurations. Again, with that view on developer productivity. And finally, did we do a good job? Looping back to the beginning of the process, an artist is only as good as when they started their last piece of work and conceived the final piece and then filled in the gaps. And again, we get to the end and think, okay, we've now completed one iteration. What did we learn? What specifics of the system may we have descoped that we want to bring into scope? Or where did we not have enough time to spend that maybe we want to go back and look at? And this cycle of threat modeling is then either on a temporal cadence. We do it every six or 12 months. Maybe it's on a feature triggered cadence. We're about to change or substantially alter a piece of functionality that is either it's seen exploits, seen vulnerabilities before or it's business critical or it's going to be refactored. And so there's a lot of code change. It's a constant observational loop we can also go through at that point. We can test these things dynamically. We can apply security testing in case of a penetration test. And then as we see to revise the model going back and reviewing. So at this point, we understand with threat modeling we have a way to identify threats, somewhat list them and assess how much damage they can make prioritizing the order of controls to apply. We also understand the controls can break attack chains or very least slow down our adversaries. But historically, not many people have been focused on supply chains. We did not see this come into stark relief until the advent of the sunburst attacks. This has changed dramatically with various bits of ordinance from the US government. So let's have a look what's available to us in terms of frameworks, architectural recommendations, best practices and controls for supply chain security to start with Solter. Solter is a framework to quantify the defensive posture of an organization's supply chain activities. They recommend code signing, artifact signing, build distribution, policies code. Provenance is just a claim that some entity, generally the builder, produced a software artifact or two, executed a recipe with some artifact's input and then we stamped the output with a signature to say the person in control of this key at this time liked this sufficiently to give you some cryptographic metadata about it. The Cloud Native Security Supply Chain White Paper is a tag security effort that involves detailing these controls and is a reference guide. I strongly recommend that you have a look. There is not only a supply chain security white paper but also secondarily a Cloud Native Security White Paper in general, huge props to friends and colleagues in tag security for putting that together. So the guidance is we take these five stages and protect, first of all, our source code, third party artifacts and dependencies, the build infrastructure itself, the artifacts that we build on that infrastructure and finally the runtime environment into which we are deploying. So this is a full list of controls for each of the principles. This will show you broadly how the controls and practices are always hinged on those set out principles. Again, a lot of information density here. These will be up on Twitter very shortly. And this is mapping each of the individual recommendations from the supply chain security white paper to our abstract infrastructure. And as you can see, there are a great number of different things to be aware of here. Once we've been through all of that process, we still need to secure the runtime environment and that's not covered by the supply chain attacks. It is an effective remediation point, but yes, it's sufficiently complex. So yeah, back to Solter. This is an open SSF, the open source security foundation, of course. And it's a set of standards and technical principles that we can adopt to improve artifact integrity and build towards more resilient systems. Not a single tool, but a number of different steps that we can take to achieve more hardened deployments. And steps are meant to address more and more sophisticated threats as we advance to Solter level 4. What we're looking at here is a list of potential compromise points for the build of the system from this consumer producer perspective. There are broadly four stages. It's supposed to be easy to adopt, giving you visibility into your supply chain and the ability to generate provenance. That is metadata about where this thing came from in the event of compromise that allows us to backtrack and at least get some visibility into what's happening. Then we start to protect against software tampering, adding minimal build integrity guarantees. Solter level 3 is hardening the infrastructure against attacks, integrating more trusts. And then finally, stage 4 requires a two-person review of all changes and a hermetic reproducible build process. That reproducibility is important because it allows us to do geographically dispersed builds. This is how Debian, this is how Arch Linux, various other operating systems build, in the event of a compromised build infrastructure, the ability to run duplicate builds in geographically redundant locations and compare the cryptographic hashes of the outcomes gives us a way to detect if there has been tampering with a build infrastructure. Any non-determinism which includes time stamps or, for example, locale changes that alter the ordering of text sourcing, these will fundamentally, by definition, alter the hash, the content address of the data. And so we have to strive for reproducibility as the foundation for that approach. So Solter asks us to protect the source code. Obviously, a lot of these things are just best practice. Solter comes from Google's practices since 2013 and is ultimately a view on how they have looked to respond to an advanced persistent threat that took over their systems and resulted in a loss of IP. There's an interesting point here about protecting the build that I'll get to with the concept of Solter minus one. And finally, yeah, protecting the provenance, which is once we've generated this metadata ensuring it's not tampered with, the thing missing from the Solter specification is the concept of the integrity or the sanctity of the infrastructure that things are being built on. If we are a Google-sized organization, we're going to be relatively confident that our build infrastructure is already resistant, that there's good access control, that we can detect tampering as it occurs for the rest of us. This plays into the shared responsibility model and we have to be cognizant that even if we get to Solter level four, if somebody has compromised our build infrastructure, we will still have a bad day. So this concept of Solter minus one is looking to address that for the rest of us, preventing tampering with the build systems' controls by ensuring the build is only accessible to authorized users, which is a difficult problem in and of itself when CICD is nothing more than remote code execution as a service, but intentionally so. This is basically the state of the ecosystem as we see it today. Again, duplicated in our supply chain security best practices flashcards, should you be so inclined. Looking at the base image, I mean, we know these things. We want to ship minimal dependencies and have hardened base images. For some organizations, if we can just drop statically compiled applications into a essentially scratch container, just with our locale information and our root public certificates in there, then we're in a really nice place, because in the event of remote code execution, and let's go back to log the shell, for example, actually that's not a good example, we didn't spawn a bash shell specifically, but something like shell shock, let's say, which does spawn a reverse shell to get that TCP connection back out. Well, if we don't have bash in our base image, it can't be compromised. If someone's able to run code remotely, generally they'll try and tap out to a bash shell and use that to establish persistence. If we don't have the things for the attacker to exploit, we're in a more hardened position. This is one of the original promises of containers, so we know that. It's nice and simple. Many organizations require a lot more than that to go into base images to do with compliance or their own roots of trust. So at least making sure that we have minimal operating system dependencies gives us that reduced attack surface. We are just running a microcosm of Linux within a container, and so we apply our standard best practices. Making sure developers can debug images is useful. There are now excellent ways to do this. Thankfully, Kubernetes has introduced ephemeral containers, so we can actually run a debug container as well. More interestingly, perhaps, into the code commit signing itself. This is really contentious. Certainly the Linux kernel suggests that only a merge commit needs to be signed. More broadly, I think, actually it makes sense to sign everything. Why is that? Because, again, a signature says the person in control of this key at this time liked this enough to do a thing. If we have a password on our SSH key, that makes it a lot more difficult for a compromised key to be used to push code to GitHub. If we have local device compromise, we can still sniff the keystrokes and we can still exfiltrate that key push code. If we have a GPG key, we need a physical second factor stuck in the side of the machine for a period of time to actually perform the signature and the push. If we also enforce branch protection, that requires that, in my opinion, that is significantly more hardened than just requiring a single control of the SSH key. Static analysis. Reachability analysis ties into something called VEX. The vulnerability exploitability exchange format. I gave a talk at the open source summit in Dublin on this point. If we have a CVE in a project that we depend upon, so we have maybe a web server and a CVE is in one of its dependencies, if there is no call graph, there's no code reachability to exploit that CVE, our vulnerability assessment teams should not have to worry about it. VEX is a way for the producer to understand that this is not vulnerable, it's not compromisable. Reachability analysis and AST observation abstracts in sex tree walking is how we can get some way to doing that. There's also a symbolic execution mechanism for doing that. Beyond the scope of this talk, but certainly things that we're interested in contributing to. Dependency analysis. This is minimum viable security. There's a lot of CVE scanning for CVEs, but we do have to respond to them. S-bomb generation. So S-bombs are great as long as they're correct. Generally, S-bombs are not correct and people don't ship them. That's because like CVE scanning, they just look at package manifests. If you've got a POM.XML or a package.json, that's what generates your S-bomb. That's not really that useful if people are also downloading stuff on the internet, dropping binary blobs in. There is some support for this in Recall which is the public signature registry attached to the SIG store project which says you can push multiple versions of something attached to a specific hash and then it's about the recency and your trust in the producer of those. But yes, open and developing area of security interest. CI scorecard. Moving on to the build, build tracing. video file. software. Build and build guidance which will run as your PID one essentially, and trace all your build behavior to help derive what you actually do and how that is built. Pipeline metadata collecting we put in Toto and Tough. This is ensuring that our builds are actually what we think they are and that we sign the inputs and outputs so that we know exactly where they came from and where they go. Evidence, Lates, and Ledges, Quack, just got the most popular project from GitHub yesterday. It is a graph for the understanding of artifact composition. It essentially allows you to query your supply chain. Incredible piece of work, really strong recommend. Recor is the ledger, of course, from the back of Sigstore. And artifact signing, cosine notation. It's like GPG signing for binary blobs. And then we know most of the rest of these from application images and emission control. I am almost at time, so moving a little bit quickly through the end. Pipeline metadata is the only way that we can piece things back together. It also gives us a different type of observation. That is it, time to wrap up. The problem space is significant. Threat modeling gives us a view on how we can actually quantifiably defend ourselves. There's plenty of things available to defend with. Retrofit and slowly mature your supply chain, and you are not alone. Tag security and the OpenSSF welcome all contributors. Thank you very much for your attention. We have done for just a question. Okay. Handy, have you seen a lot of companies actually start measuring their salsa level? And is it reasonable to start asking one's vendors about their salsa level? I think we should absolutely ask vendors. It's probably a long time from being something that will be externalized. I think asking for S-bombs is probably a closer first step. The existence of a vendor performing salsa work, I think it's a useful indicator of the maturity, but actually there's a lot of other things that are not covered, like meantime to remediation. If you're being shipped to binary artifacts, how long do the vendors take before they actually patch it? So salsa is good for that specific sunburst supply chain attack. I personally would like to see S-bombs distributed and also externalized by, for example, SaaS providers and cloud providers, but I think the meantime to remediation is actually the interesting sort of meta data, I suppose. And it's difficult to do as well, is the point. It's computationally expensive and takes developed time. I have a few of these supply chain security best practices. If anyone would like a flash card, and I think I've got a book signing in half an hour at the booth, if you would like that too.