 So, way better, I think. Yep. Thank you. All right, so, my name's Kareem Citerli. I'm a former operations engineer. Now I work as a developer advocate at HashiCorp, where I get to tell people about best practices, how to implement them. Generally, I don't get paid when stuff breaks in production. It's kind of like platform engineering, love it. You tell people what to do, don't actually have to do it. Well, past life, I worked on high performance ad tech, serving the stuff that everyone uses app lockers for, appreciate you all for killing my job. After that, switched to the Amsterdam airport, worked in their industrial IoT team, where we made a lot of security conscious choices. Was a lot of learning on the job, if you want to call it that way. I'm at KCiterli pretty much everywhere. So, if you have thoughts about this talk, let me know. If you want to talk about chocolate cakes, definitely let me know. Very big into that. So, let's start with something fun. Get everyone excited about this. We've been here for three days, some of us even longer. It's been amazing, 16 tracks, and the conclusion really is OSS has won. It was in the competition, but OSS definitely killed it. And that makes me happy. It makes me happy because it means that we validated an approach to engineering, where we can bring in engineering skills from other teams that donate their time to make our applications, or whatever else we do with code, be better. The apps you run on your phone, your dark images, the software you use on your PlayStation, even the cars navigation interface, all of that has open source in there. Pretty much amazing. 98% of applications have OSS dependencies, and if you let that sink in for a second, it's like, cool, there's no way back. You simply can't go any other way. And it's great, except now, every one of those teams is in your repositories and in your source code. We stand on the shoulders of giants who themselves stand on the shoulders of Titans, and we gotta trust a whole lot of people. It's beautiful, it's magical, and mostly worrisome to me. Anyway, because having everyone in your repository is not my favorite thing. So, quick level setting. Let's talk about software development at software lifecycle. We'll start with the most simple one. Building, usually on your local machine, these days, maybe GitHub code spaces. If you've never tried that before, I suggest grabbing a tablet and seeing how much you can get done without the distraction of your computer. It's magical. I'll go with that. Compiling, of course. Used to happen on your machine. Hopefully now it happens on a build server, and the reason for that is really that you have more control. Control is important, especially if you're talking about supply chain security and delivering value that is reproducible. Next up, you build artifacts. You gotta put them somewhere. Used to be that the build server was also the server that ran our stuff. Maybe it was your machine with some Ngrok in front of it. That's great for a hackathon, not great for your environment, not great for your auditors. Formal security aspect is two choices you can make. Either you can help your auditors be happy right now, or you can help them be happy once the incident has been triaged. I prefer to go with make them happy now. And of course, running. Kubernetes, Nomad, whatever you use to run your applications in production, the patterns are pretty much the same. And then one thing that I like to add in there, if we're talking about confidential workloads, if we're talking about making sure we know what's running, we also need to figure out a way of making things go away. So revoking. Revoking access, revoking access credentials, but also revoking access to specific images, outdated images, vulnerable images, vulnerable binaries, and have a process around that. And the way I think we can do this is we split it into four parts, but they all come down to a single thing. It's all about building trust. Each part in this life cycle is very strongly this. Trusting people all the time. Trusting is hard. If you've worked with people, if you're an IT that usually happens, you've had to trust people. You've had to trust your seniors to tell you the right information for you to be able to work. As a junior, you have to trust that what you're doing is useful for the product. As your product manager or project manager, you have to trust that your engineering team does the right thing. There's a lot of that going around. Lots of entities that build on each other that require each other. So, level setting out of the way, let's talk about some of the patterns that we can use to protect the process, the build process in our case. Quick show of hands, who here works for a company that has more than 1,000 employees? 5,000? I'm sorry for you. If you work in an organization that has a corporate IP, sorry, IT department, not IP, you might, if 5,000 people, you might have a corporate IP department as well. Any device you have usually gets encumbered or enhanced depending on whose side you're on by some corporate malware. And that's great. I work at a company that locks down laptops. It's beautiful because I can leave this device here and I don't have to worry. It also means that when I actually want to work, this is not the best place to be. These apps make it harder for you as a developer to create things and to deliver value. But they help other teams deliver their value. Important layers of protection. I like to joke about it, but it is important that your endpoints are protected. If you work at a startup, you likely have none of these problems. I envy you. It's beautiful. But for those of us that don't, you know this is kind of like Jurassic Park. Life and engineers find a way to make problems go away. And the way we do that is real simple. Somehow you always find a way to be in the root user group. And as long as I have root access to my machine, this is no longer a trusted endpoint. Or it shouldn't be. No matter what the auditor thinks. You might hit a couple of green check marks in whichever system your company uses, but if you have root access, that means somebody else can use this as an attack vector. And we don't want that. Our corporate IT department has asked me to tell everyone that endpoint protection is important. Your laptop, your phone, anything that can talk to corporate IT infrastructure is worth protecting. Please let people know of that. So there you go. These are true words. Controlled environments are important. If you're building safety software, safety critical systems, if you're working on software that makes the light go on when I push a button, I'd like for you to have a workstation that is secure and trustable. I also realize as a developer that that is not very easy and usually not very fun for you. But I also pay a monthly subscription for my utility so I really don't care. Ultimately, laptops not controlled environments. So from a software building perspective, if we're talking about hermetic builds, reproducible builds, supply chain security, you can't really trust a developer's machine. I've got root access on here. Any malware you install, I will certainly help to be sandboxed away and not impacting me. So for all intents and purposes, and this is an important one, for the safety of your applications, assume hostile intent for everything that comes from a development environment. You can't build enough protection onto a machine where your engineers have root access. That's okay. You can't hit 100% of protection on that device anyway. I'd say why bother at that point? Corporate IT has thoughts on that. The reason is if you don't have that protection, you can't trust it. So we gotta shift the trust to somewhere else. Somewhere where I can't skip checks, where I can't work around the system and come up with my own shadow IT. Which is important. We wanna get shit done. We all know that built pipelines usually have that little escape hatch where if you set the right variable, all those checks that usually take 45 minutes to run they automatically pass. Calls the folks want to test. That's okay. As somebody love because you know what I'm talking about. Appreciate that. I usually say in my intro, it used to be a failed stand up comedian which is why I now work in IT. The one person laughing was why it's a failed part. So let's talk about some of the patterns we can use to protect the compilation process. And when I talk about compiling, it doesn't matter if you're using a web application where you're just web packing everything. It doesn't matter if you're assembling containers, zipping files, actually compiling things for different operating systems. We're just gonna use it as an umbrella term. We talked about hostile intent. Very important. This means that the code that ends up on our build server is not something we can trust. Which puts our build server in a very, very weird position. Something comes in, it's not sanitary. I worked at the airport where we had the split between the clean group of people and the dirty group of people. It's not a race thing. It's more who got checked by security and who didn't. If you were checked by security, you would consider clean and you could access different parts of the airport. Do the same thing with your build server. Any code that comes in should not be able to have a direct path to your application environment. If it does, any other boundary that you come up with is worthless, no matter how much energy you put in there. Not just on a network level, on every level. Ceiling things off from each other is a good thing. Ceiling things off from people and having silos between teams, not a good thing. Silo's between your code and your place where you built the code and the place where you run the code. Definitely a good thing. Gotta verify everything automatically. Run all the tests. And for this trust to be established, it really comes down to, we can talk about signing. We have to talk about storing things the correct way. Talk about verification. No matter if you're using Salsa or a different approach. So the supply chain levels for software artifacts. Or you're just running simple, very simple stripped down software composition analysis. You need to automate these steps. I think it was mentioned in the keynote either yesterday or the day before. You can't scale security if you don't automate it. And you can't scale supply chain security if you don't automate it. If you're an application developer, if you're an operations engineer, you deal with secrets all the time. We finally figured out that we don't need to have hard-coded secrets. We can request them at runtime. Turns out all these patterns also actually work for a lot of other approaches, such as signing our software. I'm a huge fan of GitHub Actions for a simple reason that I wasted too much of my life on getting a certain Java-based build server to run. And GitHub Actions, despite the down times, is a magical platform. 80 bucks a year, gets you 3,000 minutes a month. I mean, if your builds take longer for your personal accounts, then you know what? They deserve more money, which probably gets all of us more uptime. You might substitute the patterns that I'm gonna show you with legacy build servers, such as Jenkins or more cloud-native ones like Spinnaker. It's not important which tool you use. It's important that you understand the pattern. And for this, what I'll do is ignore the title. Code slides are hard, but definitely look at the code. For this, I'll use Terraform to configure my GitHub organization. I'm an operations engineer. I'm a lazy engineer. I like to do things once properly. And my preferred way of doing that is to codify it and express it as code. Because then, once I've done it, I can apply this to all the GitHub organizations I work in. Codification makes sense because when you look at the code, you know what's happening. This is the first layer of transparency. You don't have to question what things did he click in the UI? What things did he maybe not click in the UI? It's just explained here. So, quick step back to Salsa. We talk a lot about level one, two, three. We have scorecards that help us figure out where we are in the journey. It's all great. It sucks when we don't talk about level minus one, level zero, whichever one you want to start counting at. Basically, the foundational level of is our build infrastructure secure? Is that server where we run our builds trustworthy? Or is it just basically another developer's laptop? Put differently, how do you establish trust between what you're doing and what's happening on your build server in a reliable way so that you can trust that what the build server tells you is true is actually not something that was tampered with? It's a hard question. It's not a question that you can answer in 30 minutes. It's a question that requires multiple teams to work together and it's a question that requires you to very likely re-architect part of your pipeline. And whenever you run build servers, most of us work in the cloud. You've got a shared responsibility model. AWS tells you, look, this is what you're liable for. This is what we're liable for. It's great. We all click yes because I mean, it's an end user license agreement. Of course, you're gonna click yes. It only becomes a problem when stuff leaks and you end up on a new site and then have to crawl back and see what you actually said yes to. The long and short of it is you must carry out certain actions to protect your infrastructure and the cloud provider agrees to do the same for fee, of course. That being said, proving that your infrastructure has not been tampered with is a whole different talk. It's a talk worth having. So if you're interested about that, definitely let's talk about it later. Especially if you're in a controlled industry, anything healthcare, anything financial. There's a lot of fun that happens there and I say fun very ironically. But it's fun engineering challenges. So, back to our code. Let me quickly have a look at the preview here. We start by defining which tasks we want to run. I'm using GitHub Actions, but again, if you want to run this through Jenkins, the same pattern applies. I've got an allow list of actions that I want to run. And this is important. These are human readable. You can look at this. You can see that I'm using version 1.37 of the action slinter, making sure that my workflow files are good. And from that, I can do something important. v1.37 is great for humans to read, right? It is not very secure. If you've ever created a tag with Git, you know that those tags can be recreated. Raise your hand or don't raise your hand if you've never had to recreate a Git tag just because you may have screwed up something. I've certainly done that. 15 years in IT, I still do that at least once a month, which is why I like to have tooling that now does it for me. Because when it's attached to a service account, nobody knows it was my fault. This is good. Transparency is very important unless it makes me look bad. So we define a lot of these. Quick scroll down. There's about 15 different actions in here. All of them human readable. Any new engineer, even if you've never written Terraform before, should be able to figure this out. We're defining a GitHub repository, an action, maybe a path if needed, and in a human readable way. And this is where it gets fun. Because GitHub has APIs for everything, we can create a GitHub API with those releases and get the commit-ish string. Commit-ish is a very weird word, but that is how the API calls it. The git hash that will uniquely identify your result. We can do that based on a tag. And the reason this is important is put a few of those lines together, probably like 20 lines. We're able to get all of that and use that to configure our allow list inside the project. Kind of looks like this. We're explaining to GitHub, in codification, I want only GitHub-owned actions to work. All of them have to be verified. It has to apply to all repositories. And that's important. There is no cheating. There is no, oh, you're an admin on this org, you're a super admin or you're part of the operations team. This doesn't apply to you, it applies to everyone. And then we just write out the list. And in our UI, let me highlight this, this is what you get. These are not human-readable strings. These are not, I mean, you can read them, they're just not gonna be fun to read. Play a game of telephone with ease, you're gonna get 100% miss every time. The reason is that these are unique content addresses. They're relevant for computers. They do not have value for you and me in a conversation, but they have value for our supply chain security. Because with this, and with this pattern, we can uniquely identify which action is running. The 137, 1.37.0 on the first line, that address you're seeing there, you cannot tamper with. You cannot bypass this because my build server, my GitHub actions are required to go through that. You can't even fork that action and run it in your own organization because it's not on the allowed list. And so the first pattern is make sure you always have multiple layers that enforce the same thing. Sounds like duplication, it is. It's like having a front door and the door to your house right behind it. Two different keys and slightly more work to get in, but not just for you, also for the others. It's one step in a chain of steps that you have to take and it gets to somewhere. This pattern, you can apply to everything that ends up being a GitHub release. This works with Spinnaker. Spinnaker in this case also has a Terraform provider where you could almost copy, paste the same code, just change the GitHub references to Spinnaker and generate your job files this way. When you generate stuff like this in a predictable way, based on content hashes, you're increasing your security. And you're increasing your security in a way that is not upsetting to users. If my security team makes me type things like that, I will focus my life on something else. This is not fun. I understand the value of it. I don't wanna be bothered by having to actually do it. So, now that we've figured out that we can trust developer machines and we figured out a way to make sure our builds run through a predictable process, as long as GitHub Actions is online, let's talk about storing and what we can do there. So, if you talk to any public official who's tasked with protecting structures, no matter if that's an airport or a school or anything, they will tell you that safety and security are byproduct of consistent behavior. It's not just holds true for not burning things down, it holds true for the integrity of your software artifacts. That's beautiful. Doing the right thing should be easy because when you're doing the right thing and you don't have to think about it, you'll do it more often. And if you, for the people that raised a hand for the 5,000 people plus companies, you know your corporate security is hard. It's usually stuff involved that makes it not fun. And that's usually because those workflows and processes are not built with the end user in mind. Architecture will sell you amazing stuff, they make great products. It's not always frictionless. And any time there's friction in the security process, people will find ways around it. If you are in operations and you ever had to rotate secrets across more than one service, you know how much that sucks. But when we have dynamic secrets and we have applications that know how to retrieve those secrets from a system, where all you have to do is set it up once and then can forget it while still knowing that it works safely and securely. We've increased security, we've lowered the maintenance and we've made more joyful code. I like joyful code. So before we can't get to that part of having actually joyful code, we need to talk about what we can do before we can store something. Doesn't matter if you use Git commit, signing directly, something like Git sign, which is actually pretty cool, which provides signing based on your open ID provided identity, usually your Google account, your Azure, AD, username, or, sorry, oh, sorry, heard a sound. It gives you ways to sign code and make sure you are attached to it. This can be both a burden and joyful. Burden when the code is bad, at least it's signed to you, not to me. And so when we sign things, when we build things, we get build logs. Logs are important, especially if you're in a controlled environment, we need to make sure those logs are trustworthy. Reproducible builds are great, but for actually understanding why an incident happened, we need to be able to go back and look at our logs. The way to do that, in my opinion, is when you have build logs, store them in a safe place, get a checksum, store that in a secrets management solution. One simple reason for that, when you have a checksum that will tell you this file has not been altered, then you know that log is good. Then you know your attackers can't mess with that. It also brings in a different barrier that your attacker has to go through, no matter if that's an internal attacker or an external one. And really what we're doing here in IT is what the Romans figured out 1700 years ago. Defense and depth means that you have to have more than one layer. You can't just have a single door. You have to have a mode. You have to have fields. You have to have much more that makes life harder for the attacker. So use the API of your build server, extract the log, get the checksum for it, and store that somewhere. Don't store the full log encrypted. One, it's a lot of stuff to encrypt. Two, you can't use any good tooling to look through it because either you have to keep everything in memory and then it's not encrypted. Sorry, unencrypted in memory and which one is not encrypted or you're just wasting your time. And when we do that, when we have a checksum that we can trust, what we're really talking about is authenticateable. Doesn't sound like a word. Trustworthy metadata for an artifact to which we can attest its origin. And that's important. When I hand you a file that you can independently verify that it comes from me it has not been tampered with or my build server and I was the actor that kicked off that build, you have something you can trust which brings us all the way back to the first part. We're creating trust not because I tell you this is good but because you can independently verify it. If you've ever installed a package on Linux, it kind of works the same way. We have keys that we can trust and we're now finally bringing that to more of our infrastructure. So it doesn't matter if you're using S-bombs, software build materials or any other approach. This pattern scales for all of them. You're just taking the metadata encrypting that because that will tell you if something has been tampered with. That perfectly little evidence zip bag that police likes to hand around. And of course, while you're at it, a test to all the stuff that's in there. If you have known CDEs in your code and they're within the risk management profile that you've associated with that application and you're okay with having that in there, a test to that, sign a log with your username that helps people figure out we knew this was in there, this is intentional and it's in there for this and this reason. In automated processes, this burden goes down by a lot because when you're signed into your build server, your identity is known, attestation gets really, really easy. So we have the hard part mostly done. We've got sort of trustworthy builds. We know how to inspect them when organic matter hits the fan afterwards. So the next step is really figuring out how to go towards hermetic builds. So artifacts that are not stored on the application server. And I mentioned this before, it is very important to have that boundary. Your build server may not be as open as your development machine, but it's still not the most trustworthy place. You definitely don't want to be running production environments on that or even have build agents that connect to production environments. So let's talk about the orchestration process for that. When we create a hermetic build, the first thing we're doing is making sure all the dependencies are in that single artifact that we deliver. Your application server doesn't need to go out and download things at runtime. It's cool as some of those demos look like. If I'm starting up my note app and at that point I have to do NPM install in production, that means I'm not running the same code that I was running before. You might get a new version, you might get a fudged update, maybe something that got broken. And you're no longer able to attest that this is actually trustworthy. When you inject build dependencies at runtime, it's a bad time. It's key to prevent this mistake because it is the only way you can trust what runs in production. And trusting when you run in production is hard. We have stopgap measures for all of this. Every orchestrator these days has ways to figure this out. But we're at that stage where we can do that and where we can't figure this out. We have code that we can trust. We've got build logs that have them verified. So how do we run it? The software factory has done its job, right? So really it comes down to one simple thing. This is a pattern I personally enjoy a lot, pre-start and post-start hooks and post-stop hooks that allow you to interact and do cleanups or preparatory work before your application runs. We have an image that we think is okay. But there's a disconnect between what the build server knows and what our machine knows. Humanities, Nomad, they're great at caching images, which means the image you might be running might not be the image that you should be running in the sense that our security team could have revoked something. But because the image is cached, there's still the problem that, well, it's there, the binary is there. We could be running this. So for our pattern here, we use a pre-start task. This is a Nomad job description. You can express this as YAML and run it through Kubernetes in almost the same way, though at that point I would probably see if I could inject it as a sidecar as well. It works with every modern orchestrator. And what we do is we define an artifact that has gone through the same pipeline. In this case, it runs a pre-flight check. It has access to everything the primary artifact has access to the actual workload. And our pre-flight check will check against a revocation list. Is this image good? Does this image meet the right scores? Is this image actually supposed to run in this geographic region? Your registries don't care. As long as you authenticate, they will give you the right image. We need to make sure our orchestrators know ways to verify what's going to be running, even if it's locally cached. In this case, the pre-flight check, very simple. It checks a couple of variables and then bales if the image is on a deny list. Specifically, even if that image's version has been deny listed. Every organization approaches this slightly differently. So it's not so much worth it to show you the code of the pre-flight app. Just think of it as an HEP API that gives you either 200 or 41 in terms of should I run this or should I not. It is a very binary output, and that's exactly how it should be. I had somebody ask about this yesterday, which is why I wanted to highlight this specific part. All the code that I'll share with you later is using content addresses and never human readable versions. And the reason for that is it's all generated code. I use templating for everything. So content addresses show up there as they should, because that's the only way to be actually secure with that, but paired with the comfort of a human readable string right after as a comment. And so ultimately the pre-flight check, well, it just checks your boarding pass. Is this passenger good to go? And if you get the night boarding, well, it's actually a good thing here. From a safety perspective, this gives you a fail secure approach. Instead of failing safely, which means, well, there's an emergency, all the doors unlocked so everyone can leave or come in, we're failing securely. If we're not able to check with a verification server, this fails. If we're not seeing that version listed, it fails. We want all of this to fail before we're even starting up the image because once an unapproved and unverified workload is running, it's too late. And we would like to generally avoid that. So we shifted some parts of our supply chain security left. We talked about some patterns that make sense. Where do we go from here? These patterns are patterns that work from an infrastructure perspective. They work mostly from an applications perspective, but they're not gonna be 100% applicable to everyone in here. Every organization has different needs. So take it as inspiration. Take it as a way to think differently about the way you approach supply chain security and think about which things you can tag on to make it even harder for your teams. Talk with your engineers, front end, back end, security, networking. Talk about other stakeholders, talk, don't talk about other stakeholders, talk with other stakeholders and figure out how to make this process less leaky for your organization. Security is never just a you thing and very much in the spirit of Zelda. It's dangerous to go alone. So do it as a team. You can definitely show leadership by advancing your applications or your team security stance, but it doesn't mean that you have to be the only one that cares about this. It shouldn't be just you. Defending against attacks, both from the inside and the outside, is hard. It's tiresome and it's easy to make mistakes. If you're curious about these patterns, check out github.com slash workloads. Well, we've got some of this in best practice implementations with a focus on Ashikorp tooling. Ultimately, all code is over documented. So even if you're not using Ashikorp tools, you're still gonna be able to figure out the patterns. The github API is a github API. The GitLab API works in almost a similar way. There's a lot of search and replace you could be doing and bring this to other version control systems. And with that, we're right on time. So thank you for your time. You've been amazing. Thanks. Thank you.