 So thank you everyone for coming out. GitHub Actions is one of the most popular CI tools in use today, but if you need to or want to run it yourself, there's really not a lot of guidance to make and there's a whole lot of decisions you need to make. A lot of them have security implications. The most popular choice in Kubernetes is Actions Runner Controller, which is an open source community project for managing and scaling your runner agents. It's a controller. So it's not terribly difficult to think through the ideas of running your agents as pods, but you're running arbitrary shell commands. There's not a lot of security guidance on that. So hi, I'm Natalie. I'm a sticker engineer at GitHub. I work exclusively with our most regulated and security focused user communities, having led one myself for years. This is a really fun intersection of security, technology, and compliance. So let's talk about developer enablement, but doing it securely and scalably and in containers. So we're gonna talk a little bit about where we're headed. So first, we're gonna talk about why on earth would you ever do this? GitHub Actions has magic already done for you. Like why would you ever want to go down this really hard path? We're gonna talk a little bit about the Kubernetes cluster settings. We're gonna talk about GitHub settings and deployment scopes, and then a little bit about how multi-tenancy works with Actions Runner Controller. We're gonna end on looking at your runner images and then have a little bit of wrap up on conclusions. I am very sorry if you came in and expected me to tell you with this one file of YAML, you will be completely unhackable because that's not how any of this works. However, this is where I've seen problems, weird edge cases, and security not being considered as companies start to adopt this. And across the spectrum of maturity of container adoption, this is a really weird use case. But first off, I wanna just be very upfront with this. I have a bias throughout my career. I have continually confirmed this bias. And that's that friction. The force of resistance of movement between two parties is the leading cause of users, administrators, and developers doing insecure things. Eliminating any and all friction when any security concern is involved will inherently lead to proportionately fewer insecure things. To be clear, I am throwing no stones in this glass house because I have totally been that developer. There are extremely few network protections that creative use of SSH port forwarding can't bypass. And this is both worrying and extremely helpful to me over my career. So let's talk a little bit about GitHub Actions. We're not here to talk in-depth about it as such, but there's some things we should all know as we start to look at self-hosting this. It's our built-in automation platform. It's frequently used for CI CD, but it runs most of GitHub if we're being uncomfortably honest here. And the cool kind of thing about it is you have this great open source marketplace. There's 17,000 or some odd actions in that marketplace of reusable workflow bits. However, when a user on github.com just uses actions, you're actually getting a ton of engineering that's designed to be completely and totally invisible to you. You just say, hey, run on a Mac. And it does, or run on Ubuntu Latest. And it does magic for you. When you do this, though, you're getting an ephemeral virtual machine with a ton of stuff that's already preloaded for you. So much so that the Ubuntu Latest image clocks in at about 50 gigabytes. Works well for a VM with a little bit unwieldy for a container. And you can get an S-bomb. It's readily available for you. It's updated weekly. It's scaled. It's magic. It's SAS. We're not here to talk about that. We're here to talk about what happens when you bring that into your own network and run it on your own Kubernetes cluster. And that is completely free. Well, you don't pay GitHub for it. You pay your commodity cloud compute, your data center co-location, whatever compute cost you have. For that, we ship the runner agent. It's open source. There's the repository in the slides. The trade-off with this is that all of the security operations and all of the rest of the operations as well are on you, on your team. And this is an uneven and often unexpected burden for our security teams. And not everyone has the option of just using SAS. Here are some really common reasons that we see self-hosted runners. The first is GitHub Enterprise Server. If you self-host GitHub, you're self-hosting all the compute that goes around GitHub as well. If you want custom hardware, custom software, and honestly, because you want to, and I'm not here to judge you, I am here to help you. So we need a controller. This is where ActionsRunnerController comes in. It's an open source project. It started out as Summer Wins, ActionRunnerController, moved to the ActionsRunnerController organization. And now it is officially the auto-scaling solution for self-hosted runners within GitHub. It does ship some images for the runners. I'm sorry. But most users build their own. I'm gonna pause here in preface that there is a ton of work going on in this project right now. So this architecture diagram is what is currently in ActionsRunnerController, as well as the current CRDs in here. Like I said, the exact CRDs and the implementation of this is about to change. The additional work will provide better scaling as a supported path. And here's a high-level overview that'll age a little bit better than milk or this diagram. Auto-scaling is pull-driven over APIs. In the current implementation, that's a very short pull length. The newer implementation will have a longer pull link on new APIs. And you can see an overview of the new CRDs, the architecture diagram, all of that good stuff in the documentation in that repository. We are collaborating completely in the open for this. In any case, users are responsible for all of the infrastructure security on this. And this leads to some really unique security challenges. The first I wanna really emphasize is that GitHub has 100 million developers, not counting all of the people that self-host. So it's really hard to provide opinionated guidance with an audience that wide. Very sorry. One of the key security challenges there is the user expectation of it just works. And it enables this weird anti-pattern where containers are used in very VM ways. Which does a couple of things, in addition to being kinda grossed out by 50 gigabyte container images, or five gigabyte container images for that matter. It increases both the true vulnerability area of your agents, and it also increases the noise from container scanners in your security tooling. So both of those things means it's really hard for the team to know where to invest their time. And then there's this continual tradeoff between caching for runtime speed versus just pulling your dependencies each and every time. That makes for some really big container images. And then the next really hard challenge is that there's some economic incentives that encourage poor security posture. Especially among smaller or less experienced teams. What I mean here is that running a build system in and of itself does not bring a company any money unless your business is in running CI systems. So there's just not a lot of reason to spend time doing this securely when you're only looking at the economics of this right up front. Because of this, I see quite a few installations with a handful of really common threat models. We'll talk a little bit more in depth about. The first is poorly scoped permissions or deployments. I have this thing where I'll screen share with somebody and I see them check all of the boxes for the API scopes. Please don't do that. The API scopes are there for a reason. Privileged pods, they are so much easier than writing things to be containerized properly. And expediency doesn't always help you here. Disabling or unsafely altering key security features. So disabling SE Linux comes up way, way more often than I'd like. Using latest to deploy images. Using latest is not continuous deployment. So I see quite a lot of, we're gonna pull the image every time and the image that we're pulling is latest. Hard to debug, all of that good stuff, but it is hellishly hard to fill out an incident report when you're just deploying off of latest. So whatever you do, please, if you take nothing else, latest is literally the worst. Don't ever use it. Purge it from your vocabulary. And upstream repos or mirrors that are wildly out of date. After the whole left pad incident in 2016 and a whole bunch of other kind of related shenanigans, running your own internal everything repository became super popular and for great reason. Understanding the dependencies of your software in full is a fantastic way to increase the security posture of your code and your organization. However, it requires regular care and feeding and to not just harbor unsafe, well-known, exploitable vulnerabilities. And to understand why these challenges are really unique to actions in Kubernetes, let's take a tiny detour into what GitHub Actions really are. Like I said earlier, there's about 17,000 of those actions in the marketplace and when you just use GitHub and just use our hosted runners, things work for you. You don't actually have to look and think too in depth about what's going on in that action if you don't want to. This is the first point of, this gets tricky and weird in Kubernetes. So under the cover, a GitHub Action can be one of three things. It can be JavaScript, which has to be purely JavaScript. Node 16, don't use Node 12, it's gonna go away. And it can't rely on other binaries. Composites are what you think of when you think of a traditional CI reusable pipeline. It's a bit of a catch-all category. You can run arbitrary scripts in line, directly you can call binaries on the host, you can chain multiple other actions together. And then lastly is a Docker container. It builds a container on each and every run. It executes the container inputs and outputs defined in a YAML file. You put stuff in, magically stuff comes out. And the last thing I'll note is that Podman is not a drop-in replacement. This looks kind of like serverless if you really squint and look at it from far away. But this means really that as we're modeling threats and trying to figure out a more enterprise-y policy, internally, there's a lot more stuff you need to be aware of. Because that means there's three types of security concerns. And there's two points I'd like to highlight here for JavaScript. The first is that it likely relies on dependencies and NPM. Again, most likely this is covered by your company's internal everything repository. So we need to make sure that the repositories configured to be used on the runner image and every other container in that pod. And because the runner pods are ephemeral, that means that you're gonna be pulling and hitting that mirror quite a bit. And then the next is the possibility of script injections from your users. So two things to consider here are the trustworthiness of the action that's in use. Straightforward problem, but totally out of scope for today. And that your users can pass arbitrary input into a GitHub action. So sometimes silly, sometimes malicious, I like to think of silly a little bit more. And this is a decent way to try and escape your container or otherwise partake in security shenanigans. Script injection also a concern for composites where you can run inline scripts, call arbitrary binaries. Do you allow users to use pseudo? And the more VM like GitHub actions, you can alter your environment at runtime. It's not really all that safe to do in a container. Same for granting privileged access installing software, messing with mounts. This can get really dangerous really quickly, especially when you're adding privileged pods. And then for containers, you kind of have all the regular image provenance concerns, what's going on in that build, all that good stuff. And I kind of saved the best for last year. Docker and Docker requires privileged pods. But most importantly, you're not just allowing privileged execution of random user input, right? Right, no one does this, right? Because that's the next thing and the first thing that I really like to talk to when people are wanting to build their own in-house implementation of actions on-prem is do you really trust your neighbors? And who are your neighbors? So zero trust is not a default vanilla setting, click one button, you have zero trust in Kubernetes. It's possible, it requires skilled work and upkeep to pull off. Namespaces do a pretty good job of providing resource quotas, sharing secrets, setting policies, all that good stuff. But the team that's running this will need to have access to that and it becomes really tedious to manage at scale, going back to security and friction. And then the opposite direction that you can take also has problems. Cluster sprawl is a thing. You're adding an extra infrastructure to inventory, to secure, to patch, to maintain, to stream into your scene, all of that good stuff. There's no right answer to this balance. I just want you to think through the risks of multi-tenancy versus managing lots of single tenant clusters. And I found all the privileged pods. So when I was learning Kubernetes a few years ago, there was this moment in reviewing all of the documentation and the security stuff and I'm like, well duh, of course you don't run stuff as root. Duh, didn't we already solve this with the SElytics and App Armor and the other fun like mandatory access control things that we built into the kernel to do this? It turns out all of the privileged pods are running various continuous integration systems in Kubernetes. It's basically the only place I see privileged pods. And I'm not here to tell you it effectively removes all the fun process protection because we all know that. And I'm not gonna tell you thou shalt never use privileged pods because you have compliance frameworks for that. It just never seems that it was a risk deliberately thought through and chosen when I say hey, did you mean to run this as privileged? And it doesn't have to be this way. So no talk about this is complete without the big yicky gross thing in the room which is Docker and Docker. And it is super, super risky. It provides the most SAS like experience though. And when we look through this, users want this for two reasons either they're building containers directly or they're using Docker actions. But there are some alternatives and some compensating controls to think through that might work a little bit better for you. The first is don't use Docker containers for GitHub actions which is easier said than done. Using VMs for working with containers. I'm not here at cloud native security con to talk about reverting to virtual machines but don't throw it off the table immediately. My small contribution to this space is a root loosen pseudo list Docker and Docker pod but it is still privileged. We're gonna talk a little bit about Cata containers and Firecracker. We're talking about runners with Kubernetes jobs and then container builders. So there's some worthwhile work. There's some hard things to do. And there's some easier things to do. So for rootless, it really is rootless. It provides some coverage by removing the user's ability to run arbitrary stuff as root. So you can see here that we're trying to mount something and we're gonna know and we tried pseudo and pseudo's not here and we can't SU into root. But Docker run hello world still works. However, there's some significant trade-offs in this approach. So on the upside, again, we're preventing a lot of that user silliness. And there's still a shell though and some common utilities that I would consider risky in this context. Things like curl or W get that are available in the pod because that pod was going to want to run a GitHub action which might curl down something or it might ask, we're like, hey, am I gonna install NPM packages? And it makes a runner image that generally just works for both Docker actions and for building containers more generally. But it's still privileged for all the same reasons that regular rootful nested containerization needs a privileged access. Next, there's no elevated access inside the container, but yeah, any software configuration changes, any of that will need to happen before runtime. In this type of situation, you cannot say, hey, yum update or apt-get install the thing I need to build my software. And this means that users need to use this only for working with containers and or rebuild that image to have more software. And then at the very bottom of this slide, I have an example Docker file that you can look at for this. And it's normally at this point in that architecture review where someone in the room says, just use Firecracker, it works for AWS. Or any other of the make Kubernetes orchestrate virtual machines technology. And I am not here to dismiss it at all. It's a solid idea. VMs are much, much more isolated both from each other and from the host than containers are. However, here's where I've seen teams going down this route have a difficult and insecure time. The first obstacle is a little bit industry specific for me. There's not many managed providers in this space, doubly so if you need specific compliance certifications like, say, FedRAMP. This means that the team is likely running their own cluster on existing and authorized bare metal or VM infrastructure. This is not a problem. This just adds cost and complexity that maybe isn't always immediately obvious. And it increases how much more of the platform security again is on your team. The next thing that the team faces is all of the paper cuts that you encounter because Kubernetes expects to manage containers and the petty hill I will die on is that containers are not VMs. The most common ways in addressing these paper cuts ultimately undermine the isolation that drives this architecture decision to begin with. So the first thing I like to talk through is shared files and mounts. Well, many workloads rely on secrets, config maps, service account credentials, all of the things that are normally shared as mounts. Sharing a mount in a VM is significantly more difficult. And while Firecracker container deep has a CNI compatible network plugin and it's super, super cool, not every pod as a VM technology has this. So keeping your isolation boundaries intact can be a little difficult. And then insecure things on the administration side are still quite possible. The first thing I see normally with teams going down this route is disabling or unsafely altering set comp. You can also drastically over-subscribe your physical resources. You can bridge networking between your pod VM thing and your host. So you're not as isolated as you're really wanting to be. And then lastly and most shockingly to everyone is if you do insecure things on a VM, it's still insecure. And what I mean here is disabling SSL verification, which is my personal pet peeve. Messing with your software repositories, downloading non-managed dependencies, piping stuff to a privileged shell. But more importantly, this is the foundation of your company's software supply chain, the way that you're building your software. If you're publishing to an internal registry, writing to a shared amount, otherwise persisting data from an insecure platform, like yes, it's a VM, yes it's more isolated, but you're still doing dumb things. And lastly, I'm gonna call out that this is a fantastically documented project and they ship improvements super duper fast, which is amazing and also sometimes difficult for some enterprise teams to truly keep up with. Not a bad idea, actually a pretty good idea. Just lots of time, money, and smarts go into pulling it off. However, just like in this picture, there is a lovely paved path that does not require docker and docker. We call it runner with Kubernetes jobs. It's a little longer upfront to use it, so instead of cutting that corner through the muddy path, you're actually kind of staying on the paved path and keeping things clean. It uses hooks, so see the link in the slide, and a service account to un-nest docker and docker and shove that into its own non-privileged pod that's running in the same namespace. There are some trade-offs here though. The first is that docker actions don't work as is. So if you want a docker action, what I normally see people end up doing is building that docker container, keeping that in an internal registry and using it as a composite action to docker run with all of the inputs and then capture all of the outputs. And container builds will still need something like Conoco. But most importantly, just use things for what they're meant for, and docker and docker is not what it was meant for. So avoid privileged pods, friends don't let friends privilege, and we're all friends here. Segregate your workloads that need privileged access. I can't tell you that you'll never, ever, ever run into a legitimate business case for a privileged pod. But when you do, maybe put them in quarantine. The runner with Kubernetes jobs is super duper cool and you should use it. If you absolutely positively have to run docker and docker and maybe it's sometimes okay, that rootless and pseudo-list is probably a good way to go. And then multi-tenancy, especially poorly thought out multi-tenancy, can be quite hazardous. And it can come at the expense of security. So let's talk a little bit about the ins and outs of how multi-tenancy is an arc. So security is going to dictate where the right group to place your runners on the image is. You can be as broad as you would like and as narrow as you need. And runners can exist at the repository level, the organization level, or the enterprise level. But fundamentally, every machine that can access GitHub on TCP port 443 can be a runner. And this is both absolutely fantastic and it means you might need to think through your security boundary a little bit. For the controller authorization, there's two authentication methods. There's GitHub apps, which you should always use. They provide you granular control, they have higher API rate limits. However, in this case, again going back to kind of like old actions runner controller or present and the very near future state, this is only available for repositories and organizations. If you want enterprise runners, you're using a personal access token with a enterprise admin scope. So it is a narrow subset of the enterprise permissions. However, it is still an enterprise administrator. So when I tell you, you're gonna see the checkbox of API scopes. Please don't ever just check all the boxes. You only need one in this case. And it's the only authentication method for enterprise deployments. Apart from that, there's no difference on functionality for actions runner controller. The reason I really wanna spend a little bit of time on this is being a community project, it did what was reasonably expedient and easy. The authentication is stored as a secret in the namespace in the controller. So it's normally actions runner system. And it's passed to the pod to register it as a runner. So please do not give that token or that GitHub app more than it needs. And I would like to highlight, this is changing to just-in-time authentication very soon. And lastly, this is not the token that GitHub actions uses at runtime. That is already a just-in-time generated token. So in practice, this is what multi-tenancy tends to look like. You have a shared cluster and a shared actions runner controller can manage all of these deployments. And our cluster is going to control our hardware management, ingress, egress, log forwarding, all of the good fun cluster wide things. Our namespaces are gonna set resource quotas, pod admissions controllers, security policies, network, all of the good fun stuff there. And then our deployments are gonna control the scope. So who can use it? The image to use, vertical horizontal scaling, and shared mounts. And then you can always have room to grow. And then for recommendations, the wider you go on the scope, please default to safe and allow and govern your narrow deployments as needed. Having been in this administrative enablement position in a very regulated environment, I promise you that picture is absolutely accurate. When someone says, I really, really need root access to something. And then empowerment in this situation comes in the form of no, and here's how I'm going to help you do this safer. So let's talk a little bit about that runner image. A typical runner, like I said, most users build their own, is based on a broad OS tag. This is actually not that bad. The base software is updated for you beforehand. So you don't have to bloat your image up, running apt update or yum update, and then trying to run your apt clear cache or clearing the yum cache out. Again, this is kind of a weird use of containers. Big, big, giant red flag if there's no user account setup in that Docker file. Friends don't let friends run as root. You're going to set up the runner agency and all of the dependencies, copy in some shims, and then sometimes you're going to include in a knit process dependencies that your users expect, cached tools, cached dependencies, all of that good stuff. I see there were a couple of people that kind of went, ugh, when I said they're putting in a knit system in a container, and I know that is a controversial opinion. However, keep in mind that in a knit system is actually kind of good in this use case because your runner agent is already a process. So any process that it's going to kick off as well, you're going to have multiple things in that container. So it's very, very common to have an a knit process in these. And yeah, this is starting to sound a little bit like a VM, isn't it? Here are some examples to get you started. So Actions Runner Controller currently builds three runner images all based on Ubuntu. Most people will either use them as is or build their own internally. So there's a runner without Docker, there's a Docker and Docker runner, and then there's the rootless and pseudo list Docker and Docker. And at the very bottom, there is a link with tons of other community projects where people have built their own images. You may have forgotten in building your own image. Your in-house repositories need to be configured for all the things that you're using. And they need to be up to date or at least reasonably up to date. And then if you have self-signed SSL certificates or intermediaries, your custom root CAs, whatever you're doing, if you're messing with SSL, just keep in mind that the number of developers who notice your SSL interception, let's call that X, increases the number of the jobs that bypass or disable SSL verification by like 10 X. As long as it's easier for somebody to look at Stack Overflow and say, oh, well there's this magic flag, tack K, that I can just put onto curl and my job will work again and I can continue to do my job. Not thinking that that's the flag that disables SSL verification. Lastly, your logging is easy to overlook. There's the super easy button, so let's talk about a little bit while I'm logging on what do you wanna know. There's built-in audit log streaming and that captures the things that were done, get events, package publishing, creating and editing pull requests. Super easy, hit the button, stream it to your same, you're done. A little less easy are the actions run logs, so the console output from each run. There's an API to interact with these and you set the retention policy for it. However, for actions runner controller, both the manager and all of the pods, that's gonna piggyback off of your Kubernetes settings, so make sure that you don't overlook that. And lastly, this is a quick screen grab of the default logging agent where it's just catting some stuff out to the terminal more for debug than anything else. Sharing your amounts is not caring. So here's why you see this a lot in this use case. Enter an outer docker, don't share a build cache. Your setup languages, so if you're using actions setup Python, it's gonna expect a bunch of stuff there, the thing you're asking it to set up and if it's not there, it's gonna reach out, it's gonna install it, all that good stuff, adds runtime, adds network ingress and egress. So sharing mounts is very popular there. Your IO intensive builds can get super fast disks and really at the end of the day though, rate limiting is very expensive. I had accidentally rate limited a very large company running docker pull for the next like six hours, no one could do any docker images, it was not a fun day. And your shared mounts more importantly, aren't always version controlled and then there's always that risk of accidental data persistence at the intersection of privileged pods and writing data to a disk. So think that through if you're wanting to go down that path. And then roughly speaking, a pipeline to manage your runners is gonna contain, build it, scan it, espom it, sign it, tag it, test it, ship it. GitHub actions is super modular, so swap in what's in place for scanning, signing, tagging, deploying, all that good stuff. I do wanna talk a little bit about building and tagging, just cause these are both kind of interesting in this use case, because keeping your docker files for these containers, the deployment YAML and any other files that you're adding in and a shared inner source repository is having a remarkable powerful bottom up influence in the most regulated environments. A Git repo of course is a history of who requested review to prove what changes and when, but more importantly, my project needs whatever framework and this other team already has this image. My build is failing on image tag. Here's a PR with tests to prevent this moving forward. Show me all of the images that are vulnerable to the CVE du jour. In GitHub, this is really, really simple. You search the CVE and then you search whatever tool would generate that. And then for tagging by semantic version and date is this one weird case that I've ever seen anyone tag this way, but I've seen a couple of users kind of organically do this and it seems to work well. They bump the semantic version for software changes and the date for routine rebuilds. And just like I said, if you take nothing else from this, never use latest. Balance your permissions by your scope. The user should be able to ask, like, hey, I need to change my runner. I would like a new image. I need a network exception for something. And by the scope that they're asking it for, I'm very fond of pushing this process through a pull request. That way I know which user requested it, who reviewed it, and you can always add multiple reviewers to a pull request. Everyone is a stakeholder and more importantly, in a large company, everyone has an SLO to meet here. No one should have to wait too long. The biggest risk in this foundation of your software supply chain isn't that user asking to include a small utility or to have privileged access to their one build. It's that user getting frustrated waiting, having that process heavy IT service management ticket to nowhere and having to wait another month after that for the next change management window. Just so they get frustrated, they give up. They configure their own build server. They disable SSL certificate verification. They copy in their own bootlegged version of this utility that's never gonna get updated again because no one knows it's there. Just to get things done and move on with their job. Reducing your friction makes everyone happier and safer. We're all grownups. Grownups don't ask for random things. Not without a reason. Be thoughtful and take care of your team. They'll take care of you. Questions? I think, I don't know what time we're at. I think we're right at time. I don't rightly know. So the question, sorry, because I don't think anyone could hear that was about the future of Docker actions within actions runner controller. And the answer is I don't know if there's anything to make that easier. Right now using the container hooks is the recommended solution. So the question is, does actions runner controller issue JITs? So no. However, what you're asking about for like, does that pod have a just in time token to clone a repository or publish to GitHub packages or something like that? That's actually controlled by GitHub, either github.com or enterprise server. So the authentication there is generated by the actions side of things, not actions runner controller. The token that actions runner controller uses is something that you give it. And it is a long lived credential right now that is changing to just in time access as well. However, that is not the current state of the project. And if you go to the repo, you can actually see where we're adding just in time access support and all of that good stuff now. I think we might have time for one more question if there's one. The question, sorry, just repeating it for the yeah, is about logging in GitHub actions. So as far as like having the actions runner controller stuff piggyback off of GitHub settings, I don't know. I would encourage you to open that as an issue in the actions runner controller repo because I like the idea. I think the hard part is, if we look at this slide, these things all generate logs from a different source. So the audit logs from GitHub come from enterprise server or github.com. The actions run logs are generated by the runner and then are sent to GitHub to say like, hey, we passed, we failed, here's how you view it in that nice pretty window. But I don't know about the future of actions runner controller logging. All right, any more questions? I will be floating around and hear the whole event. So always happy.