 Most of the time was active duty military. I was an Air Force guy for 17 years, which is funny that I'm talking about submarines. But 17 years active duty, did civil service, a government civilian for a few months, hated being a politician, so got out and started the company with a few other folks. And so while I was in the Air Force, most of my time was spent working on satellite communications, radio terminals, that kind of stuff, but I also did platform development, software development, and a few other fields in that area. So during that time I got to deploy Kubernetes on what we call J-Wix, which is a top secret network on the government side. We deployed it on classified secret networks, we deployed it on fighter jets. So we got to put Kubernetes in a lot of different places, and a few weeks ago I had a chance to have a one-on-one with Kelsey Hightower, and I talked to him about some of the places we had put Kubernetes, and his response was, that's terrifying. So he's like, you know there's bugs, right? I'm like, I know, we know. But as you'll hear, most things in government, they tend to do redundancy. So one of the super important things that we believe a lot in is declarative state and declaring an environment how we want it to be through code, which allows us to reproduce it over and over. So within the submarine, as in many weapons systems in the government, there is redundancy. We expect things to go wrong. We know Kubernetes has bugs, right? We know all the software that's on top of the Kubernetes ecosystem has bugs. That's okay. But that's fine, right? We expect things to go wrong, so we're building redundancy to accommodate that. And so for the past two decades, doing kind of air gap work and seeing the problem space here, CNCF landscape, the Kubernetes landscape has grown huge in the past few years, and there's so many great tools, but a lot of the systems that we work in that we support in government and other really restricted spaces are air gapped. And so the question really is like, okay, so air gap, what's the big deal, right? So I've heard this a bunch from really smart engineers, like, well, you just need a registry and you just need a couple of things and you have air gap solved. But a few of the things that people take for granted that they've never had the pleasure of working with no internet connection at all, like you can't Google things, is yes, no memes, though I did learn somebody told me there's actually memes on top secret networks that is very old, so that's something. But no Stack Overflow, no Stack Exchange, no GitHub, all the tools as SREs, as engineers that we use, you just don't have them. And the submarine is the most extreme case because they may be submerged and have no connectivity at all beyond some very, very low frequency communications. So this is a very interesting problem, but air gap is a problem you don't know the pain of unless you've really experienced it. And you can test it by killing your Wi-Fi for a while and just seeing how good you are at deploying things and updating software. It gets really tricky really fast. And so some of the issues you run into are the first time I tried to deploy Kubernetes onto a top secret network a few years ago. I did it, this was on YUM, or DNF YUM, I was on a red hat. I could actually YUM install Qubit EM. I was like, sweet, Kubernetes is on top secret, this is awesome. I'd ran it, but I couldn't actually initialize a cluster because I tried to hit Google to go pull the images. So I'm like, well, that's not super helpful. It was packaged, but I couldn't actually run anything. On a different box, I tried to install Docker, and I couldn't actually get it to work because some of the dependencies that Red had expected weren't the right version. So they were version mismatched. So one of the things that we run into is you have this expectation of binaries running, and when they're dynamically linked, when you have lots of dependencies, there's this kind of nightmare if you may have heard of dependency hell, right, where you're trying to find the right dependency and this needs these three dependencies and these are incompatible. And how do you trace back that? And there are some tools like in CentOS and Red Hat land and Debian as a few as well, where you can track back dependency trees and you can find out what you need. But then you have to stitch those all back together and hope you're not conflicting with some other dependency needed somewhere else. And yes, there's SNAP and there's other tools out there that can assist with this, but this problem of dependencies due to dynamically linked libraries is actually a pretty big problem in air gap because you don't really know what you have available to you. And it's probably out of date. And as the community is moving towards SBOM and towards these supply chain security models and we're trying to track back all our dependencies, there are places that we work with, Defense Unicorns, where those systems are pretty old. They're pretty out of date. And it's really hard to get a good handle on what's actually there because sometimes we can't even see it. So this problem is very prevalent in government. And so one of the things that I try to communicate to other SREs is Kubernetes and most of those tools just don't care about images. What is Kubernetes? It's a resource scheduler. It happens to do it with a container runtime interface, happens to that with images right now, but there's cross-plane. Those are the tools that use it for other things, for VMs, in the case of Tansmin, a few others. So there's all these different things that Kubernetes can do beyond just images. And there's a CRI for that. There was a discussion a few months back on Twitter about why isn't there a registry built into Kubernetes? Because this would make this problem easier. And many of the people who helped create Kubernetes, some of the guys from Google, that's not part of the design. That's an external concern. That's what the CRI is for. Well, that's great. But the CRI is actually running, it's talking as a client from KubeLit, right? Kubernetes is talking to the CRI that can run through an interface, which is Docker or container D or Cryo or picker poison. And it's doing the actual work of figuring out how to get the image, how to trust the image. And so one of the things we ran into is I can mirror images to container D. I can mirror images to some place. But if I want to pull something, I need to trust it. TLS is actually sitting at the host level, so at the node. So you have the node, and you have the container run time. And then you have a client that is Kubernetes talking to that container run time, expecting it to figure out how to trust things, how to route things, how to get around firewalls or restrictions or all those things. It just becomes very complicated. And then if you think about a highly available multi-node system, you may have 20, 30, 40, 100 nodes. They all have to trust that same TLS chain. They all have to route that. They have to have firewall egress policies set, ingress policies to that registry to trust it. This becomes a really complex problem. If you look at home, customize any of the tools that are popular for kind of helping us with Kubernetes, they don't, for the most part, track the OCI dependencies. So they talk about the manifest, the yaml, the definitions, the specs, they'll point to images, but you have to go find them yourselves. Or operator patterns might not have them at all. It could be embedded in the code. It could have an image hard-coded into the binary that you won't even see at all. We've actually seen this a number of times on different operator patterns. It's really frustrating because we're trying to find out, what do we need to deploy? And we have to go guess it by deploying it in some online cluster and then go look at the image list to figure out what's actually been deployed. Or go try to find the docs and see what they're documented. Now, Helm did merge in a PR a couple of months ago for Helm, I guess for Future Helm, that will allow you to track this. There's some other efforts out there to track your image dependencies. I think SPDX and others might eventually get there. But today, if you tell me, go deploy this thing, this Helm chart, this operator in the air gap, it's up to you to figure out how the heck you do that. And so this is the problem. So the real problem here, not the dog or the engine department, the real problem here is where is the registry? This is a problem we ran into with the submarine is we want to deploy assets. We want to deploy things into the registry, but to what registry? Kubernetes doesn't have one built in. Some tools like Kind, use a KEP, I think it's like 1755, to specify a registry pointer that you can use. But you still have to figure out how to connect to that. They don't tell you how to connect through authentication or the routing. It just says, here is an entry in a config map or a secret that tells you where you can go find one. But that's not really used by anything besides Kind and I think K3S right now. So there's no standardization around this, because again, Kubernetes doesn't care about the images. So one of the things we run into is, OK, cool. You could just like docker run, registry 2, the famous distribution, distribution spec registry. How did you get that image? If you're in an air gap, did you docker save, docker load it? How did you get docker? Are you really using docker for your HA registry? Are you doing persistence or recovery? Are you using S3? Do you have S3? The submarine doesn't. So you're getting all these problems of if you take a bare metal system, no cloud services, and you want to make it highly available, durable, the registry becomes a very important consideration. And this goes back to DNS, to routing, to TLS trust, back at the node level, all these problems. So it's just really, there's a number of things that have to happen to get it right. So one of the things that we've found is just really easy to say, just docker load, docker save. You may not have the docker CLI available to you or it might be the wrong version, an outdated version, in the case of all the government systems. And so this becomes just one more thing to add to your list of things you have to go find a way to install and to run. OK, so what do we do? So for the submarine, and there's the link down here, this paper on Zara, if we're going to talk about in a second. There is a highly detailed research paper done with the Navy postgraduate school. And these slides are available on the schedule as well. I believe they're posted there as well. This paper done by Bridger Smith is really exploring this problem in depth from an academic perspective. And so at the same time, I left government and wanted to start working on this problem exclusively. Again, Air Force Guy, this Navy postgraduate student wanted to research this problem. So we got together with a few other people and Bridger did all the right documentation, the research, and put his paper together. I just wrote some code around the problem. But then we built a team around that. But his paper goes really into depth about this problem from a very detailed version and hardware specifications, more than I'm willing to say here, because I actually know what I can or can't say. But some of the things that we found here, the highlights, we wanted to be declarative, because the people we were targeting weren't going to be SREs. On the submarine, a sailor has many jobs and there's many different things happening. This is not their only job is running Kubernetes clusters. So we need something that was simple, that was approachable, that was reproducible. And a very important thing for us was we needed to be able to destroy it and recreate it aggressively. And so as we're thinking about that problem, the first thing that comes to mind for me in Kubernetes land is K3S. We're huge fans of K3S. And having K3S provides a lot out of the box the function of how to do that. It doesn't necessarily solve the registry problem, it doesn't necessarily solve other problems, but it's a single static loop power go binary that includes QCTL, that includes container D, that includes all these things rendered in for you out of the box. And you just drop it in there and it's magic, right? So our first iteration with maybe plus graduate school was what if we take K3S and wrap some tooling around it to make it nicer for the submarine user. And so that's the first iteration last summer was just trying out deploying K3S with some small yaml to define images and manage packaging them up and putting them into a registry. And what we literally did was we lifted and shifted files and put them into the directories that K3S needed. And so this worked. There are some limitations there and I've talked to Darren Shepherd since then about some of our decisions then and he confirmed some of our suspicions around durability of that process and that solution. But it worked, it was good enough but there was a bunch of unknowns, right? So the way that K3S works is you drop manifests in a folder, you drop helm charts in a folder, you drop images in a folder and eventually K3S picks them up through its controllers and does magic. So we wanted to be fully declarative. We sort of got that. We had no way to manage a life cycle of those deployments. We had no way to track what was happening. And so we're still a little blind and that makes us nervous because we need to be extremely prescriptive and continue to do this over and over. So K3S gives us the limited resource, small footprint, perfect. It does solve getting into the air gap because you can bring a binary of images you need and you can bring the binary itself. You stick them in a machine and it just works. So it kind of checks that box but we needed an experience that was intuitive for a sailor to just push button create thing, push button destroy thing and to Kelsey Hightower's point, there's gonna be bugs and when things went wrong, we could just destroy it all and start over again. Even to the point of data. So we actually, the first app we did this with had a pretty large data set. I think it was 67 gigs for the smaller data set we had to push in. So we even wrote some stuff into this tool to allow us to declaratively shove the data in. It's rudimentary. It's not what we want as a final end goal but this is what we pushed onto the submarine hardware for the test bed was taking the XARF, for this thing we're gonna talk about a second to create the cluster with K3S and then injecting the data declaratively. The goal here was to give the sailor that experience that was push button but predictable. And so this is what we built. XARF is an open source tool, it's Apache license and we built this. One of the very first things we said was we wanna be like K3S, static binary, written in go, because the tooling's great there and CNCF obviously, we want it to be declarative and the only requirement we wanted to have is maybe system D, right? So, and that's just for K3S but we just wanted something that we could compile and run anywhere with zero dependencies because in the air gap, we don't know what we're gonna get especially in some of these systems, we have no idea what version of rel or Ubuntu or Debian we're gonna run into. We don't know what packages we have available to us. We assume we have nothing. In fact, the way the XARF works for some areas is just deploying on Docker scratch, right? So as we're talking about in a second, one of the things that mattered to us was just getting this as light and lean as possible. So over time, we kinda defined what we needed. So obviously for federal customers, Sbom is super important, it's actually a White House mandate now. So Sift is a marvelous tool. We use Sift package via witness which is from TestifySec to allow us to create the Sbom assets we need automatically. So when you collect artifacts, it will produce those for you and I'll show you in a second what we did with that. We added some cosine support. So using eskit for now, we're also looking at cosine for image signing and package signing. We do have some questions from Sigstore folks on bundling and air gap. There's been some PRs that have supported this but I have questions around keyless and air gap. I think there's still some pieces there we need to solve to make that trust chain work. And then we also eventually realize that some of our customers, they wanted to deploy this onto the submarine also into cloud and they wanted to do it the same way. So what we ended up doing was rebuilding ZARF to not just push files into directories for K3S but to talk to using a Qt context to automatically push the things that it needed to. We don't want to replace flux or Argo or those tools and ZARF has some built in functionality to support a flux or Argo model by deploying its own registry to deploy its own Git server embedded in the cluster. So we can handle that declaratively but we really wanted to just allow somebody to deploy this in this case, this application that the mission owner was using onto the submarine and also into cloud and cloud they were just doing for testing. Submarine was the real thing but we wanted to give them the ability to do that the same both sides. We also had to do some stuff around image cataloging and a mutating web hook to make this cleaner. So we did that as well. There's a diagram you cannot see here. I know that it's on the web page in more detail but the kind of weird problem to discover before is you need a registry, right? And so how do you solve when you need a registry if you don't have one already? Now obviously if you have cloud available to you so there are air gap clouds. The government has top secret and secret clouds from various companies that they can use. So like S3, I could get to on a top secret network. It's just only available on that network. So we do have some ways to use things like ECR or other tools but sometimes you just need to be able to create a registry temporarily. So we kind of have this crazy hair brain approach to this. What we do is we use a 500 kilobyte rust binary we wrote that injects into a random image. So we look across the cluster and we connect and we say what images are already running on this cluster. We map it to the nodes. We then take a pod. We created ephemeral pod on that node with that image that's already on that cluster that we know is already running and we inject this rust binary on top of that and run that. It also injects a series of config maps as well which are split apart by 500 and 12 kilobytes themselves. The rust binary then checks a shot some and combines those back into a single binary, extracts a tar ball and that gives us an ephemeral registry running in the cluster for a few seconds. So what happens is we have 20 something config maps that are injected in and to this random ephemeral pod after that's running, it then deletes that and it's gone and we're back to your registry running in the cluster. And this gets around the chicken and egg problem for us. We looked at a lot of different ways to do this. There are other ways. We tried Netcat, we tried the QCTLCP which is just the exact wrap with the tar. We tried different strategies and we found that config map was the only way that wasn't gonna be blocked by Coverno or are gonna have other issues with policy because it's just config maps. It's binary config maps. We tuned it to where it wasn't quite 670 kilobytes which is the max threshold blob you can do for a base 64 encoded binary. So we kind of tweaked it a little bit but this is what we're doing now is we inject the rest binary, we inject the stack config maps and then from there we go. Okay, so time is weird right now. I know we're wrapping up in eight minutes and I know some of you will have to leave. So before I do the actual demo, I wanna pause there for a second to give you guys a chance to ask questions. Sorry Elon. Okay, feel free to just throw us one of that in your raise your hand if you have a question as I start this. We'll try to make this brief for you because I know you guys have other places to get to. So I'm sorry, the schedule's been kind of weird. Again, this is Apache 2 license. We are looking at donating to CNC Affiliate Foundation. We have some hurdles we have to get through to decide if it's actually possible with some of our government customers constraints on who can commit right now. So we're still working through some policy things on the government side and making sure that we're doing right by the government customer but also we plan on keeping this thing Apache license. We as a company believe strongly in open source and we wanna stay doing that. So whether it's in foundation or not, it will always be open sourced and there will be no freemium model, no vendor lock or anything. We just want it to be free and open and we're funded by our government customers to make it that way. Okay, so there is a website, zarf.dev. You can go to which has instructions and more information about what this is and of course the get repo you can look at as well. But just quickly, the way zarf works is a static binary. Right now it's compiled for Linux and Mac. We haven't cut off for Windows yet, it's on the backlog and it can deploy to a local cluster like kind. So in this case, I'll just create a kind cluster. Oh, that hurts, thank you. How do I make this? Okay, is that better? Can you guys see that okay? Okay, so we'll create a kind cluster real quick and we'll just do a zarf in it. Okay, so I mentioned the S-bomb stuff that we built in so I had a frustration with S-bomb. It's cool, but how do you use it? Because it's a bunch of JSON. So we put together something that generates this little tiny S-bomb. Now, zarf agent is a Docker scratch image so there is no S-bomb because there's nothing there. But as you can see, like Giddy, we can go through and look at the different assets there and what's built in. This is something we built that it runs in the air gap. It's just a static HTML site that loads up the stuff from SIFT, from the SPDX packages that were created there and just lets you search through various whatever you want to look for. So we did that because we wanted the users to be able to see before they deploy what they're getting exactly. And so everything that's driven by zarf is driven by this YAML that's being dumped to you. The API, these are actually all API calls you're seeing and it's not gonna act unless you say yes. So we'll just say yes. I'm not gonna install the other pieces right now because we're out of time. But I will, as you can see here, it's now injecting those config maps into the cluster. While it's doing that, I will open up another window and make it bigger so you can see it. We vendor a few things like K9s. It's one of the things we vendor in because we love K9s. So as you can see here, it just created that zarf injector which is that Rust binary with an embedded registry. It is now loading that and that part's done. So it's bootstrapping a traditional Docker registry in the cluster now. And so now the seed registry is what we call that. Now it's bootstrapping the long-term registry. And let's see here. Okay, so that's booting up the regular registry now. Typical. This is part of the seed one. You'll see it as it rolls over to the new one. Just regular Docker, Docker 2 registry. The zarf agent is doing the mutiny webhook that's going through it. If it looks at an image, it's gonna rewrite that image to point to the internal cluster. If it sees a flux CRD for a Git repo, it'll rewrite that to match the GITI server with its credentials it needs. So you don't have to change anything to use flux or right now we're only doing flux to an argument for flux or for any image. You just define it and it'll point it back into the registry. And then lastly, because we are, I don't wanna keep you guys, as this is done, we will go over here to an example. If I can find it. Coods or, so one of the weird requirements we had from the Navy was they have really old system sometimes and they asked, could we run DOS too? So they wanted to run some DOS applications. I was like, okay, yeah, we DOS, that's perfect. So one of the things we did, I'm just gonna create a package now just so you can see what this is. This is using the Zaraf YAML. This is when we did to show them DOS. I don't know if I can make that bigger. I don't know why Zoom's not working, but we wanted to show them DOS and that is totally the wrong direction, sorry. Yeah, my Zoom in doesn't work on this Visual Studio, but anyway, the point being, basically said, we'll show you DOS in a cluster, right, using Cloud Native. So that's what it is. I'll go ahead and create this package and it's prompt me, do you wanna create these things? Use this version of Zaraf, this build information. I'm gonna say yes, let it create its thing. It will do things like the S-bomb for us and now we can Zaraf package deploy, find that package. We're not gonna get the S-bomb this time, we've seen that before. So we'll let this thing deploy and we did something called Zaraf Connect which is a way to do dynamic tunneling to different services, because again, we're trying to help the Navy user make this simple, so Zaraf Connect. Originally we did Doom because that seems right, but then I felt like we really needed to upgrade it so we added a few more, I'm sure Disney's mad at us right now, but the point is like, okay, yeah, cool, you're playing Doom, but it's DOS, it's DOSBox, right? So I wanted to show them DOS on the cluster, deployed in the air gap, automatically streamlined with just push button and that was kind of the point, was to give that experience to them and then one of the things we did to make that better was Zaraf, just different ones, connect to Doom because sometimes you wanna go straight to Doom, right? That's driven all by YAML, these are annotations in the manifest that we deployed saying give different shortcuts to the user via YAML that they can Zaraf Connect to and the very last thing I wanna show you is it's hard to find images and things. So one of the tools we wrote inside of Zaraf was Zaraf Prepare which what this does is it builds, customize, helm and tries to read in the operator YAML and it tries to fuzzy search inside of those for different images. So it will build the helm, it will build, customize or take your raw manifests and it will go search through looking for images. It'll also do a fuzzy search and if it finds something that fuzzy matches an image-like thing it will try to connect to that and see if it's actually an image and if it is it'll say also, I think I found this thing. So that's Zaraf, are there any questions? I'm sorry? So the question was on Kubernetes release upgrades. There's a couple of ways we're dealing with this. It depends on if you're talking about Zaraf managing the cluster or not. So when it comes to actual like 22, 23, 24 Kubernetes version upgrades in K3S there's a pretty easy path for us because of the way K3S manages upgrades. If you're talking about say some cloud cluster, Zaraf is not maintaining that right now because if you're provisioning it outside of Zaraf which is what we think a lot of things will be when we're talking cloud, we won't really get involved in the Kubernetes upgrade lifecycle for that. We're only really focused right now on if we're managing the cluster ourselves which being K3S there's a pretty straightforward upgrade path. Are there questions? Yes? Yeah, so no, initially it was just a K3S wrapper functionally speaking. We later on pivoted to handling manifested images ourselves rather than having K3S do it for us. So we've tested it on microkates, kind, K3S, K3D, the TKG, TKG Community Edition, OCP, EKS, AKS, GKE and I think a few others I'm missing. We've tested daily on some of the cloud, on some of the local distros and our pipelines run kind K3 and K3S every time but we also test regularly on cloud clusters as well. It just needs to keep context basically. Any other questions? Okay, you guys are the place to go. Thank you so much for your time. Thank you.