 Hi everyone. Welcome to the Flux maintainer track. Today we're going to talk about how you can use Flux to amplify your get-off setup with OCI cosine. I'm Sunskol. I'm a Flux and flagger maintainer. I work at Weaverx. And my name is Kengden. I'm also a Flux maintainer and I work at Weaverx. This is a maintainer track, so I'm going to assume most of you are familiar with Flux. For those of you who are not, let me do a quick intro. Flux is a CD tool which lets you do get-offs for your community clusters. What that means is you can deploy your applications in a community cluster in a get-offs fashion by storing them in a grid repository or a hem repository. It's a CNCF graduated project. We achieved graduation status last year. We have multiple integrations with Terraform and AWS cloud formation and there is a free and open source UI you can use as well to view your Flux resources. And we are being used by companies like AWS and GitLab for their own get-offs offering as well as companies like Orange and DT for their 5G deployments. So good news everyone, Flux finally has a GA release. Thank you. What that means is that the core get-offs APIs that are the Git repository API, the customization API and the receiver API are all considered V1. Which means that there are going to be no breaking changes so you can upgrade those APIs whenever you want at your wish. Flux, if you don't know, is built of multiple components and multiple APIs. We like to evolve and iterate on these APIs at our own fashion so that we can give the best experience to you users. What that means is certain APIs are still not considered GA, which means they might have breaking changes. For example, Helm and OCI. Helm is considered close to GA. We are working very hard on it. It's tricky to get it right because Helm is Helm. So it's in the pipeline. There's also OCI repository which is a relatively new API which lets you fetch sources from OCI registries and then there is the image automation APIs and then there's also the notification API which lets you do all sorts of alerting and updates to Slack and so forth. The basics of Helm with Flux. You have your Helm repository CRD, you have your three CRDs that each map to a particular artifact in the cluster. The Helm repository has an index.yml Helm chart is an instance of that chart. So if you know about how Helm repositories worked historically, that makes some sense. It'll make more sense as we go on. The Helm controller with the source controller together applies the Helm release in the Kubernetes API. It's using the Helm SDK under the hood. Everything is compatible with the Helm CLI if you're using Helm already. It's a seamless transition. Also, the notification controller is involved so that we can send alerts when things happen. On the right you see there are various places that charts can come from. Harbor or GitHub or that's the chart museum on the bottom there. These are the resources. The Helm repository resource on the left we said maps to that index.yml in the legacy version. This is our legacy Helm release and repository. On the right you see the source ref points at that Helm repository. We have selected a particular chart since legacy Helm repositories can store many charts and many versions. We select a version. We're using a December wild card so that we get the latest version within a range. There's other configuration that you can do, but this is the basics. All of this has been working fine, but we have continuously run into issues by all of you is saying that Helm acts funky or there are certain problems. Sometimes it's too slow. From what we have seen, majority of these issues can be tracked down to one root problem is the fact that index.yml just does not scale. The approach of having to store all your charts information in one file is just not a very scalable approach. There are several reasons for this. First of all, it's because it's a YAML file and YAML parsing is just slow. Matt Farina is a Helm maintainer who recently did a benchmark test comparing YAML parsing and JSON parsing and JSON parsing was significantly faster than YAML parsing. So you have to parse the entire index.yml file and then load into memory and then look up the exact chart entry that you care about. That's a lot of CPU and RAM that you are spending on that. Helm repositories can contain hundreds of different charts and those charts can have thousands of different versions. That means you can end up with a very, very big index.yml file and there is no way to filter out the charts or the versions that you don't care about. So you end up downloading a lot of stuff that you just don't need. Lastly, verification requires provenance files. Provenance files have been around for a long time but the problem with provenance files is that it's another file that you need to care about. It's another file that you need to manage and distribute. And these are not some theoretical predictions that we are making here. You can see these issues out in the wild. Bitnami has a very famous repository which has multiple Helm charts. Recently they had to purge a significant amount of their index.yml because CloudFront was not being able to serve it due to traffic limits. So what is the solution? Of course, it's like the scatter of the index.yml file. How do we do that? So how many of you are familiar with OCI? That's great. So for those of you who don't know, OCI is this open governance body which standardizes everything related to images, how they are built, how they are packaged, how they are run. And for most importantly for us, it's about how they are distributed. So there is a distribution spec which standardizes how container industries work and how they are supposed to distribute images. The best thing about the OCI distribution spec is that it's supposed to be agnostic of content types, which means that it's built in mind keeping images but you can store anything. You can store an mp3 file in there if you want to do. So if you can store an mp3 file, why can't you store a hemp shot? This is a meme that I stole from Dan Lorank on Twitter. It essentially boils down the OCI distribution spec into this one image where you give it some certain parameters and it gives your URL which points to a tarball and in our case this tarball points to a hemp shot. So basically what you do is you can list the tags. So OCI distribution API has a very good API for listing tags and writing and such stuff. So you can just list the tags you're worried about and you can just go and fetch the tarball that has your chart. So what does this look like in practice for users of Helm? People who package charts and publish them. So Helm package doesn't change at all. It generates a tarball and that has the chart YAML and the metadata in it. Helm push changes a little bit. Now we're pushing to an OCI URL that's a registry on the right hand side. There's no third step unless you're interested in provenance which is that third step there. So cosine sign, we'll see in more detail. The spec has some changes. So we've added type OCI and OCI prefix to the URL. We've added a provider here and that's for authentication. On the right hand side we've also added provider cosine which is for verification of the provenance. So what are some of these benefits that you get when you use Flux, Helm and OCI together? First thing right off the bat is the fact that you have all your apps, images and signatures in one place. That is your container registry. You don't need to have 10 different tabs for several different things. You get passwordless authentication and keyless integrity verification. We'll touch on that in the next few slides. And most importantly you can get rid of your index.yaml which means that you have to spend as much CPU and RAM as you were spending earlier which translates into cost efficiency. You don't need to spend that much on English traffic which again translates into cost efficiency and you don't run into issues like Bitnami did with network bottlenecks due to size. Okay, so let's talk about how passwordless authentication works in this scenario. So how it works is it uses something called workload identity. Workload identity is basically a cloud IAM role binding to a particular workload. What you do is you create a cloud IAM role which has access to your container registry. So let's say you're running on GKE and your container registry is on Google artifact registry. You can create a service account which has read access to that container registry and then you can bind that service account to the pod or the node that is running in your Kubernetes cluster. Now since you have bound this role to your pod or node, that pod or node also has access to the container registry and can pull those images from that registry. The benefit right of the bat here is the fact that there are no secrets. There are no static credentials so you don't need to manage all of that dangerous stuff. Flux's implementation integrates seamlessly with Azure, AWS and GCP. We're going to demo Azure today. And most importantly it's native to Kubernetes. So workload identity uses something called service accounts and OIDC projection which means that you're part of the Kubernetes world. You deal with Kubernetes APIs. Right now, Flux's implementation only works at a global controller level which means that if you wanted to use different workload identities in the same source controller instance, you would not be able to but work is being done right now to make that happen as well. So you can specify the service account name on the Hemchart object or the OCI repository object itself multiple workload identities working around on the same source controller. You also get keyless verification. You get keyless integrity verification which is great because you don't need to manage PGP private public key pairs which is again like another elimination of the need of secrets which is great because secrets are dangerous. How this works is that it's bound to your OIDC identity so you sign your chart with cosine. Cosine is great because it uses OIDC behind the scenes and all you have to do is sign it with Google or sign it with GitHub and the signature is bound to your OIDC identity. Flux also supports matching against OIDC identities like the exact identity. It's not out in a release yet but the PR has been merged and the next release will have this feature as well. So what this means is that you can be super strict about where your Hemcharts are coming from. So for example, if you wanted to make sure that it is indeed our GitHub workflow which generated this chart and pushed it off to your container industry, we can do that. So as you can see here there's an issuer field which says token.actions.githubusercontent.com which basically is saying that the issuer of this authorization token is the GitHub OIDC provider and the subject is the repository that has the workflow which generated the packet shot. Okay so this is the workflow that we're about to demonstrate and in the beginning a chart is pushed as it's released and I described earlier the package step, well this happens in CI Flux pulls it into the cluster automatically in the staging environment and it deploys it. And you can do destructive tests because this is a staging environment. So we have in this case we're going to run Helm tests only in the staging environment. You can have different configuration. Assuming everything goes okay the step triggers dispatch event that creates a pull request which user can review and then merge and at that point the upgrade proceeds in production in an attended fashion. So we're probably a bit too early for the demo but we have a few more slides after this. So we're going to do the demo first and then we're going to do a recap and reassess what is the state of Helm and Flux. So I'm going to first walk you through the file structure. Is this font size okay? Okay great. So I have two clusters staging and production so if you want to look at the file structure there is clusters and there's production and there is staging. Both are pretty similar. They have Flux installed and they have one app for info. So this is the Helm repository for the code info registry. So as you can see it points to a OCI registry which is hosted on ACR Azure and you can see provider is Azure here. This basically tells Flux that try to use the Azure workload entity to access the container registry so that the user doesn't need to provide any kind of secrets with static credentials. So this is pretty much the same for both staging and production. So let's take a look at how Helm release looks like for staging. So here as you can see it's pretty standard but if you notice here it says 7.0.x for version. This .x is basically a wild card which basically says deploy the latest patch release. So if the latest patch releases 7.0.2, deploy that. If it's 0.3, deploy that. It also has the verification stuff here which basically saying that make sure it's the GitHub OIDC provider which created the authorization token and it's my repository here which is the owner of this workflow. We have a production Helm release file here. The only difference between the production Helm release file and the staging Helm release file is the fact that the production is pinned to a particular version. There is no sembler wild card here. Which is great. Which is what we want. We don't want to deploy random stuff or like the latest release into production. We want to make sure that we know what we're deploying into production. That's why it's pinned to a particular version and it can't move without our explicit approval. Let's take a look at notifications. So this is a provider. Provider is basically a way that Flux can contact other external systems and let them know about Flux events and alerts are basically a way of creating a Flux event alert. So as you can see here whenever Helm upgrade succeeds for this particular Helm release it's going to create an alert and that alert is going to be sent to a GitHub dispatch provider. A GitHub dispatch provider lets us trigger workflows based on certain events. And this is a GitHub workflow that we have. So this GitHub workflow basically gets triggered by the GitHub dispatch provider. And it's pretty standard. We're not doing anything revolutionary here. The most important business logic here is the fact that we are getting the version from the alert. So whatever is the latest version that got deployed in staging. So let's say that's 7.0.3 that got deployed in staging. We take that version and we use YQ to do some YAML manipulation and we change the value of the version in the production Helm release file to that version. So we're basically changing the pinned version from 7.0.2 to 7.0.3 in the production Helm release file. And that's how we make sure that we never deploy the latest thing. We always make sure that whenever we do deploy a new thing that's without approval. And then we go ahead and create a pull request with that change. Okay, cool. So this was like the basic intro to the file structure. So I'm going to go ahead and sorry, I'm going to go ahead and create a new chart version and push that. So I have a workflow. I have a GitHub workflow which will whenever there's a new tag version, it will basically create a chart and will push it off to Azure Container Registry. So let me just tag this here and push it. Okay, cool. So let's see. Okay, cool. So there's a GitHub workflow which is running right now. Let's see what it's doing. Okay. It's packaging the chart. Right now it's logging into ACR. Cool. It signs the Helm chart. Okay, so let's take a look at Azure. As you can see there's a 7.0.4 tagged chart here and this is its signature here. Right, so now that there is a new chart in our container registry, what we expect from our clusters is that the new chart should be deployed in the staging cluster. So I'm going to go ahead, let me just make sure I'm staging. Let me change to staging. And I'm going to go ahead and reconcile the Helm chart in the staging cluster. Okay, cool. So as you can see, Flux saw that there's a new version and it deployed 7.0.4 in staging. And we can verify that by doing a Get Helm release as well. Right now it's pending. It's in progress. Let's do that again. Okay, cool. So now staging is running 7.0.4 of our application, which is great. So now what this did was it created an event which was sent to Flux's notification controller and Flux took that notification that it got, the alert that it got and triggered a workflow which created a PR and we're going to go and check out that PR. So let's reload this. Okay, cool. So now there's a PR in our repository which contains our Kubernetes manifests. And what does it say? Oh, okay. So it says to change the production Helm release file from 7.0.3 to 7.0.4 and which is essentially what we want. We want to make sure that all updates to our app versions happen with our explicit approval. So if I don't approve this PR and if I don't merge this PR, even though staging is running the latest thing, production will still be on the thing which we approved it for. So I mean right now you could push another chart version 7.0.5 and that would get deployed in staging as well. But production would still be on 7.0.3 because we haven't approved anything yet. So I'm going to go ahead and approve it and approve it as well on his phone. Yeah, here we go. Okay, cool. I've approved it. So let's go ahead and merge this PR. Okay, cool. So now that this PR has been merged, I'm going to go ahead and switch to production and I'm going to reconcile the customization. Okay. So now let's see where was the status of our Helm chart in production. Okay. So as you can see it has pulled 7.0.4 because we merged the PR. And what's interesting to note here is that it also says source verified, verified signature of version 7.0.4. And this is because we included that verification section in the Helm release. So source controller or rather flux make sure that we always verify the signature of the artifact that we're pulling. If this signature was invalid or if there was some kind of a problem this reconciliation would not succeed. So let's go ahead and see what is the status of our Helm release. And here you go. And now we have 7.0.4 in production. But the difference is that we explicitly approved this. It didn't happen on its own. Flux didn't pull 7.0.4 on production and deployed it. We explicitly approved this via the PR workflow. Yeah, that's it for the demo. Okay. So let's recap what we've just done. So this is a well-documented workflow. It's been around for a long time. The main things that we've added here today are the password lists and the keyless parts for OCI and Cosign. We have the advantage of Helm tests. It's a battle-hardened workflow and it's extensible. So you can use it on other platforms. I've heard of people using this with Azure DevOps. They use a generic webhook instead of GitHub dispatch. So this is great. But there is a negative drawback if you are not building Helm charts. It's a rather large investment to get started and there are some very good reasons not to use Helm in 2023. We don't need to go into them. So what can you do if you don't want to use Helm? What Flux would suggest is you could use OCI repository instead of Helm release. So we haven't documented this yet, but if you follow the workflow, we'll update the docs within a couple of days, hopefully, to describe this. It's very similar. It's like three lines of changes. It's not really a big deal. So you get all the same benefits of keyless and identity using OCI repository. So it's not well documented yet and there are a few other drawbacks. That's the main one is that you won't have Helm test. So if you were using Helm test or Helm for other reasons, the life cycle features, the rollbacks, you'll use Git for that instead. And of course we also saw the semver wildcard promotion method, which we used in staging. So this is portable across most of the source kinds in Flux. You can set spec.semver on a Helm chart or OCI repository or Git repository and it will work in all of those places. This is a great way to get releases out faster if you're trying to publish and iterate quickly without a lot of friction. You don't necessarily need that friction unless you want it. We know we want it in this case, so that's why we demonstrated it. But really to be clear, this is a workflow for staging environments. I wouldn't recommend you do this in production without a great deal of ceremony applied some other way. One possible way would be to use flag arcanearies, which we're not going to go into today. We don't really have time, but... Does anyone here use flag arcanearies? Okay, that's cool. Yeah, great. So that's a faster workflow. It has less friction, but it's not really for production because we want that friction for production. We want a manual approver. We want all of that. It's good. So as far as the take away for users, you should be using OCI for helm everywhere you can, especially if you're using flux. And that's really the one take away. But as a vendor, what do you do? Do you support both? Do you publish to your legacy and also publish OCI, or do you just go with one or the other? That's not really up to us. From a flux perspective, there's no need to support both. There's no valuable work done by index.yaml for flux. But there may be other reasons that are not relevant to flux in particular that you should support both. But from an end user perspective use OCI if you can, because you're avoiding all that needless waste, you pay for that waste, and even if you don't pay for that waste, it's environmentally friendly to avoid waste. So what if your chart vendor doesn't support OCI? Well, you can talk to them and ask if they will do it, but maybe don't worry about it, because it's really not a big deal. Legacy is probably not going away. For years we've heard that legacy helm repositories would go away. There really is no sign of that happening anytime soon. It really only affects you if it affects you. So the way to know that it affects you is to monitor your performance. And we have a great monitoring guy that's been updated. There's a link to it here. So go ahead and check that out and see if you have a performance impact. If you switch it will be quick and clear. So that is our talk and we can take questions at this point. This is actually the QR code we'll take you to and it'll be online in a short amount. But back to this slide here. Yeah, that's the link up at the top. That's the link you want. I'm not just really familiar with Helmfile. I'm not sure how that works. The question is, is there a locking mechanism? There's no lock file. Yeah, it's either automated or it isn't. If you want it to be pinned to a particular version then you just pin it in the file. So here we have a wildcard and if you want it to be pinned to a particular version for like an incident or something or just to implement this workflow as we've done in production it's pinned to a particular version. So there's no automation. It's not automatically promoting anything unless you do it. So Flux doesn't really work that way. Flux basically pulls the Helm chart and then it just deploys the Helm release and it just makes sure it continuously reconciles against what is there in the Helm chart and what is there in the Helm release. There is no concept of a lock file where you are tracking what has been deployed at what time. It always fetches from the source. So it always fetches the chart and then compares it with what's there in the Helm release and install release. That's all. So Flux is not committing back. The GitHub actions workflow is generating the commit. So I'm not sure it happens in CI. Flux doesn't. Yeah, but if you use image automation then you can probably mix and match. You can have an image policy which says that oh, you use 7.0.x and then the image automation controller commits back and it says 7.0.4 but then you need to use the image automation APIs. You can't just do it with the Helm control. So it's basically an OCI artifact. I think he means the OCI signature. Are you talking about this one? So that's just the digest of the artifact. So it's a .6 file. That's what cosine generates. That's the signature of your artifact. So you can take a look at the workflow. You mean with PGP? But we wanted to demo Keyless because Keyless is just easier. You don't need to handle PGP key pairs. It's always easier to do it without keys. Flux also does not implement the PGP verification. It's not implemented now. I think Keyless means not the Helm problems but the cosine private public key. You can do a cosine keyful verification. Yes, there are keyless and keyless workflows. You just store the... Yeah, you can do that. You just need to store the key in a secret. Oh, there's a mic there. I didn't know that. Hello? How would someone integrate Flagger to do the production release upgrades in this case? In this case, it's really two orthogonal things. You can use Flagger wherever. You can use it in a staging environment or you can use it in production environment. Flagger is not tight to get or Helm. It's completely independent of all of these things. It's just about what's happening in your cluster. As soon as the Helm release upgrade succeeds and the deployment... Let's say the app is a deployment and the deployment changed, like the image version changed or whatever, Flagger would kick into action and it would start the canary process. It does not really matter what your Flux setup looks like. Flagger can be deployed agnostic. You don't need Flux and you don't need Flagger to do Flux. They're completely independent but they work very great together. Just a question on the production deployment side. The deployment was approved in GitHub and then obviously the deployment happened. You didn't actually trigger anything that happened automatically. Flux system is pulling from Git. Is that right? To get the latest changes? We also merged the pull request. After that. I guess the production system still needs to have Git access. You've got some Git access secrets installed in an environment. You need Git access because all your manifest are stored in a Git repository. You need Git access. Do you have any other way of having an OCI backed access? You could do it in an OCI repository. Right now the default bootstrap process uses Git but you could definitely store all your manifest in an OCI repository instead of Git repository and then do it from there. We do want to take Flux into a direction where OCI repository and Git repositories are very equivalent to each other as in whatever supported by Git repositories should also be supported by OCI repository. If you did that, my question would be then how does Flux system know to detect that change? Is it constantly polling the source system like the OCI repository in that case? It would be polling at the interval or you can set up a receiver and the receiver would work for Git repository or OCI repository in a similar way. Is that like a web hook or something? Yeah. In terms of dealing with legacy Helm repositories, I just wanted to call out for anyone who didn't see it. There was a lightning talk on utility to proxy legacy Helm repositories through OCI so that might be a solution for that as well. Great. Awesome. Thank you. Hi. In terms of verifying the signatures on Helm charts using Cosine, you said it wouldn't reconcile if the signature verification failed. Is that always going to fail or is there a way to audit that instead? Could you come again? Sorry, I didn't catch your question. So when verifying OCI artifacts using Cosine, when the source controller pulls them down, you said that it wouldn't reconcile that artifact if the signature verification failed. Is there a way to just audit that instead and output in the law? I think there's an event that gets submitted so you could look out for that Kubernetes event. But the purpose of the signature verification is to be sure that it is from the correct source so I'm not sure that would be implemented as an optional... Like if the verification fails then the reconciliation will fail. There is no try to verify but it's fine if it doesn't verify. That's not an option.