 So my name is Bartek, and I will be your moderator today. We have Stefan with us, talking about flags and security. I would like to remind you about code of conduct and that we should have masks unless we are speaking for drinking. And I would like to say also that we have two seats there as well for two people. Let's go. Two people. Come on. Two people. Two people. But probably there are groups of six, so you cannot split. OK. We'll have some time, hopefully, for questions. And if not, there is definitely time afterwards where you can catch Stefan or you can use Slack channel or virtual hybrid platform meetings. There is also Flux Booth. I will talk about it. Flux Booth, yeah. Probably you will be there a lot. All day. With this in mind, let's start. All to you, Stefan. Thank you. Welcome, everyone. Applause, by the way, first. Stay. Welcome, everyone. I'm very excited to be here. Two years of virtual conferences here wear me down. And it's so nice to see everyone face to face. So for those that don't know me, I'm Stefan Prodan. I work at WeWorks. I'm a Flux and Flagger maintainer for five years now. And I'm very happy to talk about you, about Flux and the security aspects of Flux. But first of all, if you want this shirt, please come by the Flux Booth. You'll have to answer a short quiz. And if you are here, you'll definitely be able to answer it. So don't be afraid of it. It's really, really easy. Come by and get our Flex your Flux t-shirt. OK, so a little introduction to the Flux project. So we have an organization called Flux CD, which is under CNCF. We are an incubation project. We applied for graduation. And this is going well. So hopefully by next coupon we will be a graduate project. The main projects in the Flux organizations are Flux 2. Why is Flux 2? Because when we started Flux nearly six years ago, there was no CRDs. There was no operator idea inside Kubernetes upstream. So Flux version one was a simple demand that if you want to configure it, you'll do that at install time. If you want to reconfigure it, you'll have to reinstall it in a way. There was no dynamic configuration because we didn't have custom resources. Two years ago, we started working on Flux version two. And we have split Flux version one into Flux version one. It was like a monolith. It will do all things, like talking to Git, applying things on the cluster, and so on. In version two, this repo holds the CLI and references to all the other Flux components. So Flux is now joined the microservices idea where we have these specialized controllers for everything, half-stare resources, and so on. And we'll talk about it a little bit. Another project inside the Flux organization is Flagger, which does progressive delivery for you. So what it does, it allows you to decouple the deployment process that Flux does, or any other continuous deployment tool from the release process. So Flagger works with any other Kubernetes continuous delivery, or you can also do, I don't know, keep cattle apply from your Jenkins job, whatever. Flagger works with any of that. So Flagger is not dependent on Flux. It works great with Flux. So Flagger works a little bit with networking. So you have to have some kind of ingress control or service mesh. And based on that, Flagger is able to route traffic, slowly expose the new version of your app to your users. And the magic behind it, you can deploy on Fridays. And if it fails, Flux roll its back. So that's the gist of it. This is how Flux looks like, as in a continuous delivery platform. It's made out of Lego pieces, so you can build your own platform on top of Flux. It's very flexible. We don't have a UI, so you have to build that as well. At the core of everything that Flux does are these source types. So the idea behind Flux and the idea behind GitOps is that when you want to do something on your clusters across your environments, you don't reach out to Flux. You tell Flux what to do by committing yamls into different sources. And of course, being GitOps, we started off with Git as a main source of truth for your whole infrastructure or your clusters and make up your fleet and so on. But we've expanded Flux over time to different type of sources. So it's not only about Git. And Flux is moving to new territory and I'm going to talk about that a little bit. So the main sources that we currently support today are Git, our Helmry Poistories, OCI Container Registries, and S3 buckets. So if you see S3 there and everything that's compatible with the S3 APIs, for example, you can use your Minio cluster, store there your desired state, and Flux will do exactly the same thing. So it's not only about Git. Flux is no longer only about Git. The idea here of a continuous delivery solution is that, OK, you have the sources, but you can also send events to Flux. So Flux is reactive. It reacts to all sorts of events. So it's not only about I'm pushing something to Git, then I'm going to wait for Flux to notice that change. You can ping Flux. We have a notification controller and you can, from your CI, you can tell Flux, hey, do something now. I don't scan the Container Registry or reach out to the sources because you want a faster reconciliation cycle and you don't want to make Flux reach out to the sources every second to detect fast changes. So it has this even listener where you can tell it to do things. But Flux will not do anything besides what's describing Git. Even if you ping Flux, hey, do this stuff. Flux will do whatever you said in Git. So you cannot control it from outside through events. You can only tell it to do faster whatever you configure it to do. And the other side of events is Flux emits events for everything it does. And we have a component which issues events inside the cluster, as Kubernetes events, but also outside the cluster. So in a way, you push changes to your sources and Flux does something with that change. And the result of that change, Flux can post it back to you, maybe on your Slack channel, Microsoft Teams, PagerDuty, whatever, even from each other manager and so on. So the idea here is that you have this continuous loop feedback loop. It's not just, OK, I'm pushing something to my sources, then Flux does whatever and I have no idea what it did. It can tell you if something has succeeded and if something failed, it will actually tell what failed. So that's part of our observability story. We also expose Prometheus metrics. We give you Grafana dashboards and so on. So we have a good observability story outside of a dedicated UI. And what I'm going to talk about more today is about how you can extend Flux across clusters and how you can make Flux aware of tenants running on those clusters and isolating them. So today we are going to talk about how is Flux made, which are the things that make up Flux, how we release Flux, how is Flux dealing with secrets, because if you store secrets in sources, in Git repos, S3 buckets, and so on, those secrets must be stored encrypted and secure. And we'll see how Flux deal with that. And finally, we are going to talk about multi-tenancy. Is Kubernetes really multi-tenant or not? We're going to see how Flux works with different multi-tenancy models. And finally, we're going to talk about what things are needed for Flux version to reach GA, like a 2.0.0 release. OK, let's start with what is Flux made of? I said a couple of things at the beginning, but there are many, many things making up Flux. So of course, we have Kubernetes API extensions called CRDs, right? You, Flux has no HTTP API. Everything you do, you have to do it through the Kubernetes API in a declarative model by storing these custom resources in some sources, or you can directly apply them on the cluster. But this is how you control Flux to custom resources, right? So Flux extends the Kubernetes API with its own kinds. Of course, all these custom resources have to have something that operates on them. So we build Kubernetes controllers powered by Kubernetes Controller Runtime, which is a great SDK for extending Kubernetes. If you are thinking about building your own controller, I definitely recommend Controller Runtime and QBuilder. They are great tools made by the Kubernetes community, and they work really great. So we base Flux version to only on upstream Kubernetes components. Our controllers are built with Controller Runtime. The Flux command line tool is akin to Qubectl. It's built on top of Kubernetes CLI util library, and it feels and behaves like Qubectl. Like all the Flux that you are used to Qubectl, we give it a configure flag. You use impersonation. All these configurations work the same in the Flux CLI. So it's really easy to switch between one and the other. We have the same expressions like Qubectl and so on. The command line tool has many functions from generating YAMLs, the custom resources of Flux. If you don't want to type your YAMLs, the command line has an export create and an export function. So you can give it command line arguments, and it will write the YAML somewhere on disk, or it can apply direct on the cluster. And we've seen a lot of people building all sorts of utilities around it. You can wrap it easily in a best script or any other thing you want to do. Also, the command line tool is the way to get started with Flux and installing Flux on clusters. It can create Git repos. It creates deployment keys. It works with GitHub, GitHub, Bitbucket APIs to allow your team members access on those repos. And it exposes a simple command, which is called Flux Bootstrap. Maybe I'll have time to demo it, where you say, Flux Bootstrap, this particular repository, and it sets everything up. Now, that's the first time and the last time the Flux CLI has to touch your cluster. So after you run Bootstrap, what that command does, installs Flux on your cluster, then it sets up the Git repo in such a way that Flux can clone it. So it generates a unique deploy key per repository, sets the deploy key through GitHub, GitHub, whatever API you have. And then all the other actions, you can do it only without access to the cluster, only through those sources. So if you want to reconfigure Flux, you can reconfigure Flux in the Git repo. Because the Flux definitions are stored in a repo. Flux is able to upgrade itself and so on, and we'll talk about that a little later. And for those of you who use Terraform to create your clusters, we have a Flux Terraform provider. So if you don't want to use our CLI to install Flux to Bootstrap it, you can use our Terraform provider. And after you create your cluster, you can use the provider, that module, to set up Flux in the same way as the CLI does. And finally, Flux is made up of many Go and some C libraries. We use all the Kubernetes things. We adopted KStatus, which is a library in Kubernetes for determining the readiness of a resource. So that's why Flux is able to, after it applies something on your cluster, is able to monitor what's happening with those resources. Okay, I've modified the deployment. But what if the deployment fails with those changes? Flux is able to monitor that using this library and we'll report back. I know I've changed 10 deployments, one is failing and Flux will tell you which one is failing and why. For dealing with Git, in Flux version one, we've been shelling out a lot to the Git CLI and so on. And from a security point of view, shelling out to other binaries is a very bad thing. You need to mount files, SSH keys and so on. So in version two, we decided that Flux should be, all the Flux components should be a single process demon. They shouldn't accept anything. They shouldn't have to have a shell even installed in the container. So for dealing with Git especially, this is quite challenging if you don't use the CLI. So we had to use two libraries. One is called GoGit, which is pure Go implementation for the Git protocol version one and for the Git protocol version two, what Azure DevOps and AWS code commit uses. We had to use a LibGit tool, which is a C library made by the Git team. And yes, that's why we have statically build the Flux now and include all these C libraries, which also include OpenSSH, LibSSH two and so on. For making Flux, integrating with other tools in the CNCF landscape like Customize, Helm, Minio, OCI, encryption, decryption and so on. We are also using only Go libraries of all the SDKs of this tool. So we don't shell out to anything. We don't use the Helm binary or whatever. We use the Helm SDK. We use the Customize SDK and so on. And we try to contribute back upstream. Our change is our improvements. And also for integrating Flux with cloud providers, all sorts of services like, okay, I want to reach out to some container registry, ECR or whatever, right? We use the upstream SDK for that. So we always stay on the latest version of those SDK and we try to maintain them all being every time on the latest stable release, right? So we have all those things. So the question is how can a helpful of people maintain hundreds of dependencies, tens of integrations across all the cloud vendors? And the answer is quite complicated. So yeah, we dedicate a lot of time. We have seven core maintainers, 18 maintainers across all the repos. We have a very helpful community. Every time we do a Flux patch release, something goes wrong in like five minutes. Someone will post on Slack or OpenSource. Hey, I've tried this out on my staging cluster, something breaks. So yeah, the community has been very, very helpful. Slack is crazy, but yeah, there's life in open source. And even with a helpful community, even if you are like very careful, whatever you do without automation is almost impossible to ship something stable every time we do, we roughly do a release every two weeks across all the controllers. So there are like six controllers, 12 custom resource definitions. We have our own libraries, I don't know, 10, 12 libraries. So we focused a lot on automation and I'm going to talk about that part now. This is Flux release pipeline for the libraries, the controllers and the CLI. We have unit testing and we use MF test from the controller runtime library, which allows you to run a Kubernetes API and an ETCD instance inside your testing process. So it runs those processes locally, you don't need a container or anything like that. So all our unit testing are targeting MF tests so we can test against different versions of the Kubernetes API, but it's not a full blown cluster. So you don't have, you cannot run deployments there. What you can do is test that your custom resources and the logic of your controllers, the reconciling loop are according to the spec. We've also integrated with Google OSS fuzz, so our controllers, our libraries are continuously fuzz by the Google project and I'm very happy about it. We found some issues that way. Of course we build everything in all our controllers and CLI and target all the OSS out there, all ARM64, AMD64, ARMv6 for Raspberry Pi and so on. And finally we do a full blown end to end test for everything and of course if all goes good we release either a library or no controller. But that doesn't mean when we release a controller that doesn't mean you can actually deploy it very easily on your cluster. Why? Because we have all these controllers but if we test each controller independently then how can we make sure they work nice together, right? So that's where the flux to repo comes into play and we have a GitHub bot and what it does it assembles all the controllers. So once they are released the bot assembles all the controllers, put them together in the flux to repo and then we have this release pipeline of flux too as a distribution of our controllers where here we integrate with security scanning, GitHub, QLN and SNCC. Then we run end to end tests so we test all the controllers together, the bootstrap, the CLI, everything inside GitHub actions and we use the Linux and the Mac OS runners. Now the issue here with GitHub actions it doesn't have support for ARM64. You can have runners only for AMD. And we applied for queries, CNCF helped us a lot and we are now we have a testing grid on Equinix Metal that we can use and there we have our GitHub runners deployed there hosted on these bare metal machines and there is where we make sure then we run all the end to end tests for ARM64. We have a large user base that runs flux on Graviton 2 and other on Equinix and other ARM64 clusters. So for us, ARM64 is as important as AMD64. We make no difference. We want to make sure everything works on both. And of course we have a lot of users, home users which are running flux on their Raspberry Pis which is ARM V7. And for those we couldn't find the provider that will just give us Raspberry Pis. So there are communities Raspberry Pi communities which help us, they run their own tests and if they find problems they come back to us. But in the last year we didn't have any issues. We had a couple of issues at the beginning with ARM V6 due to the C libraries that we use and so on but those are sorted, right? So we, all these tests are running using Kubernetes kind and Kubernetes kind is a great way of running end to end tests because it runs a whole cluster in a single container. So you need Docker or something like that to spin up a whole cluster. And you can test on different Kubernetes versions, right? And we support, for example, Flux works from now 20 to 24. So we can test on all the minor versions and make sure we didn't break something because the Kubernetes API changes a lot and the client go changes a lot, all the go libraries. So we really try to make everything possible to catch any kind of issues there. And finally we do our cloud end to end testing on Azure, EKSGKE and at the release part everything, everything that's part of the release is signed with cosine and we also publish software below the materials. And this is, these are the release artifacts. This is what happens at every Flux release. We have multi-arch container images, signed images, checksums, deployment manifests. We also publish open API specs in the JSON format for all custom resources. And you can import those in your IDE, like IntelliJ, Golang, Visual Studio Code and so on. And you have, when you type the animals, you'll see, you get auto completion, you get validation right into your IDE for all the Flux custom resources. We publish the CLI binary with packages, homebrew, arch and so on. Terraform provider goes to the Terraform registry and we also have a Flux GitHub action that we publish with each release. Okay, so back to security. What makes the Flux controller secure? First, as I said before, we don't shell out third party binary. So we, and we don't depend on OS packages and so on. So we, yeah, the code is like whatever we wrote there, it's only that that runs in your cluster. And then because of that, we can seal Flux with we are dropping all Linux capabilities inside the container. The root file system is read only. We use the second profile, the default one, the runtime default. We don't run as root or controllers and we use Kubernetes impersonation API every time we do something on the cluster and going to talk about that later on on the multi-tenancy side. Now, Flux versus the competition, right? As I said, most, I think most continuous delivery products out there, they allow you to, you know, plug in some binary, some script and this is how you drive continuous deployment. From a security point of view, we felt that we don't want that into Flux and in a way it limits Flux features. We have to implement any kind of integration because you cannot just throw a script there or a binary in the Flux container and make Flux run that. So basically Flux statically build, no shell out to anything, no git, no kubectl, no hell, no nothing. There is no HTTP API. You can only control Flux to the Kubernetes API so RBAC and everything else applies to everything that Flux does. All the actions that Flux performs for everything it does, it creates a Kubernetes event so if you enable the audit log, if you store these events in some external storage, you have a full trace of what Flux does and you can compare that with your git history, right? In the GitOps all we say, hey, the audit log is the git history but a small changing git can reflect in hundreds of actions on your clusters, right? So if you store both things, you aggregate them somewhere, then you have the single glass of pain where you see everything that's happening. This git action triggered all these events. Flux execution is predictable. There is no way to extend Flux for plugins or scripting. How you extend Flux is you build new Kubernetes controllers and we have a SDK called the GitOps Toolkit. You get started with QBuilder and Control Runtime, then you import our libraries in your own controller and that makes you part of Flux and you can easily extend Flux like that. And one example is Weaverx has created a Terraform controller which allows extends Flux to the Terraform tatery. So you push Terraform code to your Git repo then there is a controller in the cluster that applies it, then integrates with everything Flux. It waits for other Flux things. It can trigger other Flux events and so on. And there are other controllers like a jsonnet controller and so on. Okay, who trusts Flux from what I said, like Flux is embedded in other distributions like Azure Arc, you can see anywhere VMware, Tanzu, DDoEI, IQ. Also the US Department of Defense and US Air Force are using Flux, it's part of the platform one. Deutsche Telekom has a thing for a platform for deploying Kubernetes at the edge for 5G and that's also embeds Flux into it. And there are many more examples like SAP here, Fidelity and so on. So yeah, a lot of platforms opted to include Flux in their own platform because Flux is not open-ended. You can pick and choose controllers. Maybe you only want to use Flux source controller and the customized one. Your platform doesn't deal with Helm. So you can pick and choose any Flux controller and it's easily integrated in your platform. And I think that's our strongest point. We don't have opinions. We don't say, hey, you only like this, you can install it or only like this, you can use it. And that's why we see adoption here. Okay, how secure is Flux? So people trust us, but how secure is it? In, we had a security audit sponsored by CNCF made by Adalogix. They discovered some things. We had a great collaboration with them. We addressed all the security issues that they found. And based on that audit, we put in place an RFC process and everything that touches a Flux security posture, like we want to add a new custom resource that can have some, it changes the security model, you have to go through this RFC process, right? That means Flux changes are not as fast or new features don't land up as fast as you may want. But we think that everything that goes into, touches the security aspect must undergo this process. And where all the Flux core maintainers, they have to approve it and so on. So we have a process in place which is detected by our governance. This year, the Flux team has focused a lot on security hardening. We have done our own internal audits. We found vulnerabilities in our multi-tenancy model. We patched them. We improved a lot the secrets management and the decryption on multi-tenant environments and lots of other things. And by the end of this year, CNCF tax security will look again at Flux and maybe then we'll release Flux GA. We really need to have our security story straight before that. Is Flux bulletproof? Of course not. There is no such thing as bulletproof in software, right? Even if, I don't know, some software has no CVs, it probably, it has CVs, but no one has found them. So since we launched Flux version two, we have these four high and critical CVs that are affecting multi-tenancy. So there were ways of elevating privileges from a tenant to the cluster admin. And those are more about, hey, we use KubeConfig's and KubeConfig's allows you to shell out to something, right? So we have to strip down the KubeConfig's and disable any kind of shell out. Even if our code doesn't shell out, some libraries that we use do this shell out. So we found those things and we removed them and we sanitize all the inputs that are now coming into and make Flux multi-tenancy. How do you keep Flux up to date? As I said, we release very often every two weeks, every three weeks, and it's really painful if you do this manually. So because Flux is able to upgrade itself, you can use our GitHub action. If you are using GitHub, what that GitHub action does, every time we release a new version of Flux, the GitHub action will open a pull request on your repo where you bootstrap Flux with the new Flux version. Once you merge that, Flux says, oh, there is a new version of me, let's upgrade. And if you are using GitLab, Bitbucket, or other solutions, the people from Renovatebot, they created an extension for Flux and it does the same thing at our GitHub action. And you can keep Flux up to date really easy with no effort. So please, if you are using Flux, consider this strategy to being up to date. And our APIs are stable. They are a bit above what we said to users. We are not going to break the backwards compatibility and you can upgrade like we have people that were installing Flux, I don't know, six months ago. They upgrade to the latest version, everything works. So I encourage you to keep up Flux today. Okay, so from a GitHub's perspective, when you have Git or any kind of source involved in your continuous delivery, there are challenges that come with it. First are, how do you keep your secrets safe? You cannot store the secret in plain text in your repo. How do you restrict access to sensitive data? What happens if someone steals your GitHub, your GitHub token, for example? They can log in, right? They can do things on your behalf on GitHub. And how Flux protects you from that, Flux works with OpenPGP. And you can tell Flux, don't trust the committer. Even if someone has right access to a repo, only these people, only these keys, only the commit sign with these keys are allowed to make changes on the cluster. So even if someone takes over your GitHub account, by some means, even if they commit as you in GitHub, if you have told Flux your PGP key, that commit will be verified by the key. And if that verification failed, Flux will not apply the modification, will issue an event, a narrow event and say, hey, this commit is not authorized. It's either, it's not signed or it's signed with a different key that I know. And that key is stored inside the cluster, not on GitHub, right? Because if you store also your keys on GitHub then, yeah. And how we prevent destructive cluster operation, like what I'm doing if I'm going to delete everything. And this is also about how you can tell Flux to limit the access of summary polystories on the cluster. Okay, so are my secrets safe in Git? We, Flux works with any kind of controller or operator that deals with secrets, but we felt like we have to come up with a solution where you don't have any kind of dependencies to other controllers like secrets, hash core vault controller or anything. So we built into Flux secrets decryption and we choose to integrate with Mozilla SOPs. We are committed to the SOPs development and maintenance. We are now, we have made a couple of improvements inside Flux for the SOPs integration and now we are back porting it to the upstream. And because of SOPs, basically the SOPs is a CLI, you use it to encrypt your secrets and then you configure Flux to decrypt the secrets. And this works with static keys for with agent encryption. For example, one of my favorite tools in the encryption landscape, which is a great replacement for OpenPGP, but we also work with OpenPGP. And what SOPs gives us is the power of integrating with all the cloud vendors. So we have Azure, Google, AWS, KMS or also hash core integration and so on. And if SOPs adds a new provider, for us it's very easy to extend Flux and integrate with that provider as well. This is how secrets operation looks like. Have a public key. You can store the public in your Git repo. You can encrypt that, you push it to the Git and then Flux talks to, either takes the static key from a Kubernetes secret if you use something like H or calls out to the cloud KMS using workload identity. So you don't store it anywhere, the master key and it decrypts the secrets. Okay, and finally, is Kubernetes true multi-tenant, right? Yes, but so multi-tenancy is really hard. Hard multi-tenancy, it's easy if you have the right tooling. So in so multi-tenancy, multiple tenants share the same cluster. So there are global objects like custom resource definitions, namespaces and so on. You cannot let your tenants control that. So you have to have the separation of platform admins or cluster admins and tenants and tenants have to be restricted to namespaces. In hard multi-tenancy is where tenants get dedicated clusters and a dedicated Flux instance or you can run Flux on your management cluster, for example, where you spin up your, where you have your cluster API providers. You can run Flux there and Flux talks to the, listens to the cluster API events. It detects all new cluster has been created. It takes that Kube config and provisions all the tenants workload from the management cluster to the tenant cluster. So we have two ways of doing hard multi-tenancy. You either give your tenant full control over Flux or the tenant has no idea Flux is even running there because Flux is running on a management cluster and you, that's how you peer a Git repo owned by the tenant on their cluster. And about tenant isolation boundaries. So there are these two things and because Flux GitOps tool in works with sources, Kubernetes has a tenancy model, sources and Git has its own tenancy model and Flux is the one that bridges the gap between the two. Right, so as I said, you have dedicated cluster and dedicated repo stories. If you do hard multi-tenancy, and if you do some multi-tenancy, then you have to deal with, you have to create dedicated namespaces, role bindings, no taints and so on. Okay, and this is how it looks like. You have sources for admins which can run as cluster admin on all the clusters. And then you have Git repos for your tenants and what Flux does, it uses the impersonation API and reconciles all the tenants workloads under their own accounts. So they are highly restricted. They cannot touch other namespaces or global objects and so on. And no time for demo. Flux GA, you can come by the booth and I can give you the demo there. So it's really easy, right? Okay, Flux GA, we have a couple of things running. We are integrating with Helm OCI and we are building OCI for any kind of Kubernetes manifest. So instead of using repo stories, you can use OCI. That's the thing we are trying to do in GA. Some additional resources and that's it. Thank you.