 Hello, everyone. My name is Hrtefan Brodan. I'm a flux core maintainer many years now and Very happy to be here with you Have some exciting news about the flux project roadmap included in this presentation. So half of it will be we're talking about Scaling flux and the different ways of doing that and hopefully the other half have enough time to go over the Roadmap for this year. So let's get started with what's flux Every year I'm changing a little bit of definition flux is so many things. So this is what What I'm going to say today The idea I'm trying to Send here with you know flux for foundation layer for continuous delivery platforms is that Flux is not the platform. It's not an end product It's not something you Go there and you know use it directly. It's best when you integrate flux with your own things and You build your own abstractions on top of flux from pure technical perspective Flux is a Kubernetes extension and extends Kubernetes in many many ways. That's 13 CRDs. I Don't know. Maybe we should stop But I don't don't like 30. So it was just one Next year and yeah, six controllers That doesn't mean you have to use all of this or the the minimum let's say flux deployment requires you to understand Two CRDs, you have to create two custom resources. Let's say Kittery point of customization and you can use only from these six controllers to controllers, but Given that almost all of you are using helm, you'll definitely need a third controller, which is home controller and you should probably be interested when flux fades. So then you'll need notification controller and Maybe you want to fully automate your pipelines and when you deploy a new Helm chart your container registry or when you build a new App image a new container image Maybe you want to fully automate that on your staging clusters and when you are doing that you'll need the image automation controller. So that's how you Get to the whole of flux But flux was designed to be extensible. So it's not only these controllers. You can add more things to it. For example Now there is called the TOEFL controllers. There is a cloud formation controller And so on. So we we've created an SDK, which is called the GitOps toolkit It's all written in Go. Flux is all of Go blank So you can use this SDK to write your own controllers From a security standpoint of you we we decided when he created Flux version 2, 3 years ago that we are not going to Build flux around the idea of plugins and shell exec to other Things like if flux needs to reach out to Git, flux should have the Git implementation in its own code base So everything has to be native written in Go. So flux is predictable It has a smaller text surface When we go through a security security audits and flux has gone through many of those By just looking at the code and analyzing the code, you know exactly what flux is doing So we don't do shell execs to Helm to Git to nothing. So everything has to be built in So how would you extend flux then right if there is no way of Like in CI when you add a new step you put a binary there the extension model for flux is that we provide your An SDK build on top of Kubernetes controller runtime and you can build your own controllers which can do other things and extend flux to you know other types of system targeting Things maybe outside of Kubernetes or build your own things in inside Kubernetes, which flux is not made for so we have these API constructs in flux which are Split them in like four categories. It's about of course, it's workload definition, right? The main purpose is you want to Deploy things that are running in your cluster. So we have An abstraction build on top of Helm customized and we also support Kubernetes Manifest I'm seeing abstraction, but is not quite that is more about a reflection of things that you do with Helm and customized CLI in a declarative model, but we don't Abstract it that much it's just we make it declarative So it's up to you to build other things on top of these constructs to make I don't know have an app definition Flux does not tell you what an app is you it's up to you. You can make it from all these sorts of bits Not our category. It's a desired state acquisition. So we started flux As a github store, right? Everybody knows flux for github's but since we since the virgin to design we decided that it should be one way of acquiring the desired state and we've Pretty fast added s3 compatibility storage to Flux source controller so you can use buckets Asia blobs and other things in Google Cloud the bucket Google Cloud Minio and so on and in the last Two years one year and a half. We are more focusing on the OCR defects Story that doesn't mean that gits gith is Going away from flux flux is still github's you'll still use gith when you collaborate on how you define your Clusters how you configure your applications, but this is more about where is the state stored where flux comes in and ring beside those clusters and for some use cases In I don't restrict it environments Some organizations say hey, we only allow s3 as storage which is audited for us that Whatever it's on the cluster cannot go to github.com We don't want to you know run a git server in our production system. So for those types of restricted environments Having s3 compatible storage there is a is a great way to do github's even if you don't run git as a dependency of your production system Other things reconciliation, I know I'm guessing everybody here understands what reconciliation is Okay Yeah, so this comes from from Kubernetes not something we Invented in flux right everything is a controller. There is a controller loop So you have a desire state and an actual state in our case is what's running in the cluster and desire state is what's coming from outside and Typically you do two operations on this you need to discover is there a drift Between those and if is there a drift how can I correct it and how flux does it flux relies on? Kubernetes server side apply So we are we highly optimized for this where we Even if you have tens of thousands of workflows definitions Let's say deployment services all the things in a single repo in a single directory and want to apply that with a flux Summization if you change only a service and only on a deployment We are not going to apply all the things all the time what Helen does or keep still doesn't song We detect exactly what changes and server side apply using the dry run functionality tells us Tells flux say only this thing has drifted and we apply only that part So that's why flux can you know? manage Hundreds of resources thousands of resources and be efficient while doing it and the last part which is very very important It's observability right flux does not have a UI so It's complicated. How do you know what's happening, right? We mainly rely on events so flux emits Kubernetes events for everything it does And it also we also log these things which more with more details. So if you enable something like Kubernetes audit log or you capture these events you can Get a really good understanding of what's happening every time flux does something There is a Kubernetes event for it and we annotate these events. So, you know, for example when something fails From which source It originated like if let's say a deployment fails Flux will tell you this deployment fail, but it comes from this Get repository this branch this commit so it can basically trace back its origin and you know where to go to fix it But we've seen that You know building aggregation for Kubernetes events is is not something that every cloud providers offers by default or it's expensive or you maybe one more To build some kind of logic and you react into to flux events and do something else So we have this notification controller where you can route The flux events to external systems and we have many integrations there We'll talk a little bit about that in the roadmap So those are let's say the four categories that flux tries to know Allows you to build your continuous delivery platform One of the things that we we really tried with flux is not be Opinated that's why flux is complicated. That's why flux has so many CIDs so many controllers because we want flux to adapt To organizational structure and not get away around you we don't want when you Dot get ups to change your whole mindset get ups is a minds implies a mindset change It's actually hard to make that step we and we want to you know We want flux to be as easy as possible when it comes to adapting it to all your Dev teams you have platform teams Which team has which type of access so we we try to make flux fit into To the organizational structure We also Our goal now that flux is J is for flux to keep up with the growth velocity of your Organization and let's talk a little bit about that One of the biggest factors if you are into micro services is you know keep adding them So adding more more things more micro services what it means It's a it's a lot of pressure on the continuous delivery pipeline. You have more things to Deploy maybe they are independent And that means you are doing micro services correctly, but most people they aren't like most organizations I've seen They play the micro service game, but in in fact they are doing distributed monoliths Right you have 10 micro services, but they all need to be deployed at the same time And you have to match all those versions and you have to have a specific order in which you deploy them right so Anyway, you are way better off to the monolith but Not getting there another growth factor is of course disaster recovery higher mobility you even if you have a little Let's say you deploy 10 apps is not that much right, but then You want to deploy those as close as possible to our customers So you will want to deploy them in all over the globe. You want to have no backups You want to have high availability. So you still keep adding even if Apps don't grow for those apps still keep adding to the continuous delivery part So it builds up Security constraints as well. Maybe you need dedicated clusters. You need to do hard multi-tenancy And lastly business expansion right you add new apps to so your business becomes more It expands to new Territory, but you'll never close the old apps because that's what's making money So if it belongs it makes money it will stay there forever. So you have to maintain more and more things over time so I am Recently I'm getting more and more of these type of messages on on slack from users They realize like okay. We started with a couple of things and all of the sudden we Next year we're going to you know deploy thousands of workflows hundreds of clusters How can we do that can flux keep up more? You'll start with fluxes a mono repo probably you use the example repo and That's fine. It works but not at Huge scale you need to rethink a little bit your setup and when you realize that you can go into panic light I don't see how I can store in a single repo thousands of apps or hundreds of clusters, right? But flux can actually do well with a mono repo with thousands of apps and hundreds of clusters It's more about your cognitive load and can you deal with that? But This is what I'm talking about flux can actually do it no matter how you organize things Okay, so Scaining strategies how you make flux work when you run at huge scale I Suggest you go through this journey where instead of jumping directly to oh, I want flux Horizontal scaling I want to run hundreds of flux instances and so on instead of going straight to that. I Recommend you you look into source optimization control fine-tuning. That's the first thing you should do Then go vertical scaling and lastly if nothing works, then you should do sharpening Let's see source optimization. So there are There are some things you can can do to majorly improve flux speed and one of the things that I've I'm kissing it every time every time move away from Helm HTTPS repositories Store your hand charts in your container registries. It helps out. It helps a lot flux with memory management because if you have hundreds of hundreds of hand charts and hundreds of hundreds of Versions of those hand charts and you store them in an HTTP repo What that means is for I don't that's how hell does it it stores everything in a single file It's a huge YAML file That flux needs to load into memory every time it needs to figure out though. Is there a new version, right? So that operation is quite expensive and You can get rid of that operation you can basically eliminate all that logic by just you know Do help push your container registry? Another thing I I See like an Issue is that when you have this huge app repo have thousands of thousands of commits in there And you place the you know the YAML definition for flux in next to your source code That works, but not if you have hundreds of hundreds of repos like that. Why because git history is quite large Indian flux doesn't need all your app code. It just needs those Definitions so you either can create Dedicated repo for I know group of apps or something I can store there only they almost or you can move those Move publish those definitions to the container registry as well So don't have flux cloning and pulling from your app repos is one thing that will help you scale it Flux has a way on how it can apply things in parallel has a concurrent flag But if you store everything in a single and you apply everything from a single directory to the cluster and use a single flux Systemization you don't have much room to improve it right can Take advantage of Paralyzation and having flux applying things concurrently so that's another strategy split up your Your things somehow logically. I don't know a cluster add-ons in a directory apps. You group them by Dev teams or whatever and you have multiple flux customizations It depends on between them and that's how you can speed up the the reconciliation a lot another thing about Controller fine-tuning so you can can do two things one is you can use persistent storage for the flux source controller and What that helps you is when you upgrade flux or the node where flux source controller runs When it needs to start up Source controller creates a cache of all your external sources Right, so if you have thousands of thousands of hand charts and you Delete source controller or you upgrade it's a new pot It needs to download all all those things again from upstream But if you use a persistent volume for that the cache is there So you will not hit that you know huge spike on your network another thing which I found Very recently I think Late last year is that on some clusters Customized build can be very impactful the the slowness of the disk where customized control run can be very impactful for customized build operations and How how flux runs it it basically gets the artifact from source controller then Programmatically is it runs something that resembles customized build way to speed up this and eliminate the disk latency is to use An in-memory volume for the temp directory and that will Basically all the the build operations will be made in memory and is not a big issue because These are cleaned up all the time so it will not the memory will not grow That's also good a good thing to do for optimization Vertical scaling let's say you did all the optimizations. You're still not happy You want to deploy everything in one minute, but it doesn't happen it takes five whatever You can bomb the limits you can add more CPUs give it more memory What's really important here is that you should also increase the concurrency level Based on a CPU limit so what's concurrency is like you tell flux how many go-routines to use and how many hand charts Helm releases you can reconcile in parallel so as Can expect there is a relationship between how much CPU it has and The concurrency level you can exhaust the CPU if you set concurrency 100, but you give it like a very tiny V-core on the node The problem here with vertical scaling is it also it will hit the ceiling at some point and The most important thing you should look for are rate limits You are trying to speed up flux, but and the reconciliation But on the other hand you may have the opposite effect if the Kubernetes API is under Pressure and can't keep up with the amount of operations It will rate limit flux and not only flux to rate limit all the other controllers that are running and You'll end up with a slower Reconciliation that you wanted so you really need to Monitor rate limits that's very important for vertical scaling So for flux version 2.2 that we released in December we decided we had People kept asking us like how many hand releases can flux reconcile How many customizations and so on we don't have we didn't have an answer for it We still don't have a clear answer But we decided to create a repository we We have sponsorship for from GitHub and CNCF so we have access to the large runners in GitHub. So we decided to write a benchmark which is No as a reference you should run this benchmark on your own infrastructure because there are so many things that Can be way way faster way way slower. For example, I run this on on an M2 CPU and it's like way faster than what you can see here, right? So it really depends on things, but What I'm trying to show you here is that flux in the latest version is quite fast. It can do 1k helm releases in eight minutes and 1k customizations in around four minutes so This is about You deploy at the same time 1000 versions of your apps. This is what means here. It's not we have in total 1k Henry's is like you want to run 1k upgrades At the same time and you you may think like who does that there is no such No company out. There is that fast doesn't ship that fast Up this right is crazy to do this kind of testing But it's not that crazy if you think about Hey, there is a CV in the base image and I'm having all my microservices are not just or whatever Technology we're using and there is a CV in a base image and you are patching that base image You are rebuilding all your apps and you have to deploy them, right? So flags has to be as fast as possible. He has to to do that Deployment and you may end up with these numbers Right, so Can I put the link to the repo here? You can can take a look at it and try it on your own okay What about horizontal scaling? It's you know Scale vertically maybe you want to upgrade more than one can release at the same time or You want to do some kind of isolation and so on Controllers in Kubernetes can be They're not horizontal scalable it can just increase the number of replicas and all the sudden things will work like that Custom resources are like entries into a database. So if you want to scale controllers you have to apply The same methodology if you do with with databases and how we we did it for for flux. It's through sharding so There are many ways on how you can shard The and spread the log in the cluster I added here to strategies you may want to spin up a flux instance Permanent and Heavy dedicate that also adds the security posture the whole thing Or maybe you run flux on a management cluster and then you want to run an Easter Instance per group of clusters or maybe for each cluster is up to you but the idea is that you need to figure out how you want to share the custom resources and then imply apply a label and Sign her releases customizations sources to a particular shot. So you can move So resources from one chart to another you can do Canary releases for flux and upgrade the only one instance and assign some Well, Helm release there and see is the new helm controller working. So the sharding mechanics can also work as you know as rollout of the of the flux controllers This is one example of how I can do it what what I want to highlight here is that Sharding for flux is you run a primary Installation of flux that you can easily do it with bootstrap then flux itself Manages its shards There's the idea behind flux and flux bootstrap is that flux does its own upgrades It's a single command that you run once at cluster creation then you don't touch the cluster flux knows how to manage itself and When you imply Sharding flux will manage also all its shards so it will know how to upgrade them It applies the labels and so on. So this is one one nice thing about flux is that you know We build it in that way if flux is managing all your apps and knows how to upgrade your apps Flux should definitely know how to upgrade itself and manage itself. So And here is is an example on how you can structure things but this is all or new of course We have users and said, okay, I do I need to label all these things alone I have hundreds of hand releases now. I have to add all these labels on my own and so on you can use an admission controller I've seen this use case where You have an admission controller policy that mutates all the flux objects and you can say oh if it goes into that namespace And you know that that namespace belongs to a tenant then you can add the label for tenancy and then you You move the reconciliation of all that namespace to a particular tenant or to a particular cluster We don't have a mutation controller in flux will probably never do but there are Solutions out there right in since if there are so many options that you can you can use here Okay So that's chatting any any questions For sharding No, okay. I'll I'll move to the second part Let's talk a little bit about the roadmap so two weeks ago we have published The roadmap for this year as you may know flux is going through a Big change we are moving from a single vendor project to a multi vendor monthly individuals Which are you know managing the project? So it's a big change for us. We We didn't manage to publish the to update the roadmap in January we are kind of late So we are going to skip a Minor release, but we are now in a good shape we've I've met at this given so many people wanted to help us and so many organizations that they want to contribute to flux I have high hopes Flux will be in a very good position by the end of the year and to grow even more What we've set up to do for for the first two quarters the next two quarters First is general availability. We are Graduating almost 80% of the flux API is to GA for now we have Flux GA only for good repo customization and receiver this year. We are Promoting all the helm constructs to GA which means We are telling you we are not going to break anything. We we kind of never did it We we kept the backwards compatibility for for Henry Lee's since three years ago But this is us telling telling you like we we run the benchmarks we refactor helm controller is in a very good shape and Is now ready to go for GA a lot of people has pushed us like should do GA of helm like two years ago We weren't ready now. We feel like we we we are in a we're in a place where we can put this stamp on and We are doing the same thing for the image automation resources So one direction is GA another direction is adding Futures to flux and this is a sensitive matter I'm becoming very sensitive about adding new features because any new future comes with you know new dependencies more work More maintenance work and so on so we we are trying to We have an RFC process in place And what we are trying to do is encourage people that are want to add these features to flux to also help us Maintain them on the long run So an example is we are going to ship notary integration in the next Flux version we already have cosine, but I know if you've seen the binami announcement I think it was this week all binami charts in docker hub are now signed with notation, right? So if notation is becoming so popular flux should definitely be able to work with it and Big thanks to Microsoft for contributing the notary integration to flux it He's been a long journey. They had to refactor bits of source controller to make that happen It's a huge poor request that we are, you know getting in shape, but this I hope I'm 99% sure That it will make it in the in the next in the next release Another thing that we are doing The people from Ericsson together with the CDF Organization they had a lot of use cases for flux they use flux and they want to Better integrated with tecton so they are Contributing the CD events integration in the in the next release we are going to Allow for flux to react to CD events For example tecton does something. I don't care. It's an OCR defect and it can tell flux. Hey go and reconcile that by Sending a CD event to the flux notification controller So we are extending the receiver with we see the events and Later on we want to make flux and translate all the flux events into CD events So then flux can tell tecton have deployed that now you run the end-to-end tests Then you call me back and then I'm doing something else Right. So CD events is is really nice for you know being able to mix together different Components of your platforms and we Yeah, I'm very thankful for Ericsson for doing all this work. This is going through RFC processes Is a is a good example if you if you look at CD events how How we are we are shipping this into flux where basically open an RFC you Set up the use case you explain why this is good. Who is doing it? Where are we going to implement it? What's the impact on on on the flux components? Building a receiver is quite easy. It's not it's not the it's not a big effort But translating all the flux events and making them compatible with city events that would be a big challenge because you can easily translate Helm operations to a city event because you have an install you have an upgrade you have an uninstalled right but With all the other things that are happening in flux. We don't have that concepts flux is reconciling There is nothing such thing as install upgrade. It's You know you have this state you move to this other state So you need to figure out the way on how we can you know map what flux is doing to what city events expects so that's That will be an interesting Journey for us this year through RFCs and see how we can achieve that other things that we are doing is Moving forward with Helm OCI improvements We are going to have we are going to allow a Helm release to reference an OCI repository So we are kind of expanding beyond that Helm the CLI and Helm-LSDK can do you'll be able to Pin a Helm chart by its OCI digest. So you even if you know A tag is not someone can push on that tag the same chart the same version and you want to protect yourself from that You'll be able in the next version to pin it to digest Also, it will be also way easier to debug Helm releases because then you'll don't have Henry Poistory Helm chart is generated in a Helm release You just have liking for customizations the source which is an OCI repository and the Helm release and Another interesting thing that people have asked us for a long time was hey, I want to Deploy on staging only release candidates of my Helm charts I don't want their the stable releases the stable releases should only go to production and Senver does not allow you to do that if you write a Senver Range You can't say only pre-releases right so we are we are also adding this facility where you can we already have this concept in flux where you can filter The the with with reg axis you can filter the Versions before we apply the the Senver range so it can actually in the future actually can say like Oh, it's a release candidate has minus RC goes to staging if his minus test Goes to the testing cluster and so on. It gives you more power on how you can You know do things with with with Helm and direct Helm Automatically upgrades So we'll probably push more for this type of construct from this moment on where you will have a Helm release and OCI repository And yeah major thanks to Sule for being involved into all of this and yeah control plane Okay, final words. I Will I've told you a little bit about this We really need your help as a community. We should strive to make flux sustainable and The way I see and I think all the core maintainers have the same impression as me is that We should be able to Allow people that are they want to add new features to flux give them ownership of those features and the problem here is of course can contribute to source controller or to I don't know Helm controller and We will say at some point. Hey, do you want to become a maintainer, but you only added let's say A Verification to source controller, right and we'll propose to hey, can you become a maintainer of source controller and most people will say like source controller has So so much complexity all the git protocol is there OCI repository is Helm repository is all that stuff and my little thing that I have contributed, right? So it's a it's a lot of responsibility that we With with putting on people one day when we want them to become maintainers and We are now trying to move to a new model for this where We want to say hey if you have contributed this feature We can make your maintainer, but only for that feature and I'm hoping people will be more inclined to do that to be easier because you'll Help us maintain the thing that you are contributed you have contributed You are an expert on that little part because you added all that go to flux. So hopefully with this new model on how we we assign responsibility to people will be able to increase the from we have a large pool of people which are making contributions and a very small number of people which are maintainers and We are trying to increase the number of maintainers without giving them all the Responsibility right and give them a little bit of responsibility Hoping that in the future there'll be more knowledgeable of flux in a sample and they'll say yes I now understand everything So yeah, please Help us come come and see us talk to us on slack help us with the road map And I'm looking forward to you know working with all of you On the future of flux. Thank you very much