 My name's Alex, I'll be your speakers today. Outside of my work, I moonlight on a bunch of things on the left here. I primarily work for Tag App Delivery as one of the technical leads performing due diligence for projects and creating content. When I'm working, I'm the director of Kubernetes at Canonical, so I think about things like micro-kates and upstream Kubernetes. And so not an obvious fit to be giving a talk about feature flags, but I think what's interesting is that feature flagging is something that I find very compelling because I think we have a lot of opportunity right now to take feature flagging beyond the browser where it traditionally lives. So I just wanna get some forced participation here in the most introvert way possible. If you wouldn't mind just scanning that, you don't have to talk to me, you don't have to say anything, you can just scan that on your phone and we're just gonna do a little survey. It will be 60 seconds, I promise you. I'll leave it up there just for a few seconds. When I change slide, you'll see it again, so don't worry if you miss the opportunity to scan the QR, it'll be up in a moment. So let's have a look. Let's go right back to the beginning here and start with something simple. How are you enjoying KubeCon? It's my first time in Detroit, it's been amazing. I think it's an incredible city, brutalist architecture, brilliant people, great conversations. Good, that's the correct answer. I'm happy to hear that, I'm happy to hear that. Okay, well let's move on. What does feature flagging make you think of? And this is an interesting question because I think of a lot of different things. Dynamic, that's a good answer. AB testing, that's traditionally what I would think of as well, AB testing. Blast radius, that's a really, really good idea. Deployment progressive, these are great. These are great. Access control, safety toggles, et cetera, et cetera. So it sounds like the audience here is pretty well-versed in the idea of feature flagging. So thank you for humoring me on that question. And there's one final question. I'll just let these two last participants type. There we go. Final question, why do you think feature flagging fits best? There's a few options on here in no particular order. Where do we have it? I think this is a difficult one to answer. As I said at the beginning of the talk, I felt like until I got involved in Open Feature, it was just something you saw in the browser, on the server. Maybe in your Node.js runtime or your Flask server. Okay, interesting, but real spread, right? They're real spread. So thank you. I can see the results are climbing on client and server. And that's sort of what I anticipated most people might say. So that's good to know. So we've done our slides. So feature flagging is effectively all of the things you described momentarily ago in that word cloud. It is how you deliver functionality rapidly but safely. And exactly as some of you have said, it allows you to do gating or content, AB testing. You know, think about geofencing. That's a feature, that's a feature that's being flagged. You might have certain capabilities on your movie streaming that are only available in certain regions. And they'll look at the metadata of your profile and they'll see your email address or your geolocation or something like that. So feature flagging is really important. And you'll see that I put a QR code. I just discovered these before this talk. So you're gonna see there's a recurring pattern here. P. Hodgson put a really great article on Martin Fowler's blog, which was all about feature flagging. And I decided that rather than try to do this service, I would just refer you to the canonical source, no pun intended. So where does it typically occur? So client side and server side is sort of what you think about it. You know, a web browser, server side, a client side rendering doesn't really matter, but it might make a call, it checks the SDK, then checks its feature flags configuration. This is the simplest kind of occurrence. You know, you think about it sort of like this. You know, you can have an API, an SDK that integrates with that API. But there's more to it than that. You know, there's caching, there's evaluation, there's synchronization. So I don't mean to do a disservice to feature flag providers. It can be quite complex. Then you've got RBAC, users, management, state, et cetera, et cetera. But again, this is primarily the domain of the browser, of the server. And then we also have a lot of vendors that build feature flags. And these vendors are awesome, but they all have their own interpretation of the world, right? A few little examples of what an API might look like or what a function call might look like. There's no particular winner. It's just an implementation. It's how they view the world. It's how they think about evaluating a feature flag. You know, get my thing. Tell me if my thing is blue or is it green? As you'll see. And so I think that given that we're in the age of interoperability in the age of bringing things together and trying to build open interfaces, this is perfect fresh ground for opportunity for innovation. And if you think about things like service mesh interface and other projects like open telemetry who have come together, a consortium of vendors and they've decided to build something fairly common, it means that users benefit, which is great. So the goals, I think generally speaking, are that everyone likes their vendor agnostic approaches because it allows the customers to feel reassured that they're not being locked in. And that enables interoperability because you can start pick and mixing where things go. And of course, we've talked about this idea of performing complex evaluations. So why I've ended up building here is effectively a bucket list for what a project might look like. And this project that I'm describing is Open Feature. Open Feature is a brand new CNCF project as I'll go into talk about in a moment for feature flag management. It allows you to have a singular consolidated schema and set of providers and SDKs that are contributed by a collective of providers. People who are incentivized to build really first rate experiences for their customers through Open Feature. And you'll hear me touch point back to Open Telemetry a few times because that's one of the more successful attempts at doing this across real world vendors who really want to make something better for their customers in open source. The API in Open Feature is simple. It's flexible, it's extensible. It's completely designed by the people who are experts, people who have spent 10, 15 years doing feature flagging who understand their customers as well as people who are first time contributors like me who have no idea how feature flags have grown historically. So Open Feature was a project that grew out of an incentivization to build something that's better so that we had that interoperability. It was started by Dynatrace, but quickly and rapidly grew to have a lot more folks from both Open Telemetry, from feature flagging ISVs and end users, software delivery companies like myself. Only this year, and if you can believe it, we're already in Sandbox. We're a Sandbox project already. So from one KubeCon to another, we went to Sandbox. My participation in all this was based off a chance meeting. I was just larking around in Valencia as you do. Going to parties, being at my booth, trying to talk about canonical and all the things that we do. And I happened to get involved with some of the Open Feature folks. And I thought that was really interesting because I heard about it. A colleague of mine from App Delivery said, hey, you should check out Open Feature. We're trying to do some interesting stuff here. And of course, being me, I was thinking, well, I don't do kind of web development. I'm not super interested in that. I'm more interested in things like top of rack switches, hyper channels, Kubernetes orchestration, everything that's not that. And so I said the thing to myself, well, you know, I work for a company that builds an operating system that runs on 65% of all Kubernetes in the world. What should my incentivization be? What would the philosophy be that I would like to bring to this? And so I thought, well, what about if we had a sort of a Unix approach to this? What do we sort of thought about running this as almost like a kernel module as a system D process? What if we had feature flagging for the shell, C++ for web servers, for the kernel, right? How would that actually work? How would you build something that could talk to a feature flag file, perform evaluations? And then I started thinking about the implications of that. So on 26th of May, super hyped up on Coca-Cola and Tappas and good weather, I sort of started thinking about this thing called flag D, which is a feature flagging demon that runs more or less anywhere. It's got Power 9, S390, ARM64, ARM7, ARM64. It's got about nine, 10 architectures it supports. It follows the Unix philosophy. What's the Unix philosophy? Yeah, it follows the Unix philosophy, which is compactness, completeness, doing one thing really well. But what it does is it integrates with Open Feature. So Open Feature is this standard, this schema, this idea of how you should do things. It's a collection of SDKs and it's also a set of providers. And it became a provider for Open Feature. It is a provider, so it matches the API. I had some great people that then joined the project who were experts on this because I'm not an expert on this. My expertise is on building robust software but their expertise was actually on the subject matter. So in this very teeny tiny demo, you can see that I started running on my Mac and you can't read this because it's like France. But what you should be able to take away from this ASCII terminal is that I'm showing you flag D running. So we had this ability in your terminal to do flag evaluation from a file. File on one tab, curl on the other, symbols. That wasn't quite enough for me though. Because even though we had this compact and simple library, I couldn't help thinking about IPC, right? Couldn't help thinking about using AFUNIX sockets. I couldn't help think about other channels and protocols and ways that this could work. And then of course, when you start extrapolating, you're thinking, well, where does it have to just work on Linux, right? Why does it just, you know, who's gonna use this, another thing, right? You know, this is a CTL process right now. So I started thinking, well, Kubernetes is something that I do in my day job. What about we took flag D into Kubernetes? Right, and I thought, okay, that's an interesting idea. So I sketched out this design, and I'm gonna try and do my best to talk you through this. I'm not sure which legible it may well not be. So this is like a coffee kind of scribble and then I did on a Scaladraw, which I promise I'm not paid to mention, but I use all my slides with these guys. So I thought, okay, well, let's think about how does it actually work? We have some sort of agents running in a container. That container would need to have, it would be a sidecar because the host process wouldn't wanna be bound to it. So I didn't wanna compile anything in. I didn't wanna have it as an SDK that was inside that container because then there was no way to deliver it, right? An application team would have to build flag D into it. I wanted it to be something separate. I wanted it to support RPC, TCP, maybe UDP, datagrams, I don't know. I was just thinking about it this time. And so it started to coalesce into this new project. So I was on a roll here. I think I might commit, I think I merged these both into the open feature project on the same day, but the next project was called the open feature operator. So what this did is this operator, effectively would run in your cluster. It would look for feature flag configurations. Those feature flag configurations, it would then inject via flag D into your workload. And it would do that in a few ways. It would create basically a config file that flag D would read and flag D with the help of my esteemed collaborators now did some somewhat decent evaluation. So I could say from my Flask app, go get me the current color and flag D would then evaluate that based on some conditions. It's even more sophisticated now than it was then. Now it can do percentile based evaluation. Now it can do all the things I mentioned at the start of this talk. So what's really exciting there is that we're starting to see some sort of smart logic being able to be driven by flag D and consumed by your application. In this example, small as it is, get fruit, right? Apples or bananas. Everyone can relate to that, apples or bananas. But the flags themselves could be changed to the CRD. So the custom resource in Kubernetes would contain the flags. I wouldn't have to touch that little config file there. So that was the initial thought. That was the initial thought. We then sort of road tested this a bit and I realized that that was not the way to go. Config map mounting inside of Kubernetes is not the way to go because the least time out on the config map is too long. So if you want to update it, you got to wait for the reconciliation and it would be up to 60 seconds. So I thought we need to put this into sixth gear. We need to put the speed on this. So what I then went back to the drawing board with was let's use a shared informer factory, which is a capability of the Golang library for client go. It talks to the API server directly. It says, hey, I am a pod. I'm this flag D pod. I have a bit of code in me that connected to Kubernetes API and it tells you that I want to watch this API. So from left to right, the way this works is that I have a deployment. I simply label this end to end on the deployment and I have that CRD that matches the name end to end. The admission controller goes through the operator. The operator says, hey, I can see this deployment wants to create pods that are open feature pods. It then validates that CR exists because there's no good creating it and there's no flags to read, right? It then mutates the pod spec. So what it does by mutating it is this pod spec probably only contains one container. It injects a second container and it sets up the permissions. So it creates a cluster role binding because of course our back is a big challenge here because you're talking to the API server directly. So it creates a cluster role binding. There's all the machinery to get that working and then your workload container is able to consume it. Now the beauty of this is because we also build the SDK, the client SDK that you get to work with as a developer, you treat it like local host always. So if I'm on my Linux machine building this workload container, I can just spin it up in a CLI and if I go to Cates, it's the same experience. I don't need to put any connection strings in. I don't do any of that, the gubbins. All I need to do is say .grpc or .hp or .hps who support the lot. In fact, we're using the GoLang Connect library which lets you do multiplexing on the same port which is quite cool. So Kubernetes feedifying, that's only step one. Now that's for me, that's vanilla. Let's go to something a bit more interesting. What if we could take this further to start building operators that consume flags, operators that actually were able to be controlled with flag D? So I started thinking about, well, here's an example program that I wanted to build. It's called Watchman. It's a validating emission controller and what happens is that I wanna be able to turn it on and off. I should be able to pull the plug on my cluster from creation of pods. It's actually a pretty attractive proposition if you're in a lockdown mode on a cluster. How else would you do it? You'd have to revoke user RBAC or you'd have to cut the API server off or you'd have to find any number of hacky ways of doing it. But with this, you could use flag D, you could trigger it via flag and it would simply lower the drawbridge and the validating emission controller in the emission chain would reject it. And so I thought, this is really interesting because I think we're onto something here. So I'm gonna show you a few demos because I think I've spoken a lot there. So the first thing I wanna show you is the demo that some colleagues made using flag B and using open feature operator. You have the controller manager, that's the operator, that's the gubbins that I helped put together and then what that does is that injects flag D into this thing. That thing is the demo deployment. It is one container but what it's done there is it's injected flag D as a sidecar. I'm sure that you're a very savvy audience who are very familiar with sidecars but I thought I'd mention it in case you haven't used them before. So let's port forward this demo and see what's going on. So the first thing I would like to do is to put the demo, this bit on the left and I'm gonna put my screen on the right so you can see exactly what's going on. Let's refresh this. Cool, this is a very cool little Thibonacci demo. I made absolutely all of it. No, that's a complete lie. I made none of it and it just, it's great. So this is really useful to the demo on so thank you again for helping me with this. So what happens here is that if we look at the demo deployment, you'll see that you've got the demo, you've got certain flags that are being checked. What happens now is that if I go to the feature flag configuration I can see that my open feature demo has an end-to-end certain feature flags. This is really cool. So now what I can do, this is the spec by the way for open feature, the schema. So you can see that we can decide on the variance. I'm just doing a very basic example here but if we go to say line 38 and we change this to green, save that YAML file, we go out of that YAML file and we save it and what happens in the background is it upgrades and it turns and changes to that color. Now what's also interesting is by, let's do a live demo, it's always exciting. Oh, I think my port forward might have timed out. Let's give it another little whirl. Let's have a check. Oh, port forward exists. Okay, there we go. Demos aren't supposed to work the first time, that's just the rule. Let's have a look. Okay, so let's go demo deployment. There we go. This is the nerve-wracking bit when it never quite works. Let's have a look. It's got a flag not found. Again, like I said, I didn't make any of this so I'm just gonna remind you now that I did give myself that caveat, a get up. Let's try and have one more little look and if it's still not playing ball then I'll move on to the second part of my demo. Oh, just as I started terminating the pod it decided to kick in to gear. Let's try it one more time. So let's just get rid of this pod and we'll try it again. We've got a little bit of time. So it's just pulling my image down. Interestingly enough, this cluster is in my shed in South London. I'm using tail scale and it's gone through there and so that's kind of why I have a little bit of latency on here. However, I thought it was cheaper than having to host it on a cloud provider so that's something for me to learn in the future that the Wi-Fi isn't necessarily the best. So whilst that's running in the background we're gonna switch to the other part of the demo. I mentioned Watchman, this controller manager. This is a pretty simple example of an admission controller, quite simply put if I go to cloud, oh, code, Watchman and I go to key control, apply config, samples, pod, right? It's a simple engine X pod, something like that. I don't know what it does, let's have a look. You can see it's starting up. You can also see I've probably got some network latency because my pod images aren't being pulled very quickly. Yeah, there we go. Let's kill that. That's a standalone static pod so no harm done there. What I want to show you is how quickly this updates with flag, the underneath open feature. So this has a very simple set of feature flags and on off variant, that's what's happening here. So I'm going to change the default variant to on, there we go, and then going to go down to here and what you can see, yes, a demo that works, finally. You can see that flag is set to blocking admission, right? So imagine you're working at a bank and you've got a cluster that is mission critical. You know that you've either had a bad actor join that cluster or you've got a developer who's inadvertently creating resources that's destroying something or mutating something critical by simply flipping that feature flag, you've blocked access for everyone in the cluster which puts it in a safe space because you can treat it like a box, you can forensically go through it. So this example of Watchman is designed to show you how you can do that with flagging and just to show that it's not smoke and mirrors, we can just go and fix that. So we go to canines and we'll turn it back on momentarily and we'll see if the other thing's finished pulling. No, this is in for the long haul. It is a Node.js image, so it's probably about 15 gigs. Okay, all right, and we'll turn the Watchman system back on and I'm silently relieved that I chose to do two demos rather than one. That turns the variant back off. I'm not sure if I print anything in the log, maybe I do. Yeah, I think I do, yeah, so you can see here what actually happens as I check in the log. I'm gonna do a bonus round here since we didn't get to see the other bit of the demo and just show you very quickly actually what the code looks like from an implementer's point of view. So I hope you can see this. I have a web hook here. In my web hook, I have a component here so it says get flagdvalue, right? That flagdvalue calls the SDK. That calls the open feature client SDK. There's no connection string, as mentioned. You can set the type of connection so I think I'd do that in the main.go here, you see. I tell a lie, I did put a connection stream but that's, but it defaults to a local host anyway. And so I set the provider, I set gRPC because gRPC is cool, and as you can see, it's gone through, it's checked to see, does this exist, this is just in a timeout loop and it's then gone and done what I needed it to do. So if we go back here, we can then create our pod once more. And I promise that that'll be the end of my messing around with this. All right, turn that two off, great. Let's create our engine next pod again, apply. There we go, pod created, exciting. Cool, so let's go and just summarize that. We didn't get to do the playground application, however, you can do that yourself on your own hardware that pulls stuff down properly and can play around with it, it's generally available. The QR codes I had previously should give you some context as to where to get that from, it's on our GitHub under open feature. And the second demo we did was hitting the brakes if you need to stop a creation of a pod in a cluster. And of course that is just the tip of the iceberg, right? Just doing something out of the admission control is super simple, we can take it a lot further, we can put it inside a module for an ingress, we can put it inside some other piece of microservice behavior that needs to switch to services inside the cluster that perhaps he potentially needs to mutate something else. So there's a lot of applications with that. And of course that made me think what's next, right? We're able to now mutate and modify the incoming pod spec and put Flagdy inside of it, that's really great, we're able to enable users. But what about non-user facing workloads? What about system workloads? And so I started another project because I get quite bored easily and I thought I'd try another one. So this now, it's not quite ready for prime time, this is the API server in Kubernetes being rebuilt with Flagdy inside of it. So you can turn parts of the API server on and off dynamically. So feature gates, you can start to flick on. Now if you've been a Kubernetes admin before, you'll know that you have to restart the Kubelet and pass new parameters in to enable certain capabilities. It's not actually a code limitation, it's a decision to do that. And what I've started to go down the route of is building a project that will allow you to do that in the API server. And I think that's exciting because we can start to do upgrades, we can start to play around with capabilities, we can start to give developers access to features without fully committing. We can also go another direction as well. I've mentioned workloads from a persona of the management cluster, I've mentioned it from the application workload, but there's also another dimension and that's actually taking it further down the tree, down the stack into the infrastructure layer top of rack switches, DPUs, SmartNICs, starting to look at what needs network acceleration. Let's say I have something going on in the data path, I can analyze that and decide whether or not to put it through a network accelerated node. So putting flag D inside a scheduler could be something that could be quite interesting. And the evaluation system, which I didn't cover in a great detail, is getting fairly robust now. It also overlaps my previous talk about data operations. With this sort of stuff, you can start to build declarative configuration for different types of failure mode, right? So effectively when a type of request comes in, you can build failure modes with the configuration flags. And I don't think that this is ever really something that anybody anticipated with feature flagging. One final thought just on this is flag D also has the capability to contact the remote API of a provider. So if you have a feature flag provider in the cloud, flag D can connect through its HPS service and talk to their API. Hands in the air, we haven't matured this very much. We haven't specifically done a vendor API, but it's there, there are all the codes there. So what you could then do is you could turn a feature flag on and off in the browser and it restarts your load balancer or it modifies something in your cluster. And I think that's a very interesting prospect for democratizing who and what gets run experimentally on your cluster. In terms of our roadmap, we're here, right? October, November soon, Christmas. By next March, go on April, we're going to be looking at maturing the project even further. Our roadmap is going to be shortly released. One of the things I didn't mention is I've just joined as a fairly neutral party on our governing board to try and help lay out this roadmap because we want people to have the confidence that they can start building production grade infrastructure against this. I wanted to finally share some links here for those of you who are interested in getting involved. The website, openfeature.dev, the repositories, the community, please do get involved. It's only made as good as the number of contributors and people that want to help us. We've had a really great start. We've already got one production use case coming in fairly soon and we've got a lot of other folks who are looking at bringing it into their organizations. I thought we had a last slide. However, that is me and I'd like to thank you for your attention. Cheers. I think we have time for some questions. If you would like to ask any questions, I'm happy to try and answer them. Can you tell us anything about who is joined in Open Feature, any vendors or anything? Can I tell you who is joined in Open Feature? Any vendors, yes, I certainly can. We've had some diligent work done on our community repository already that outlines fairly well interested parties, interested contributors, folks who are already working on it. There are going to be, no doubt, some very familiar names on some of these projects, on some of these participants and we look to grow that. In fact, I think we had folks just approach us at KubeCon who are a feature-flagging and experimental feature company who want to build a provider for FlagD. If you want to build a provider or an SDK, we have all of this stuff inside of the Open Feature Community Project and please do check it out and ask us more questions. We're happy to help. Hi. Sorry. Thank you for the talk and the demo. You mentioned earlier that there's some complex schemas that were supported by Open Features and I was wondering if you could show an example of that, maybe? So, I wanted to also prefix this by saying it's not my expertise area, so people in the crowd from the project, I don't want to do a disservice to it. However, there is a schema that has been agreed upon by the community. It's being revised continuously. I will just outline a very high-level example of that full schema just for you to see. So, this is something that I know is starting to crystallize, but as I mentioned, things like evaluation content, doing fractal evaluation, those kind of things are in there already and I wouldn't be surprised if we look at doing further work in this direction. However, we do take it with a certain sense of, I think, responsibility. We look to open telemetry as well. We have to be very careful about making iterations on the schema once we've had people start to build towards it with SDKs and providers. But yes, again, it's under the schema's repository. We have issues, we all have issues, but we have issues if you want to participate. Also, if you're interested just in the architecture, we have OPEPs. OPEPs are open feature enhancement proposals. So if you want to get involved in the project, you think, you know what, I think you folks are doing a great job, but I think we should go this direction, get involved. It's a great way to keep things going. I have to answer your question. Thank you. Anymore? Okay, well, thank you again for your participation. Everyone that's flying out tonight and tomorrow has safe flights and thank you so much.