 Hi everyone and welcome to this CNCF on-demand webinar, an introduction to feature flagging and open feature. So in this call, I'm going to explain why as software developers and SREs, our software needs to be delivered safely. I'll talk about some ways to do this that the industry is already doing and how those techniques relate to progressive delivery. I'll explain how feature flags can be used as your secret weapon to improve your continuous delivery processes and why open feature is even necessary in today's tooling ecosystem. And of course, I'll give live demos of feature flagging with open feature. The first demo will be in a non-cubanetti setup and the second will be on a cubanetti's cluster. So who am I? I'm Adam Gardner. I'm a CNCF and Continuous Delivery Foundation Ambassador. I work for Dynatrace and I contribute to and maintain a number of open source projects. As you can hear from the accent, I'm originally from the UK, but now live in Brisbane, Australia and outside work, you'll probably find me in, on or under the sea because my favourite hobby is scuba diving. So enough about me. What is open feature? Well, I'll get into the details later, but for now understand that open feature exists to bring standardisation across the open source feature flagging tools and the commercial vendors that are out there. Feature flagging is useful for both day one activities, that is the actual deployment of the software and day two activities or the day-to-day running of the application. And I'll talk about both, but let's start on day one. Let's start with the basics, getting software to the end user. So you want to get that software to users as quickly as possible. So let's talk about some of the existing techniques that are currently used to deliver software to end users on day one. And the most basic way, of course, is just to set up a time of day that your website or service will be down. You tell your end users about this in advance and then when the time comes, you just send them all to a holding page. You turn off version one and then you turn on version two. When you're nice and ready, send everyone back to the website and now everyone gets version two. All done. This technique is simple and it's cost effective. It requires no or little additional staff training or tooling, but it's not so good for mission critical applications. You know, the really important ones that actually make you the money. When Google going down every time they wanted to do an upgrade, potentially it can also cause you headaches. Imagine that this is for an internal system, okay? And you update that internal system at 4 a.m. on a Sunday. That's fine and no one is using the system so there are no complaints. But when will you know about that first? Well, the first time you're going to know about that is Monday morning when everyone tries to log in. And now guess how many tickets you're going to have Monday lunchtime sitting in your queue. The second technique is called a blue-green deployment. With this technique, you spin up a complete replica of the environment, so of production, for the new version, for the green version. And when you're ready, you switch everyone from old to new, from blue to green. This has the benefit of being, you know, on paper at least, a fairly easy deployment method. But the big downside here are cost and complexity. To do this safely, you really need to run two complete copies of production. And that's easier said than done. And even after you've switched to the new version to green, it's common to leave blue running for a while so that you can switch users back if there is an issue. And of course, running all of that is expensive. You've at least doubled your cloud bill and then add on staff costs and time to build the new environment and figure out how to actually do this, and then build in any security or observability that you need, et cetera, et cetera. The other thing you might have noticed about these deployment strategies is their blunt instruments. Either everyone's on version one or everyone's on version two. Wouldn't it be nice to be a little bit more granular about who is on which version? And this is where canary deployments can come in. The coal miners used to take canaries, the birds, down the mines with them. The idea being that the birds were the safety net. Coal mining, of course, can release deadly gases and the birds were more sensitive to this gas than the miners. So if the birds felt the effects, the miners knew they were forewarned and they could get out of the mine pretty quick. So the canaries, again, were the miners' safety net. So what's that got to do with software? Why do we call it canaries? Well, a canary deployment acts in the same way. You're intentionally releasing version two to only a small subset of your users. For example, one or 2%. You observe that user group extremely closely and then if anything happens to that group, yes, they are impacted, but you now know that and you now know that you cannot roll out to other users. On the other hand, if nothing goes wrong, you know that you can roll out to a slightly higher percentage of users. For example, 5%. You repeat the process until, of course, hopefully everyone is on version two. So canary releases can limit the blast radius of changes and with that, you're starting to progressively deliver your software. That is, delivering software in small controlled increments. The advantages to canary releases are that you're testing in production, you're testing with real users and real traffic and real infrastructure. The users are using the system the way real users do. You can do true side-by-side comparisons of version one and version two. Canary deployments can be lower cost than blue or green because at the end of the day, you've only got one production environment, but having said that, costs can easily spiral with canary deployments. You need extremely thorough observability to make sure that you can catch anything with that subset of canary users and that observability is not an easy thing to get right. You'll probably also need to buy or build additional tooling to actually do the canary deployment releases and then you'll have all of the hidden costs like staff training and enablement on all of this new technology. To enhance canary releases, to take it one step further, wouldn't it be nice to have a way to be even more granular about who is targeted? What about only the testers getting the new version or only the customers who want the latest and greatest and maybe have signed a disclaimer accepting that potential risk? What about people in a certain country? I think I heard a story that Facebook roll out their new features to New Zealand first because obviously they're first in the world, it's a small country, so it's a natural canary. With traditional canaries, to do that, you need extra infrastructure, maybe a service mesh or a load balancer that's able to inspect that layer seven traffic and make those intelligent decisions based on some value in the traffic. Then though, that increases cost and complexity. So if standard deployments in blue, green are the hammer and canaries are the knife, then what are feature flags? Well feature flags can be your scalpel. Feature flags give you the ability to do that cool advanced canarying layer seven traffic routing stuff without requiring new infrastructure or redeployments. So there is a mindset shift because unlike the other deployment methods, you don't deploy and run two copies, two versions of the code, you run one copy and you deploy the features and all new features are hidden behind feature flags. So unless you explicitly enable the feature flag or enable the users to access the feature flags, then the feature itself is hidden. Now feature flags also live outside of your application and so the changes to those can take effect immediately without a redeployment or a restart. And if you think about it, using feature flags, you've gained the ability to do canary deployments almost automatically. Feature flags can also help on the day two operations. You know that kind of day to day BAU running of the application once it's actually in production because if the SREs find an issue in the application, they can reconfigure the feature flags again from outside to change the behavior of the application at run time. And you'll actually see this later in the demos. Now if there is a downside to feature flags as that they do need to be put into the application by the developers, feature flags are not something that can be bolted on after the fact. Feature flags are then a software development concept that allows you to enable, disable or change the behavior of a feature in the application without modifying the source code or requiring a redeployment. So what does the architecture of a feature flag enabled application actually look like? First of course, you need a feature flag provider. This is the source of your feature flags. It could be something you developed in-house, an existing open source project, or it could be a paid for vendor. It could be as simple as a database or it could be a JSON file all the way up to a full platform with analytics, user authentication and so on and so forth. When you have your back end, your application code then calls the provider to ask for the value of whatever feature flag it is that you need. Your application could also send contextual information such as the user's location, their email address, their loyalty tier or whatever is important to maybe change the decision of the flag because the feature the flag provider can then use that information to change the value that's sent back to your application. So for example, imagine a new feature that's disabled for everyone by default. It is released safely because it is disabled. By default though, your flag definition contains a rule that says, well, if I get contextual information that the user is in the testing group, instead of sending back disabled, enable it for them, send back enabled. And that way the testers can access the feature in production, but only they can. And that sounds perfect, right? But I mean, again, we live in the real world. So let's scale that up to a realistic size. So now you have multiple services, multiple teams in your organization. So imagine the best case scenario here. The business has mandated that all of the teams in this organization use a particular feature flag provider. And that would seem to be problem solved, right? Well, what happens if the business decided to change the feature flag provider? Now every team has a huge job to recode all of that plumbing in their application and redo it for whatever new provider the business decides they're going to use. That's, of course, a lot of time and money spent just to move providers. Now imagine a more realistic scenario in which each team has kind of organically adopted the feature flags that they use. And they've all decided to use whatever provider they like. At the time when they adopted it, developers from one team can't easily transfer to another team. They need to learn that new feature flag provider and the ways to interact with it. More importantly, nothing's the same. There's no standardization on how feature flags work, what functionality they have, the providers, the support, or even the languages that are supported. So really what's needed here is a new layer in between the application and the provider, this sort of magic API layer that would standardize access to the back ends. And if we had that, we can say, you know, set the provider to tool A and then go get me a feature flag of Foo. And then this magic layer would be able to figure out the how, i.e. translate the, you know, into the provider code. And then each team for their application has a standard way to use feature flags. The teams can experiment with different providers until they find one that really fits their requirements. How could you write such a layer? Yeah, yeah, of course you could. But again, it's time and effort to write and maintain it. And also, what I've just described is the open feature specification. So at this point, you're probably also thinking, yeah, well, that's nice, but I've got environment variables, I'm absolutely fine. These are working perfectly. And environment variables are primitive feature flags. Configuration can't be changed without a redeployment or a restart or at least a recompilation. Environment variables also have no contextual awareness. So they have little to no meaningful telemetry data. So, you know, if you change an environment variable, judging the impact of that is hard, if not impossible. So how? I mean, you could and probably will decide to extract the environment variables to somewhere outside of your application. For example, you might start by storing them in a database. This is a step in the right direction, as it can also add contextual support. But your application code, as you can see, still contains that hard coded knowledge of those conditions. So in this example, you're stuck with checking users' locations. And without a recode and a redeployment, nothing else. So moving all of that logic out into a feature flag provider like this gives you really advanced capabilities. Now experiments can, you know, be run and deployed and updated independently of the application code. And it's also fully observable. So obviously open feature is compatible with open telemetry. You can see traces, spans and metrics for which flag variants were used, why they were used, when they were used, and more crucially what impact they had on the end user. So here are some other scenarios that feature flags can target. What if the code is ready, but the business is not? How do you ship the code and move on without holding up your other teams? What if the feature is implemented across multiple services? How do you coordinate the deployments of those services? What if you want to enable the feature pseudo-randomly to see how it impacts, you know, real world usage or performance? So now that we have a background of the why feature flags, you know, why feature flags are so necessary, what exactly is open feature? Open feature is a CNCF sandbox project. And it's an open standard that provides a vendor, a tool agnostic API for feature flagging that works with your favorite feature flag management system. Open feature is not a feature flag provider. It is the standard to which feature flag providers conform. So here's an example of an open feature code snippet to retrieve a Boolean value. Notice the first line, we tell open feature about our provider, and then we request that open feature retrieves a Boolean value from our provider. The provider is the translation code which transforms that standard getBoolean value, which by the way, never changes regardless of which provider you decide to use in the future, into the tool specific API calls. So don't worry that the tool maintainers and the vendors write the provider code. So that part is actually already done for you. You don't even need to know their APIs. So what does all of this mean? Well, trying out a new provider is as simple as changing that first line of code. Second, you write your flag retrieval code once, the getBoolean value, and you never have to touch it again. It doesn't change based on which provider you use. And third, every team now has a single, common, well-understood, and well-specified way to use feature flags. So talking of the tools and the vendors, open feature is being developed with the industry. It's not like open feature is coming along and saying, this is how you do it. What that means is it almost certainly already works with your favorite tool or vendor, and even if you have a homegrown flag provider like a database, it's easy to add support for your homegrown solution as well. So it's time to see open feature in action. And for the first demo, I'm going to be using an open source, open feature compliant flag provider called FlagD. So FlagD is a tool that implements open feature. FlagD is not open feature. FlagD comes as a binary and docker container, so it can run as a service as I'll show you in the first section, or as you'll see in the second demo that I do, you can run it as a sidecar in Kubernetes. So FlagD reads one or more flag sources, that is JSON files or YAML files, and makes those flags available via an API. So in this demo, I'll show you how feature flags are read at runtime without a redeployment of the application being necessary. I'll then show FlagD reading from multiple flag sources, in my case, JSON files, and then I'll extend that basic flag to include contextual information. The flag is reconfigured to return one of three values, triple A by default for all users, triple B for any users where we pass an email ending in at openfeature.dev, and triple C for any users using the Chrome browser. Finally, I'll show fractional evaluation, and this is useful if you want to pseudo-randomly assign requests or users to different buckets. In the demo, any users with email addresses ending in at fais.com, or fictional company, they will match the rule. So those users will then be pseudo-randomly assigned into either red, blue, green, or yellow buckets, but whatever, importantly, whatever bucket a user is assigned to, they will always get that value because you wouldn't want someone, you know, re-logging in or refreshing the page and seeing a different color or a different logo or different content each time they refresh the page. So the second demo will be Kubernetes native deployment of FlagD. It will use the openfeature operator, and its job is to watch for custom resources and ensure that the right pods have the right values dynamically and at runtime. The openfeature operator makes these values available without a restart of the pod and thus without any downtime. This demo scenario, this UI, this web page will be available for you to experiment with. The user's killer coder. So thank you to killer coder for providing the infrastructure for this. So as you load this, everything starts, everything kind of installs in the background and you get GitE, which is a Git repository. So we've preloaded a repo in here with some flag values. So this JSON file is the format in which FlagD expects flags. So you can see here we have a number of flags, one called MyBoolFlag, one called MyStringFlag and they just return different types of variants. So variants are the possible results. And if we look at MyBoolFlag, you'll notice that the default variant is on, meaning everyone right now gets on. So however you request MyBoolFlag, you're always going to get on returned. These names, MyBoolFlag, MyStringFlag, that's up to you, that's your flag key. If we scroll down and look at something a little bit more complicated, we've got a flag called fibalgo. It has possible variants of recursive, memo, loop and bnay and of course the default variant is recursive. So you can read that as there are a number of possible results when we evaluate this flag, but right now by default, everyone is just going to get the recursive string. Now our application will use that to use the recursive algorithm for Fibonacci calculation. But there is a targeting rule here that references a rule set. It says if this email with FAAs matches, if it's true, then the evaluation returns the bnay variant. Otherwise if that evaluates to false, it doesn't return anything, no, so we drop back to the default variant. So what is that rule? What is email with FAAs and how do we specify it? So if we scroll all the way to the bottom, we've got this email with FAAs and it basically says well if the var is email and if the value that we pass at fAAs.com is in that string that we pass, then it's true, then the rule is true. So you can read that as our application is going to take the user's email address, pass it back through open feature to flag D. Flag D will then use that contextual information and if the email ends in at fAAs.com, those users and those users only get the bnay algorithm, but everyone else remains on the recursive algorithm. So that's the way you can start to kind of canary your deployments. So let's see that in action. Let's start flag D in tab one and I'll open a new tab, I'll leave that running and I'm just going to test that flag D actually works. So I've just done a post to the flag D endpoint, notice it is on port 8013, I've asked it to resolve a string and I've passed it the header color flag key and I haven't given it any context and so the value has come back as red and the reason is default and the variant is red. So if we look at our flag definition, we can see what's going on in the back end. Well we've got a number of possible variants, red, blue, green and yellow. The default variant is red, but we've also got a targeting rule and again it's that email with at fas.com. Well if you notice the context here is empty, so I didn't pass an email, so of course I get the default variant which is red and that's exactly what we want, that's the safe default. So now let's change that to something different. Let's change that default variant to yellow. I'll commit the changes and then I will re-request the feature flag and now of course we get yellow as expected. So in a real system you could have multiple sources of the flags, you might have multiple teams or multiple developers working on different bits of their code and they have their flags and you have yours, how do you have flag D, read both of those sources of flags? Well that's possible. So what I'm now going to do is create a new file with a new flag called brand new flag and write it to disk. Now I'm going to attempt to retrieve that new flag value from flag D and of course this should fail because we haven't told flag D about the new file yet and of course there we are, it does not found flag D error flag not found as expected. Now what I'm going to do is restart flag D, so I'll flip back to tab one and I'll restart at pointing at both files. So I've just repeated the URI command, one pointing to our existing flags stored on a web server in Git and the other on a local file in temp local flags.json. So now I'll go back to tab two and retrieve that new flag again and this time we do get a value, we get A, the value is this and the reason is static. So that proves that flag D is actually reading both of those flag sources and making them both available via the API to me and as a developer I then don't have to care where my flags are stored. Okay, so that was a very basic set of flags, they were pretty much almost environment variables they were on or off apart from the targeting rules. Let's look at something a little bit more complicated now. In the flag.json file in Git you have a flag called targeted flag. Again we've got a number of variants first, second and third and the default variant is first. But now we've got a couple of different rules, we've got a rule that says if the email if the variable email is passed and at openfeature.dev is in that string then those users get the second variant so BBB in this case or if email isn't passed but user agent is and Chrome is in there then those users get third and everybody else just gets the default variant. So you can build up very complicated layers of how things work here. So let's curl this and by default of course we get AAA the default variant. Now let's try it passing the context with email as the key and me at openfeature.dev as the email address and you can see it's a targeting match because we get BBB and we get second. Now finally let's do Chrome again user agent is Chrome123 and we get CCC. So now let's look at the thing I talked about earlier the pseudo random evaluation. If you look at the header color flag again we've got some variants we've got a default variant that previously we set to yellow and we've got some targeting rules that you have already seen the email with FAS.com but this time we've got a fractional evaluation and what it does is it takes the value of the email address that is coming through and then 25% of those matching email addresses get the red variant online 90 25 of them get blue 25% get green and 25% get yellow. If you keep sending the same email address i.e. the same you know string they'll always get the same color they'll always get red or they'll always get blue so let's see that now. So I'm going to pass user one at FAS.com now I don't know which variant this is going to get but it will always now that it's got red it will always get red no matter how many times I run this command I'm always going to get red if I pass user two at FAS.com I randomly get green and I'm always going to get green and so on and so forth now just to prove that the default variant i.e. if I don't pass a context we're back to yellow remember that was the default variant. So there you are by by mixing and matching those rules you can create very complicated logic and that logic actually lives outside of your application at runtime so you don't need any restarts or redeployments as you change this stuff and and most feature flag systems allow things like targeting rules most vendors allow targeting rules and and contextual evaluations like this so yeah that is the first demo and now let's jump into the second demo which again uses flag D but in a Kubernetes environment. So here we are looking at the online demo the links will be in the video description so you can play around with this so we have a Kubernetes cluster with the open feature operator installed so what does the operator do well here is our demo application and you can see Fibonacci as a service is pretty slow it takes about four or five seconds to do our business logic to generate the number that the customers need so let's see how we can use feature flags to test out our new algorithm for the open feature operator will wait for and read any custom resource definitions that are defined as feature flags so let's have a look at our feature flag as it stands at the moment so we have a custom resource called feature flag configuration and this one is called end to end now within there we have a number of feature flags the first one is called new welcome message the second one is called hex color and so on and so forth so looking at the feature flag that controls the Fibonacci algorithm that is in use the demo scenario talks about a slightly different use case and you can see that it's actually targeting users with any at fas.com email so we're going to change that now just to target sue remember in our scenario sue is the only user that should be able to use the new algorithm initially okay so the way to read this Fibonacci feature flag is that it is enabled and it has a couple of different variants these are the possible values that could be returned to our application that our application can then use to perform its logic so we have the recursive algorithm we have a memo a loop and a binae algorithm now by default everyone gets the recursive algorithm however we've added an optional targeting rule to say if the email address that we pass from the front end is sue at fas.com then sue and only sue gets the binae algorithm by default otherwise if it doesn't match we are back to the recursive algorithm what this means is all users will get the recursive algorithm except sue so let's apply our updated feature flag now as a logged out user I should still get the slow version that's safe that's the safe default that's exactly what we want however if I open a private browser window and log in as sue at fas.com I now get the quick version so sue is now able to experiment on her own with the new features without impacting anyone else and remember this is in production now let's extend our scenario to add bob into the allowed users to do so we simply add another line here reapply the feature flag and now sue remains on the quick algorithm opening yet another incognito window and logging in as bob shows that bob gets the new quick algorithm but if we log out of bob and log in as Ian Ian should still get the slow version and indeed he does so we're starting to progressively roll out our software in a safe way over time more and more users will be able to use the software that without impacting other users that we don't want to so now imagine that we finish rolling out to individual users or teams and we want to just roll out to anyone with an art fas.com email address update the flag apply the changes and test the configuration now anyone logging in with an art fas.com email address should get the quick algorithm and anyone with another email or who is unauthenticated should still get the slow version unauthenticated users still get the slow version that's good bob still gets the quick version that's good and now log in as sarah and because of the catch all rule sarah should get the new version fantastic we're in complete control at runtime of who is able to use this new feature and who can't eventually we're going to be in a position that we want to roll out this to all users not just selected or internal users and that's great that's easy to do to do so just remove the targeting rule and change the default variant from recursive to binae save the flag reapply the changes and now even the unauthenticated users will get the fast algorithm doing progressive delivery in this way shows that we can roll out and roll back in case there is an issue safely and at any time one other thing to point out is that at no time did the pod actually need a restart you can see our pod has is still running and has been for 16 minutes so there was absolutely no downtime while we made all of these changes so just to summarize you install the open feature operator that installs flag d you create your feature flags according to the custom resource definition specification and then you annotate your application with two new annotations openfeature.dev enabled and openfeature.dev forward slash feature flag configuration and then the feature flag or feature flags that you need to use in your application so there we are an introduction to feature flagging and open feature if you want to get involved the project would love to have you flag d the tool you've seen demoed today and open feature both live on github and at openfeature.dev the project is also on the socials on twitter and linkedin and the open feature channel of the cncf slack open feature is a cncf sandbox project and believe me there's still plenty of work to be done and plenty of voices that we are yet to hear from and want to hear from so do join the community get involved in the discussion and let's push open feature forwards together