 Hey everyone, I'm Adam Gardner the the session today is building progressive delivery and safety Into your applications With with feature flags, so I'm assuming because you're in this room. Most of you are app developers Not yeah, cool Whoo Like that energy nice All right, but who am I? I'm a dev rel I work in the dev rel team the developer relations team diner trace Getting some feedback now I'm also a CNCF and CD Foundation Ambassador so I've spent the last decade or so in the observability Space obviously being with with diner trace, but that encompasses a lot. It's it's a wide field so Quick show of hands. I know nervous How many of you have heard the term progressive delivery? Good good about 20% I would say yeah, maybe not that all right. So that that's important I mean because Different people have got different ideas different open source projects will talk about it in different ways the different vendors Cop that term to mean slightly different things. What do I mean for the for the purposes of this talk? Is any tactic technical procedure so any way? basically That enables a software system an application a microservice to be gradually enabled or disabled so if you have a way to Not go from version one a hundred percent to version two That's that's all I'm talking about with with progressive delivery any any strategy that that enables that to happen and really that is a corollary for control and precision because if you've got that You can control the rollout and the rollback and you've got that precision to target who you're rolling out to So how how do you do that? A lot of it? relies on duplication Basically, so you you duplicate things and then you figure out a way to point people to to this or or that effectively so Reconfiguring, you know even on VMs reconfiguring load balancers reconfiguring DNS, you know, this isn't cloud native especially We've got blue-green deployments where you spin up, you know Version a version B version one version two and you just send the traffic to the left or the right Canary deployments where you're saying well, you know, I'm on version one and I want to pick a subset of users that maybe they're in Australia where I'm from or maybe they have blue t-shirts on or whatever your criteria is and you only send You know the new version to that set of That set of users you obviously observe and you make sure that it's healthy and then you Increase the size of that cohort so over time obviously if you continue down that path Your version two becomes the primary version and you repeat the process Client metadata strategies are another common way So here I'm talking about Leveraging things on the client side and I don't just mean the browser. I mean any client So but things typically like header values Like the user agent so you might want to send Chrome users to a version and Firefox users to another version using cookies using geolocation and and pretty much any other aspect that you want that You know you can you can pick out and really Choose how you're targeting that that cohort of users And then you get more into the marketing type Ways of doing things so a B testing but also multi-variate testing so these days, you know You don't just have two variables You've got hundreds of different capabilities and Things and combinations and the marketing team love this because they're going to come to you and say well We're doing this new rollout this new campaign and actually We've got 10 versions of the page We've got a button over on the left a button on the right and then two more versions where the button has different Text and then we've got a page with no button and you not only have to code all of that But then ultimately it's got a got a run and it's you've got to get the statistics about Is it healthy? Is it working? Is it converting and so on and so forth? and Then we've got feature flags so all everything else on this page is You know you can do it might not be nice to do but you could do it with duplication and then kind of separately we've got feature flagging and Obviously lots more of lots more techniques So at first glance you might think well, it's it's this versus this It's it's the tooling aspect of using something like flagger or Argo rollouts and any other tooling that you care to pick or purchase and Then you've got feature flags and they're they're different and I need to pick a road and stick with it and You know, I'll hang my hat on that but actually it's not quite that simple because The tooling is great for big broad bold changes So think about changing the underlying container image the the version of The JVM the version of Python, you know something that is big and actually logically you can't run To in the same Container it doesn't make sense. So the tooling aspect is great for that because you've got fit not physical, but You get my point. You've got concrete duplicates Where you can really can compare in in the observability data the statistics Well, is a OB did it make any difference? Hopefully not. Hopefully it got quicker Who knows we'll see in the data Feature flags are a little bit different They have you can bolt on the tooling by the way So as app developers you can write your code just like you have And you just pass it to whoever runs it and they provide all of the tooling They provide the service measures and the ingress rules and you know the capability to send That data from A to B the feature flags you must develop in your application So what that allows is actually if you hide your new features behind those flags You can decouple your deployments The kind of act of getting that that thing running from the releases from actually You know having users use the thing Feature flags are great for small dynamic changes that you might want to make And they're very very good for looking at environmental conditions as we saw before Things about the user things about the client so there are Definite use cases for both and if one of the things I want you to take away from this talk is that it isn't a Versus it's an aunt So they they can and should coexist and they work best when you When you when you use them and leverage them together So going back to the tooling. This is how tooling in this case flagger This is how flagger works. So on the top there. You've got your primary. That's version one And then your let's say you're sending a hundred percent of your traffic to version one because that's the only thing running The tooling will then spin up The experimental version the version to the canary we call it and Then flagger or you or the DevOps people or the SREs will define the rules So you say 10% of my traffic goes to the canary. I'm gonna test it. It's healthy Okay, now I'm gonna shift 20% and so on and so forth And that's what that we all love Yammel who who doesn't love Yammel That's what flux looks like. So so you're saying well every minute we're gonna check The health and then we've got thresholds and weightings and basically you're trying to get to 100% so that's that's what You know the flux looks like and a similar thing with Argo where we're saying okay each step We're gonna progressively roll out more and more traffic and actually we're gonna wait certain duration So the first one because there's no parameters there That's a manual weight someone actually has to go in and click go But then after that we wait 10 minutes We progress to 60% of the split 60% of our traffic. We wait 10 hours and these are obviously arbitrary and then up to 80 we wait an hour and at any point if your metrics Signal that that the canary the new version is unhealthy. You can trigger the rollback feature flags though just in case you're not You know fully across what feature flags are in the use cases that they can really target I've seen them used for feature toggles. Obviously as we're talking about testing and production So if you can roll out whatever you like as an app developer if you can write your stuff behind a flag and Say to people don't worry. You can deploy because I know it's disabled That's safety in itself and actually it allows you to do whatever you want Because only when they're ready, can you actually enable that feature kill switches? I've seen it used for Mitigating temporarily DNS Sorry DDoS attacks, so we detect we flick the flag and all of a sudden you're being served from the CDN Rather than the life the life page It's not the fix of course, but it's buying buying your company some breathing breathing room user targeting we've talked about Security mitigations, so if you're using a library and all of a sudden, you know next week There's a vulnerability that appears in the world if you can have a feature flag to toggle that functionality off or Fall back to a different way of doing that thing You're actually making your app more secure because the first thing you're going to be asked is Are we exposed? Yes How often how quickly can we mitigate this? Well instantly just toggle toggle the flag off. I don't know if you've heard But large language models and chart APT is pretty big news these days And I don't think you're expecting to get through this conference without hearing about it So the other there you go. I get the bingo if you haven't already heard this morning Structured content for LLM So so they like content in a certain way so that they can you know crunch it and do what they do which no one understands but You can serve the normal good-looking content to humans and then the other content the stripped-down version to the bots or Google bot or Chat GBT Let's get some code. So this is what a feature flag looks like you typically start your journey building something in-house This very simple is a get request on the home page We're getting the flag value called foo and then we're using that. So it's the most basic feature flag you can have It's effectively a boolean the problem here though is that get flag value because that code there is You have had to write that and that is the integration code that you're reaching out to your flag system Whether you've built it or bought it So what you've actually done is you tightly coupled Shiver we don't like that you've tightly coupled your application to whatever the flag Back-end you you you're using in-house or paid or an open source solution So how do you solve that? Well? We? Rip that code out effectively. It's the same code But we wrap it in an interface. We wrap it in an open feature provider Basically, it's the same code, but open feature is a standard. It's it's Yeah, it's a standard that has a very well-defined interface and So your connection code goes into an open feature provider pretty much the same code But then you can just call get flag value So what that actually looks like is you say Get me an open feature API set the provider to Let's say you've built something in-house That's that you would create the provider and that would be you setting that provider and then as an app developer You do that you say open feature client get me a boolean get me a string get me a You know integer whatever it is you don't care day to day where that's coming from because there is a layer effectively in the middle and So what you've really done is you've moved the padlock that the integration back in the layer to the provider So now your app is completely decoupled from the back end provider at dinotrace for example because We're assuming that everything's nice and you've got one feature flag system But let's be honest. You don't these things grow independently and I think we've got Six or seven different feature flag providers and vendors at dinotrace, so Hence we needed open feature So the next question probably is well who creates or maintains these providers? Obviously if something you've developed in-house, it's you if it's open source then the open source community the project Will actually create that provider so you don't even need to care how you're connecting to the back end you just say Set the provider if it's a paid vendor, of course And they are here at kubecon They will write the provider demo time enough slides. Tell me the Wi-Fi hasn't died Come on get up. No good joke. Yep. I will do increase the form So what you're looking at is a very very basic demo It's the open telemetry demo the dice roller So if you want to instrument things with open telemetry you go on their website and they've got this demo application that Basically creates a number from one to six I've instrumented it to send the open telemetry traces to Jaeger and I've also Feature flagged this so we can see the impact that the feature flags can have so There we are. We're running everything's nice and quick to regenerate my my numbers and if I go into Jaeger we can see We have an open telemetry trace of the the action So looking at the the details that the span attributes on the trace We can see the feature flag provider and I'm using an open source provider called flag D It's a you can run it in various ways. Cuban is operator a standalone Linux binary I think they've even working on in-process evaluations now, but We have our feature flag key called slow your roll and the value the variant is false at the moment So there I'm saying I don't want to slow this down So it's fast. It's 74 microseconds So now let's look at the flag definition Big enough. No, it wasn't like this Okay, so this flag is called slow your roll as we've seen from the trace It's got two possible variants on and off as I say this is the most basic version And the default variant so what everyone is getting right now is off So now if I just turn that to on You'll notice we get the spinner And it's slow. It does work, but it's slow going back into Jaeger Now we've got a trace that's two seconds long and again drilling into that We can see the slow your roll flag is true So the real benefit to feature flags here is that I didn't need to restart anything redeploy anything it just happened at runtime because flag D is watching that file that source of truth and affecting my application at runtime So that's what I mean if you have a security issue You can just toggle something and then and then it just works kind of Let's make this a little bit more realistic and a little bit more advanced though So flag D and most feature flag vendors give The option of targeting so I'm going to reconfigure this flag or redefine I should say you can find the right tab and What's happening now is it's still slow for me? But if chat GPT Came on this website and hit it it would be fast for chat GPT Why well because of the feature flag definition so very similar we've got The same flag it's still enabled it's still got two possible variants and the default variant is on this time So by default the website will be slow But we've also got a targeting rule Now targeting rules all of the feature flag vendors will will do this But this is how flag D Implement and it's JSON logic. So what I'm saying here is well if the user agent that I pass Contains GPT bot which is part of chat GPT's user agent then It's off Otherwise it's on of course So what you've done there is you've taken some of the logic that you would need to build in your code and Put it outside. So now actually as an app developer That's one less thing for you to worry about because if if the dev ops of the SREs need to redefine anything They can do that as JSON logic You've you've shipped your you've done your job, you know, it's up to them to decide who gets what versions The question then becomes well, how how did we get all of this stuff and how do we pass this contextual information? Am I only limited to user agent? Why why user agent? Well? No, actually, so if I show you the code So we start off by setting the provider which by now you'll know is flag D I'm running flag D is a as a binary so I'm passing it local host on a on a port That's the only time I need to care about where my flags are coming from then in the actual code. I can say Here's a new evaluation context and I'm gonna pass the user agent and I'm gonna grab that from the request that comes in and Then when I evaluate the Boolean or I ask open feature for Boolean I Ask it for the key the slow your roll and I pass it the evaluation context that I've just built and I just shove that off to Flag D and flag D can then do whatever it likes With the evaluation context. So this is the glue. This is how you get user or front-end specific things into your back end So what if you wanted to change to a paid for vendor now or the only bit you need to change Are these three lines where you set the provider? And you you connect to a different back end provider Because everything else is handled by open feature and because it is that standard The vendors and the open source projects are agreeing To to to give us back The data in the right format and there we are Uh any Do we have time for questions? Nope, I've waffled on enough. Well, thank you very much. Come grab me. Um, the open feature booth Will uh, we'll be there We're an open feature on the cncs slack channel and openfeature.dev very very quickly. I've got four seconds left The two the tool tooling and flags are complementary You can pick both Open feature is future proof. So by doing that and by putting open feature in the middle You're saving yourself work in the future the wheel exists Don't go and rewrite the feature flag engine use an open source project buy a vendor or use flag d to get started As you've seen the observability is essential because if you can't tell why one thing's slow and one thing's fast you're kind of lost and also, you know Ask the vendors about this they provide the the open feature providers So so ask them if they have a provider in in the language that you code in Thank you very much for your time