 Hello! And welcome to another ArgoIoGoo completion presentation, this time about ArgoIoLauts. So can I see hands, how many people know what is ArgoIoLauts? Okay, how many people are using ArgoIoLauts? How many people are using ArgoIoLauts in production? How many people are using ArgoIoLauts without ArgoIoShington B? You're the best. Το μου name is Kostis, I'm working at CodeFresh as a developer advocate. I'm also working with the ARGO team, mostly with ARGO rollouts, of course. And I'm also the co-author of the first ever GitHub certification, that you find in the link there and in the QR code. I have to say some things about CodeFresh. It's an enterprise solution on top of all the ARGO projects, not just ARGO CD. And if you have seen the news recently, OctopusDeploy said, oh, they are doing some nice things, let's acquire them. So what we are going to talk about today. First, we are going to explain the problem and why it's a problem. And then I'm going to tell you about some things that you might not know, a Kubernetes download API and ephemeral labels. And then we will see a demo because this is why you are here. Yes, and hopefully it will work. So let's start with the basics. If you have never seen ARGO rollouts before, this is a two-minute introduction. And I will talk about stateless services first. If you get ARGO rollouts, you get a Kubernetes controller that gives you access to progressive delivery. So you can do stuff like blue-green deployments, where you deploy a second version of your application. Nobody is touching it. You can run some QA tests, some smoke tests. You can give it to your favorite friend. They can test it, and then once you're ready, you say, OK, now promote, and you are off the races. There is also the Canary deployment, where you do a similar thing, but instead of having a single point where you do a switch, you gradually send traffic to the new version. And as soon as you gain more confidence, you see it's more traffic until you reach 100%. Another thing that you might not know is that ARGO rollouts has built-in support for several traffic providers, but until recently, you had to request a provider, say, I'm using NZnext, HIProxy, Solo, whatever, and they had to add support for this in the project itself. Now, there is a plugin mechanism in ARGO rollouts, and I'm also involved in the Gateway API plugin for ARGO rollouts, which means that from this point onwards, you don't need to wait for the project to add support for something. If your traffic provider supports the Kubernetes API Gateway, it will be automatically supported with ARGO rollouts as well. So if you evaluated ARGO rollouts some time ago and you said, oh, it's not supporting my traffic provider, you need to re-evaluate again. Now, as part of my job, I always stay in the CNCF Slack channel and I follow the ARGO rollouts Slack channel there, and I see all the questions people are asking. And one of the most common questions was, OK, ARGO rollouts worked great for a single application, but I have two applications. Usually, it's a frontend and a backend, and I want to do progressive delivery for both, and I want my new backend to talk with a new frontend. So this was a very popular question, and I did a presentation last year for this. So if you have this question, you can either read the blog post and watch the presentation again. So that's the intro. Now today we're going to talk about using ARGO rollouts with stateful services, and because stateful services mean different things to different people, essentially it's this scenario. You have an application that is not an HTTP application. Maybe it doesn't have an endpoint at all, like an HTTP endpoint. The classic example is you have a work here. So you have an application, you launch it, and then it goes to a queue, read some stuff, run something, and maybe it saves the result back to the queue. Or some people abuse the database as a queue, so you read something from the database and then you store it back. And instantly you will see that ARGO rollouts knows nothing about this connection. It's not something that you do with a traffic provider. So you say, okay, that's great. I will use ARGO rollouts with my worker. You, let's say, select a blue green deployment. You start the new version. Instantly it goes to the production database and starts reading tasks. And almost always this is not what you want to do. You want to start the new application and leave production untouched and maybe run some tests, as we said, in the new application until you are certain about it. So people saw this and they said, okay, ARGO rollouts doesn't work with stateful services. And I know this because, as I said, I monitored the Slack tunnel and this has been one of the most popular questions this year. So people are asking how do I use ARGO rollouts with stateful applications. There is even a whole discussion in GitHub and they all ask the same thing. I have an application which is not following HTTP. It doesn't have traffic that my traffic provider supports. What do I do? And until recently they said, okay, I'm not going to use ARGO rollouts. But you shouldn't do this. You should use ARGO rollouts even for these applications. So how do you solve this? The simplest scenario is you say, okay, I have my application running. It's using my production database and now I'm going to start the new instance and I will create a brand new database or a brand new queue. It doesn't matter. And this will be just for the new version. So the production version will not see anything at all. Live users will continue looking at version 1.0 and they will still use the production database and your new application will go to a new database just for that. So not only production is not affected by what you're doing, but also you can take this magic preview database and put, let's say, test data, some custom scenarios that you want to test. So you also have an easy way to give custom test data to your new version and then run smoke test, QA test, manual test and then decide and say, okay, now this is ready to go. So how do we do this? We can do this scenario pretty easily with some components that I'm going to talk about. The first component is the Kubernetes downward API and this is an API that you get for free with Kubernetes. It's unrelated with Argo Rollouts and essentially it's a very nice feature where you can put some special labels on your deployment or your pods and then these labels can be mounted as the file system in the pod. You can also use environment variables but it's not a very good solution and we'll see why. And essentially you can say take these labels and mount them in your file system in any path you like. Here I have an example at ETC pod info labels and then your application can read the labels like a normal file. So the application doesn't know anything about Kubernetes. It doesn't even need to know that it's running in Kubernetes. It just reads a configuration file and you have passed the configuration file via Kubernetes. So we can use this scenario with our problem and say we will pass some special labels to our application and we will tell the application whether it's running right now in Canary or not making the application smarter and we will also have our source code read those labels and then the source code can adapt according to the situation it is running on. The Kubernetes API, as I said, supports also environment variables but we are not going to use them. Stick with just files. So that was one component. The second component is Argo Rollouts and essentially it's a capability that Argo Rollouts give you when you define your Canary or your bloomed deployment you can tell to Argo Rollouts hey, while the Canary is running the active service should have these labels and the preview service should have these labels and these are managed automatically by Argo Rollouts like once the Canary is not running the preview labels are not there once it's running they are added by the Argo Rollouts itself. So now you can imagine if you put all those components together we have the magic sequence where we have the application it's running in its initial state it's using the production database and we have told it via labels that hey, right now you're running in production then we say ok, let's do a bloomed deployment we launch a new application and for that application we will tell it via labels hey, you're not running in production yet so go look at a special preview database that they have just created just for you then you will leave production untouched so at this point you can post the Canary like normal, do whatever you want to do with the new version production stays unaffected live users don't see anything strange and then at some point you promote the Canary or the bloomed deployment and you say ok, I completely removed my original version it's gone and then for the new version hey, from now on you're running in production so change the way you access the database or the queue and then from that point onwards you're back to square one where you have a new version as a stable version so from this diagram you can see that the first part is kind of easy but the magic moment is in the third square where you say to the application hey, your role has changed do something different now you're not in Canary mode anymore now you're in production so we need another component for this which is auto-reloading of configuration I've seen a lot of applications where they have this simple behavior you start up as an application you read the configuration and then you're done that's not the correct way to do it the correct way to do it is to read the configuration from files you should of course have configuration files for your DB or your queue and then you need to monitor the files and understand if they have changed and then auto-reload the configuration and unfortunately or fortunately you need to talk with the developers about this so I know that sometimes you think the developers don't know what they are doing and we are the best no, you should talk with the developers tell them what you need actually you should tell them that this is a good practice to have anyway even if you're not using Argo-Rollouts it's a good practice to have your application understand when its configuration has changed and auto-reload this and because I've made this discussion with developers I'm giving you today the ammunition and I can tell you that there is a library for their favorite programming languages that does exactly this so if they tell you no, we don't know how to do that you're going to ask them oh, you're using Spring there is this refresh scope you're using Ruby there is this thing this thing the demo that we are going to see today is with Golang application so it's very easy in most cases just one or two lines of code where you say to the application hey, load the configuration and then keep monitoring it in order to understand if it's changed or not and if it has changed then reload it so it is possible but they need to do something about it usually one of in all my presentations I select one slide and I say this is the most important slide of the presentation and in all my past presentations this was a technical slide but in this presentation it's this slide so for this thing to work you need to talk with the developers and convince them and make them understand what you need and have them make their changes if they don't make the changes then everything I'm talking about today either will not work all it would work would work with a lot of more effort because I know what you're going to say oh, I want the application to reload I will kill the pods and they will restart and they will load the configuration again which is not how you do it ok, so enough about the theory let's see the demo so I already have an application deployed here on my cluster this is the Argo Rollouts CLI and nothing is happening right now there is only one version it's both the stable they preview everything and I've actually made this application just for demo purposes and it shows me what it's doing so this is the application that's running right now it says I am version 1.0 and I know that my role is active I'm in production right now I'm also printing my configuration like where do I get the configuration from it's from etc. slash pod info labels this is also another good practice that I suggest that applications should say where they load the configuration from so it makes things easier for you to understand why something is not working or not so even if you're not using Argo Rollouts I think this is a good idea to follow and then this application also says I'm going to use a RabbitMQ server I'm using for the demo RabbitMQ which is just a Q implementation you can use a Kafka or DB or whatever you want to use just as an example and this application knows that it hits my production Q so this application is a bit smarter than other applications it knows it's running in production right now so what I'm going to do right now is I'm saying ok this application runs fine let's create a brand new version so I'm coming here in my manifest and I'm going to edit the manifest and then here you can see the magic labels that say what you do in the preview and in the active case and you can see I pass different labels for the roll so here at the top it says your roll is active but at the bottom it says your roll preview and also you can see the RabbitMQ server stays the same but I'm using a different Q this is just an example you could say use a completely different instance of RabbitMQ as well so I'm coming here and the only thing I'm going to change is not any setting I'm just going to get the new version so as an administrator even my job is very easy because I just change the version I don't care about the settings I've set them once in the definition I change the version and I say I want version 2 and then I'm going to apply and I have a new version so if I go now to my roll out I have two versions running one is running in production the other is the new version and you can see right now that the active is still the old one so just for demo purposes it's paused nothing happens automatically because I want to show you what is happening of course in a real production scenario this could be automatic so now if I restart my port forwards so now this is production and nothing has happened production still runs version 1.0 and it's using the production to the base but I have a brand new version which is running production 2 and this application and it's easy in this demo to see the change has a completely different configuration here so in this example you see my configuration with just the location of rabbit and queue but in a real application I might connect to free databases, 5 queues or whatever so this application clearly says that its roll is preview so it knows it's not running in production and I can also verify this with rabbit and queue so this is my rabbit and queue port and you can see right now I have two queues there one for production one for non-production and also I can play with my tester which in a real scenario to be my production instance using this service so here I'm saying this is production so send messages to production and then if I go to production then you can see the messages here the ones I just sent so my production messages go to production and my new version doesn't know anything about them because it's talking to a completely different queue and then here let's say it's my integration test or my smoke test or my QAT team or whatever and I'm sending preview messages to my new version and then these are only picked up by the preview version and production knows nothing so I have achieved what I wanted just by changing the version on my side I have two applications running they are completely isolated even though there isn't like a traffic manager in between and I have all the time in my world to do as much testing as I want so now we reach the magic point which if you remember is the promotion so I'm going to promote and I'm saying okay I'm ready I like the new version everything works fine and now if I go back to the rollout you can see that the new version is active and in 19 seconds the old version will disappear so if I start my port forwards again so now everything should be the same this is version 2 and now it knows it is active the preview is also the same we don't need to look at the preview you see it's active and I think the best demo to show this is now if I send some messages to production so I'm sending some more messages to production and I go here if you look at the log it's pretty clear the cut-off point where the application was running in canary mode so it was picking tasks from the preview queue and it says preview message sent out and then as soon as I promoted my canary now the application says okay I'm running in production so I'm going to stop picking tasks from the preview queue and I'm going to start picking tasks from production so that's it for the demo but I want also to show you in this particular case and it's not important that you know it's go long it can be as I said in any language that as far as the developers are concerned the only thing I had to add here so here you can see I'm saying these are my properties these are things that you read from the configuration you are going to search for them these are the standard paths so as I said the application doesn't even know that it's running inside Kubernetes and the magic lines are these four lines from 35 that essentially says hey, place a monitor on your configuration and if the files have changed then auto reload so you don't need to restart any pods or kill something in order for the application to understand the new configuration this is something that happens automatically and I already gave you all the examples about the other programming languages so as far as the application is concerned this is the only change they need to do in the source code and as far as you are concerned the only thing I added was the part about the labels that I showed you before so this is just a standard blue-green deployment it has everything that you would expect and the new section is this one the preview and metadata that is for the labels themselves and then at the bottom you can also see the Kubernetes downward API where I'm saying take these labels and mount them as etc. So this part is your responsibility the source code is the responsibility of the developers so I want to show how clear is the distinction between tasks so if you cooperate with the developers everything works as it should so this is the same example I used and the QR code you have seen is completely open source you can follow it by yourself So what have we seen today it's possible to use Argorolauts for stateless services and Argorolauts has built-in support for many popular traffic providers even if your traffic provider is not supported by Argorolauts check the Kubernetes gateway API and see if your traffic provider supports this there is a plug-in for that if you want to use it for stateful services with a technique you have seen today by using downward API and auto reloading configuration you can use this in application as well So essentially there is nothing stopping you right now to adopt Argorolauts for your applications I will monitor the Slack channel and if I see any more excuses I will create a blog post or a presentation Thank you Kostis if I may ask a question Thank you for this presentation that's a very interesting technique very useful I have a specific question you mentioned that the application the previous application reloads the file right and then you said 15 seconds later remove the old version now my question is do you have some way for this application to tell the Argorolauts controller hey I've finished because my question is what if this new application is stuck in some way or cannot read the new file you don't want to remove the old version but the new one has successfully reloaded the config file So that's a great question and one thing I didn't show you because it was go lang specific here if you look at the the source code essentially apart from reloading it also runs custom code that says to the application stop now first so if your application is doing something at this point in time with this stop now function or whatever you use you can say wait 5 seconds until you are finished so again it's up to you to decide when this cutoff happens but you need to talk with the developers and tell them what you're doing so they know better the way of stopping something in the middle thanks I have another question if you don't mind so it's a really great demonstration how it could work with a simple database let's imagine that we have a database that is password protected and we could not have flawed syndication but it could be external service that is not integrated with a cloud provider so any hints which direction we should look for a solution so how you use secrets with ARGO projects is probably question number 0 no, question number 1 question number 1 is 0 is how you deploy ARGO CD and question number 1 is how you do secrets it depends on the secret solution that you are using so if it's like no, sealed secrets or an external operator or something else I would try to make the settings not the secret itself but maybe for example the token that you use in order to fetch secrets or the authentication method that you use to fetch secrets so the application is not responsible for reaching for its own secrets and then you have to put all this stuff in ARGO rollouts in ARGO rollouts you just enter the new way of fetching secrets but you might be in a company that says so the application as soon as it's launched it has access both to preview secrets and production secrets at the same time maybe for some people this is acceptable maybe for some people it's not there is another big discussion which is not for now which is whether you have your application reload all the configuration it needs during startup or it loads configuration dynamically as it goes that's a very good topic but not for today ok, thank you another question here we are stopping application inside inside the pod inside the container when the pod changes so when the config map which is basically the source of the change changes all the applications and all the pods in the stateful set will restart so basically we might drop some packets that are going to them so is there any difference between this and restarting all the pods except for speed we have downtime anyway so in this demo this was a simple demo so the application just worked the database maybe in a real application you would have request there as well ok, last one and then you would use the traffic provider as well and finally got the traffic provider thank you that's it, thank you very much