 Okay, hello everyone, I'm Marco Amador, Saptor Lead for Data Engineering and SRE at ANOVA, and I'm here to present a little bit about ANOVA's journey on GitOps and progress delivery. So ANOVA is the global leading provider of industrial IoT, and is the result of a merger of five well-established companies. And with that, we add five platforms and five different ways to build software and deliver it. So we created a new platform, trying to follow the cloud native culture and best practices, and created the Unify platform. The platform that we've created as a state-of-the-art tech stack, a cloud-based tech stack, we are running everything on Kubernetes, on NISTU, service mesh, with a pretty nice observability stack, like Prometheus and Lowkey and so on. We had a few key aspects to take care when creating a new platform from scratch. One of those was scalability. We needed our clusters and workloads to scale automatically, accordingly to the load. And we needed to create a multi-geography and a multi-region clusters because we need to segregate the data from different geographies from our customers and multi-region, basically for data replication and read agency. We have multiple environments, staging and production in all of those geographies. And of course, we needed to enforce our security policies in all of these clusters. More importantly, we have different product teams that we wanted to have completely full ownership on their infrastructure and applications. So we also need a way to configure the roles and the tenants themselves. And for that, we used GitOps, more specifically, Flux. We started with Flux V1 and for a while now we are using Flux V2, which brings us a few controllers, source controller to handle all of our Git repositories, customized controller that applies customized on those Git sources and create the Kubernetes resource themselves and Elm controller to take care of the Elm releases and perform all the Elm operations that we need to install our Elm releases. As I said, we need multiple Git repositories, one of them to our platform team, basically where we describe all of our Elm releases for operations related like the observability stack, the issue service mesh, the tenants themselves. We need to specify here and the permissions. And each tenant has their own Git repositories where they describe their own workloads at will. So with so many geographies and environments and regions, we needed a way to keep our repositories without duplicated code and something to prevent drift between so many environments. And for that, as I said, we are using Elm releases from Flux, it's a CRD, and customize that allow us to apply overlays on top of some base or common base manifests. And then we can apply the small difference that we have between environments while keeping the base clean and without duplicated code. Yeah, I usually show my repos. I have some example repositories on my GitHub that we can check after this. Well, how do we continue to deliver our workloads? We use a pretty nice feature of Flux as well, which is the image automation that brings two additional controllers. One is Elm release, two additional controllers. One, the image reflector that keeps fetching the new image tags that we are keeping pushing in our CI pipelines. And another one that matches or tries to matches our image policies with those image tags that we are finding on our container resistries. If any of those match the image policy with the new tag, a new patch is applied in our or in any Git repo that reference that image policy. And then we roll out the new Elm release with the new version. But not so fast because sometimes we don't want to simply roll out the new image because a lot of things or a lot of bad things can happen. We might be introducing a regression. We might be missing some integration tests that we forgot to do or even suddenly we might, our APIs might start taking longer that our SLO allows. So we are using Flagger to that as many deployment strategies, canneries, AB testing and blue and green. We are mostly using canneries where the principle is very simple. We start or we spin up new containers using the new image that we detected in our image policy and start progressively sending or forwarding traffic to those containers. And during that, which is called the canner analysis, we keep monitoring the metrics or whatever we want to see to see if the SLO still complies. If everything is okay, the cannery is promoted to the primary workloads. Here is the kind of chart that explains this a little better. We start progressing forwarding traffic to the canneries. If everything goes well, keep increasing that amount of traffic and if everything is okay at the end, the primary release starts using the new version and the canneries are killed. Sometimes we don't have enough traffic to have metrics so the canneries wouldn't progress without the metrics for us to analyze. So we can use different strategies like using generate our own traffic, like using tools like A or K6 where we can perform loading tests during the cannery analysis. While looking and seeing all the canneries are progressing, we can keep seeing this in our dashboards to see how the canneries are performing. The canneries are ephemeral. We test it and the canneries only last while we are doing the cannery analysis. Sometimes we want to keep the new version not being exposed to all of our users so we are using dark launches where for the case that we cannot test the new feature using feature flags for instance, when we start using a completely new architecture, we wanted to make available the completely new version to only a specific set of users. And for that, we are using an issue, virtual service feature which is called delegation where depending on an HTTP adder for instance, we can delegate the traffic to another virtual service from Istio. In this case, if anyone knows the other side dark, we'll hit the new versions that I'm releasing and are available only for the users that know that specific adder. We can keep running canneries the same. We have all of these set up in our Elm charts and different to the canneries when the cannery analysis finished, we still have the new version available for the ones that know that specific adder. And basically, it's what we have been doing with GitOps and Progressive Delivery. We still have a few challenges like we needed to, we have a different observable stack in on every cluster. We'd like to have a unified, for instance, observable stack. And that's it. I hope you enjoy it. We do have time for questions if you wanna take a question. Just go to the microphone. If you'd like to do a question, there's a microphone in the middle. If somebody's up front, you can wave your arms and I'll throw a mic at you. You have four minutes. Any questions? We got one, I'll bring the mic over. Thanks for the presentation. It was very good and very interesting part, so I liked your choice of the stack. Just one question. You mentioned that for the cannery release, you don't have sometimes real data, so you execute tests, for example. Is the quality enough to be sure that it's good enough to release it because you're creating tests on your own? Maybe it's not really the behavior of the users you're trying to see? True, true, but we are using specific containers for that specific service where we define a bunch of case tests that we perform. I think probably is even more accurate because we can just go through every endpoint that we have to test it while if you are only getting traffic from outside, we might be eating only a few of those services and then we are not really analyzing the cannery correctly. Any other questions? Raise your hand or go to the mic in the middle of the room. We have a taker. Raise your hand if you're gonna have a question as well and I'll get you set up next, or line up. Go ahead. I was just wondering, how are you keeping Istio and Flux up to date in your production clusters? Well, we are keeping updated with basically manual. We are, right now, we are using Istio 13, I guess. It's pretty updated. And the Flux is the only thing that we actually are not using image automation, but we could also use image automation to update Flux itself. But we are not doing that. We are actually performing because we want to test first in staging. We are doing Flux install and so on. But Flux, it's in its own GitOps repository. So we are doing everything as code, but we are updating that our operations GitOps Git manually. With Flux CLI itself. Thank you. Yeah, really enjoyed the talk. How are you securing your Git repositories to make it so that only privileged people can push new code to production? Yeah, basically, we have everything on Azure. So we have AD groups that only the members of a specific tenant only have access to that GitOps repository that day on. Thank you. Yeah, got another one? Yeah, thank you for the presentation. So how do the application team and the developer sort of interface with Flux? Do they decide the threshold for A-B testing and how is that configured? Yeah. We have one single AOM chart that supports, it's very flexible, so they can decide what latency they, basically they can decide their SLO for that specific service. There can be latency, can be error rate, can be whatever they want. They have complete ownership. The AOM charts are flexible enough to be configurable as we want. All right, thank you, Omar. Everybody give a round of applause. Thank you. Thank you.