 Good morning, everyone. I'm Anshul Mehta and I work in the platform team at Atlan. I am Sumandas. I also work in platform team at Atlan. And today we are going to talk about how we were able to do 300 hours of repetitive work in just three hours by leveraging Argo workflows and Argo CD. Before moving forward, let me first talk about why Atlan exists and what is Atlan. We started building Atlan with just one mission in mind to help humans of data do more together. Humans of data are diverse. Analysts, engineers, business, product, and scientists. Each has their own tooling preferences, skill sets, DNA. And when these diverse set of people come together, they create magic. They will do amazing things from curing cancer to self-driving cars. Now let's talk about what is Atlan. Atlan is the single source of truth that data teams, or as we call them, humans of data, can use to discover, trust, and understand the data that they care about. And to effectively close the loop on all of these, what we need is data governance. So what exactly is data governance? Data governance is the practice of setting clear rules and responsibilities for handling and using data within an organization. It ensures that data is managed in an effective way and adds value and used for analytics and information purposes only. These are a few pillars of data governance. Let's talk about each one of these. Business Cross-free. Defining, certifying, and aligning core terms related to business, metrics, and business documentation. Lineage Impact Analysis. Understanding the source of data. Where did the data come from? How it has evolved over time? And how it changes over time? Profiling. Standard characteristics about your data, minimum value, maximum value, and standard deviation, etc. Access Management. Making sure that the right person has access to the right data at the right time. Security. Providing more context on how data is used, who is using it, and for what. Active metadata. Ability to leverage metadata to trigger certain automated actions like triggering a Slack alert when some data action happens, updating documentation when a certain, leveraging all the existing metadata. Cool. Just a quick recap. We have talked about why ATLAN exists, what is ATLAN, and what is data governance. Now let's come to the problem. What's the problem? As ATLAN continued to develop, the core microservice that powered data governance began to accumulate some critical and significant technical debt. And this in turn resulted in three major challenges. Number one, reliability. We started facing a lot of customer support tickets, and this number was increasing every single week. Maintainability. Just making sure that the governance microservice kept running and all its dependent services kept running was becoming very hard. And innovation. Our current architecture didn't allow us to innovate as much as we wanted to and as much as the business needed. So yeah, we were in a big trouble, and our nights as a result started looking something like this. And our days, something like this. We decided to, the only way to move forward is to deprecate the current microservice and design a new one to get rid of all the technical debt. And that, trust me, that was a huge decision for us. But in just three, in just four weeks, we built a new service that could replace the existing one and solve all our challenges. And this was a big moment for us. But wait, how do we roll it out? Rolling the service out wasn't as easy as we thought it would be. And it had four major challenges. Number one, scheme evolution. We had to, just to make sure that all the schema changes, we made some schema changes with the new service and to make sure it works fine, we had to do a data migration. Number two, we even deprecated certain APIs. And which meant all the other services that leveraged this particular governance service would need code changes. And even though scheme evolution and API deprecation would make the rollout very hard for us and add some short-term complexities, but it was very important for us to innovate and to build for the future. Number three, Atlant's multi-instance model. We run in a model where each one of our customers get their own recluster instance. And this is important because no organizations today do not want their data to be stored in shared resources. And other than this, there are also some security and compliance concerns for this. And the fourth major reason was zero downtime. We didn't want our customers to have even a single second of downtime while all this rollout happened. So we came up with a rollout strategy that could work on a single instance. Number one, data migration. We would migrate the data. Number two, data validation. We would validate that the migration was successful. Number three, releasing all these changes in other services at the same time that leveraged the governance service and the most important one of all, post-release testing. Once all these changes are rolled out, how do you ensure that nothing breaks? The users have the same smooth experience as they did earlier. But now that we have a way to roll it out on one single instance, how do we do it on hundreds of instances? The easiest, the simplest way was to painfully repeat these steps again and again for every single instance. And based on our calculations, one customer instance would have taken us around three hours and doing it for hundreds of instances, at least 300 more hours. So we were in trouble. But how did we actually do it? We did it with the help of Argo. Passing over to Suman to talk about the details. Thank you. Yep. So as we discussed that we have to release this force. Sorry for this. Yep. We have to do data migration, then validation, then release, and then testing. We have to ensure all this. What we have realized is that we could just simply write Python script and execute it via Argo workflow to achieve these steps. These are data migration and after the migration we want to validate that if the data was migrated successfully and after that the post-release testing. We could just simply write a Python script and run via Argo workflow. Now let's talk about the release. Releasing changes in all dependent services. So governance service was one of the central services where there was a lot of dependent services on the governance service. So we have to release the changes in all dependent services. What do you want? So right now the state is that we have one new governance service we have built that is up and running. But all the dependent services are still using the old governance service. So our release target would be enabling them to use the new governance service. So whenever all the services will use new governance service we would be thinking that the release is done. So this is the state what we want to achieve from this to this. Now I would want to so how we did it before that let's take a look into our platform architecture that we use in Atlant. So there is one control plane from where there is B cluster and Argo CD and Argo workflow there and each customer has their own virtual Kubernetes cluster. So that is each customer has their separate instances. And there is Argo CD which is controlling the release. If we change if we merge any changes it will syncs the latest release to all the instances. So this is about how the platform architecture looks in Atlant. Now from the previous slide we have seen that Argo CD actually controls release. So what we want that from inside of the workflow we are running the instances we want to communicate with Argo CD that is sitting in the control plane that we could ask Argo CD to release the change into specific instance. How can we do it? So we have Argo CD CLI we just installed Argo CD CLI in the Argo workflow and from this we can release the changes from Argo CD CLI. So the question the problem we wanted to solve was how to communicate effectively from Argo workflow that is in the instance to the control to the Argo CD that is sitting in the control plane and ask them to release when we want in each instance. So for that we have to solve four things those are creating Argo CD user and policy why this was important. So for Argo CD to work we have to first authenticate Argo CD CLI with each user but we can't let any one user to update other instances. So for each instance we have to create one service user and RBSC policy for each. So we have created those control plane for the Argo CD and then we have created one Argo CD parameter that will update the config map of each services. So what this config map is doing for each service we just configured the services this way that if we instruct the config map to use the new service it will start using that service. So in each Kubernetes cluster that customer have if we just update the config map it will take all the dependent services it will start using the new governance service. So whenever we will update the config map it will take the pod will take a restart and after that it will start using the new governance service. So and we will be controlling this param controlling this config map from the Argo CD parameter itself. So our target would be just update the Argo CD parameter via Argo CD CLI and it would be rolled up it will be rolling out the changes. So we have we did data migration then data validation and the release and testing all of the steps independently in just one Argo workflow we package them all and we can run it in all of the customer instance as we want. So here what we can see that we can just repeat this step in each customer independently and it will run without any dependency. Now the target is how to run it in all instances we have hundreds of instances so how to run it. So we have two way first we have to install all the templates install the templates in all the instances and then we have to trigger the workflow when we want. Now let us take a look how we actually install the Argo workflow in all instances. There is a tool Argo PM we use a tool built and open source by Atalan in the last cubecon we actually presented about this. So what is Argo PM? Argo PM is actually package manager for Argo workflows. It enables the developer to distribute and consume Argo workflow templates. If you have lots of workflow templates to manage and there is a lot of dependency graph forming so you can easily use Argo PM and use them as package so this using this Argo PM we publish the workflow into packages.atalan.com so that is in the central repository. From there we published all the packages into all the instances. Now let us trigger the Argo workflow so if you can see there is in the control plane we have Argo CD and also the Argo workflow. So here this Argo workflow in the control plane manage all the platform related tasks from there we can submit the Argo workflow from there. So we just passed the list of instances and then and it triggers the Argo workflow in each instances. So this is how we did it. Now let us take a small recap how the whole flow looks like. First let us we trigger the platform workflow this platform workflow has the list of instances and after this it actually triggers the workflow on all specified instance that we given the list of. Then it runs data migration then validation and then release and testing beautifully with Argo CD and Argo workflow. So Argo workflow in tandem with Argo CD really helped us releasing changes so much easily and in manageable way. And what is the impact of this what after when we did it we 100 we really successfully with 100 percent and then we also reduced the customer support by 96 percent after all these changes and in the last 6 months we have added 3 major changes related to data governance. Any questions? Thank you. Please post your feedback in there.