 Hey everyone, thank you for joining. My name is Prasad and I'm an engineer at the workload identity key in Uber. So this is the story of changing the specific ID of all inspired and able to workloads and the challenges and learning came out of that exercise. So let's get into it. So this is a brief agenda. We will go over the background, which will involve the current state of Spire that runs in Uber Prod, some details about the Intra. Then we'll move to the problem statement. What problem we were trying to solve and what approaches we took to those problems. We can go over the challenges and then learnings of this exercise and we can end it with the Q&A. So let's look at it, the scale. So Uber has thousands of hosts which have Spire agent running on them and we operate in dozens of data centers. Our Intra is still evolving. It has different orchestrators which schedules different types of workloads. And this is important detail because I will get back to you later. This also means these workloads are going to have different OZ requirements. And if moving on to the identities, we have around million plus unique identities across different zones. And OZ is like a in-house solution. It's like any policy-based solution where you list a policy for your workload which has a allow list of bunch of 50 IDs. And then it's all wrapped in our library which we built on top of GoSpiffy or GoJawa. And let's look at the sample Spire registration, especially what it looks like. So this is a session which has fields like Spiffy ID that's the identifier of Overclude. Then it's parent, it's where it rolls out to. And it's selectives are, in this example, the Docker labels, which service name foo and the partition prod. So how it works is when the Docker container matches with service name and prod partition, these two values that Docker container will receive this identity. And it will contain the Spiffy ID of example.org which is a trust domain, the one variable that we have defined and then foo slash prod. So the problem here was, as I mentioned, our intro is evolving. And they plan to change some environment variables and their values. And like I mentioned in previous example, like our Spiffy ID looks like trust domain and it contains some environment variables and then the workload ID file. Now, if the intro changes the values of some environment variables, then Spiffy ID will change. And which means the same workload is going to get a new Spiffy ID now. And so this means all the odc policies associated with that workload, you need to change and get updated to the new Spiffy ID. So what are the approaches we took to address this situation? Right, one of the basic approaches is keep the old Spiffy ID. So we need to change even if underlying environment variable changes to just have a custom hard core logic to assign it to the previous value. We feel like this is not the right solution. It's like pointing the problems on really solving it. And it will involve changes to this bad registration flow with some complex logic which may work but not necessarily all the time. So another approach was getting rid of this environment variable in this from Spiffy ID. Especially this variable was really not needed to uniquely identify our code. So it makes sense to get rid of it because this can change in future as well. So when we decide to take this approach of getting rid of this unwanted variable from ID which essentially means all Spiffy IDs which follow this format needs to change. And it exactly sounds like migrations and all the parties involved in migrations. They all saw fun. So as we dig into this approach and started to get into details then we realized that we have a lot of Aussie strategies in place for different type workloads. And the reason for that is there were some workloads that are on boarded to us before we even created a custom home-based Aussie solution. And some of them had different requirements which like circular dependency ones where they could depend on the solution that we have developed. So some of the weird cases we saw as the Spiffy ID was hard coded into the code this part of config. And this essentially translated into us, chasing all these different stakeholders which are directly consuming Spiffy ID and updating them, deploying those changes, making sure all the Aussie policies are updated. And another, as we looked into this board we saw another opportunity where essentially what we can do is create a new registration, right? So the same workload can receive two identities. Inspired upstream, you can get multiple identities same with our Aussie solution but there's no really way to choose a preferred identity. So in the previous example that I showed, who can get two identities but there's no way to choose a preferred one. And the same was the case in our Aussie solution where there wasn't a way to define a preferred identity. So what this essentially meant was as soon as we would have created a new registration, workload could have gotten the new identity and could have caused failures if the Aussie had not been updated before. So as the migration steps that limited us to update all Aussie policies before we even changed the format. And this was like I mentioned this is a time consuming process. We had to chase various stakeholders to update their unfix. So we changed some unfix for them, waited for the deployments and there were some of the snowflake cases where so the workloads, especially you can think of a platform services, which run on every single host. They could take a days to receive a new build. So essentially we were blocked until all of these workloads have received this new build which contains both the old and new specie IDs. And so some of the snowflake cases we had to create white lists to move ahead with other workloads which are ready to consume for new format. So that was the first step. And then we just went and updated the specie format, created all these different new registrations. And another thing we had to look for is as I mentioned, we have a lot of registrations. We had to look for scalability in terms of how many registrations SPARS are able to handle. Recently we are also seeing agent which caches different identities can cause the out of family traps. So that was one issue where you cannot just go ahead and update all the identities. We had to take it into batches, small batches, update the new ones, introduce the new ones and remove the old ones so that you don't cross the total number of registrations or increase in registrations per agent count. And then the last step was just removing old specie ID from OZsys. So from all this entire effort, there were obviously fewer learnings. And one of them was we need to advocate for a uniform OZ solution across different workloads. And if we had that, then why do you know this is not for cases like directly using specie IDs, especially in code. So that would have essentially helped us our interaction to just a limited number of folks and then could have had prioritize and do the deployments and save some time that we lost in the first step of a migration. And second point is around handling of multiple identities. And this one is still under discussion. Like we're not sure like where was the right place to handle multiple identities like should spy registration has a field for a default identity or preferred preference number or OZsys should limit on some time-based or preferred way of choosing identity. And this is something that we can actually use to help our open source community and get ideas on how we can handle this. And under the learning we took from this exercise is our specie ID format needs to be as concise as possible and it should have a very limited number of dynamic variables which are only associated with the uniquely identifying workload. Adding static fields are fine, but if your dynamic variables in the specie ID format depend on a lot of stakeholders, then there are chances that they may get, they evolve and then require you to do specie ID migration. So that's something that we thought thorough and decided to came with this new specie ID format which obviously contains trust domain and we introduced this orchestrator as a field. So orchestrator is nothing but a schedule of a workload. And the reason for introducing this into the format was there's usually no guarantees between your uniqueness between two workloads. So for example, the foo may be scheduled by scheduler A and also by scheduler B. So it might cause confusion if the scheduler is not field, scheduler field is not present in the specie ID format. And someone needs to define separate policy policies for those. And another advantage of adding the orchestrator we felt is when we enable the R back. So if registrants are the ones who are doing the registrations or orchestrators are the one who are doing registrations then we can simply put this as a prefix. So the orchestrator A can only work on the specie IDs like creation deletion of the trust domain plus orchestrator A prefix. Second part, last part of this specie ID format is unique workload identifier. And this one actually depends on the type of workload and the OZ requirements. So typical service, stateless services, we can have something like service A and the partition could be production or staging as a identifier. Some other low-level services may not even care about this partition field. So it really depends on their OZ requirements but our goal is as we decided could be keep the identifier asked and say that as small as possible. So this was our experience and I'd be happy to take any questions. Thank you for listening.