 So I'm going to switch topics now and talk about upgrade. So kind of going back to sort of my overall theme of us evolving over time from TFS into a cloud service. Upgrade was a big deal. Back in the case with TFS, still case with TFS, you got to take it down early on with VSTS. It was the same thing. Take service offline, go upgrade it, which is a complete non-starter for a global team. Somebody somewhere is critically, and it has a critical dependency on VSTS. They're trying to ship something, they're trying to patch their service. It doesn't matter. They've got stuff going on. There's never a good time for everybody. So we got to be able to do this online. Now, if you're going to do an upgrade, not everything can change simultaneously. It's just not possible. So if we're not going to be able to change the application tiers, the job agents, the virtual machines as a unit, if we can't change the virtual machines and the database together at the same time, who's gonna have to handle the fact that they're different? Where are we gonna absorb that complexity? And we've chosen to absorb it in the application tiers than the job agents. And a lot of this comes back to, I mentioned before, we've got a ton of SQL. The thing with SQL is, SQL with if statements is completely awesome, right? We all enjoy writing SQL with lots of branches in it. It's crazy, right? So we said, instead of complicating, further complicating our SQL, we're going to handle this complexity in .NET. Now, you could do this in Java in any number of different languages. Obviously for us it's .NET. So we created a set of factory classes that understand the SQL versioning. So every sprint, you create a new interface with that version and you kind of march along in time. And that way there's always a set of binders that match whatever version of the database is. This also, by the way, allows for easy rollback of binders because the first thing we do is deploy the binaries. After we deploy the binaries, then we kick off the database upgrade. So if we deploy the new binaries, something goes horribly wrong, we can roll back the binaries, that's easy enough. And this is much easier to test because now we're testing standard .NET code. We can write unit tests for it. It's much easier to deal with step through, debug, et cetera than dealing with this as a crazy set of if statements and branching in SQL. So how does this actually work? So we need to be able to do these schema upgrades online. And like I said, the first phase is we go deploy the binaries. The binaries in a given sprint deployment, let's say, so we're currently deploying sprint 123. The binaries understand sprint 123 database schema and they understand sprint 122 schema. So in and in minus one. And the binaries will query SQL and find out what schema I'm not talking to. Oh, it's that one, I'll load that binder. Again, going back to the factory class of loading what matches the database. And as you decide, hey, I need to upgrade my data. I've added some new feature. Let's say I've added a new feature to work at them tracking. Great. I'll go add, let's say, a set of notable columns. I'll start populating that with data. I may even put in place a SQL trigger to keep it all in sync. But before any of the actual upgrade happens per se, I've got to create the data. Because as you'll see, when we want to do an online upgrade, if we're not going to take you down, it's got to be invisible to you. So if I'm going to do data transformations, and if this is done at any scale, it's got to be done before I lock the database schema, before I block anything that you're doing. Because when I take that lock, it's got to be fast. So the first thing we're going to do is we're going to go manipulate our data, whatever that means for the feature that you're working on. And for something truly large, this could take multiple sprints. Like when we changed work item tracking from a wide schema to a long schema, that was multiple sprints. And every sprint made some changes to the schema. Some of them, like that, are very complex. Most, of course, are much simpler. So once we've gone past the phase of creating this extra data in nullable columns, it could also be in brand new tables with different names that we'll later swap in. We go into what's called deployment mode for the application tiers and the job agents. And this is where they, when they make a call to the database, they grab a reader lock on the schema. And from a standpoint of using the SQL, if you will, the job agents and the ATs are effectively readers of the schema. It's kind of a little bit of an odd terminology, but that's what they are. And meanwhile, the upgrade itself is the schema writer. And it's trying to grab a writer lock. And so there's this dance that goes on in the code where every time a call comes from the AT or a job agent into the SQL database, it grabs the reader lock on the schema. Meanwhile, the upgrade's sitting there trying to find a moment in time to grab that writer lock. Try to grab it, nope, can't do it. Try it, can't do it. Oh, wait, there are no readers. Grab the writer lock. Make the final set of changes, which is I'm gonna update the metadata, I'm gonna swap in the new procedures, I'm gonna swap in the new types. I may even swap the names of, let's say this new table that I built on the side that's actually gonna take the place of the original. I do that swap. Very, very fast, small number, relatively speaking, of operations. And then I'm gonna release the lock. And what you as a user should see is you should never notice. If you happen to be using your account at the moment this happens, let's say you're gonna go save a work item. It may take five or 10 seconds to save that work item that particular time. And you go, huh, that was kinda slow. And but everything else goes back to normal. Like the most you should see is that something slows down for a few seconds and then it all goes back to normal. You don't lose any data. You're in the midst of updating a work item. That all happens for you. None of that data gets lost. It's completely invisible to you. And by the way, as part of this, I don't really dive into it, but we've also have to have the web UI handle online upgrade as well. So when we upgrade the JavaScript files and the style sheets and the icons and kind of all this stuff, if I go make a, let's say I do a major facelift to some particular area of the product. If I have changed that UI and then you hit save and suddenly the call goes to the new stuff and the new stuff wasn't expecting something different than your data in the format that the browser sent it. It's all gonna fail. So even the web UI is versioned. So we've got versions of the TypeScript, the icons, the style sheets, everything. It all loads from a version folder. And that way until there's a full page refresh that happens, you're still using quote the old UI. You do something along the lines, along the way that triggers a full page refresh. You switch hubs or something. You get the new web UI. But again, everything's set up so that you don't notice the upgrade happens. You just get new functionality. Yes. How are you deploying the SQL using scripts, using DAC pack files? Good question. So Ed Glass is gonna talk more about how we do the actual deployment, but a lot of it is driven through scripts. So all of our SQL is checked into version control. We've got lots of .sql files. And the way servicing is done and everything, there's a set of things to get auto-generated to make some of the servicing steps happen and so forth. It's all done through the SQL's in text file, so it's essentially a script. And then we have something we call light rail that's a set of PowerShell scripts that actually drive the upgrade. We built all this stuff a long time ago. Over time, I expect to move to something newer, but right now it's working well for us and it's kinda not a need to go crack it open, but Ed Glass will talk more about how that actually gets deployed. Question. So are all your deployments manual? No. So good question, are the deployments manual? Thankfully, the answer is no. We would go insane if we had to do 192 scale units manually. So they're highly automated. What actually happens, and I think he'll show you some screenshots, we actually use release management. So VSTS deploys VSTS. And that does mean we do have a way to deploy VSTS if VSTS is down, right? So, but VSTS release management works with straight C overall deployment. There's a set of scripts that run the actual steps and everything. So it's, somebody goes to the UI and says, hey, I'm ready to deploy Sprint 123. And they kick that off and it progresses through the rings and he'll show you too. It automatically goes from ring to ring. It's not a manual thing that somebody says, oh, I've done ring zero. Let me go queue a deployment for ring one. Now let me queue a deployment for ring two. Doesn't work that way. It will pause and there are cases where we say, hey, we want it to pause. We'll wait a day to do the next step. Somebody has to go say, yes, it's okay because if something goes wrong, we want to be able to react to it and not have it propagate out to the rest of the accounts, of course. But he'll go through it in detail, but it's all fully automated. And that wasn't always the case, by the way. Question? For all the different assets that you version, whether it's store procedures or files, JavaScript, whatever, how do you, do you have a complete copy for every version of the product internally or are you doing some other kind of scheme, like some form of copy on right or something like that? So it's a good question. So the version, it's actually got a full copy of every version. So for example, in Azure Storage, there's a full copy of every version of the JavaScript files, CSS, et cetera. The deployment itself carries full copy of the SQL. So now, as part of the servicing, it generates deltas, so it knows what to change. So when we do the upgrade, it's not changing one sprocket into another when they're actually identical. We detect all that at build time and actually generate the deltas so we know what things need to be, actually need to be upgraded. But we've got full versions of all this stuff so that they're fully independent. Question? What do you call this, a blue-green deployment? What I call it a blue-green deployment. Actually, no, because the way we do deployments today is since we're using PAS, we actually use something that's called a VIP swap with the Azure load balancer. So we actually spin up, and I think Ed will cover this too, we actually spin up a new set of virtual machines in a staging slot, and then we do a VIP swap, and we swap all of them out all at once. And what's currently in production goes into the staging slot, staging slot becomes production. And it's not atomic, but it's close enough to being atomic. So that would be kind of a blue-green deployment, right? That you have a separate set of servers which are live, and then a separate set of servers that would go live later. Yes, and I guess when I think about it, there's no rolling deployment, for example, right? It's always, the entire set of binary, the entire set of virtual machines is always one sprinter or another. There's never a mix in there. Thank you. Question? So it seems like you guys maintain at least two versions each time you deploy, right? Yes. So when do you do that cleanup? Because at some point you have to, so like the next deploy, you're gonna have to probably clean up the previous version, right? And keep like, you know, the two, so basically the current version and the next. So do you guys actually do this cleanup or this is something that you guys are using like in the background that's making this easier? Or is this like manual? So interesting set of questions there. So on the binary or the virtual machines, once we do the swap and they go into the staging slot, it's not long, half an hour later, or something, they disappear. We delete those. In the database, you know, of course, there's only one copy of the data. There's never two copies of the data. That'd be incredibly expensive. The schema and all, as the servicing runs and replaces the sprocks, there's only ever one copy of sprocks sort of active at one time. And there's not a full second copy of the sprocks in the database. Because again, the servicing has generated the deltas and it knows exactly what to go change. The closest thing that has sort of two things at once is the binary, since it's capable of talking to the old DB and the new DB. And what happens is, as you might imagine, the code, if you went and looked at the code, you're gonna see, okay, milestone, 119, 120, 121, 122, 123, 124. What happens is teams, as they go add those, eventually they go rip the others out. The other sort of interesting challenge, and I'm not gonna talk about this at all, but really gets into on-prem upgrade. In the cloud, we go every three weeks, we upgrade. On-prem, you could be coming from TFS 2010. So that's a whole separate conversation. So there's another conversation around how on-prem upgrade works. But it leverages the same functionality, because we couldn't literally have two separate upgrades. We'd go nuts.