 Tenancy con and I'm the very first speaker of no pressure. I guess the bar is really, really low because there isn't a bar. So I guess it's all going to be up from here. Thank you for joining me today. My name is Bob Walker. I'm a field CTO at octopus deploy. And today I want to talk to you about lessons learned with multi-tenancy. And I need to look at my slides before I start talking sometimes. So just kind of a little bit about me. I've worked for a number of different companies before I worked for octopus deploy. I'm a developer at heart. I started off as a developer way back in the early, early days of dot net. In fact, I remember learning about dot net and university going. Have you heard about this brand new thing? The very first job I had, we converted classic ASP pages to ASP.net web applications. Now, the funny thing about this is that every company that I've worked for, we've done multi-tenancy in some form or fashion. But the way that we approached solving multi-tenancy was very different. So I'm going to take you on the developer journey, so to speak. Because I saw we have a lot of platform engineers. You have a lot of SREs. So I kind of want to give you the perspective of the other side, what it's like to be. But really the goal for today is I really want to make sure that everyone doesn't make the same mistakes that I've made over the past 20 years. So don't be like me is essentially the goal of today's talk. Or really don't be like past me. Because past me, well, he's a bit of an idiot at times. And at the same time, while I'm an idiot, I'm also a bit confident and sometimes a little bit overconfident. And so I would make a number of mistakes. And the whole point of this talk again today is to walk through the different approaches of multi-tenancy and learn from some of the mistakes that I've made. And some of the things as platform folks, as DevOps folks that you need to keep an eye on, especially when you start talking about multi-tenancy. So first we're going to talk about the key different approaches with multi-tenancy. And then then we're going to work on the three most common ones that I've encountered. And then finally, the evolution of multi-tenancy specifically once we started talking about Kubernetes. So let's talk a little bit about the approaches of multi-tenancy. In my career, these are the three most common ones that I've been involved with. And this is the single website, single database, which I haven't had a ton of experience with. I've only had worked at one company where we did this. The single website, multiple database. And this is where everyone signs into the same UI. And then they get a different database connection or they're hosted on a different database. And then we have finally the isolated infrastructure, which is probably what a lot of folks have most knowledge with today in terms of different namespaces. Every customer gets their own sandbox. And what I also want to do is I've put together kind of different ways to rate multi-tenancy and the different approaches. So the very first one that we're going to talk about is the tenant maintenance or how long it takes for a tenant to be added, updated, removed, along with the underlying platform costs that go along for the ride. Then we're going to move on to code complexity. How does that impact your code decisions? Because some multi-tenancy decisions, it's a pretty big choice on what you're going to do because it can be very, very expensive. Then we get to move on to crosstalk. And this is probably the thing that drives me the most batty as well as makes me the most scared when it comes to multi-tenancy, which is when one tenant can see or access another tenant's data, either intentionally or unintentionally. We'll talk a little bit about the noisy neighbor and some experience I've had with that, as well as this. We'll be talking about deployments because I work for a deployment company. So we'll be talking a little bit about deployment time and the impact that has. And then finally, we'll talk about contractual complexity. And this will all go pretty fast. It sounds like I'm going to say we're going to talk about this and this and this and this. It does typically go pretty fast. And what I've done is I've put together sort of a rating when it comes to each of these approaches. Oh, good, it did turn out pretty good on the slide. Awesome. And from the looks of this, you might be saying, oh, my gosh, isolated infrastructure, that's definitely going to be the way to go. That's how I should do it going forward in the future. But in some cases, the single website, single database approach, that might be a pretty good approach as well. I'd say out of all of these, I actually have the most experience with single website and multiple database. And my goal is basically dissuade you from doing that today. In terms of like the different experiences and the different companies that I've worked for, this is the approach that we took for each of these different ones. And I think it's very interesting that companies still make the different decisions that they do today. So first up, we'll talk a little bit about the single website and single database approach. And if you forgot the diagram, don't worry, you're going to see this diagram an awful lot. This is where everyone signs into the same website. And then we just show the data based on some sort of credentials. Now, in terms of tenant maintenance, if you're worried about how am I going to scale and how can I have customers self-serve in terms of multi-attentancy, this approach is going to be the best approach you have because all of the logic to add a new tenant is going to be within the same code base as the application. It's incredibly easy for a tenant to self-service, add themselves, make some changes, or to have your support people make changes because again, it's all in the same UI. And it's as simple as adding or updating or changing a record in a database. But that has a cost. And that cost is extremely high code complexity. And one of the mistakes that I made is one of the company that I was working for, I was brand new to the application. I thought I knew what we needed to do. And it was like, we just need to get this feature out the door. We just got to get this feature out the door. And it's just for this one customer. That's all we have to do it for. And so I hard-coded the customer ID. So where it says where tenant ID is for every query, I hard-coded where tenant ID equals 045. I can remember that number probably till the end of my days. And we pushed that out to QA. And the number of tickets I got coming in from QA was just astronomical. Like, are you out of your mind? What did you just do? And I was like, what's the problem? You know, this is just for this one customer, not understanding all of the business rules. So not only is your code complexity incredibly high, but now you have to train all of your developers to make sure they don't make the same mistakes over and over again. If you have an existing application, doing something like this requires a significant amount of effort and time. Within Octopus Deploy, one of the things that we did is we added a new feature called Spaces. And for all intents and purposes, they walked and talked to a lot like different tenants. And so every single one of our queries, instead of where tenant ID, it was where space ID equals. And then as we were doing this, we go, oh, my gosh, our permissions model needs to change. Because how we've done it in the past isn't going to work. So we had to retrofit our permissions model. And then we had to add in more complexity on top of that. And so what should have taken a couple months ended up taking nine, 10, 11 months to fully get out the door. The last thing is that authentication and authorization has to be perfect. And it requires a significant amount of effort to get perfect. Because if it's not perfect, then you have the risk of crosstalk. And I've rated these as a medium. And this was kind of one of the things that people pushed back on for me. I said, why is this a medium? Why isn't this incredibly high? Because the chance of someone doing something like I did where a hard code and an ID number, that could happen. And it's really easy for that to happen. Or from a security point of view, someone could assign somebody the wrong tenant, either intentionally or unintentionally. Or there's an avenue of attack. But the key difference here is that something like that is typically found during the testing phase. And you can put a ton of automation around it. And it's pretty easy to find. You can also find that during code review stages or any number of those different things. And if an issue is found where a tenant's data was incorrectly written to the wrong tenant or whatever the case may be, it's typically easy to fix because you're updating the same database as everybody else. But while it's only a medium for crosstalk, the noisy neighbor risk is incredibly high. So at the company that I worked for, we had reporting capabilities. And we had, this was a company that provided software for oil industry, the oil industry. And so you can imagine some oil companies are bigger than the others. And one of the bigger oil companies decided to go, I want five years worth of data. And so that consumed all of our database resources on one of the web servers, all of their resources. And so we started getting in at all these tickets from people saying, we're getting all these timeout errors. We're getting all these timeout errors until we looked into it and said, this one customer asked for this one thing. And we basically bricked our system because everything is shared, everything. But one benefit, very easy deployments because you have to update one code base, one database. If I need to make a change, I need to make some sort of migration scripts. There's one database for me to target. In fact, what I can do is I can come in, probably take a copy of that database, scrub it for PII data, and we're done. I get a testing database. I can get multiple copies of that, write my migration scripts. Very easy to do. Now the migration scripts might take a while to run because databases in this approach, you're measuring in the hundreds of gigs, not the tens of gigs. But it's typically pretty easy to get out the door. And you can probably knock out a deployment. And if you do everything automated and you have it perfect, probably 30 minutes. Not a big deal. This was one of the funnier ones that I didn't realize was sign off when it comes to contractual complexity. Because I was all about automation. We want to get things out the door. We want to get things up to station. We want to get things out to production. We want to deploy faster and faster and faster. We couldn't do that because we had to have customers sign off on all of these changes. So it didn't matter how fast we pushed things out the door if we had the slowest customer that took two weeks to finally approve every last little thing. And so we started getting upset with that customer going, all right, we're going to start adding feature flags to disable that functionality and tell you sign off on it. Well, now we just add in more complexity to our code base. The other problem is, is your data lives side by side with other customers. And that's really hard to get approval when you start talking to some of these companies and they start talking about data sovereignty laws. And you're saying, wait a second, your data lives, my data is going to live with my other customers? I'm PepsiCo and Coca-Cola is going to live in the same database as me. I don't think so. And so this is typically why people will shift their logic to go, okay, we're going to have that one website, but then we're going to have multiple databases. Every customer gets their own database or every tenant gets their own database. And it makes a degree of sense when you go down this approach. And in fact, I thought this was the right approach for quite some time because, oh, yes, we have one code base to manage and every customer gets their own database, win-win. Your tenant maintenance that goes from a low to a high because your note can no longer easily self-service. You need to add in automation because now every time you add a new tenant, you have to create a new database and that's not an easy thing to do. On top of that, you have to start worrying about how am I going to manage all of these hundreds if not thousands of connection strings? And unless you have automation, it's just not possible. Your code complexity, well, good news is it drops down from a high down to a medium because there's no more customer-specific ware clauses. And this was, to me, the biggest appeal. This is why I was completely on board. Every multi-tenant application that approached this, I was like, yes, no more ware clauses. I'm going to solve it with this problem. I wasn't thinking about the additional layer that you have to add in there where now you have to manage your connections. When someone signs in, what connection string do they get? Okay, let's make sure they're assigned to that correct thing. You also have to have feature flags. That's not an option because everyone's using the same user interface. In your authentication and authorization, it has to be perfect all day, every day because of the two, between the single website and single database and this approach, you have a much higher chance of cross, excuse me, the risk of crosstalk is much, much, much higher. And the problem with crosstalk in this approach is that it's much more insidious to find. So one of the applications I was working on, I'm trying to avoid company names, not to get people into trouble. So sorry for the vagueness at times, but one of the companies that I was working for, we had, we discovered a core bug in the .NET session management where if the right condition happened, the wrong connection string got to the wrong session somehow. And we didn't even know this was happening until someone called us up and said, why am I seeing this financial data stored in this particular, in my particular record? There's no way we could test for that. I mean, how could you test for a core bug in something that's gonna happen if there's 2001 space Odyssey where everything had to line up just perfectly? Once we figured out what the issue was, Microsoft was like, oh yeah, we have a patch for that. Go ahead and install the patch. You're like, well, that would have been nice to know. But then we had to fix the data. We had to get the data from one customer's database into another customer's database. And that was not an easy thing to do. That took another week for us to solve that because you have one chance to do it right outside of backups and everything like that, but you wanna keep the downtime to a minimum. And it's not like you're gonna add a ton of automation to that because that's not a problem that's gonna happen every single day. But one good news, excuse me, is the noisy neighbor problem. That goes down from a high down to a medium because although the CPU and RAM resources are shared, every tenant, they can get their own isolated database resources. If you're doing something like Azure SQL, you can even assign it down to the DTU level where this customer gets 100 DTUs and another customer gets 50 DTUs if you're going down that approach. But again, back to that one user who wants five years worth of data, it's still gonna have all the CPU and RAM resources consumed as it's processing through all of that data. Deployment time was another one of those fun things that I didn't think about until we started having to do this for 100 plus tenants where we had to update every single tenant's database because everyone was using the same code. And you had to do it all at once at the same time. And so if you enjoy getting up at 2 a.m. Saturday or 5 a.m. Sunday because I had to do that to update the schema for every one of these customers, then I could update the actual code, then you kind of get into that. And then for contractual complexity, the good news is the data is isolated from other customers. So that makes things much easier. But at the same time, if they have sign off, which happens quite frequently, especially for business to business, you don't have a good way of handling that. So again, you're back to the whole feature flag thing. So let's talk about isolated infrastructure. So I suspect that's probably what most folks in this room are pretty familiar with. And this is where we have every customer gets their own namespace or every customer gets their own sandbox in some form or fashion. They get their own database. Now in terms of tenant maintenance, creating a new infrastructure for a tenant, especially if it's doing things like where I need to stand up a Kubernetes cluster or I need to stand up some new VMs, whatever the case may be. Unless you have automation, that's going to take a significant amount of time. And for us, for OXPUS Deploy, we actually had to create a completely separate application. We have cloud platform application because we needed the capability for someone to sign up for a brand new cloud instance and be up and running within, I think our goal was 60 seconds. So we have an entire platform devoted just to doing that, to managing all of this isolated infrastructure because once you start doing this stuff at scale, you're talking thousands, when I'm talking, when I'm in scale, I mean I'm thousands of customers. You need to have something to manage all of that for you. And this is not something that you can just whip up in a week. This is an ongoing application that we have resources, an entire team devoted to just managing this. And we actually went through two versions of this. We were actually on V2. The first version had some quirks, but it gets better. And like I said, we have an entire team devoted to it. The good news of this approach is although this is rather expensive, there's almost no code complexity because we don't have any customer-specific ware clauses. We don't have to have any sort of fancy interface or some sort of layer that's managing the credentials for us. That said, one company that I worked for, every single customer, they got a fork of the code base and they got their own. Please don't do that. That isn't incredibly hard to manage. So this is a low-code complexity as long as you don't do that because once we did that, then every time we added a new feature, we had to then merge that across every single customer. And no matter what, you're still going to need feature flags for tenant differences. I don't think there's any way to get around that, especially if you're going to have a single code base. Unless, again, you want to fork the code base, then you don't have to worry about feature flags. You trade one hell for another. The other good news is that there's no crosstalk. The chance of a customer talking to another customer is really, really low because everyone has their own credentials, they're in their own sandbox. Now, there's some arguments to be made about namespaces and I can completely understand that. But generally, the chance of one customer accessing another customer's data is exceptionally low. On top of that, there's no single place where the customer can access all of their credentials. It's typically going to be in that cloud platform system that we have that there's no customer access to it. And if one tenant's compromise should only impact that one tenant. You also have really low noisy neighbors because, again, something like Kubernetes or if you're doing EC2 instances, everyone gets their own dedicated resources. So if one tenant's consuming all of their resources, well, that has no impact on the other tenants. And we actually see that quite frequently with an octopus cloud where one tenant is consuming, they're doing all kinds of crazy stuff and everyone else is perfectly happy but that one customer is going, I'm getting these weird time-out issues. Strangely enough, the deployment time, it's not high, it's more of a medium and the primary reason behind this is because you can update individual tenants. And those are relatively quick. When I say relatively quick, I mean about five to 10 minutes. But it can take days or weeks to update all of your tenants unless, again, you have it automated. And you also have to consider things like maintenance windows, the time zones, when are they using the application. But again, you can do more of a rolling deployment on something like that. It makes things much easier in that case. So although it takes a day or two to get everyone updated, you can do this on a rolling schedule. You don't have to come in on a Saturday at 5 a.m. I meant to put down low contractual complexity, not high. That's what happens when you cut and paste too fast. So you don't have a low contractual complexity risk on this, sorry. Because you can deploy to each tenant as they sign off. If they're not ready to go, you just don't deploy to that one tenant, but everyone else gets the features and functionality. No need to add any extra feature flags. No need to worry about your data because that's completely isolated. So I'm going to spend the last couple minutes talking a little bit about the way that multi-tenancy has evolved, especially the last five to 10 years. So when you look at these different approaches, if you were to ask me five to 10 years ago, which one would I pick? I would 100% pick either single-website multiple database or the single-website single database. And that's primarily due to... Wow, that did not come through. That's primarily due to just an old-school way of thinking. Because again, I was originally a developer back in the early days of .NET, where we had to have... We were talking... Actually, this was pre-dated virtualization. And so everyone would be... Everyone would have their own servers and everything like that. So I was like, how can I prevent all of that from happening? Today, with the tools and the functionalities available to us with cloud providers, with Kubernetes, with dockerization, with infrastructure as code, I would 100% do the isolated infrastructure. I think that's the best approach to go with. But if you do this, especially if you're running things at scale, you really need to invest in that underlying platform, that octopus cloud platform. It's going to take money. It's going to take time to build. But if you want to do it at scale, that's the only way to really do it. And I'm not saying... When I say octopus cloud platform, I mean there's a separate tool that we're not octopus deployed. It's just something that we have internally. I just want to make that very punnily clear. I'm not trying to sell you anything. This is an internal tool that we have, that we control, that we manage. And it's really important that if you're going to go with this approach, you need to have something similar. But not all use cases are the same. Ultimately, the question is, where should multi-tenancy complexity live for you? Should it live in the code base? Or should it live in your infrastructure? Which is the best approach. And there's a lot of different things to consider. But I'll just wrap it up with just saying, please don't pick the single website multiple database approach. It's really the worst of both worlds. You still have to build the tenant infrastructure management system. You still have all the code complexity. It makes things just so much more complex. So please don't go with that approach. Don't make the same mistake I did. So thank you very much. If you're interested, link to my slides.