 The next talk we have is a thousand Django's with an order Django multi-tenant by Lorenzo or do you learn so? Thank you Okay, so hello, everybody. I'm quite honored to be part of this online version of Python India 2020 I hope you are just having a great first day of the conference And I hope you're going to enjoy this talk as soon as my as my connection behaves properly So a thousand Django's within or Django multi-tenants does exactly the title of the talk and Let's get to it Well before there is any syntax error triggering in your head Let me just say that Django's is my humble attempt of pluralizing the word Django, you know the web framework Apparently it's possible to use both as an ES version, but when you use the ES version Apparently you seem to trigger some memories of sweet tropical fruits in the head of your audience And also it gives room for some really really bad puns. So as far as Django's I hope we can move on with this type of pluralization Anyways, hello officially. Hi. I'm Lorenzo Peña. I am a proud citizen of all Dean Cuba I've been a Django developer for 11 years now and as of late I have this acquired taste of in a chaser of multi-tenancy challenges Now the main question that is going to be driving this talk is how to fit a thousand Django's in a single Django And most importantly, why does it matter at all? So please gather around because today for you I have a quick tale of fitting Django's that is honestly a crazy combination between Housing web development and Dungeons and Dragons. So let's get to it once upon a time Customer red had a problem and our brave hero you develop a solution in Django But then customers blue green and yellow showed up with exactly the same problem So what should our hero do given the fact that there is a solution already a Django solution already implemented And there are some options the first option Which is the one that comes first when you roll the dice is to actually clone that Django solution into multiple different Django's that is what I like to call multiple single tenancy and While you can do that for three or four customers is going to be a very difficult tax task If you need to do that for a thousand customers So if you go on this road Well, you're gonna get some tenacity strength and patience But let me tell you that there is just a better way So please take a look at that shiny white belt into the left and the best way to do it is to actually try to fit many Django's within a single Django with this my dear audience You'll see that you're growing vertically Sky is gonna be your limit and you're gonna get great get a huge deal of intelligence charisma A mysterious gauntlet and a brand new batch which as you may have guessed is multi tenancy Because yes, there are of fitting many things into something else is what is typically called multi tenancy in housing Software development and yes dungeons and dragons, too So let's get a bit formal now Multi tenancy is the software architecture in which a single instance of the software is capable of serving multiple tenants For instance, there you go some examples that of businesses that implement multi-tenancy drawbacks Shopify slack and wordpress And then tenants are going to be the isolated spaces in which users with specific privileges interact For example, the accounts of Dropbox the workspaces at Slack blocks at WordPress servers at this core stores at Shopify or sites at stack exchange Now, why is this all relevant? Why does this matter? Let me be honest with you I have this firm belief that if you are a web developer At some point in your developer's career, you're gonna be crossing path with multi tenancy And let's say that multi tenancy. It's a medium to complex to complex task It's just not a piece of cake and can be daunting at some point in time So I think it's fair to use the analogy that at some point in your career If it hasn't happened, it's gonna happen anytime soon and multi-tenants multi-tenant Thanos It's gonna be kind of inevitable, you know, like in the movie But the good news is that we don't have anything to fear because we are Django developers so far There is no major challenge. Django hasn't been able to overcome as of now So please whatever you are right now in the work. I clap your hand Let's clap our hands for the framework that turned 15 years old and as of today in 2020 is more than cable of lifting all the heavy weights of modern web development So awesome. Yeah, this is awesome But how do we actually get to implement multi tenancy into one of my existing Django projects? Or how do I actually get to fit a thousand Django's into a single Django? Well, there is a number of packages available to you and you can just use one of those But truth be told one does not simply add multi tenancy to the project Even if you become an expert in one of those particular packages and you get to learn the noots and bolts of that There are some underlying principles and some general purpose knowledge That outlasts and not leave any existing package that you and I would do a good favor to know Because by doing this we wouldn't approach any Existence solution out there in the form of a black box and we will be able to understand what's going on and even extend Or make our own decisions So in this talk in order to go through the path of thousand Django's within what I'm proposing to you here is that We're going to be taking a look at the underlying principles of multi tenancy in Django in general So this is the agenda I have for you today We're going to be taking a look at the active tenant then the three architectural choices You'll have to make for multi tenancy and finally the scope of everything else in the framework Are you ready? I hope you are Let's begin with the notion of the active tenant Remember we mentioned stack exchange previously So suppose you are in stack exchange and you want to run this query question objects all you are expecting to get some questions But where because stack exchange is a big site that are multiple subsites like stackable flow server fault super user area 51 others So welcome to this new concept of the active tenant This is kind of the cornerstone of multi tenancy and the idea is that now You're gonna be operating in the context of an active tenant in your database access your L reverse in admin side cash every single part of the framework and Notice that it's gonna happen even outside the request response cycle because we have things here like management commands and Solitary tasks where you don't have a request object to interrogate So we have to be ready to be to operate within the context of an active tenant everywhere Even if there is no request object Django has a couple APIs You're probably familiar with where there is already this notion of getting and setting an active something This is what you do when you get and set the time zone and language and therefore We can totally extend this behavior to also get and set an active tenant. This is what most packages do And this is a possible code implementation of these pair of functions notice that we are using here as give ref local Which is a dropping replacement for threatening locals that works good with async.io You know that Django as of late is getting more and more async compatibility But if you take a look at the code, you'll notice that actually what we are creating here is a global variable and Globals are generally this courage in Django. This is a pattern that Django code base itself hasn't embraced a lot Django doesn't use globals too often. So Well, this is a good use case but it use case for using globals is generally not recommended for you to be Just creating new globals. The reason is that the more you depend in globals The more couple becomes your code the harder to test in isolation and therefore it's not recommended in general Now about the active tenant there are two interesting questions to ask ourselves number one What is the type of a tenant object? Is it always going to be an instance of any particular model? Not necessarily I'm just gonna leave this open as a food for thought and second question What happens if for some operation there is just no active tenant? Is this going to be a buck? Is this a valid use case because you're running an operation that is tenant agnostic? Or probably is the lack of an active tenant an indication that you want to run the operation in a wild card mode and therefore run It in multiple tenants all interesting questions that we're not going to be answering specifically here But we can discuss about that later So as soon as you get the concept of the active tenant in your head You're not you don't face three architectural choices that you had to make in order to implement more tenancy in your project These are users and tenants database architecture and tenant routing Let's take a look at each one of those Users and tenants we previously mentioned that tenants are the space in which users with specific privileges interact But this relationship between users and tenants and tenants can take three different shapes One is that in which users exist outside the context of tenants You can have users you can have tenants there can be relationship, but that relationship is kind of loose non strict For instance more press Shopify and discord where each one can exist separately There's another shape in which users exist within the context of tenants That is you cannot imagine a user being taken outside of tenant and the perfect example for this is lack Where you cannot even log in unless you specify a workspace first and thirdly We have that shape in which users and tenants are the same thing and as an example We have Gmail Dropbox It will depend on you to pick which one of these cases is going to be the most suitable for you because this is going to have some Implications in how you model your users and the baseline question to ask is how many tenants? Do I expect a single user to be associated with that's the first choice? Second choice is going to be the database architecture itself That is the layout of the database and if you look outside in the internet, you're going to find that there are typically three approaches Isolated approach in why in which there is multiple databases one per tenant the shared approach With one database and then tenant column on entry level tables and finally the semi isolated approach that takes advantage of Postgres schemas in order to do the tenant storing of data Let's take a look at each one of those Isolated database approach. Well, you know that Django is Compatible with multi database configuration. So in practice, there's nothing stopping you from having the settings like this where you have Each one of the tenants corresponding to one database in this case You will need to have a translation function from a tenant into a database alias And then you could use these alias for performing all your query operations like saving your Objects or running your queries or even creating a scope manager so that all your subsequent queries are already operating in the context of that database if you want to automate things a little bit All this work could be offloaded to database routers and since we have this translation function that converts a tenant into a database alias We could be providing this value as default value for read and for write in your database and therefore You could be offloading some of the work now the good thing of the isolated database approach Is that it is optimized for isolation because your tenants are already separated into different databases? The bad is that you cannot have relations across databases Because Django impose this constraint and that adding tenants will require Reconfiguring your project because your tenant catalog is now living in your settings The not so funny part about the isolated database approach is that your costs your operational costs are going to be skyrocketing The more tenants you have so unless you're planning to have a number of tenants in the lower tens or you can afford to have One of these gentlemen as your billing manager then this approach is not generally recommended Second approach is the shared database approach in which you have our records mixed and identified by a tenant column In this case entry-level tenant specific models will require a pointer to the tenant they belong to and notice that I'm saying here Tenant specific models because you may have other models that don't don't make sense We inside a tenant that those could be tenant agnostic and also notice that I'm saying entry level Because you don't have to annotate every single possible model that is tenant specific as long as you can reach to the tenant It belongs to through a reasonable number of joints. You could afford to skip this Tenant annotation on every single tenant specific model now for this approach You will have to be passing your tenant specifically in all your queries for creating objects and for filtering No matter if the tenant is right there in the model You're operating with or you have to make a number of hops in order to get to the tenant through joints You could also do a bit of automation in this process And this is something that some packages already do In order to automatically assign the tenant when saving a model you could use a default value for the field Which could be a cobalt function in this case get current tenant or you could have a Custom implementation of a field in this case a foreign subclass with a pre-save hook Or you could also resort to using a pre-save signal on the relevant models as for query in well Some let's say magic that perform some of the packages out there is to actually use custom managers and custom queries in order to inject The active tenant as part of the query so that you don't have to do the filtering yourself But on some particular cases like sub queries or certain database annotations It could happen that you still need to pass the tenant specifically so The good thing about the shared database approach is that you are optimized for scalability because adding tenants is just a matter of adding Rows into existing tables. The bad is that data isolation is going to take extra Development for because it's going to be your responsibility to be able to just provide the results that belong to the tenant that is requesting the data and Consequently, they're not so funny part about the shared database approach is that is super easy Extremely easy to just forget to filter and pass the tenant as part of your filter So if you don't want to wake up in the middle of a night with your brain wondering whether or not you have filtered by the active tenant My recommendation is that you bookmark all your tenant scope queries that you Create automated tests for each one of them and make sure that they are actually returning results of the active tenant And even that you take a step further and test that you have tested each one of them Yes, it's kind of a riddle, but it's life-saving in most of the times and please remember Test are the softest pillow you could have in general in software development But especially if you're going to share database approach and the third approach the final one is the semi isolated approach in which Postgres schemas are used to isolate tenants within a single database. What are schemas in the concept of Postgres? Well, they are a layer of abstraction between database and tables They are equivalent to name spaces and in the face of ambiguity if there is there are multiple tables with the same name in different schemas You can resolve that problem by configuring a precedence through a postgres concept that is called the search path Now with this approach queries remain practically unchanged And this is one of the big selling points of this approach But I have to warn you that this comes at the cost of an increased technical challenge somewhere else And this is where under the hood packages that implement the semi isolated database approach And if you're gonna do that, this is this is the challenges you're gonna face You're gonna have to implement a custom database back end in order to set the search path in postgres Prior to performing all the other queries You're also going to need a custom migrate command to operate with schemas Because Django knows how to operate with databases But this concept of schemas that is in the middle of database and tables is something that Django doesn't know How to do so you have to extend the migrate function so that my great works at the schema level and finally You also need to create a custom database router with the allow migrate hook So that in tandem with the migrate command you can decide and Django is capable of determining Whether or not a specific model can be migrated in a specific schema Now the good thing about this approach is that you are optimized for isolation because you're still using schemas to separate Your tenant data, but you also have increased scalability because you're still we're using a single database And therefore adding schemas is easier than just adding databases The bad is that it takes extra effort to understand and control how schemas interact This is like a new dance you have to learn and the not so funny part about the semi isolated database approach Is that well? Let's just say this is one of the not so funny things is that migrations are gonna take a lot of time because now You have to be iterating through all your schemas in order to be running migrations And yes, this is going to be a very time costly process the more tenants you have That said we can now ask the interesting question which one is the best and let me tell you as you may have guessed that neither is it Why well because there are pros and cons in all of them as you have used as you saw And because it will depend on your specific use case and let me just tell you There are going to be use cases in which each one of these databases approaches are going to shine In each one of those so none of those is ruled out out of the box Of course, if you want to engage in an ice cream fight respectful ice cream fight about all of this I'll be more than happy to take any of your ice cream balls I mean any of your questions later after this talk and the final architectural choice with we've seen users and database Now we're come to see the tenant routing that is how to activate a tenant from an incoming request Let me remind you that Around 90% of your Django operations are going to be happening within the request response cycle So it's very useful to be able to activate a tenant from an incoming request and then have the tenant activated for the rest of The cycle and in general, this is perfectly possible as long as your tenant can be inferred from the user Session or headers or even better if your tenant can be inferred specifically from the URL By through the domain or subdomain or subfolder or query parameters Now, where is the place to do it? Well, the perfect place to do is our middle work Because by creating a middle word you can just use any translation function from Any part of your request into a tenant and then you can activate this tenant if there is no tenant already activated If you take a look at this code, you will see that there is no major complexity only to just create the function that translates from a session Into a tenant and it's up to you to determine what's the meaning of that Now if you create multiple retrieval methods, if you have multiple translation functions from different parts of the request You can also have multiple middle word that you can then chain and therefore you could be activating Tents from multiple different methods. So you have choices in this case Now, what if you want to do the opposite? What if you are already within the scope of a tenant and you just want to generate a canonical URL? That not just points into a particular path of your project, but also into a very specific tenant just by means of the URL For some cases, this is simply not possible because you know users Session and headers are not easily nor securely and I don't recommend you that you try to encode this as part of the URL But if you are doing translation from the URL itself is perfectly possible and it's going to work really good And if you're doing it be a subdomains of folder or query parameter is perfectly possible as you can see You will only have to do some tweaking of the URL reversing process and packages will help you with this Because you you know that Django only reverses the path part of the entire URL And therefore you will have to either prepend interpolate or append the tenant you are operating with As a bonus, let me just tell you that you can also use different URL counts based on the active 10 This could be interesting on some scenarios Django already provides a hook if you assign at your outcome module to the URL Your outcome variable of the request Then this module is the one that is going to be used for the rest of the request another default one So as long as you provide yet another translation function and you convert a tenant into a URL Count module that you have for instance for a particular tenant you could have different URLs for this tenant So these are the three architectural choices You will have to make users and tenants database architecture and tenant routing and you have to make a choice for each one of those Some combinations are going to work really nice together while others won't make sense at all So it's up to you to analyze and determine what is most appropriate for your project and then decide Now there's more to Django than just that So let's take a quick look at the scope of everything else And let's see at least five parts of the framework and let's begin by management commands It's gonna be extremely useful to run management commands within the scope of a tenant and for new commands You can include a tenant argument in this command so that you can activate the tenant and then perform the Regular operation that the command is doing bad for existing non-tenant or command Which is everything that chips with Django or any other third-party application This is not going to be possible So you will have to define a special command wrapper and this is something that comes with the packages This command wrapper is going to take a tenant argument and also a neater command with its own arguments And the wrapper is only in charge of activating the tenant and then calling the inner command You will find that some packages even do the nice thing of blending the arguments between the wrapped and the wrapper command So it's probably something that you will take advantage of without having to implement yourself But it's good to know that this is happening behind the scenes As for file storage. Well It also it's also going to be useful sometimes to separate files by tenant for instance by prepending a string representation of your tenant to the paths and You can define a custom file storage in order to do that if you need a different a different implementation of that You could always generate a new file storage now in higher security contexts Where you cannot afford one tenant accessing files from another tenant you could do a couple of things you could generate pre-sign URLs and By using pre-sign URLs clients are going to have a time-limited access and also Pre-sign access to that specific path But even if you cannot afford the chance that During the lifespan of that URL and other people accesses that URL You could you also use a proxy view that is a view in Django That is in charge of determining whether or not the incoming request has the Appropriate security clearance in order to return the file this is something you will find in external packages and It comes at the cost with some overhead because all your files are going to be going through this proxy view as for cash Let me just tell you that the cash that comes with Django out of the box is tenant agnostic Therefore if you use it as is you're gonna be having a cash clash because one tenant could be setting Something in your cash and another tenant using the same key could be retrieving that or just overriding that and notice That is a significant security concern here because you could be leaking information If you if you if you don't do this tenant aware So the way to solve this problem that comes out of the box Almost always is to actually Incorporate the tenant as part of your key and then as part of you of your key function And then use this function for your cash and therefore your cash become tenant aware This is very important consideration for you to know as for salary tasks Well is more a matter of discipline than code because salary tasks are going to be tenant agnostic initially But if you pass a tenant identifier as part of your task arguments, you will be able to then Get the tenant from that identifier and then activate the tenant You will find this behavior in other existing packages, but as I've been saying for just too long It's just useful to know the underlying principles and finally for channels. This is gonna require a little bit more work Because you will have to implement a custom middle word to activate tenant from request and for Most middle word operations channel require different middle words because it's operating at a different level. It's it's it operates with protocols We are for more minutes. Okay. I'm wrapping up. So You will also have to name your consumer groups Because if you cannot afford messages being leaked across tenants, then you need to use a similar approach to what we did with cash and therefore Provide a string representation of your tenant as part of the name of your consumer groups And this is something that is not easily found out there in packages So if you're gonna do channels what sockets through channels, this is something that you'll have to implement most of the time yourself But you could take inspiration on other on some code. There is out there now for everything that was not covered We left a lot of things without covering like tenant-specific admin perhaps or tenant-specific templates or perhaps how to integrate this into the worldview of other third-party applications. I Have the firm believe that principles are generally extensible So for everything we have covered here, you could extend and apply it to whatever was not specifically covered Now, let me give you some of the packages just in case you're Searching in the market for which package to use in order to implement multi-tenancy Notice that the packages are Opinionated by the database approach they use and these are the packages I consider to be the most prominent ones and you can find more in the multi-tenancy grid of Django packages dot org and that's it. This is how you fit a thousand Django's within. I hope I have Answered some general questions and I have clarified problem of multi-tenancy in Django If not, or if you have more questions feel free to hit me in Twitter github or email I'll be more than happy to continue the debate there Let me just say you So final remark that in times of global pandemic the new coal is wearing masks So please do it and enjoy the rest of your conference. Thank you very much Thanks a lot. Let me say it was a great talk