 Okay, so Lorenzo is joining us from Cuba. This is your first aeropitone Lorenzo? Yeah, my first aeropitone ever. Cool, welcome. It's also for me. It's also my first aeropitone. It's not my first python conference, but it's my first aeropitone. So thank you very much for presenting. A reminder for everyone. You can ask questions in the Q&A. And I will ask those questions. After the talk, this talk is 30 minutes. So Lorenzo, all yours. You're ready to go. Well, thank you so much and good afternoon or good night. Good morning. Thank you for joining me today in this journey to discover yet another package for multi-tenancy in Django. My name is Lorenzo Peña. That's my handle where you can find me pretty much everywhere. I have been a software developer for over a decade. And I have around 11 years of experience in Django with around three packages in the pipi registry. So I hope you're having a great time in this online version of your python 2020. And I hope as long as my connection behaves properly that you're going to enjoy this talk as well. So yet another package for multi-tenancy in Django. Let's play this title kind of backwards. And let's begin with Django. And firstly, let's get all on the same page in that 2020 has gotten a little bit out of control. Yeah. And with the raise of GPT-3, future is uncertain in some aspects. You know, probably we're thinking whether we're going to be keeping our jobs by the end of the year. But I honestly think we don't have to be worried too much, worried about it. Because we are Pythonistas and Django nuts. And we are in the world of Django, the web framework for perfectionists with deadlines. We just turned 15 years old, which is a great milestone, is mature, solid and battle tested. It has an amazing community and a great momentum and is getting stronger than ever with more async and reactivity as of late. So just in case you're wondering whether it's still a valid investment to be thinking in Django and multi-tenancy as of 2020, I think it is. And I think Django can perfectly handle the rest of the year and the upcoming decade and provide for you and for me and we're actually rooting for it. So multi-tenancy. What is exactly multi-tenancy? Let's play by an example. Suppose there is a customer red, which has a problem. And you develop a solution for that customer. You deploy the solution is working great, but now customers blue, green and yellow have exactly the same problem. So what to do? You have a solution that is already working for the customer red. There are two things you could do. You could just copy and paste that solution and provide multiple single tenant solutions. Or you could make the jump into a multi-tenant solution, which is single instance of the software that can provide for the needs of all your customers. That's exactly what multi-tenancy is. It's a software architecture in which a single instance of the software serves multiple tenants. And as an example, you can see Dropbox, Shopify, Slack and WordPress. So what are tenants? Well, tenants are the isolated spaces in which users with specific privileges interact. There is the accounts of Dropbox, workspaces at Slack, blogs in WordPress, servers at Discord, stores at Shopify, or sites slash communities as stack exchange. So all shaping good so far. Now, how do we actually get to implement multi-tenancy in Django? Well, one does not simply add it to the project. Why? Because if you want to implement it from scratch, there's a number of things to do and do right. And even if you're going to use any of the existing packages out there, it takes some knowledge to properly decide and determine which one is going to be suitable for your needs. So about this thing of packages, will you actually end up needing a package for multi-tenancy? Well, most likely, yes. And there are many of them. I made one of those. I contributed towards the entropy, which is kind of a fork actually with some conceptual changes. But the truth is that there is no one size fits all packages tend to be very opinionated in a number of architectural decisions that need to be taken. And so there is no silver bullet for us when selecting packages for multi-tenancy. So am I really going to give you yet another package for multi-tenancy in Django? Well, not exactly. Instead, we're going to take a look at the building blocks, the pieces that form the foundation of multi-tenancy in Django itself. And instead of taking a package-first approach, we're going to be taking a knowledge-first approach. We're going to be pretending that we are implementing multi-tenancy from scratch without actually doing. And I hope with this knowledge, you will be able to select, understand, debug to contribute back to just any existing package. So this is not just yet another package. This is the package to rule every other package. And if you think of it, it could actually be the ultimate package. So my dear audience, I give you the ultimate package for multi-tenancy in Django. I hope you're ready to get your hands happily dirty in the concepts and notions we are about to cover right away. And the first of these notions is the concept of the active tenant. Suppose we are inside Stack Exchange in some of the internal sites and we run this query, question, objects, all. Okay, but where are we expecting to get questions from? Because Stack Exchange is too big. It could be Stack Overflow, it could be Server Falls, Super User, maybe Area 51. So welcome to this new concept, the active tenant. And the idea is that one tenant has to be the active tenant and the framework needs to be aware that it's going to be operating in this scope, even in database access, URL reversing, admin side, cache, pretty much every part of the framework. But notice that this has to be in place even outside the request response cycle because we have things here like management commands and salary tasks where you don't have a request object to interrogate for the active tenant. Django has a couple of APIs currently that you're probably familiar with where there is this notion of the active something. It's the time zone and language and you probably know these couple of functions get current and activate where you don't have to be using a request at all. So we could take some inspiration in this and actually create a couple of functions, get current tenant and activate tenant so that we are able to enter the scope of a tenant by activating it and then retrieving the active tenant further on. This is a possible implementation of the active tenant notice we're using here ask your ref local, which is a drop in replacement from trading locals but without getting too deep. This is actually a global variable that is threat safe, but it's still a global variable and the use of global is generally discourage. Why, well, there is a reason why this pattern is actually so scar in Django code base itself. And the reason is that the more you depend on globals the more couple becomes your code the harder to test in isolation and therefore it's not super recommended and Django hasn't fallen in this pattern, except when it's absolutely necessary. So in this case I would consider that this would be a perfectly valid case for this, but it would be a pattern that is thrown upon otherwise so please don't go out of this that just creating globals for the sake of it. So there are two important questions about the active tenant and those are what is the type of tenant object when we get and set a tenant as the active tenant what type of object we're working with. And the other question is what happens if for some operation, there is just no active tenant, is that a buck. Is that a possible situation is that a wild card scenario in which you are actually hoping to around the operation in multiple tenants so these are all valid question but it's more of food for thought. So as we get settle this concept of the active tenant, there are three architectural choices that we need to make. And these three architectural choices are users and tenants database architecture and tenant routing. Regarding users users and tenants, there are three types of relationships between these. One is the type in which users exist outside the context of tenants, that is, you can have tenants you can have users there could be relationships between them. This kind of loose non strict an example or press Shopify and discord. Another type is the one in which users are bound and concentrate within the scope and the context of a tenant. The perfect example is lack where you cannot think of a user outside the concept of a workspace. And there's a third type in which users and tenants are pretty much the same thing. The example of this is Gmail Dropbox and the like. So which one to pick it will completely depend on your use case and how do you expect your users, your users and tenants to be interacting. The baseline question here is how many tenants do you expect a user to be related with the second day the second architectural choice to be making is the database architecture itself. And there are typically three approaches. One is the isolated approach in which in which you have multiple databases, one for tenant. The other one is the shared database approach in why in which you have a single database and a tenant column and on entry level tables. And the third one, which is kind of reason is the one in which you have one database but you use Postgres schemas to do the tenant to store your tenant information. So how, how would that be if you were to implement multi-tenancy with isolated databases? Well, Django is compatible with multi-database configuration, so there's nothing stopping you from having multiple databases there. The only thing you would have to do is that you would have to be switching between databases when accessing and storing your data. Here I am using a translation function from a tenant, from an active tenant into a database alias. And everything you would have to do is just using that alias for saving your objects or for filtering your queries, or even for generating already a scope manager so that all your subsequent queries are already in the context of that database alias. This could be somehow offloaded to database routers since we have this function to get the current tenant and we also have a translation function from a tenant into a database alias. We could be providing default values here so that we don't have to do the previous thing. The router makes it so that whenever you do a data access, this would be default value of a database, and you could still resort to doing manual specific scoping if you need to override the default behavior. The good thing about the isolated databases is that you are optimized for isolation. The bad is that there are no relationships across databases because Django doesn't allow it and that adding tenants requires reconfiguring the project. Why? Because your tenant catalog is actually living in your settings so every time you're adding a new tenant, you have to update your settings. The not so funny thing about the isolated databases approach is that as you scale, it's going to become quite expensive with operational costs. So unless you're planning to have a number of tenants in the lower 10s or unless you're planning to have Scrooge, Madoc as your billing manager, this is generally not recommended approach. But for the shared database approach is the one in which, remember, we have mixed records in a single database. So in this case, entry-level tenant-specific models will require a pointer to the tenant they belong to. And notice that I'm saying tenant-specific models because not every model needs to be tenant-specific. You could have models that are used to share information across tenants and notice that I'm also saying entry-level because models determine relationships between them. So you don't actually need to provide a pointer for each one of those as long as you're able to reach the tenant with a reasonable number of joins. And this is an example of a hypothetical question model following the stock exchange example where you're having a foreign key to the site which is kind of the tenant in this example. In this case, you still have to rely on the active tenant in order to complete the missing part of your queries because you will have to use that tenant in order to create your records if you haven't provided, if you have provided a tenant agnostic interface to the client. And you still have to filter by tenant, no matter if you're actually filtering a model that has the tenant pointer right away or if you're doing a number of joins in order to get to the tenant. You could do some of these automatically and you could try to assign the tenant automatically by means of using a default value for the field that holds the tenant which could be a callable. And in this case, if appropriate, could be get current tenant or you could have a custom field with a pre-safe hook. If you're using a foreign key, you could even subclass that a field. Finally, you could resort to having a signal on relevant models in order to complete the model prior to saving it. As for querying, well, you could use custom managers and custom queries in order to pre-filter by your active tenant and therefore the subsequent queries account that these are already filtered. However, there could be some annotations on sub queries where these customization probably doesn't work. So you still have to be open to be doing manual scoping from time to time. The good thing about shared databases is that you are optimized for scalability because adding tenants is just a matter of adding rows into your tables. The bad is that your data isolation will take extra effort on the development side. And the not so funny thing about the shared databases is that it's very easy to just forget to scope any specific query by the tenant. And therefore, if you don't want to wake up in the middle of an eye with your brain wondering whether you have filtered by the active tenant or not, my recommendation is that you bookmark all your tenant scope queries that you make automated tests for each one of them. And you make sure that they are returning results in the scope of the active tenant. And even that you take a step further and you make an automated test to test that you tested each one of them. So it's kind of a riddle, but it's going to save you at some point. And the other that tests are the softest pillow you could have in general for software development, but specifically if you're going to share database approach for multi-tenancy. As for semi-isolated databases approach, this is going to rely in possible schemas in order to isolate the tenant within a single database. So what are schemas? Well, this is a specific postgres concept. They are a layer of abstraction between databases and tables. They are not thought of as name spaces, but the good thing about these name spaces is that they are not mutually exclusive. So you can organize them on fine tables in many of them by means of properly configuring your search path, which is also another postgres specific concept. In this case, your queries remain practically unchanged. This seems like magic, but you're going to pay the price with an increased technical challenge somewhere else. This is where you will have to do two major things or it's required to major things in order to get this approach working. First, we're going to require a custom database backend based in the postgres backend or any other backend that is postgres friendly. And this backend will have to be able to convert the active tenant into a sequence of schemas and then you will have to run a query to activate those schemas by means of setting the search path. The other interesting challenge, and this is kind of the biggest challenge, is that we have to teach Django to do migrations again. Why? Because Django doesn't know anything about schemas. Django knows about databases and Django, you can pass a database in order to run your migrations, but when it comes to schemas, that is not part of the world view of Django itself. So the idea here is that we are also using a database router in order to use the allow migrate hook in order to tell Django whether or not it's legal to migrate a specific model based on the active tenant. And this also requires the migrate command itself to be kind of tweaked because the migrate need to operate at the schema level. The good thing about semi isolated database approach is that you're optimized for isolation. And at the same time, you're having increased scalability because since you have a single database, you can scale faster by means of adding schemas. The bad is that it's going to take extra effort to understand and control how schemas interact. The not so funny part about the semi isolated database approach is that since you're able to scale faster, you're going to be hitting the thousands of tenants very soon and therefore thousands of schemas and you still have to run your migrations on each one of those schemas. So you're going to be paying the price with an increased time to run your migration. So please be advice about this, which one is the best. Well, neither is it why because there are pros and cons in each one of them. And it will depend on your specific use case and each one of those approaches is capable of shining in their specific use case anyways, if you want to engage in a respectful ice cream fight. You can hit me in the breakout and we can continue the debate there. So your architectural choice is tenant routing that is how do you expect to take an incoming request and generate an active tenant out of it. So it's generally possible to do that because your tenant is somehow encoded encoded in the incoming request by means of the user session or headers, or it could be even better if your tenant can be inferred from the URL itself. You can do the translation from the domain subfolder or query parameter. So where is the perfect place to do this translation from an incoming request into an active tenant but that places the middleware. It's a perfect place to do it, because you're capable of taking the request in common requests just activated 10 and therefore the rest of your request and respond cycle for that request is going to be guaranteed to be in the scope of that active tenant. So the implementation of that middleware of one of the middleware. And in this, in this case, this middleware is translating from a session into into a tenant. Notice that the major complexity here is actually creating the translation function like to take something from the session and convert it into a tenant to activate because the rest is pretty much checking if there is no active tenant already, and then activated. So one of these is that you only need to provide different translation functions for different retrieval methods and you could have similar functions for users headers domain to folder parameters. And you could still chain one middleware after the other if you want to combine the power the power of these retrieval methods. So you only have to take care that the order will determine the the precedence of those retrieval methods. So what if you want the opposite. What if you are already in the, in the scope of a tenant and you just want to generate a canonical URL that is capable to give you I mean if you share that URL. It can take, take you not only to a specific path within your URLs but also to a specific time. It's not going to be possible in some cases because I mean user sessions and Heather are not easily encodable as part of the URL. But you can totally do that if you're inferring your tenants from domains of folder or query parameter because even though Django only reverses the path part of the URL, you can still prevent interpolate or append the tenant in order to allow the canonical URL to land in a specific path of a specific tenant. And this of course will require a little tweaking in the URL reversing process as a bonus. I also tell you that as I also give you that it's possible to provide custom your outcomes based on the active tenant so if you have different types of tenants. And you want to provide different different your your else for those tenants, you could also provide this translation function. In order to convert a tenant into a URL comes module Django already provides this hook where you can just assign the your outcome module into this variable in the request, and you will be using that different module. You have to make major architectural choices you have to make. And you'll see there are multiple choices for each one of those some combinations make sense play nice together while others don't make much sense at all. So it's going to be up to you to determine whether or not things are playing nice for you, but there is more to Django than just those three parts and those those three architectural choices. Let's take a quick look at the scope of everything else and while this list is not going to be comprehensive. Let's please at least see one of the five major places where this dynamic of the active tenant is going to be an important thing. So management commands, it's going to be very valuable to be able to run management commands in the scope of a specific tenant. You management commands, you could just include a tenant argument so that this command is capable of first activating the tenant. And then performing the, the operation that the management command is expected to do. Now, for existing non tenant award commands. I mean, these commands are basically everything that chips with Django and third party application. So for this case, you will have to define a special command wrapper that basically takes the tenant as an argument but also a neater command to be running. And this wrapper it should be in charge of activating the tenant and then calling the inner command. You will find some packages existing out there that even take a step further in elegance and combine the arguments of both commands. But you know this is kind of trickier to implement but it's possible completely possible to do. As for file storage is also going to be extremely valuable to be able to somehow organize the files by tenant. And in this case, you can totally define a custom file storage in order to perhaps prepend your else with a string representation of your tenant, etc. Notice that I'm not providing any code example here because there are multiple types of file storage depending on your backend. It could be file system or it could be something in the cloud. So this actually takes a little bit of more specific use case thinking. Now in higher security context where you cannot afford one tenant accessing files in alert tenant, there are two kind of workarounds you can do in order to increase the security level. So one is generating pre-signed URLs so that your static or media URLs are a short leave and therefore it makes it's harder to visit tenant specific URLs for files. And the other one is actually using a proxy view so that the view acts as a middleman and determines whether or not the incoming request has security clearance to access the specific file storage. And then there are some interesting packages that are doing most part of these themselves. So this is more like the underlying concept. As for cash, well as a buck, the cash that comes with Django is tenant agnostic. So if two tenants are using the same cash key, there will be a clash. And most dangerously, you could be leaking information by means of one tenant storing some data in the cache and then other tenants retrieving that. That's a leak. So one thing you could do is actually generate a special key function that you could then use as part of your cash configuration and that key function. The only things that need to do is just augment the string representation of the key itself with the current tenant. For salary task, well, it's more it's more matter of discipline. The idea is that you pass an identifier of your tenant so that the first thing you do in your tenant specific salary task is to activate the tenant, and then you can resort to doing the rest of the task itself. And finally, for channels. Well, you're going to require a separate middleware for your web sockets, whatever you do for your regular request you're going to be doing, you'll have to be doing that also for your middleware for your channels, because they are not fully compatible. And if you want to activate tenants from the incoming scope, you're going to be kind of duplicating your middleware because they are not compatible. And if you're using channel layers, you will have to also do a similar approach like we did with cash with your consumer groups name, because if you use a tenant agnostic name you will also end up leaking messages between the groups of multiple tenants. So for everything else that was was not covered like admin side or template and specific templates, I am certain that the principles here are generally extensible. So I'm almost sure that you will be able to extend the principles in order to cover pretty much everywhere in the framework otherwise feel free to continue discussion in the breakout. Now, some of the packages that actually implement multi tendency and help you in generating multi tendency with your project, which you could choose from here on this plane for packages notice that they are kind of opinionated based on the database approach. These packages were taken from the multi tendency greed of Django packages.org which I consider to be the market for Django packages. Please visit there, you could find more interesting things there just make sure you see whether or not the package is Python three compatible whether or not it's production ready and even if it's actively maintained because otherwise you might end up in a very difficult situation. If you want to contribute back to any of those packages, the ones I mentioned and the ones I didn't I by means of reporting box implementing new features or even improving documentation. I will dare to speak on behalf of everybody of all the package maintainers and tell you please come. Thank you. You're more than welcome to do so it could be a gold opportunity for you to put in practice everything you may have learned here, or everything you have learned in your own experience and progress through multi tendency in Django. So that's it. We can keep in touch for more. And I would like to give a special thanks to Russel Keith McGee Orlando William and Rafael Michel, which were of a huge health in the preparation of this talk. And lastly, thank you my dear audience for joining me today in this journey. Thank you.