 Hi, thank you. Everybody can hear me? So, first of all, this one will be a bit more technical than the previous one where more theoretical or philosophical, so I hope it doesn't bore you too much. I'll go a bit through code and stuff like that all in the presentation. I want code here. Again, ask a question whenever you want, no problem. I know I'm not very famous, so a bit about myself. I've been in computers since forever. I owned my first computer in 81, so probably before some of you were born. I went through every technology you can imagine from mainframe, PCs, everything, and about three or four years ago I ended up with Ruby after doing nine years of Windows and C++, and it was the best change I ever did, no more Windows. I'm a firm believer in best tool for the job, so I won't tell you that Ruby is the best language to write an operating system in or the space shuttle control system, but I do think that it has a lot of things that it can do and web applications, stuff that relates to that, it's really the best thing I found to work with. What I'm going to talk about is about multi-tenant applications. How many of you had to write multi-tenant applications? Okay, quite a few, but I see that a lot didn't yet. So what is a multi-tenant application? It's an application that supports multiple accounts or multiple clients on one system, so I run one application and I have multiple companies. For example, Basecamp is a multi-tenant application. We don't each get our own server on application, it's all shared. It requires a few things to support it, and one is data separation, which is very critical. I don't want you to see my Basecamp clients or my bugs in a lighthouse. It requires customization to be able to adjust it to each client, so each client can have a few different things, and it was designed to optimize resources because if you think about it, I could take and run one server, one application for each of my clients, but I do it all in one so that I can deploy only one set of codes, I can fix the bug once, and I don't have to manage 500 servers because I have 500 customers. So that's the main idea of why we want it. When we don't want to use it, it's on applications that, first of all, things like, you know, simple website, who cares, you know, my company public website, who cares if it's multi-tenant or not. Social networks or any other applications that share a lot of data, the main idea of the application is sharing data between people. We don't want to separate the data if we want to share the data. So those kind of applications that are mostly public, let's say a newspaper application, doesn't need to be multi-tenant. What I'll do now is we'll go through the views of nothing to do with multi-tenant mostly. They just showed the data. The controllers and the models are the important things that we need to manage in a multi-tenant application, and I'll go first over the controller and then over the models of how we do multi-tenant application. What are the options? Like I said before, I believe in many options and you need to choose the best one. I'll go over three or four options for each, and the pros and the cons, and then whenever you need to use it, you can decide which one is better for you. So first of all, the controllers. One of the very basic things that a controller needs to do is to recognize the account. Okay, who is the person that is logging in now? Do they belong to IBM or to HP? Who are they coming from because we need to show the correct data? There are three main ways to do it. One is domain-based, which is what Basecom uses, many, many other applications, and Fadol uses it. Where we have the name of the account basecamp.com, the name of the account.app.com. The second way is URL-based, where everyone key off the same URL, but they have a different slash, the name of the company, and then the rest of the URL. So it will be, let's say, slash-john-slash-contacts-one. And the last one is login-based, which is what I think the best example will be the Google applications, where, I don't know if any one of you set a domain for Google. If you send, you know, more.ph on Google, you go to more.ph, but then it becomes Google, and just by your logins, they know who you are and they separate it by the login. So the URL has no difference. It can be cookie-based, it can be a session ID on the URL, but the URL itself is the same URL. After we recognize the account, we need to set the context so that the model works with the correct context. It can be current user, it can be current account, doesn't matter so much. Usually you go from the user to the account unless you do domain based on the account, the account based on the domain. And then all your data access go through the current user. So the current user contacts, current user context, and all of that. So how do you recognize an account from the domain? Funny because when I was writing the presentation, DHH posted something on the 37 signal blog exactly about that, and it's actually on GitHub, and you can see the demo. So what we do when we go into the application controller, we add a before filter. We set the current account. And if you go here, if you see the set current account here, what we do is find by the subdomain. Subdomain in Rails, you know, return everything but the base domain. So if you have Google.com, it will return everything else, aside from Google.com. So we take the subdomains and actually we take the last one. So if someone typed www.john.app.com, we'll take the John and all the www. Okay? So here we just find the account. Now we have the account, the current account set based on the domain. And if we go, for example, for the customer's controller, you see that we get the customer by getting the current account customers. Okay, so everything is that we, every place we access data, we access it from the current account. The current account is the key for everything and everything hangs off the current account. We'll see later in the model what we do in the model to make it work, though it's pretty standard Rails code. If you want to recognize the account from the URL, which we say app.com slash John, again, there are a few ways to do it. The easiest way is using the past prefix, which is new I think from Rails 2.2, but maybe 2.1. So past prefix, let us prefix the old path with something else, so it can be the account name and if we do it like that with a symbol like thing, we can actually match it in the routes just like any other part. So it becomes a part of the route and the name is used there. Another way is to use resourceful, one of the similar plugins, is that we let you actually extract the URL. By the way, it's very popular to do it in a localization where you add the local name in front of the rest of the URL. So you do en slash whatever or es slash whatever. It's not as clean as the others because you have to think about it. You have to put it in your route, you have to take care of, you know, unmongling the URL. Another way to do it is at the server level do the replacement. So the server does redirect from the slash John to John.app.com and pass this into Rails, which is another way to do it. Again, it's not as clean as the others. I personally prefer not to use it this way, but if you need to, you can still do it using those tricks. Recognize the account from Login. So when the user log in, we know what the account is. There's no different URL for different companies or different accounts. And the user tracks as an example. The reason is I use this and you know, also the other thing from Git is that after the presentation you can go there, look at it and really see code working in real life. So it's all from open source stuff so you can play with it yourself. Tracks uses the current user as the anchor because tracks is multi-tenant but for specific users. So it's not per company. There are multiple users per company. There is one user per account. The user is setting the session so whenever they have to come back, the browser from the session we get the user and everything hangs out of this user. Again, the user could be an account or we can take the account at the user and from the user the account. Here is the way it's done there. I cut a bit of the code so you can read it, I hope. So we have the Login from Cookie in the authentication system so you log in from the Cookie or do the regular Login. Then we have the getCurrentUser which is find the user and return the preference and the user and again here the preference could be account so accounts equal the user.account and you have the access to the account and we set the currentUser so from now on after the Login which set here the setCurrentUser we have the currentUser set and it works with a global function so anywhere we call it we can get the currentUser. In the application controller we use the regular Login system we have access to the currentUser and to the preferences and we added the Login required which dual the Login set the currentUser and the preferences and in the context controller for example we just do context equal currentUser.context so the currentUser is a global function where and it actually access this internal variable which is currentUser very similar to the other way only we have to do the Login to know where the user come from and other than that is the same. Which one is better? There is no difference it's all aesthetics some places like to have their own URL john.app.com and not go to Google. The reason Google did the thing where you can redirect your domain to gai.com if you look at it once you log in you are at Google at gmail.com and not at your own thing but it's a way for you to you want to send your clients to gai.app.com there's no performance difference between them they're all exactly the same domain base the one where we say john.app.com is the easiest to partition at the web server level so if you want to have multiple server multiple app servers it's very very easy to partition it like that because you can send each one to a different application server actually have multiple DNS pointing to different front end balancers so it's a bit easier to scale it one thing to note is that for SSL access if you do domain base you need wildcard certificate you cannot use regular SSL certificates because of the way web servers work they arrive at the same server and SSL negotiation is done before we know the server name from the header and that's why you need wildcard access wildcard access certificate don't have these nice green things they now have in the new browsers because they are not specific to a domain and an IP second part we have to do is the models what do we do at the models for multi tenancy actually I would argue more importance of the controllers because this is where we do the data separation if we don't have the data separation we felt completely at what we're doing and we'll talk at three different models here one is separate DB where each of our client has their own database the second is schema separation which is the less known of the method here and I'll discuss it in more depth schema separation is actually built on a feature of Postgres where you can have different schema in a database and completely separate the data it's not schema what we call schema the DDL of the language it's actually a namespace in the database we'll go over it and the last one is scoped access which is what we show in the current user and stuff like that in this mode we hang everything out of a user and if you think about at the SQL level every single query you do at the end is account ID equal whatever and account ID equal whatever so it's actually separated by a specific ID of the account separate database the pros of it is the ultimate in data separation there's no no chance of mixing the data because they are completely different databases they don't have to be a different data server they can be the same data server but a completely different database it's by design share nothing because you have a database for each client nothing is shared which means it's easier to vertically scale it to do whatever you want with it there is a very high degree of failure localization if one database is corrupted or something happened it impacts a single user all of the user are not impacted by the fact that my database is destroyed so the failure is very very localized to a specific account it's very easy to do customizations on it because each one is a different database do whatever you want with the database when you connect to the database you get the new DDL or whatever and you can do whatever you want with it as long as your code can live with the different structure there's no problem in Rails it's pretty easy to do because mostly the models don't care the internal structure the cons of separateDB is huge resource overhead if I have to create a database for every single client we're talking about a huge overhead it's harder to create a new account I have to go create a new database of the server meaning I need super user access to create the database and do all the define the connection to the database and reload all the data it's very hard to cache because everything is different we cannot even even the cache that we use in Rails internally the small model cache cannot do almost nothing when the connection is different and we have to reconnect and I see something is missing but there is a lot of overhead in the connection both of the protocol level I don't know if you are aware but TCP connection handshaking is a three phase process I take comparatively to sending data a lot of time and there is also all the servers of most database server of a high overhead in creating a connection because they have to check security where the IP is coming from and all the other things to set up the session so there is a very high level of of overhead in establishing new connections all the time schema separation pros there is strong data separation it's mostly share nothing also there is some sharing in it and we'll discuss it it's almost transparent to Rails we have to do one thing to make it work completely transparently because of the data separation it gives us independent account migration so we can migrate one account by leaving the other we have a smaller amount of data to migrate when we need to migrate it's very easy to take a legacy of a connection that was never designed to be made into multi-tenant application and make it multi-tenant the con is that there is some resource overhead so for example I wouldn't want a million accounts on one database in this technique and we'll see later why but it's not as bad as a database the connection is the same connection we're not doing multiple connections here it's a bit harder to create a new account because again you have to generate the table definition itself in each new schema you create there is a way to make it automatic I prefer to just create it using SQL just dump the SQL in and create a new account it need modified migration the regular Rails migration will not work with this way you need I once did something called a schema iteration migration it's a migration that iterates over the schema and change them one by one so it's not hard to do but it doesn't use the regular migrations scoped access the pros is very low resource overhead just another where on the query it's very natural to Rails especially with scoping today and with 2.3 with the default scope it's even easier very easy to create new account you just create an account in the table you don't have to do anything in the data it's very easy to do data aggregation because if I want to count how many use, how many contacts I have all together in the system I do one query and I get the result if it's distributed over database I have to connect to each different database and get it and if it's in schema separation I have to actually iterate over the schema and find each of it so here the data aggregation is very simple the cons is that it's a very weak data separation model one mistake from one of the developer forgot to put the scope forgot to start from the account did something and we can expose other people data in our system it's hard to migrate large data set imagine changing the structure of a table with 50 million records not a nice thing to see especially in databases that do not support acid over the DDL like my SQL if something fails in the middle of the migration it might be stuck with a half done and half not done migration Postgres doesn't have this problem especially since 2.2 where they use transaction on the migration so if it fails it rolls back to the original status it requires sharding or partitioning if you want to scale so at one stage the data will be too big we will have way too much data in one table and we'll need to partition it and it's harder to partition in this way it's not impossible but it's a lot harder you need to do a lot of logic behind the partitioning deciding who goes where how to move people from place to place and how to continue sharing the data because it wasn't designed to share between different databases separateDB I won't get into the details of how to do it because first of all it's very much against the grain in rails to do it in rails it's hard you need to one of the trick is using the magic models but even that is not built to do 1,000 clients 1,000 clients 1,000 databases it's very iffy if it will work correctly it's a better model if you do per client MSP what used to be called in the past managed service providing where you actually install an application for the user each one gets their own server pay the money it's their server and it's not really together how we do scoped access scoped access is actually very very easy let's say we take the class user it has many context with all the with all the regular whatever we want to do positioning and all the rules but as you see it has many so by default when I access a user and I do user.context I will get it already separated by the user because it belongs to the user and it will automatically add the user ID equal 5 we have here the belong to account so actually I can extract the account from the user and everything is good there's nothing else that I need to do as long as I always access through the associations in Rails if I try to do funky things outside of that so all kind of queries, direct queries stuff like that that doesn't go through here or I go to the context directly and not through the user I might expose data so this need to be very careful with it need to follow the direction all the time now we go over schema separation and because this is a bit it's a part that most people are not aware of how it works I will go here a bit in more detail if you have questions just ask and I'll use Postgres as an example so I think that Oracle can also do the same thing and probably DB2 but let's stick to open source what is schema in in Postgres a schema in Postgres is like a directory or if you want to do it in Rails terms it's like a namespace so you can have in the same you can have in the same database the same logical database you can have multiple schemas that contain exactly the same file so I can have the users file in five different schemas on the same database so they're not called users they're not called anything else they're called user but they're completely separated because we have a namespace on top of it which is the schema so I can say the schema and if I want to access it we'll see in a second how we access them separately from each other we can access them directly with a qualification so we say the schema name dot table name just like object oriented the table name but the cool thing about Postgres is that it allows us to define a search path and a search path is very similar to an operating system search path so let's say you can have the same file ls in five different directory but there's a path that define where the system looks the first one it find is the one it uses same thing with schemas if I define a search path and the system goes over the search path wherever it finds the first one it finds in the search path is the one it will use so let's see here an example so we start here we create one schema and create a second so AAC user AAC user B then we create a table called test one with a field F1 it will create it in what's called the public schema unless someone remove it but by default there's a public schema where everything by default goes in now we go and I create under AAC user one test one F1 fully qualified and it created another table called test one but if you try without schema it will say table already exist right because we have a table test one in this case nothing happens because it's qualified by the name if I set the search path to AAC user B and now I create another table test one it's exactly like I created a table like here but I didn't have to specify anything so I just did AAC user B in the search path I created a test it created it here because this is the first schema on the search path by the way if AAC user B already created a test one it will fail with table already exist ok now a bit more of this same magic as let's do the rest ok so we started by resetting the search path so it will be like the server just started and we insert into test one we insert two values those values are actually inserted into public test one and we do the same thing that we qualified with the schema AAC user A those two values user A1 and user A2 were actually inserted into user account user A AAC user A test one now we set the search path to AAC user B and we insert again into test one look at this and this is exactly the same SQL no difference in the same sentence but those two values will actually go into AAC user B test one and then I think because they are all in the same database we can actually query across, query across schema so if you look at it here we did select star from test one union select star from AAC user A test one and we got all of the records out ok so we can do some cross schema work and we can share some of the data but it's very easy to completely separate the data this is all the change we need to do in Rails to actually work with this so first of all it's just a small helper that I wrote we actually can do without it but it's easier with it called schema utils just a module and what it does it add a schema to the path so I give it a schema and it added to the path by executing search path to this now this will be a calling to the server but it's actually only set for the current session so I can have multiple people connecting to the server each having a different search path ok it's unrelated to the specific connection and not to the whole server itself and this is the same just reset the path back to the default in the Rails login Rails database and in the application controller all I do is I add a before filter set account and after filter clear account by the way I don't really have to do it but I prefer to do it to catch errors if there are errors all set account does is add the schema to the path by finding the subdomain you remember the same calls that we did this subdomain before we find the account we find the db schema of the account and add it to the schema path and after we finish the request we actually reset the schema path the only reason we do it here is that if there is any error there's no schema defined and nobody will see my own data because there's no schema defined it will go to public it doesn't know about my schema if you look in the database and you ask what table there are if you didn't set them in the search path there are no tables to say there are no tables in the database so this is the way we do it on the schema separation a few things and gotchas and tips for those things first of all multi-threading is dangerous in this way if you try to do schema separation with multi-threading on a connection pool it doesn't work I mean it worked but you left bugs because if one set the schema the search path and then another connection came in and said the search path on the same connection we now have a different pass but aside from that I personally recommend you not to do multi-threading anyway like I said before I did windows C++ before and I did a lot of multi-threading programming the oddest, hardest to find bugs ever are multi-threading errors and bugs and by the way testing, automatic testing doesn't catch most of those because they are very timing related relating to million other things exactly when the processor left your thread and went to another thread be careful with it and usually you can scale and do things without the multi-threading migration must be schema aware so if you do this way you need to make your migration schema aware just means that before you do the migration you have to set the schema as well and if you have multiple schemas you just have to iterate over all the schemas and run the migration for each of the schemas that you have site creation is best with a SQL file you know because you need to create it we just what we do when we change the schema we dump a file with the system and then we just load the SQL back into the new schema and it create it and other things that relates it to everything not to this anything that works on recognizing by account is you need to be aware of the TLD because the Rails way of doing it it takes into account when you do the subdomain in Rails it take into account only two parts in the TLD in the top 11 domain so it's only .com so google.com will be the domain and the rest will be the subdomain but if you have google.co.uk you'll get c.o.uk as the domain and google will be part of the subdomain so whenever you do something like that be careful with it and either pass in the you know the size of the domain the size of the TLD or just you know plan for it beforehand and one thing about caching if you do caching you need to include the account again in all of them because you don't want to return contact one of one user as contact one of another user the account must account a user must be taking into account when caching always remember this you cannot cache regularly because you will again mix data and that's the end and I'm open for questions yes what database were you using there Postgres yeah this works in Postgres so any other systems that support schemas or some other way of namespacing it doesn't work on my SQL yes I just want to take the last one I'm using schemas for like if you haven't heard it before let your mind wander and one thing is that first of all I will tell you use Postgres for everything that you do and not my SQL but if you use it and you want to learn the Postgres documentation is very very good the schema documentation is really good all of it even just for learning SQL is a very very good place to do is in the Postgres documentation and the Postgres mailing list and forums are very very very good the actual all the developers of Postgres answer questions of noobies in the mailing list more questions yes so that the schema separation that got in the box shoot me an email and I'll show you it's really really easy there's almost nothing to it and again in Postgres it's nice because they are transaction aware so if the schema if the migration didn't complete it will roll back to the original without leaving you with the migration done the public schema affects all the actually it doesn't affect there's no difference between the public and the rest aside from the fact that by default it's there I know of people that actually delete it delete this it's just as a regular schema but if you do a regular Rails application and you run the migration it will create everything in public unless you did something not to do it one of the tricks of doing with this is that usually the user or the accounts will sit in the public schema so whenever you go to the to do the the initial part of the login and all that you do it on the central schema on the public and then you go and work on the others if the user file appear or the account file appear only in the public even if you set the search bus as long as public is there you will see it so Rails will see it it doesn't even know it's in a different schema. Rails is completely unaware of it that's why it allows us to do it because once we set it at the level of the database server Rails is completely unaware of what's going on below it and it will let you do whatever you want and that's why I said in the beginning it's easy to take an existing application let's say you have some contact managers that you want to make it multi-tenant by using schema separation you don't have to write to change a single thing in your code even if you didn't do scoping the user and all that just take it put it on a different schema and it's multi-tenant other questions? okay thank you