 Alright, I'm going to go ahead and get started. So today's talk is going to be talking about data sharding, specifically how to develop scalable data applications with Drupal. So everybody can see me. My name is Toby Hagler. I'm a senior web developer at Phase 2 Technology. It's a Drupal shop based out of, just outside of DC. So one thing I do want to remind everybody, the official Drupal calling London party, the Batman live world arena tour, the buses were leaving outside of Fairfield Halls here sharply at four o'clock. So I'll make this fast, that way everybody can get to the bus. So what I'm going to go over today, I'm going to talk about, first of all, reasons that you may have for sharding data. I'm going to talk about problems that you might have. I'm going to go over a couple of use cases that are fairly generic enough that maybe they can apply to you. I'm going to actually talk about the types of scaling and sharding. I'm going to kind of give you the how, the what, of horizontal and vertical partitioning and federation. And kind of give you some of the options that you have while sharding data in Drupal. So first and foremost, you're here because you want to talk about sharding specifically for scale. So I want to talk a little bit about the differences between horizontal and vertical scaling. And just by show of hands, is anyone here familiar with the difference between horizontal and vertical? OK, good. Good number of folks. So the short answer is horizontal scale. You're going out horizontally. It's really easy to just add more machines. Think of a load balanced environment. You've got a load balancer and three web machines behind it. You need more horsepower. You can just add a fourth and a fifth and an nth web server behind it. Vertical scale is just making the machine bigger and stronger, adding more memory, adding more capacity, adding more resources to the same machine. The downside is with horizontal scale, you face costs. It's more costly to create more machines, to instantiate more cloud servers with vertical scaling. No matter how much memory, how many more CPUs you put in that machine, you're going to hit a wall at some point. So sharding helps with that. So I'm going to talk a little bit about sharding, what it is. Specifically the two main types of sharding, which is partitioning and federation. That's pretty much synonymous with horizontal and vertical sharding, respectively. I'll talk about how it helps and the differences you have to think about when you're dealing with normally a monolithic Drupal database. What exactly is sharding? Simply put, sharding is just breaking a whole thing into shards, into smaller pieces. Why would you do that? Well, smaller pieces, they're more manageable. You can divvy them up into physical databases, into separate databases. The real trick is putting everything back together seamlessly. So what are some of the reasons that you might have for sharding? So obviously sharding to scale your application is really the first thing people think about when you think about sharding. You might shard because you're sharing data between multiple applications. One of the use cases I'll talk about in a minute is dealing with resumes. You're taking resumes from your website, you're taking that applicant data, and you want to put that in a second database because your HR department might need access to that same database, but does not need access to the database your website is still on. And so it helps federate your data in a lot of ways. And then that way it also lets you leverage other technologies. Your website does not have to be just Drupal, your website can be Drupal for the content management system, Node.js for building AJAX applications. You might use other storage technologies like MongoDB, which I will talk about later on. Those are some of the reasons that you might shard. So how does sharding actually help? Well, basically in a one-to-one with a slash slide, sharding is going to help scale your applications because it lets you reduce the amount of data that's stored in one place. So one particular type of sharding, which is the most difficult, horizontal sharding, actually lets you split every other row out of the same tables and put those into physical databases, which is great because your data set is smaller, which implicitly means that your index sizes are smaller. It means that replication is going to be faster, and that's how that helps with performance. So in the previous slide, the reasons for sharding, sharding for shared application data, so how does sharding help with that? Well, you can take secure sensitive data like that HR data, personally identifiable information taken from resumes, and isolate it somewhere else so that you can have your website, and you kind of have to play it loosey-goosey with certain permissions in order to have that content readily available on your website. But if you go through expensive HTTPS layers to accept that data, you might have a different transit route for storing the sensitive data in a more secure place. Also, it just helps you segregate your data into manageable chunks. That's how you can kind of leverage different technologies. I'm not really going to go into Node.js and the things you can do with it, but a lot of people are interested in playing around with that. You might have a Python or a Ruby application that you need to share data with. So sharding your data helps you segregate that out and helps scale your applications. So before we get into sharding specifically for performance gains, I want to make sure that we've covered everything else that you want to do for performance. So first of all, make sure that you're using memcache D, which if you think about it really is data sharding. You're splitting the data that normally went into cache tables in MySQL. You're sharding that and taking that and putting into dedicated cache servers. The same may be actually be true with the Boost module, you're kind of sharding some of the responsibility of displaying pages from Drupal to Apache by creating hard files with the Boost module. It kind of alluded to load balance web servers earlier. MySQL master slave replication, Drupal 7 is really good about letting you mark certain queries as being slave safe so that they will only go to the slave freeing your master only for rights or hopefully only for rights that help spread the load to your data. And one thing people always forget about is also once you've created a view to display page, to display blocks, sometimes it's a good idea for performance sake to take the most complicated views. If you turn on the develop module, it will show you which queries take the longest and turn those into custom queries as optimized as you can get them. So you want to do all these things before you really consider sharding your data, assuming that accessing the data is actually your bottleneck in your site's performance. So those are the highlights, the four or five things that people always tell you to do to help scale your site, there's a lot more that you can try. So for one thing, you can add more memory to your database server, add more CPUs to your web servers, remember memory is cheap, DBAs are expensive, so sometimes it helps to just throw more memory at the problem and maybe it'll go away for a little while. This is a perfect example of vertical scalability. Things like moving all your HD access into the VHOS config so that Apache is not having to parse that every time, Apache tunes, Moscow tunes, you can go on and on. Evaluate whether or not you need all of those PHP libraries, I mean, show of hands who actually needs the PDF libraries that come with most installations of PHP. Maybe 10% of us in the room have really had to do anything with PDFlib. So you can recompile PHP, reconfigure PHP to be as bare bones as possible to help with performance. So after you've tried all of these things, once you've done the 90% of the easy work, we're gonna close the easy work to make your site scale, your web environment is probably gonna look something like this. So you've got your load balancer, you have in number of web servers doing the horsepower, crunching the PHP, displaying the content to the users, you may have a varnish server in between load balancer, that may be your load balancer, but you might have a hardware load balancer, you may have varnish, you may have a CDN that's involved in here somewhere, kind of between the internet and your load balancer. You've got cache servers, you get multiple cache servers because again memory is cheap. You've got master slave replication creating database clusters. So this is a pretty typical well-balanced environment for serving high-traffic Drupal websites. So after you've done all that and you still think you have performance problems related to your data, getting that data out, you have complicated data that you need to do joins and things that get tied up in the database that becomes a bottleneck, then we want to talk about sharding. So what are the types of sharding? Two primary approaches to sharding your data, partitioning in federation. So partitioning is an example of that horizontal scale that we talked about earlier where you can just keep adding databases horizontally next to each other. These are basically in a sibling relationship with each other. You're going to take a table, for instance the node table, everybody is familiar with it. To horizontally partition that table what you do is all the even NIDs go to the first database, all the odd NIDs go to the second. So your database is now cut in half and so each table now has half the index size, half the data size and roughly half of the overhead involved in getting that data out. That sounds like a good approach but it's really the hardest thing to do. The benefits are great though. You have much smaller index, you have much faster queries for the same amount of data. On the other hand, federation literally means just creating a set of things. It's a little bit more conceptually logical. So that means you're using logical divisions like geography. So you have a list of users, people who have submitted their resumes. You might break them into geographical regions. North America Europe, Asia, something that's conceptually logical rather than every other record. So it's much easier to deal with federated sets. It makes it a lot easier to break those up across multiple physical databases so that you're not having to horizontally scale your master's slave replication. So also the difference is partitioning data. You do have problems if one database goes down. You've lost half your data. The other thing with federating your data is since the data tends to be very discreet and atomic, if that goes down, your website is still up. You just don't have access to that particular thing at a time. So people can read about your company. They just can't necessarily apply for a job right now. So this is the hard one. We'll just kind of get it out of the way because honestly, if you have to do some horizontal partitioning, there may be some other problems with the application that you're actually developing, whether it's Drupal or with anything else. But a discussion on sharding is not complete until you've talked about horizontal partitioning. So scaling your application performance, this definitely will help querying data out of your database, make it faster because you're dealing with smaller indexes and it's much more, it's leaner. So you have a distributed data load. You can run into fewer resource contentions. When you're making queries to your database, you're making half the queries, essentially. Well, half plus one, to be honest. So quite honestly, horizontal partitioning is the shard of last resort. So it's important to note that this is not the same as master-master replication. So if you use MySQL's MMM, for instance, that's great. That really does help a lot. And that's actually still just vertical scaling because you can only do that so long before you run into replication contention. In horizontal partitioning, your rows are going to be divided among physical databases, which is great. The more you can spread that out over different physical databases, the better it's going to perform for you. And if you can do that indefinitely, that's perfect horizontal scale. The downside is, and this is especially true when you're dealing with this in Drupal, it does require custom database APIs. So I'll show you in a second how complicated this can get quickly. But you can see that I was running out of room to describe some of the caveats to dealing with even in odd partitions. But essentially, the way this is achieved in MySQL just so everybody knows, is in every single table definition that has an auto increment value, you can auto increment that value by some other number than one. So every every node that gets inserted into the node table is auto incremented off of the last one. So you can auto increment that with a different value, you can auto increment it by two or by three, and then have a different auto increment offset. So that's how you could achieve even an odd table. And then you're essentially having to do round robin querying to find data that you need. If you know the node ID, you know exactly which table to go to. That's why it's, you know, in about about two plus one, in terms of your performance, because you're almost half of the queries go to each, each slightly more than half of the queries go to each database. So that all sounds well and good. But your your your web cluster that was already getting kind of complicated, suddenly start looking more like this. You'll notice that each web server now has an additional API element to it. That's what that little thin blade is looking out of the side. That that is basically the traffic cop that says you need to go to this database, you need to go to that database. And for every single master and slave that you have in your primary database, you have to have in your secondary database, your tables have to be consistent across the board, because they are essentially mirror images of each other structurally, and they just alternate data between them. So, you know, it sounds great in theory, the horizontal scale, you know, by sharding between those databases, but it does, it does get out of hand pretty quickly. So what I'm actually recommending then is federation. So federation, federating, federated sharding. It's still vertical. So you do still have a ceiling that you'll eventually hit. This helps you essentially cut your data stock in half and replant it somewhere else. And they'll eventually grow and hit that ceiling and then you can cut that off and stick it somewhere else. It's also perfect for when you're going to be sharding for shared application data. You have your website that needs access to, to, you know, user submitted data, the HR department needs access to that payroll department might need access to that same data. So that's all in a separate database from your website. And it's just it's just helpful for manageability and for security. One example of security, by the way, is that you can grant my SQL permissions to do pretty much everything you need to to to read, write, update tables to your primary websites database, which is all well and good if that gets lost, because you have a backup of that, but nothing's ever actually revealed that's that's dangerous. And then you have your resume database sitting elsewhere with much stricter permissions, you only have permission to select for essentially a read only database. That way, you might have a different my SQL user that talks to that database. So that if one is compromised, you're not compromising the whole. So vertically scaled databases that first glance, it looks similar to the horizontally scaled databases. Except you don't actually need the full database cluster replicated every single time. In fact, every single federated database cluster that you deal with can be different. In this case, we have the primary my SQL master and slave. In the second instance, we have a master and a single slave, the slave might actually just be used to replicate data that is used by an internal user that way that's that data is read only. They only have read access to the database, they can actually touch it. A third instance might be use MongoDB, couch, some other Oracle, you might you might have an Oracle database somewhere. And so every every time you federate your data to a new data cluster, it can be a completely different, it doesn't have to follow the same thing every time. It allows you to evolve your data applications as you're working on. So that brings me to dealing with application, application sharding. So it's not just sharding data, you can actually do the same thing to your application to your website by providing web services, you know, softwares of services is a big thing. Essentially, that's what this is, is sharding the application as well. So for instance, I'm sure most people have heard of discuss, it's a third party plugin that allows you to do commenting. So you decide that commenting on your site is becoming too much of a resource hog, in terms of being able to serve content. So you switch to discuss, or Facebook comments to provide the same functionality on your site, you're still allowing users to comment on your on your content. But you've offloaded that to a separate, a separate application environment. So you can do this to shard all sorts of components of your site. If you think about with the content that's distributed network with the CDN, you can use edge side includes to also pull in specific pieces of content. So you can actually be serving a static HTML website. And then your Drupal instance for content management is specifically to create just the ESI fragments that are included in your static HTML website that is served by you know, Akamai or whoever. So if you use a page CDN, if you use varnish, all of these things have edge side includes in them. And so Drupal in this case is is pure content, it's not display of content, it's just managing the content, creating those ESI fragments that are included elsewhere. So those are some just basic examples of application charting. Some other some other sample use cases, you know, I've talked a bit about collecting resumes with your existing site and being able to do something beyond just what's in your website with that data. Also building an ideation tool so users can submit their ideas for your company to consider for the next product. And then other users can vote up or down they can comment on your on on that user's idea. So these are these are kind of two examples of applications that you can shard from your primary website. So when you're sharding resume data, for instance, and I'll actually give some some some code samples here in a minute and get to the more technical explanations. But in this particular instance, you're taking resumes for a large corporation website. users are going to submit the form that form data that they submit is going to somehow magically appear in the HR departments database, they're going to have access to it. And then that way, you're not having to give admin access to every single person at the company. You have the webmaster, the site builder, whoever the content editor, managing the website, three or four people have access to the website to sensitive data at that point. And then the pertinent information that the HR person actually would have an account for doesn't need one anymore because they use their internal data application to talk to the data that you collected through a different application website. So how would you do that? So a couple of different approaches to sharding schema. So you can use the same physical database. It does not have to be in a separate database. If you're just interested in sharing data amongst applications. So you can you can use database prefixing to talk to a different schema. Or you can use different physical databases. You're going to use you can still use all the Drupal DB API, such as DB right Drupal write record to write a new row to the database to update a row in the database doesn't have to be the same physical database that your website is using, which is pretty handy. So when you're dealing with database prefixes, it's all set up in settings.php. This is not modular code. It's your Drupal's instance setup. It's real simple. I'm sure everybody is familiar with my sequels dot prefixing. You can actually reference a different database from the one you're currently using by simply referencing a select star from database name dot table name. And that's essentially what you can do with prefixing tables. It does require that the MySQL user that your Drupal instance uses has permissions to at least select from this other schema. So in Drupal six, it looks something like this. And in this particular instance, I think is taken out of the default settings dot php. You can in this case, we're going to be sharing all the pertinent users information between Drupal instances. This is this is actually a fairly common trick. When you're dealing with multi site, but you want to share the same user base. So it's simply referencing the schema name dot whatever. In this case, if you look at the shared table, sorry, the shared schema, the tables actually wouldn't be prefixed with anything. As far as Drupal is concerned, it doesn't care, it's just going to assume, okay, you want me to talk to this table, that table happens to have a schema prefix. And it'll talk to him just fine. So in Drupal seven, it's very similar. It's a little bit a little bit more more complicated. But the idea is still the same, your default prefixing, you can you can reference the same user tables. In fact, and this is just kind of an aside, you can use a Drupal six instance, and a Drupal seven instance, share the pertinent user tables between the two, so that a user can log into a Drupal six site and still be logged into a separate Drupal seven install by sharing these tables with a couple of caveats. So one is the user tables have to be updated to be compatible with Drupal seven. The password hashing scheme is a little bit different. And there's some other things going on with sessions. But once you've modified the tables to be Drupal seven compatible, Drupal six still reads them fine. Unless the password has been rehashed for Drupal six. So in that case, you should always log in on the Drupal six site, and then redirect all users slash whatever on the Drupal seven sites to the Drupal six site, so that everybody always logs in there, as long as the domain names are the same or similar, as long as the domain cookies are set so that they're they're compatible. Then when you log in the Drupal six site, these shared user tables, you will be logged in on the seven. So one, one reason you might actually want to shard applications is you want to take advantage of some of the things that people seven has to offer new modules, but you don't want to upgrade your whole site. So you can create a site lit, so to speak, and use something like mod proxy to serve your Drupal seven site, which exists on a separate tier as if it was a multi site. So that that's how you would handle if you if you wanted to share the same physical database, but you want to different different schemas. Now, for true performance gain, you're probably going to want to look at using different physical databases. So you can you can set up connections in settings.php and in Drupal seven, you can actually set it up in your module code as well. DB set active is the magic function that's going to basically handle everything for you. It will switch which database connection is being used. You can still use all of the Drupal API, and then set your database connection back to the original sites connection. The thing you really have to watch out for, and that's once bold, you will get burned while you're developing this. When you're doing schema catching, catching any kind of errors might trigger an error cascade, because if you have a PHP error or PHP warning, it tries to write that to watchdog, it's going to say, Hey, insert this into watchdog, watchdog table isn't there. And then it freaks out because it tries to write an error, which wants to write write that to watchdog. And it's going to find that that's not there. And it's just going to continually go downhill from there. So in Drupal six, it's pretty much as simple as this, your DB URL that that can be a string. In most cases, it is a string. It can also be an array. It's not very well documented. That's the case. But you can make that an array. Now, these do have to be the exact same type of databases. So when you're doing this, if your primary databases must equal all the secondary and tertiary databases also have to be my SQL, you can't mix Postgres and my SQL and different different things like that. But if you if you want to do that kind of stuff, that's when application charting actually comes in handy. In Drupal seven, you can you can specify the database connection strings in Drupal seven in the settings file. And it's going to look like this, where one of the arrays is default, you can create multiple strings. And each connection can actually have different table prefixing as well. But in this particular case, this code could just be used directly in your module. So if you have, if you have an application that's seasonal, if you're collecting, you know, children's wish list for Santa, something that you're only going to run for three months out of the year, you might just store this in module code rather than trying to put this in settings on PHP to keep connection overhead down. And it's pretty much as simple as using Drupal's database object to add the additional connection information. Then you can go about your business, switching the database, setting the active database, execute your queries and switch back. So this, this is pretty much all you have to do. Once you've created your connection strings, you first want to load the schema for the tables that you think you're going to be writing to these are the tables that you've defined in your modules install hook. And it's not explicitly required that you do Drupal get schema, but trust me, that's that's what you want to do. The reason for that is when you when you use db set active, and you connect to a new physical database, you're talking to a completely different database at this point, when you use something like Drupal write record, and you say, I'm writing data to this table with this row of data. The first thing Drupal write record is going to do is look for the table definition, the schema definition for that table out of cash, it's first going to look in code cache, instead of code cache, then it's going to look in the cash tables. Because we didn't load it the first time, it's not instead a code cache. It's going to look for it in various cash tables. And it's going to freak out because those cash tables don't actually exist in your other database. And it's going to say, Oh, that's bad, I need to I need to I need to make a I need to warn my administrator about this. So he's going to look in the systems table to find out where error should go. And the system table is not going to be there. And so it's going to default to watchdog. And that's where you get that that downhill slide of errors that that will just ruin your day. So that's why you want to go ahead and load the schema first. That way it's it's static code cached. You don't run into those problems. So you want to switch your database, execute your queries, do it do it has to be done, and then immediately switch back before you have to do anything else and you'll notice the second time, we don't actually pass out a parameter. You just it's going to assume default. So when you're saving data in another database. So what are the advantages really to switching database connections in the first place? So one is that you can still use all the all of Drupal schema definitions that come and install modules. You still have access to using all of the database API is that everybody knows loves. You know, it lets you let you deal with smaller databases for your website, which is great if you if you have master slave replication, it helps keep master slave replication lag down so that when you write something to the database and you turn around and try to read it in the next on the next page load, for instance, it may or may not be there because of replication lag. So this helps keep the lag down. And it just makes things more manageable. You have less overhead in your database, it's going to help that perform a little bit better as well. So in the case of of our resume use case, the resume is submitted via the form on the website. They've gone through, they filled everything out, they've uploaded their actual CV, the resume, their word doc, their PDF, whatever they have, the submit function for that form. It takes the data, it's already been validated at this point. And the submit function is going to do that's where that's where all of that goes. So it's going to load the schema. It'll connect to the HR instance of my SQL. It'll it'll write or update the record depending on if this is a new resume, or if they're just updating their their old one. And then immediately switch back. And that way, you don't have to worry about reports running, you don't have to worry about exporting that data. The HR director, whoever manages that sort of thing, immediately has all the updated resume in their database right away. So not only are you not having to store access data in your websites database that the website itself doesn't need. Your HR director has that information right away. It's also secure because it's not being physically stored in the same database as the rest of your website. So if your website gets owned, they don't have access to data outside that database. At least that's the whole. So so that's if you want to use my SQL. There's lots of other options. You know, you can use you can use Oracle. If you have some some legacy applications internal to to corporation and lots of folks to use Oracle. You can also use MongoDB. You can use couch DB. These are these are no SQL databases. Is anybody familiar with no SQL? Good. Yeah, it's a lot of fun. So essentially what what what this means is it's it's it's schema less. So there is no table definition like you're used to in my SQL. There are no columns. It's it's essentially just your shoving documents into the database. In the case of Mongo couch and a few of the others, they're stored in in Bison format, which is binary JSON format. And you can query these things as if you're parsing a JavaScript, a JSON object. It's very JavaScript like syntax, which is one of the reasons why it plays so well with no JS, which also has JavaScript syntax. MongoDB is very fast. It's it's it's significantly faster reads than my SQL. So if you're if you're doing something with an application, like an ideation tool, where there will be occasional rights, you know, somebody submits an idea, or they vote on an idea. But most of the time, people just want to read the different ideas and see see how their idea is coming along with all the votes. It's very fast because it's first of all stored in a separate database that's only doing that. And it's document based. So you get everything kind of all at once in a month. Mongo databases will not specifically because you're using Mongo databases, but because they're geared towards storing documents. The data tends to be very denormalized. And I'll actually show you some examples. Now, if you actually want to find out some more information about Mongo, for those of you who are in or around London, Mongo Tengen, the folks that that that build MongoDB, you've got a conference here next month. They actually do have several Drupal related sessions. So there's there's some good, gonna be some good crossover there. So I unfortunately can't make it. But should be pretty good. So dealing with MongoDB in Drupal, there's a MongoDB module. It's been around for a little while. There's a D6 and a D7 version. The Drupal 7 version is really, really good. It's mature. It has the basic MongoDB API, as well as modules to let you store all your field content. You can store cash, you can store sessions. All this data can go in Mongo database, rather than in the MySQL database. So things like fields that once you've created the node, once you've created your content, you're not really doing a whole lot of writing to it. So you want that to be as read friendly as possible. So you store that in Mongo databases, it's going to be quicker to load those fields. You can also use this module to just create your own connections to the Mongo database at any of the four object levels that they have. Which the four object levels that when you're dealing with Mongo connections, you have the actual connection itself. That's an object. You have the database connection, which is the equivalent of the schema in MySQL. You have the collection, which you can think of that as a table in MySQL. That's the collection of all of those particular documents. That would be the entire collection of your resumes, the entire collection of your comments, the entire collection of your idea in your ideation tool. Then you have the cursor object, which is the collection of results from a query. The PHP drivers for Mongo tend to be very object oriented, so if you're very comfortable there, it's good for that. So a sample Mongo database document might actually look like this. Again, if you're familiar with JSON, there's no surprises here. It is JSON format. Technically, it's Bizon, because it's binary stored. But this is a sample document that you might have in Mongo database. Now, say this guy didn't have an existing title, he didn't fill out an address, then that information would just be not present in the document for his resume. Then querying using the MongoDB module, you write in your module something that would look like this. This would get you the collections object that you see here. That's a collection. It's referring to a collection. Then applicant is the actual cursor that you create from that by finding from the applicant's collection. So just a real brief intro to the syntax. The first array is I'm looking for usernames, Smith, or last name Smith, surname Smith, that has a social security number. Since it's just a number, it's just saying one or zero that has put in a social security number. Say that's all you're looking for in your search. Then just give me back the first name and the last name. That's all that's going on in this query. That's all well and good. You can write a custom module in Drupal that might get back some of the results of the resume. You might have something similar going on in the HR department, they have their desktop application or some sort of internal intranet application that might be done with Drupal with open atrium or something like that, where you're getting that resume data. But here's the interesting thing is you don't actually have to get the data out with the Drupal module at all. So Mongo has built into it a very simple REST API. So you can actually, if you're on the same domain, use Ajax, jQuery, whatever you're comfortable with any kind of JavaScript in the browser to make a JSON request, an Ajax request to the MongoDB servers itself, hitting the REST interface, and actually being able to pull that data out, and you'll actually get JSON data. If you need something more complex, you can write your own. You can also use sleepingmongers, which is built in Python. MongoDB REST is just an OJS implementation that sits on top of the sexual lives on the Mongo database servers. So in this case, sharding has helped you by creating a JavaScript application that's fairly static. It's static HTML at this point, and you're bypassing the web server entirely once you've served that initial page. So you can actually query the databases yourself with JavaScript. So for those of you that aren't familiar with REST, it really is as simple as making a request to a specific ID to a specific URL. The first path that ideation is the database ideas is the collection of documents that you're dealing with. The second sample here is just saying, Hey, I want all of the comments out of the ideation database, whose parent ID matches this for a blah, blah, blah, that long ID. So to do anything more complicated, you're likely going to need a dedicated MongoDB REST interface. But check out sleepingmongers, check out the Node.js version. They're both pretty good. So anyway, that's enough about talking about Mongo itself. So I talked earlier about dealing with applications on separate web tiers. So keep in mind that application sharding is the same thing as data sharding. Once you've separated the application for the application lives, it's having to talk to that data in a different location anyway. So where you write your data, where you read your data from is really inconsequential in terms of your primary Drupal website. It doesn't have to be the same database. In fact, you can have multiple Drupal instances, you can have a Drupal instance that just manages your website, you have another Drupal instance that manages your ideation tool for collecting new ideas about products. You have another Drupal instance that does just collect resume data. And whatever other use case you can think of, you can actually create a separate Drupal instance that's very lightweight, it's very lean and mean, and does just that one thing, rather than building another module, having a monolithic Drupal website that runs everything. So, so how would you do that? You could use Mod Proxy, at the very, at the very least, to create a new path on your primary Drupal website, slash applications, whatever example.com slash applications is now actually a Mod Proxy reference to this other web tier entirely, but to the user, there is no difference. So the proxy web clusters, what this slide is called, is just an illustration of your original web tier. And you can, you can, these, these look similar, but the, they, they, you can, you can create completely different web architectures for everything outside of the load balance. So you have your primary Drupal website, represented by the large cluster on the left. Then you have your, your ideation tool, your applications tool, everything can be a separate web tier, and all served out of the, out of the same website to the user, that they're actually charted into separate applications. So, in short, that's some of the different techniques that you can use, take advantage of charting. Does anybody have any questions? Anything that you want to see more of or talk more about? That's a really, really good question. And I thought about talking about that. Oh, yeah. So, so the question was, how would you hook that into using views and some other some other real common modules? That's, that is really tricky. So if you're, if you're, if you're dealing with my SQL data, in a separate schema, in that very first instance, where it's a shared schema, same physical database, very easy, your, your table is going to be prefixed with the, I'm sorry, the, the database information is going to be prefixed with the table name. So views is not going to know any different, it's going to work identical to, to how it was all in the same, same schema. If it's in a separate physical database, adds where you get a little tricky. What you have to do in that case is the module that actually manages talking to the separate database that handles the DB set active, does the queries there, and switches back, that module is also going to be responsible for maintaining a lightweight lookup table in your primary Drupal database. So in that case, what you would do is you create sort of a stub table that contains just the very minimal information for views to at least see it. And so if you're dealing in Drupal seven, you can create, you know, a handful of entities that that it will know about you define the entities, you don't actually have to store the information there. So you've created the definitions that views is going to be aware of. And then you can have views alters, whenever that view comes along, you can actually alter the results from the views query by injecting the data from this other physical database into that. And you would take basically the same approach if you're doing this with MongoDB. If you're if you're using Oracle or any other thing that you're storing data in, it's basically just you have to kind of handle them in separate cases. Any any other questions? Yes. Okay, so the question is, you know, talking about the speed performance of reads in MongoDB, but I didn't really talk about how it compares when writing to MongoDB to my SQL. It's still a little bit faster. It's quite a bit faster. And I don't have numbers. That's a really good question. I wish I'd thought about getting some numbers for you on that. In short, MongoDB writes still out perform rights for my SQL for a couple of reasons. One, when you're dealing in Drupal, most of your rights actually occur over multiple databases because your data is normalized. And so you're writing multiple tables. In MongoDB, you very likely denormalized your data. And so you're writing one document one time. So that's that's a little unfair to say that MongoDB is faster. It's also because your schema has made it so that your rights will be faster. But apples to apples, if you're writing the same sort of documents, MongoDB writes do fairly well, as long as you have very few indexes, where my SQL regains ground is when you're doing a lot of updates to existing documents and you have a large document set. Now, with that said, MongoDB's threshold for what a large data set is usually about four to five times bigger than what my SQL's large data set might look like. So in my SQL, you start to notice performance drops when you hit, you know, one to two million rows really start seeing degraded performance with Mongo. It's really, since everything tries to be done in memory, it does as very little as possible to write the disk. So disk is usually freed up. So you don't have file IO contention. MongoDB, you might get 10 to 15 million documents of the same type of document that you restore in my SQL before you start seeing performance issues. So did that really kind of answer the question? Okay. All right, anything else? All right. If you if you do think of more questions, here's my contact information. T Hagler at phase two technology.com. You can follow us on Twitter at phase two. I actually don't have it on here. But my Twitter is T Hagler. Same as my email address on my ToglianD.o. And these slides are going to be available on agileapproach.com, which is phase twos technology blog, as well as the session page for this conference. All right, thank you.