 I'm kidding, I'm kidding, it's not that many. All right, hopefully you had a nice lunch. I certainly had. Definitely necessary of the last night's events. I'm happy that I still have my voice, because we ended up singing karaoke in some bar, but that's besides the point. All right, so I'm talking about MongoDB, giving a MongoDB introduction. Initially, I submitted two talks. I submitted a MongoDB introduction talk and a MongoDB and Drupal talk. They picked the non-Drupal talk. So if you don't see too much Drupal information here, that's because that's the talk that was picked. If you have any questions, feel free to raise your arm or hands or legs if that's easier. Yeah, questions, feel free to also now or email them or ask them on Twitter or things like that. All right, let's get started. So just from my perspective, how many of you have played with MongoDB? Okay, that's a far... And how many of you are using this in production at some point? Okay, so the people that are running it in production might not get a whole lot out of this talk, because this is a MongoDB introduction if you run it in production. Hopefully you know something about this product. Sometimes you wonder them. Anyway, so where MongoDB comes from, and I'm going to explain where it comes from and how and when it's useful and how it could be useful for Drupal. All right, so a little bit about me. I'm Derek, I'm Dutch with Live in London. I'm one of the PHP driver maintainers for MongoDB, hence the t-shirt. And I've also written a whole lot of other PHP extensions. You might have heard of that one. I've also done that, and I love maps in any sort of form. There's still a few more seats in front, by the way. All right, so where MongoDB comes from is... Yeah, there's lots of very fast key value stores out there. Memcache, Redis are all the way up in the top left. They don't have a lot of functionality where you basically can only store things with a key and a value. On the other side, on the top bottom right, we have the relational databases. They have a lot of functionality. Most of those things you would never need, but they have a lot of functionality. But with that comes a penalty in making it faster and more scalable. So what we try to do with MongoDB is not to be a key value store, but more try to have almost all the functionality of a relational database. We don't have all of them, of course. They're very easy to scale and make it really fast. So that is the underlying thoughts behind MongoDB and how well we're doing with that, of course, up to the users to decide. All right, so a bit of terminology. If you're familiar with relational databases, how many of you have never played with a relational database? Okay, good. Then this rulebook. So in MongoDB, we don't have tables and rows and views and joins and things like that. And the terminology is quite different. We still have a database, and a database doesn't contain tables, it contains collections. And a collection stores not rows but the documents. And even though a collection is very similar to a table, a document is not similar to a row. A document in MongoDB can have a very rich data structure. It is not only fields and values like you have in a relational database. And also something interesting is that documents in a specific collection don't need to have the same fields. They can, but they don't have to. Indexes we still have and are just as important as normal in a relational database without indexes. Things will be slow. All right, let me turn that off. Usually think about that, but it's gone now. We'll see how that goes. And yeah, something that MongoDB doesn't have is joins. So you can't create a query to join two collections together. But instead of that, we have something called embedded documents. And I'll get back to that in a second. Okay, documents in MongoDB are stored on disk just like any sort of database. And the format that is stored in is binary JSON. So it's a format that's created to be very fast to convert to JSON or JavaScript. And it's binary because it's short. We also have to add a few other data types to it such as IDs and data types which JSON doesn't have. The primary idea is that you can just talk JSON to the database. And you'll see this being reflected in most of the driver APIs as well. So in a PHP API, you don't actually have to write JSON, but you will create structures that are very familiar to JSON. And in many cases, in the documentation, you'll see JavaScript or JSON structures as examples. And to convert us to PHP, it's really simple. Show it in a second. Now, every document needs to have an ID field. It's your unique key. There's an index on that field by default. You can't remove it. It is getting annoying. And so... Oh, I know what it is. Google Talk. Right, so simple document with three fields. You have underscore ID, which has a really ridiculously long object ID. That's the default thing. It gives it... But you can set your own values for underscore ID as well. Then we have other fields like handle and name, which is my Twitter handle and my name, or group all those org username if you want. But what you can also do is some more richer documents. So in this case, I have three fields again. ID, name, and talk. But talk is not just a value. It is actually an array of values. So if you, for example, in Drupal, have a node where fields, say, alter, can have multiple values, in MongoDB, you can just store those as part of your document without having to link to another table. And this is the embedded document that I spoke about. And just, in most cases, means that you don't have to do any joins. All right, so each of those JSON objects in there in the array again has two fields, title, and URL. Okay, so if you would design a MongoDB schema, you still need to put in some fault on how you want to do this. But, yeah, a block system. It is a very simple thing. We have a block, and a block contains block entries. And block entries have comments and tags and alters. And if you want to model that in a relational database, you often end up with five tables. In the Drupal case, fields are all stored in their own separate table, for example. And, yeah, that is how it makes sense to do that if you have a relational database. With MongoDB, you just don't do it like that. And there's no way how you can transform a relational database schema into a MongoDB schema automatically, because you need to put a thought in how you want to do that. So one of the examples how you could do this is by creating two collections. You create a user collection and a block collection. In this case, the block collection has a name and URL and an author. And then in one of the fields, it has block entries as an array. Each of those block entries then can have comments embedded in there in another array. And tags can be set, in this case, only on the block and not on the block entries. But that's how this example is. And in this case, you have two collections. How exactly you want to do this depends a lot on how you want to access your data, how you're updating your data, because all decisions will have performance implications. But there's no guaranteed set correct way of doing it, which you have in relational database theory. It's only one exact correct way of doing it, which means a fully normalized database schema. Fully normalized is really nice, but it's not going to be the fastest, whereas this focuses on modeling your data in such a way that it's easy to access, as opposed to not storing data duplicates. All right, so I mentioned that. Oh yeah, a few things that I have to mention though. So there's no joints between collections, which you often don't need that because of embedded documents. And the only thing that is atomic is rights to specific documents. There's also no transactions. The reason for that is because if you introduce transactions and you have a cluster with multiple nodes, it makes it really difficult and not actually faster. So this is not a trade-off, but longer be not having transactions. You can sort of work around this by doing a few tricks, but yeah, it's not there. Okay, it's a very simple example of a database schema. Say I want to store shapes, circles, squares and rectangles, and all of those shapes will have an area, but a circle will only have a radius, a square will only have a diameter, and a rectangle both has a length and a width. Now if you want to model this in a relational database, one of the options which you can do looks something like this, right? You have a table called shapes, where the ID is the ID of the shape. Then you have the type to store, and then you have the area, which is the common for all of your documents. But all the ones are radius, diameter, length and width. They only apply to specific types of shapes. Then you end up having lots of no values, whereas the moment they be, you just don't store the fields that you don't need, so they don't take up any space in your tables, or in your collections either. And this is perfectly normal. Not all the documents in a collection have to look the same, and in many cases, it makes sense not to have them look the same. So, like in this one, a bit of a better example is probably storing lots of products, like Amazon will store both computer parts and books. Now, books will not have CPU speeds. And neither do computer parts have a number of pages. That simply makes no sense. But it makes a lot of sense to put it all in one collection, because it makes it very simple to index on specific fields, as well as on search on the whole collection and find products that has the word Drupal in it. And it can both be probably not computer parts, but books and t-shirts at least, for example. So, yeah, that's where the strengths are from. And if you're using Drupal, then you have different content types, and it's perfectly okay to store notes of similar, but also different content types in one collection. It's not a problem at all. There's no maximum size on collections, and although index sizes get larger, it's often better than storing very strict documents in different collections. So, another small example. This is, like I said, I love maps, and I do lots of open-street map stuff. So, this is another bit of an example on how you would store data. So, in open-street map, every object, there's three different ones. There's nodes, ways, and relations. We'll have an ID field. In this case, it's 4442243. And then, each object will have tags. Tags are what sort of thing it is, like if it's a road or a building or a river and things like that. It has names of them, but not everything has a name. Some buildings are just building blocks. They don't have a name. And because this is a data set where every object can have this distinct tag, it makes a lot of sense to store that in a NoSQL database. Because you don't have to come up with a difficult schema or do 16 different joints. If any of you have ever used Magento, you know how interesting their EAV model is. Not very interesting, which is horrible and slow. Whereas in MongoDB, you can just put everything in there, so you don't have to deal with that. So, yeah, a very simple way of storing this would be just converting the tags to fields. Another way of doing it would be to collect all the tags besides name in a different subarray. If you want to do that, it's all up to you. Sometimes it makes sense, sometimes it doesn't make sense, and yeah, it just depends how you want to store your data, really. Now, queries can be run on, obviously, after inserting data, you can also query data, which is very useful in the database. And on the JavaScript shell, what you'll see is that we always use DB. That is not the name of the database. It's just DB, because that's the current database object. .name of your collection, in this case Pry, stands for point of interest, and then you call the method find. Now, if you look at this on the PHP side, DB will be the variable pointing at your database, and then you can, instead of the ., you can use the arrow to access the collection and then the arrow to go to the method, yes. All right, I'll get back to the ID. So, of course, I'm not repeating the question because I forget it. So, yeah, so a very simple query. I'm looking for documents where tags.highway. So, this is already an interesting thing, is because you can actually do queries on subfields. You don't have to do the queries on the main fields in your collection. Like, it's perfectly fine to either query name or tags.highway. And that, for me, it doesn't matter. It only matters because you need to set an index on those fields as well if you want to make that fast. And, yeah, you just query by supplying a document that you want the documents in your collection to look like. So, in this case, it basically shows me every document where tags.highway equals secondary. You can also have some special operators, such as exists for whether a field exists, and there's a whole lot more of operators as well that we can use. All right, so what is important, though, is that in this case, I want to find all pubs because that's what people like to do in England, find pubs. And there's also something that really works all with OpenStream app because they also like pubs. Strange, isn't it? And, yeah, if you want to go find them with my database schema, say, I want tags of the manatee equals pub, then if you use explain, which is very similar to my skulls explain, it shows you the query plan, which indexes are being used and how many documents are returned and how many documents are scanned, and scan meaning scanning on disk. So, in this case, N has 3,895 items, which means there's that many pubs in my database that I've understood quite a lot, and it's just London, but it has scanned two to three million objects in my database. Courter is saying basic courter, that means full table scan, something you don't want to do, just like you don't want to do it in my skull. So, in order to fix that, we created an index on it, and the index you call with a command called EnsureIndex, again in PHP, change db to the variable pointing to your database and use the arrows instead of the dots. And the curly braces, you convert to the array keyword so that's how you would transcribe the JSON into PHP. So, I've created an index text amenity with value 1, and value 1 means sorting ascendantly. You can also use minus 1, which sorts descendingly, or you can use 2D for geospatial indexes. I won't go too much into indexes. And then I set the index, run the explain again, you see this n-scanned object is now equals the number of documents being returned, and that is of course awesome because you could use the index. So, I've created a tag on amenity, then I also figured out that I need to find things by highway, so I create an index for that, and then I want to have one by reference number, and yet you can see this is not a very clever idea of doing it because it means that's what every tag that I want to search really fast on, I need to create an index. And you can only have 64 indexes per collection. You will run out of indexes. And yeah, I have one more bit. So, yeah, also lots of indexes make things slower because on every insert and update and delete, all the indexes will have to be updated as well. So, thinking about the data schema is important. So what I have here, this is running in PHP, and you might see the square brackets in PHP 5.4, that is equivalent to the array keyword with an opening parenthesis, and the square bracket in the end is closing parenthesis. If you use MongoDB, it's really, really useful to have. So that's why most of my examples will have that. So, yes, I have a little document here with name and tags, and in this case, I don't set an ID myself, so there's no in the square D field that I set. If you don't set an in the square D field, then the driver will generate an ID for you. And the ID that's generated depends on time, process ID, part of your MAC address and things like that. So it's not a global unique identifier, but I've never seen it collide. So, yes, so I create a few... Like I said, yeah, you need to have lots of indexes in this case, but it's very easy to extract all tags from this because you can just, in PHP, query and just loop over all the tags and display them, which is very fast. But if you want to search on it, you need to create lots of indexes to do it fast. So, yeah. Now, another way of doing this is storing the tags, for example, in K stands for key, pointing to the key of the tag and V to value, the value of the tag. And with this, you can create one index on tags V comma tags K, which is a compound index on both fields. And when you do this, then you only have to create one index to be able to search all tags. But it's more difficult to extract all the tags out of it because you can't loop over just tags and extract the key and the value. Now, instead, you get an array where you will then have to pick out the key and the value. Now, PHP 5.5 will work around that, but I'm not quite sure how solid that functionality is. So, it's also important that if you want to look for a key, a specific key and a value, then you need to use an operator called Elimatch, because otherwise MongoDB doesn't actually look for a key and value, both members of being one array elements, where they would just check for a document where tags.key, any of them, will have the key and tags.v, any of them, will have the value. And in this case, if I would, for example, search for K, name, and v, yes, it would still match. So that's why I use Elimatch, to restrict that to just one array element. And finding all rows is still easy as well because I can just query tags K. If that's the highway, that means it's a row. Right. Now, there's a few other situations here where this might work or not might work. I've written an article about that. My last slide has a big QR code on the GoC URL, which has a whole list of resources. And my article describing this data schema is in there as well. Right. So, there is always the IDs whether I'm going to embed content in my documents, create nested structures, or I'm splitting it up into multiple collections. Because in some cases, that makes sense. If you, for example, have a document and you'll have lots of comments to that, being constantly added, MongoDB will have to start moving this document around on disk because when you first store a document, it creates a document that adds a bit of padding so that it can grow a little bit. And MongoDB actually calculates how much padding to add on its own. So, it's clever with it. But if you have a document with lots and lots of comments, then you'll see that your document keeps getting moved on disk and that's going to be slow. So, if that's the situation that's happening, it makes more sense to put your comments in a separate collection. Because it's a lot easier to write to because nothing has to be updated. It's just data that you add to it. So, for things like the watchdog, that is something you want to create. You don't want to create one document where you have all the elements in there. No, you want one collection where you just write to as fast as you can. And then in those cases, you want to split up your comments out of your documents. And that is called linking. Linking is MongoDB term for just having two IDs that point to each other. There's no foreign key concept. It's just a conceptual thing, how you would want to store that in your documents. And, yeah, how you want to do that, that is totally up to you, really. And, yeah, you need to figure out what makes sense and what doesn't make sense. And there's lots of articles on the Internet describing situations. If you have specific problems, you can ask them, you don't stack overflow or write to the Google Groups mailing list or ask an RSE that's many ways how you can get there. Some other thoughts on how you design your schema. All right. So a little bit about Drupal and MongoDB. There is a module called MongoDB. It provides some functionality, but it doesn't allow you to replace my score with MongoDB and Drupal. It has modules for sessions, watchdog, blocks. I have no idea what a block is, so somebody please explain it to me afterwards. Qs, I also have no idea how those are used, but cache and field storage, I do understand what they mean. So storing all the sessions in MongoDB makes a bit of sense because you can set up MongoDB in such a way that you have multiple nodes of replicating to each other. But it's also useful because if you store, with this module, if you store a session in MongoDB, then all the fields will be split up instead of having a serialized session blob that you have to use when storing things like memcache. So it makes it quite easy actually to query this session table, then, or session collection to find out session elements for a specific user ID, for example, which you can't do if you store it in memcache, for example. And the same thing for watchdog. The original one, as far as I understand, with this log, if you want to do any sort of data processing, you need to read that and process an important database. Whereas if you enable the watchdog MongoDB module, you get the watchdog in one collection. And it's also interesting, MongoDB supports something called a capped collection. It's a collection that has only a specific amount, a space for only a specific amount of documents, say 10,000. And when it's written to 10,000 entry, the next one will start overwriting the first one. So there's no problem with running out of space because it only uses this allocated amount of elements or size. Now, field storage is also an interesting thing, is instead of having to store field data in a different table for each different field type, in MongoDB it will actually allow you to store that all in one note, in one document. Now, if I have time at the end, I will go back to this and show some of this because it's often better to show them here some person talking about it. Now, there's also module called MongoDB in the score dbtng, which is a very experimental database drive that parses SQL to make sure that you can do everything in MongoDB. This doesn't work very well. It's a proof of concept. Now, for Drupal 8, we hired a guy from the Drupal community to help us get MongoDB, to make it possible to have only MongoDB running and no need for MySQL anymore, and hopefully that makes it into Drupal 8. All right, so I will skip over those slides that tell you how to do it. Actually, since I've written the slide, there's no RC2 instead of RC1. And when I post my slides, go look at it and install it and play with it. They are just here for referencing. I'll skip over them. All right, so, yeah, this is a... No, field conflict is not interesting. All right, so if you store fields... Or what is this one? Sorry. I created a field called alter2, and you'll see that then different notes will have... Actually, this is the same note. We'll have two values in the fields. There's both D-E-R, which stands for my name, and H-A-N stands for my co-worker's name. Actually, I can just scroll that and see the full name. Yeah, so in this case, I've stored both my notes and fields in MySQL. And you'll see that there's now two entries for each of the alters, which is a multi-value field. Now, if I want to store that in MongoDB, instead of two fields, or two rows in a fields table, you can see here both the title as well as field author with just two alters in there without having the need for doing joins, pulling the data out. And this example, the body field isn't in here because I done something wrong, but you can also do that, of course. Now, a little bit about replication because replication makes MongoDB easier to scale. And replication in MongoDB works as follows. You have one primary note and two or more extra notes. Now, which note is primary from the start on is determined by which note you start first. At any point in time, any of your notes can go down, and another one will take over. The drivers that talk to the language, in this case, the PHP driver, will detect that and will switch to only using the new primary for writes, and any of the remaining secondaries also for reads. So in MongoDB, you can only write to your primary note, but you can read from any of the notes. Any note can go down, or it will take over automatically. And this you can scale up to 11 notes. So what replication will give you is fill over an availability, notes can go down, but you can also scale your read. So if you have a really read heavy application, then your application can read from all of the notes in your set, which makes, yeah, well, of course, reading faster because you can read from more servers. There's primary notes, secondary notes, and there's an arbiter. An arbiter is a note that participates in elections, who gets to be the new primary, but it doesn't store data. So you would use an arbiter in case where you only want to have two data notes, but because you need to have an uneven number of notes, you use an arbiter to then vote in this election itself because you need to have an uneven number. All right. If just replication isn't good enough, you can go to sharding, and if you need to go here, then you need to have an enormous amount of data. And if you want to go to sharding setup, then, yeah, your setup gets slightly more complicated, and you need a few more machines, as you can see here. Now, it's not as bad as this shows because the config servers, they're very tiny databases. You can put on the same physical machine as any of your data notes, as long as you make sure that you don't put them all on the same one. You can either have one or three config servers. One is okay for development, three in production, yeah. In this case, it can only be one or three. Now, each of the blocks that I have is responsible for a specific set of your data. In my example, the first square containing a replica set of three notes that are replicating among themselves will store all the documents where my shard key, say, the shard key is a key that you dedicate, like, I want to split my data up on this key. That could be surname, like I do in this example. So the first shard will contain all documents where the surname is from A to H, the second one from ITP, and then the third one from Q to Z. But the ranges, you don't have to set yourself. MongoDB will automatically pick the correct data sets, depending on your shard key, because in surnames, the n's and m's are more prevalent, so it's just not up to you to decide exactly where your data lives and where this data lives that is stored in a config server. Now, you have one other thing, which is this bit, this is your web server, basically, where if you client, which is a PHP driver, talking to a MongoS process, and the MongoS process is basically a proxy. MongoS talks to the config server and knows where all the data lives. And it also is responsible for moving blocks of data between shards in case one becomes too full or too empty. This also happens sometimes. Now, this is a really complicated setup. If you ever need to go here, your project is doing really well. And in those cases, you might want to look at getting some support as well. But... All right. Yeah, yeah, go for it. Yes. Right, so MongoS will handle both query routing and result routing for you. So that's why you call the proxy. So if you search for only document that only depends on your shard key, so you're only searching on surname, then it's really fast because it only talks to one of the shards. But if it's a query that's for which it is possible that the results live on multiple shards, it will have to send the query to all of your shards. And then MongoS will collect the data and also sort it for you. So there's some logic built into MongoS to do that routing. Yes. Okay, so the first question here is whether replication is asynchronous. Yes, it is. So if you send a document to your primary, there is some time it takes for your data to be available in your secondary. However, from the client side, you can control from that specific client's perspective whether that's okay for you or not. So you can say the client just send off the query with the insert or the update or whatever. But there's other modes where you can say send off the query and make sure that it is saved on one node or make sure it is saved on all three nodes before the client then returns and will issue the next query. So there's lots of configuration possibilities in there. I just don't have enough time to explain that. Now I forgot your second question. Oh, sorry, now I remember already. So yes, the question was how the replication works between nodes. And the replication algorithm is clever enough to try to read from the closest slave. So in this case, the primary and the two secondaries all sit in the same data center, so they will probably both read from the primary node. But what you can do is put you a third node, somewhere on the other side of the country in a different data center, and have two nodes there as well. So you have the secondary in the same data center as your primary read from your primary, one of the slaves in the second data center from either the primary or the secondary from the first data center, but you'll have the second node in your second data center then read from the slave that is already there. And that algorithm is getting better. I think, but I'm not 100% sure, from the next version of MongoDB that's going to be released very soon, you can actually configure which slaves you want to read from initially, but it will still automatically optimize how that happens for you. All right, so let's see how am I doing on time. 24 minutes, okay. Okay, so indexes are important in MongoDB. It's important in a relational database. It's just as important in MongoDB. And if things are going slow, in most cases, all in an index will fix your problems, not always, because in some cases, your data schema just won't be able to handle that specific load, and you might have to re-architect your database schema, which is not that difficult because you don't have to create tables, you don't have to run other table queries, because in MongoDB it doesn't really matter. There's no such thing as tables and hard-coded fields anyway. It's all whatever the client does for you. But indexes are your best thing for fixing things. Now, the underscore ID field is always indexed, and additional indexes can be created by a command called EnsureIndex. There's no such thing as CreateIndex. EnsureIndex basically what it does is make sure that whether if an index already exists, it wouldn't recreate it. And if it doesn't exist, it will create it for you. It's just a convenience method. The PHP driver actually doesn't have CreateIndex implemented. Some of the other drivers actually do, but it makes no sense to have so, but just in a while. Now, you can create an index on both a single column or, sorry, single field. Still have too much relational database terms in my head sometimes. Clearly, you can also make indexes on compound or more than one field, and you can also create indexes on subfields, which can be really useful in some cases. Now, I have a whole presentation about indexes that I gave at the Munich MongoDB users group two days ago. Those slides are not online yet, but I'll publish them very soon as well once I get back home, and you can have a look at those as well. Okay, so let's imagine a very simple index. In index basically has the column that's indexed and then fields associated with that. In this case, it's longitude and latitude of the cities. Very simple index. Now, in this case, I'm creating a compound index between country code and population. So I call db.cities, ensure index country code one and population one, and the one stands for sorting, ascending the, for both fields. Now, in most cases, the sort order doesn't really matter too much, but there's a few edge cases there. So the query plan, I already showed you a little bit with explain, but there's a few more fields in there as well. So there's millies that shows you how many milliseconds this query took. In this case, it took zero milliseconds. Clearly, it's not zero, but it's less than one, so it shows up as zero. There's a bunch of information in, like, n-yields and n-chucks-skips, which I don't even know what I mean. So not very important, but there's also is-multi-key and index only. And those are quite useful because index only means that I'm only requesting data that can be found directly in the index, which means that MongoDB doesn't actually have to look up the document, which, as you can understand, will be faster, but you need to be a bit careful with that with putting too much information in the index because then the index will be as large as your data store, and then it won't be any faster, of course. Is-multi-key is used for indexes on a field that can have multiple values. So I already showed you that the author in my MongoDB-Drupal thing had two author fields. Now, you can set an index on an array that has multiple values, and when that is queried, is-multi-key will then flip to true just to show that kind of things. Index bounds shows you for each of the keys in the index which values are being searched for. So in this case, I'm looking for GB and a population anywhere from half a million to maximum integer size. I won't strive for announcing that number, but I can't stop getting up. So basically everything larger than 500,000. And then you get a bunch of results. Four in this case. Now, something else that is useful here is that the first field cursor tells you which index is being used. So if you see something called a basic cursor, that means no index is being used, that is not a good idea. If it shows B3 cursor, that means an index is being used. And in this case, the index called country code 1, population 1 is being used, which is basically a concatenation of the field names and the sorting order within the scores. So explain is very useful to figure out what's going on. Now, in MongoDB, if you have slow queries, slow queries often being queries that don't have an index, they will be logged if they take more than 100 milliseconds into the MongoDB log. You can configure the setting. And when you see that happening, it is a good idea to look at those queries and see why they're being slow. And very often you'll find that there's no index on that. And in some cases, it makes no sense to put an index on it because it's a query that you only do once every two days, right? So don't bother putting an index. And it's okay if one of the queries is slow once in a while. But if it's a query that you're doing a lot, then, yeah, have a look at maybe you can put an index on it. So I'm showing a small example with another compound key where population and DEM stands for Digital Elevation Model. In this case, it's the elevation. So I have a couple of cities in here. Again, I have a few in Andorra, which are both over 1,000 meters, with a small population. And my index is set on population and DEM. So my compound key is called population-score-1 underscore-DEM underscore-1. Now, if you run specific queries that use these index, like, I want to find all the documents where the population is more than 2 million and the elevation is more than a mile above sea level, then although it can use the index, it's not the most efficient way because if you do range-based searches on both parts of your compound key, it can only use the index for the first part of... can only use the index for the population part of the query. And in that case, you'll see that N, being the amount of documents being returned, 6, but then scanned as 127. That means that many keys in your index had to be searched, which if that is the same as the number of documents returned, that's really good. If it's slightly higher, it's not really a problem, but it is something that you do need to keep an eye on if you abuse an index like this, because an index like this, you shouldn't really have done. Because, actually, I have a slide for that, so let me get back to that. Now, if you modify this query slightly by sorting by negative elevation or descending by elevation, then another flag pops up, which is called scan-and-order. When that shows up, it basically means we could use the index for the query part of it, but we couldn't use the index for the sorting part of it. And, again, with 127 objects, that is all right. If it's 4 million, it's not all right, because it takes too much time to then manually sort those documents again. So how you choose the index is even a sorting order can be important. Okay, so also, if my compound key is population, comma, dm, then MongoDB can't use an index on only the second part of your compound index. So if I would do a query on just elevation, which is the second part, it will refer it to a basic cursor. And in this case, it then scans almost 23,000 documents, and it returns 614. So MongoDB can not only use the second part of a compound index, but can't use the index if you're doing a query or a sort on just elevation. So let's create an index on elevation descendingly as well and see whether that makes things better. And in this case, you see that for the query itself, it uses the original index that I've had because I'm doing the search on both keys of the index. But scan and order is still true, even though I have an index on elevation descendingly. And that is because MongoDB can only use one index at a time. So any query you run can only use one index with a few exceptions. Like if you have an OR query, then it can use an index for each of the specific parts in the OR part of your queries. All right, so a quick summary of indexes. So MongoDB uses B3 indexes. That's where you see the word B3 cursor showing up. And B3 indexes are quite optimized for range searches. So this is perfect for most cases, really. All of the principles apply for creating index in a relational database also apply to MongoDB. Like more indexes make things slower because more index need to be updated. The collection can only have 63 indexes. You can create 63 indexes because the underscore default already has an index. So you can define your own point if you only create 63. And MongoDB can only use one index by query. Also you need to be a bit aware about redundant indexes. So the order of a compound key in an index is important. So if you have an index on elevation, comma, population, then this index can be used for queries both on elevation and population as well as on just elevation. But not on just population because it can't use the second part of your index. So you would need an additional index on population in case you want to have that as an index search as well. All right. So it's a few cases when indexes can't be used. If your query uses any sort of negation operator, like any for not equal or not for negating a condition of another bit or not in, like just can't use an index in this case. It can also not use an index if you use some specific operators such as Modulo. The index is just not built for doing this. If you do a query with a regular expression with MongoDB supports, then it can only use an index if the search part of your regular expression is anchored to the start of the string to the string. So it can't use it if you just search onbo to find London. But it can use it when you scared long to find long. If you use any JavaScript and workloads, I haven't shown you that, but in MongoDB you can do some extra filtering with JavaScript. In those cases, the index can also not be used and for any sort of map produced job, your indexes are also not used. All right. That is as much as I had to talk about. Are there any questions? I have about 12 minutes. I'll start by that. Okay, so the question is, what happens with indexes on slaves? Basically. So every member of your replica set contains its own data set and its own indexes. The protocol for replication only replicates the documents itself. So the slaves are responsible for maintaining their index at the moment. All right. That's not a question here in France. All right. So the question is, what about full text searching? At the moment, MongoDB doesn't have any full text searching capabilities, but it's being worked on. So hopefully we'll see that from the next version, but I can't promise that because I don't control that. All right. Do I have returns on the search for MongoDB? Okay. I'm not aware of there being a Mongo back-end for the search API now. Sorry. So I don't know how that works. Yes. All the way in the back. So in my sharding example, if MongoS is a single point to fill, you know, because you can have multiple of them. And from the next version of all of the language drives that are being released, the drivers can connect to MongoDB. And MongoS is at the same time as well. And then when one of them goes down to find them, we just use the other one. So there's this functionality that's currently being built in the driver. And I basically fix that yesterday for the PHP. Anything else? There. Go ahead. Okay. So what are the different data types in MongoDB? It's basically all the types you have in PHP as well. So you have your strings, or in MongoDB it's always UTF-8 strings. Then there's arrays, because you can have multiple values. There's a MongoData type. There's also... There... What are the other ones? There's a whole lot, but I can't remember now. I really don't remember now. Sorry. But the most important one is the MongoData type in here. Okay. All right. Yes. So there's a lot of things. So the question is more about what more richer things I can put in there. It is possible to store blobs in there. There is a MongoBin data type, that does that for you. And that works for data up to 16 megabytes, or slightly less than 16 megabytes. If you want to store larger files, all of the drivers implement something like gridFS, which allows you to store large files. It basically jobs up the file and the driver then handles that for you. On the other side, there is no spatial data type, but there's a spatial index, which works on just a simple array. So if you have an array with two coordinate pairs, the search index on that one is spatially enabled. Anything else? Yes. The spatial searching is good, you can do find things near this point, or whether a point is part of a polygon and we're going to extend it a little bit more. All right. One more question there. So the question is what will be a reason for not using this on any Drupal site. That's a bit of a loaded question, of course, because if I would say this myself, I can't think of anything. Besides my skull being available on a lot more of a shared host than MongoDB is at the moment. Setting up MongoDB is actually easier than my skull because the only thing you do is download and start it. There's no tuning, no need to create databases or collections or anything like that, that all happens on the fly. But with that, of course, anybody can create new databases. So MongoDB does have authentication built in, but I can understand that some shared hosts are not really happy doing this. But it is possible and there is a few dedicated MongoDB hosting providers out there that will provide you with a MongoDB database as well. As for a data model itself, I can't think of any reason for a Drupal website to do this. If you, however, go to financial transactions or financial things where you really need the transactional support, then MongoDB isn't a really good fit because it doesn't do transactions. One more. So whether I have data or speed comparisons between using MongoDB and my skull, I don't have those. At TenGen, we don't create those benchmarks to compare because you can't compare it because the data storage is so different. At least that's our point if you're there. I am sure that some people that have used MongoDB and my skull with Drupal very heavily will have those details but I don't have the data. Just send me an email and I can see whether I can find some examples for you. All right. I think I have time for one more question but there is none, so that's also good. Okay, hang on. So this QR code, if you scan it with your smartphone, then it will go to this URL and once I get home, I will upload my slides to that URL. It also has a long list of resources so you can send me more questions. If that's it, I hope this was useful and thank you very much. Nope. It's okay to walk up to it and scan it. Hi. One second, we're just turning off my mic.