 We're really excited today to have Andrew Morrow. This is actually the second week in a row where we've had a Brown alum come talk about the eight years with us. So that's pretty awesome. That shows you how awesome Brown is, but also how CMU is awesome, because we get people to come to us and give you great talks. So Andrew is a lead engineer at MongoDB. He's been there about four years now? No, two. Two years. Before that, he was at Google. And before that, he had to start off for about seven years that he hadn't talked to, because he didn't network himself. So today he's going to give us a talk about a high-level overview of what MongoDB is and why you actually want to use it, but then he's also going to spend some time getting to the details of some more technical issues that they face in the real world. OK? Thanks, Tom. Hi, everyone. I'm Andrew Morrow. As was mentioned, I work at MongoDB. I've been there for about two years. I'm currently the C and C++ drivers tech lead. Before that, I spent some time on the kernel team. Most of the stuff that I did during the 2.6 server release, which is the current stable release, was on the update refactor, which I'm going to talk about a little bit at the end of the talk. I've also been involved pretty heavily with our work on migrating the code base to C++ 11. There are almost 200 slides in this deck. We're not even going to come close to covering that. There are so many slides because there are a lot of features in MongoDB. And I want to be able to discuss some of them in depth. So later in the talk, we'll have an opportunity to sort of decide what we want to talk about. And we'll skip to the relevant slides. Clearly, we're not doing 200 slides in an hour. So there are a couple things that I want to talk about. I want to give an overview of MongoDB in general. Actually, it would be interesting to see how many people have used MongoDB for anything. Fair number of hands. How many people liked it? Fair number of hands. Anybody actively dislike it? Cool. All right. There were no hands. That's great. I also want to discuss among those features why those features and not others. And part of the way to understand that is to look at where did MongoDB come from? How did it get started? And how did that origin and the views of the people who were working on it inform the feature set that it has? And finally, I want to talk fairly briefly about where it's headed next. And then finally, towards the end, I want to pivot a little bit and talk more about what are some of the challenges in actually implementing a new database. Every piece of software starts with old, terrible pieces of code. Well, it doesn't start with old pieces of terrible code. But there are always dusty corners. And you have to go and rewrite these things. And how you do that in a system that people rely on is an interesting challenge. There are also interesting language level subtleties in how you deal with data that might not be apparent at first, sort of first glance. And then I want to talk a little bit about how to keep the code base current, again, in the context of C++11, which is something that I have been working on a lot and care about a lot. I'm also curious, how many people here are working with C++ in some capacity? A good handful. And how many of you are using C++11 or 14? Sort of, getting there? OK, I'm actually really excited about C++11, especially in the context of writing system software like databases. And I hope we'll get a chance to talk about that at the end. So let's talk a little bit about what MongoDB is. At a very high level, these are the pitch points that people use when they discuss it. It's a document database, as opposed to your traditional row-oriented database. It's open source. There are, of course, other open source databases, and many of them now. But it's still an important aspect for a piece of software that you're going to rely on. It's high performance. It would be unfortunate if you were trying to sell a database that wasn't. It's horizontally scalable, which is one of the unique selling points of the database. And it's full-featured. And we'll get into what those features are soon. So for those of you who haven't used MongoDB, can I actually see the hands for who hasn't, again? Who has never used it, no experience with it? So when we say a document database, this might be obvious, but people get it wrong sometimes. You're not talking about PDFs or doc files. Essentially, we're talking about a type dissociative array. If you're familiar with JSON, we're talking about JSON documents. Or if you're familiar with Python, we're talking about nested Python dictionaries, or a Ruby hash, or a PHP array. So if you have some sort of variant type in one of those languages, and you build a recursive dictionary of those, you essentially get something that looks a lot like JSON. We store something very much like that in our database. So that's what we mean by documents. It's open source. You can go clone our repo from GitHub. The URL is up there. And actually, I encourage you to do so. It's an interesting code base. It's AGPL. It has started and sponsored by MongoDB. This slide is a little out of date. We changed the company name at one point. There are commercial licenses available. And we are interested in open source contributions. So if you're interested in experimenting with a database or you find something that's broken or that you want to change, we do take pull requests. There was a period where we weren't doing a good job about that. But we've made an active effort to improve our sort of relationship with the community. So we're taking pull requests at a much faster rate. And we would love to see some from all of you. So the high performance note, it's written in C++. The idea being that if you want to write a database, you probably want to be as close to the bottom of the stack of abstractions as you can get. It makes extensive use of memory map files. The basic IO mechanism of the database is to take its data files and map them into memory, and then read and write directly to those rather than implementing a block store. There are other storage engines on the horizon that will not function this way. And I hope we'll get a chance to talk about that a little bit later, too. The data serialized is Bison, which I alluded to a minute before. That's a binary encoding of JSON. And we won't look at Bison the specification too closely, but if you just Google Bison spec, you'll get a link to the specification for it, which lays out the binary description. We'll mostly, when we talk about documents, look at them as JSON, because that's more or less how human beings interact with it. We have full support for primary and secondary indexes. And this idea of the document model, as opposed to the relational or row model, the idea is that it's really actually less work. So we mentioned full featured earlier. This is just a quick rundown of some of the features that MongoDB offers. We offer ad hoc queries, right? So we have a query language, and we have this document model. You can basically combine a pretty rich set of query operators to query that data in rather arbitrary ways. We have real time aggregation. You can build a pipeline of aggregation steps that you can ship to the database. It will be executed in C++ on the database. Those pipelines are expressed in JSON. We also have a MapReduce facility that will allow you to execute JavaScript on the server, although our general belief is that the aggregation framework is a preferable way to accomplish the same sorts of things you might do with the MapReduce system. We're traditionally consistent or replicated in that you can spin up multiple machines and arrange for data to be replicated across them. You can use this facility to achieve read scalability and failover in the situation where a machine dies. We have a number of geospatial features that we offer that allow you to do both 2D and 2D sphere queries. We have drivers that support many, many programming languages. I think that the official set that we offer probably can't list them all at the top of my head, but it's about 10. And it's the ones you'd expect. C++, Java, Python, Ruby, Scala, PHP, a few others that I'm missing. This has flexible schema. Other people like to refer to the database as schema-less. I think flexible is probably a better way of putting it. Your data still has to be able to be expressed in this nested document structure, but that's a pretty rich structure. You can express a lot of things in that. And as mentioned briefly before, it's horizontally scalable. So this very lacking in metrics graph lays out the idea of where MongoDB is supposed to fit in the depth of functionality versus scalability and performance landscape. It's supposed to fall somewhere in between. It's supposed to be high performance and highly scalable with a lot of functionality. But we do trade some things off. There are features that a traditional RDBMS can offer that we can't. And we'll get into why that is and what some of those restrictions are in more detail in a few slides. So this is sort of the conceptual mapping of the components of an RDBMS to MongoDB. So in an RDBMS, you have tables and views. The MongoDB term of art for that would be a collection. Collection is essentially a grouping of documents. There's no restriction, there's no sort of, all these documents must fit a given schema. It's literally more or less a bag of documents. Again, a row would be a document in a collection. An index is still an index. A join is an embedded document, which is sort of an interesting thing to think about. A foreign key would be a reference and a partition would be a shard, right? So if you are striping your data across several systems, we refer to that as a shard or sharding. I'm gonna skip that. And this. So we're gonna run through a quick example to sort of look at what this document model looks like and how you could use it to do a very simple model of a library management application. I don't wanna spend a ton of time on this, but it's good to see it sort of and understand why it works the way it does. So if you're doing this in a relational database, you'd have some complex diagrams that laid out the relationships between the different types of entities. You'd create tables. You'd probably use an ORM or you might use an ORM to map those tables to objects in your programming language of choice and build an object model there. You'd have tables just to join other tables together. So you'd have a lot of tables with just five things, right? And it would probably take you a while to really get the schema right, unless you're very good at it, right? In which case you can probably just hammer it out. So MongoDB emphasizes a different approach, which is start building the application that you wanna build and allow the data to evolve independently. We don't have to come up with a schema at the beginning. We don't have to lay out the tables at the beginning, right? So we're gonna create a couple collections in this model application. We're gonna create a users collection, books, authors, and publishers. So let's look at a user. So this is a simple user. If you're familiar, is everyone familiar with JSON notation? Anybody not familiar with it? Okay, great. So this is a simple JSON document, right? Or whatever you wanna call it, a dict that defines a user, right? So this user has a username. Apparently we like redundancies. First name is Fred and the last name is Jones. So we can insert that record. This is showing how you would do it in the MongoDB shell, which is a interactive JavaScript interpreter that knows how to communicate with the database. So we would be able to take this object that we constructed and inserted into the collection called users. You could also type this as a literal into this statement and it would still insert it. You don't need to have captured it in a variable. And then we can query that collection, right? We can ask the user's collection to find one of the elements it contains. It only contains one at the moment. And we get handed back the object that we put in a little bit tweaked, right? Now it's got this ID component that it didn't have before. The ID component is the primary key. It's automatically indexed and it's automatically created when you insert something if an object ID does not already existed. So in the case of this object, we didn't give it an object ID. So when we inserted it, it added one for us. If you're interested, you can break down the object ID into some internal components. It's got a timestamp, uses the MAC address of your local machine of the PID process and it hasn't increment. These are used mostly to simplify making unique IDs. For what it's worth, you can also, you are not required to use our object ID type. You can use pretty much anything as an object ID. Though they must be unique. So we're gonna work on another collection now. This actually shows the style of inserting an object as a literal rather than from a captured variable. And here we're gonna insert some information about JR Tolkan with a pretty lengthy biography. And again, we can find this object. But this time we've included a filter which allows us to specify which objects we're interested in. In this case, there's only one object in the collection but you can imagine if there were many by having passed this last name filter, we only will have selected this document out of the collection. And then we're gonna insert something into the book's collection, right? Something interesting to note here is that we are inserting it with an object ID already. As I mentioned, that's completely legitimate. You don't have to rely on our object ID auto generation. And we also have something in here which is an array. So we've defined it as belonging to two different genres. So in this case, we can also, when we do a query, project which field to get back. So in this case, we're pulling out the genre of fields. As you notice, the book was actually had more descriptive entries but we don't see them in this query because of the genre one built for projection. And here we can actually query with single values or multiple values the same way. So, and here's an example with a nested value, right? So this allows us to get just the publication. And we can see all the nested fields that are contained within it. And then finally, you're not restricted to just accessing top of the fields. You can also access what are called dotted fields so we can use the publication date as a filter as well. So we can reach into documents and query on their nested sub-objects. We can update objects, right? So this allows us to go in and set a new field. So you can imagine that if you, we're using an RDVMS and you've described your schema for the book's table and you had neglected to have an ISBN number for your books and you wanted to update your application so that it actually inserted ISBN numbers, you'd have to go back and recreate your tables. MongoDB does not require you to do that. You can just add new fields into these objects in the collection. So in this case, if we go back to the book, there's no ISBN number. After we execute this update, that one document that was affected by this update, notice that because object IDs are unique, we are selecting exactly one object here. So we know that we're only updating the object that we want and we're assigning it an ISBN and a page count. The other objects in this collection do not have ISBN numbers or page counts, but now this one does. So the idea is that your application is now responsible for understanding the sort of varieties of documents that can exist in your collection. The database doesn't actually care. So we can see if we go and execute query against the books, we'll actually now get back something that shows the ISBN number and the number of pages. You can also index on these fields, right? So here we're creating some indexes on the title of genre. If you remember, the genre is actually an array, which is an interesting facility and we can also select the order of those indexes. There are actually many, many different index types in MongoDB. As I mentioned earlier, there are some Geo Query indexes, but for the most part we're going to stick to discussing traditional indexing. You can also query with a regx. So this lets us look for anything that starts with a fell. That'll match fellowship with a ring. So we can insert a couple more books. We'll have the two towers and the return of the king so we have a whole trilogy. And now when we do a find, we're going to get back not just one object, but we potentially get back many, right? And the result of find is actually a cursor which you can iterate to get each document that matches your query. So here we look for a book by title. We can look for a book by offering. So that's like a really, really quick overview of sort of MongoDB's query update and indexing model. Any questions on that? Yes. Is there a way to clear synonyms or something like that? Well, aliases for not easily know. You basically need to know the dotted path to a field if you want to manipulate it. So if I have one set of users that call something something, the same logical entity will just call something else somewhere else. I would have to... Your application would have to be aware of the fact that you had different names for the same things. Yeah. So one of the sort of, and I'll get into this a little bit later, but one of the philosophical views of this database is to push some of the complexity to applications. So in this case, yeah, if you have data that doesn't fit like a logical schema in your collection, your application needs to be able to deal with that or be prepared to deal with that. So we talked really quickly about schema-less design, right? The fact that you can inject any new field into this. At a high level, the other features that I have slides to discuss in here are the indexing system. It's a pretty standard indexing system based on B-trees. If you're familiar with B-trees-based indexing and relational databases, you're familiar with indexing and MongoDB. We have a replication system. This slide gives actually a fairly decent overview of what the replication system is able to accomplish. Typically, you have a primary node which is able to accept writes. Those writes are propagated to some number of secondary nodes. Secondary nodes can be used by the client application to consume data, so you can read from secondaries if that is acceptable to your application. There are reasons why it may be or may not be acceptable. And we have sharding, which allows you to strike your data across many potentially replicated systems, almost always replicated. There's an additional component, actually a few additional components that come into play in a sharded configuration, particularly this other instance, or other server called MongoS, which is the Mongo sharding server. And there are also config servers, which are used to store metadata about your configuration. So I have much more detailed slides on all of these areas, as well as aggregation. I think we're probably gonna skip aggregation. But I also wanna talk a little bit about, given that set of features, why are those features the features that MongoDB offers, right? So you've got a huge number of databases that have sort of sprung into being recently, right? There have always been sort of a wide population of research databases, and then the sort of big population of traditional well-known databases. But if you go and look at the sort of like DB engines rankings, whether you believe in those or not, the list is now enormous, right? There are literally, I don't know, hundreds of databases on there. And there seem to be new ones every day. 250. 250, yeah. There are a lot of people writing databases. And it's sort of interesting to me to sort of draw a parallel to, you have this punctuated equilibrium point where people have created a huge number of databases, and some of them are going to be hallucinogenic up here, that like is an interesting fossil, and some of them are going to be very successful. And I think to sort of navigate that space of databases, it's important to look at them in the context of why they have the features they have, and what part of the database design space they inhabit. So new databases don't spring into existence fully formed, right? Existing databases evolve into new projects, they get forked, they get turned into experiments, they discover new niches. Academic research and spin-offs is another source of new databases. And sort of there's also the, somebody recognizes that there's a deficit, right? There's a space that isn't being covered effectively by the databases throughout there, and there's a good commercial opportunity. Column stores and finance are an interesting one. Maybe unusual requirements, right? Like I need an embedded database that I can ship on some very limited system, or this very esoteric system. So MongoDB didn't exactly get started any of these ways. So long in the past, long before I worked at MongoDB, long before most of the people that I know for MongoDB, there was an app server called Babel. And you can actually go read about the history of Babel on the link that's here. If you just Google Babel MongoDB, I think you'll find this link from Christina, who is one of the early engineers at MongoDB. And the idea of Babel is to be essentially a Google App Engine-like service, except that the language for applications was going to be JavaScript. So right there, you can sort of see one of the early relationships, right? Which is if you're building an app server in JavaScript and you have JavaScript objects, presumably the thing that you want to store in your database is JavaScript objects. And the most natural way to deal with that is to restrict it to JSON. And then if you're gonna do that, you probably want an efficient encoding of JSON that you don't have to parse with parses text, so you end up with a binary encoding. So even right away, you can sort of see how Bison might have come into being, right? So as it turned out, the JS App Engine wasn't the particularly compelling part of what the early build produced, but the persistence layer sort of was. So every database that is out there is a set of design choices, right? And one of my friends used to make this joke about UnicornDB, right? So it has all these features that this impossible feature set, right? It's perfectly consistent, perfectly available. It's strictly typed and easy to upgrade. It's got unfettered performance and asset, horizontally scalable, and joins, right? The whole list. And you know, this is not achievable. I'm not the right person to get into the academic reasons for why it's not achievable, but I believe that people have good reasons for making those arguments. And as an engineer working on the code base, I believe it as well, right? Looking at the complexity of implementing any part of this design space, trying to cover the whole thing seems pretty unreasonable. So you have to make choices and those choices have design consequences. So I want to talk about what some of the trade-offs that MongoDB makes and why it makes them. So we give up joints. This seems like a big thing to lose, right? But the trade-off for giving up joins is you get easier horizontal scalability. If I don't have to worry about going out and collecting data from several places in some sort of consistent way, then I can stripe my data across systems and I can migrate it from place to place with far fewer constraints, right? I give up transactions and acid, but I gain saner replication semantics and again, it simplifies horizontal scalability. So if your application could tolerate those two trade-offs, well, then MongoDB starts to look appealing, right? If your application can't, then it's probably not in the right part of the design space for the solution that you're trying to build. A similar one, right? We give up on schemas and SQL, but what you gain is flexibility in data modeling, right? And in particular, we were talking earlier about I might have one book in a collection that has data that's different than the others. That's a problem potentially for your application, but it's also a benefit. Once you've written your application to be aware that the form of data might vary, especially over time, you can build in intelligent mechanisms to handle those kind of variations. You can actually get a lot of flexibility. I can deploy a new version of my app that's gonna start writing user records with new fields in them without ever having to shut my database down or do a schema migration or failover to another system that's already been done, right? So a lot of the people who were working with this very early, some of the early adopters of MongoDB, a lot of them were web startups. And for them, it was huge because they were iterating very rapidly. They didn't want to be tied to a particular schema. They didn't know what they were building exactly, yeah, right? So if you're doing a lot of upfront design and you have a really good process for determining what software you're building and what exactly it's going to be and you can do the upfront schema design, that's great. But if you can't, having something that allows you to just pour data into it and deal with the consequences later is kind of nice. But there's also no requirement that you use that feature, right? Your application is free to enforce a schema. You can build collections that have structured data in them where you know exactly what you're working with. So it's, again, a trade-off, right? If your application isn't prepared to deal with the fact that your data might not be exactly what it's expecting, it is gonna be a problem. But if your application could deal with that, it might be very useful. And then another interesting one that we're gonna chat about more later is giving up on single-node durability. And the idea there is that you gain performance, right? If you have a single node and you don't have to worry about whether it dies or not and doing all the complicated recovery mechanisms that are involved in single-node durability, your stuff can write data a lot faster. So again, if you go back to that sort of early history, you can sort of see where some of these ideas are interesting to someone building an app engine, right? So if you're going to have dynamic load, right? People have applications that suddenly have a mob of people who are interested in them, then that horizontal scalability is very valuable to you. If you can just go to AWS and spin up new machines and tie them into your cluster, that's a very, sorry to do a question. I did, I haven't fully formulated it. Okay, we'll come back to it. So if I think about the ways that people use MongoDB, there's the stupid way, which is the, we just haven't put time into thinking about our schema and therefore we're gonna throw a bunch of crap and everything's gonna break. Sure. And then there's probably a smart way. And is the right way to think about the smart way is that MongoDB is for applications whose data is describable by a grammar that does not actually fit into the fairly simple schema that typically fits. It's an interesting way of looking at it. I look at it sort of as the smart way to use MongoDB is that there's sort of a skeleton of your data that exists in every document, right? That there's certain assumptions that you can make about your data that are true for pretty much all the documents in your collection, but there might be variations in those documents outside of that sort of implied schema. Is that right? So you're sort of suggesting that I might, I'm traversing this document and I may or may not find this field and if I do, I can expect that there'll be a pattern in that sub-document and so forth. That's why I call it a grammar. It's something where it's more flexible. A well-formed XML document or HTML page if someone actually did it right, follows a grammar. It's an ugly one, but it's a rigorous set of rules that applies to it. Yeah, well, kind of, I mean, sort of, I mean, Jason does have a grammar, right? So there already is a high-level grammar, but I think you're asking more about a grammar in the sort of meaning of the semantic grammar, yeah. What's someone's throwing into the data? Yes, I think that is the smart way to use it more or less, right? You don't want to be dealing with a collection where there are no facts that you can prove about the collection, right? It's very hard to write an application in that environment and most people using MongoDB aren't working in that environment. They have documents that are quite regular, actually, but they might have another way to look at it is you might have documents that vary over historical windows, right? So you have documents that were generated during sort of like V zero of your application. So when you deploy V one of your application, it has to be aware of what the structure of V zero documents were and it has to be able to react to them intelligently. That intelligent reaction might be to upgrade them to the V one schema of the document. I say schema, but, and then it doesn't have to worry about it. So you sort of age your elderly documents into this newer regime. I think people who are using it intelligently are thinking about it that way. Yeah, it's definitely not pour a bunch of random stuff into the collection and expect to be able to write a meaningful application against it. It might be interesting to try, right? Like give it a collection, give it an arbitrary collection, like what facts can you prove? At least you can tell how many documents there are, but so there's at least one thing you can prove. So I think we were just sort of talking about flexible schemas. And then the last one that I've mentioned a couple of times and is also related to what we were just discussing is pushing complexity application, right? And that it's a real consequence, right? If you were writing an application, a real world application that you intended to deploy, you do have to be cognizant of the fact that you might get a really weird object back from the database, especially if your application, you know, was erroneous in some fashion, right? And wrote a document that wasn't something that expected to consume. You probably won't get an error when you insert that document. So you should be prepared to handle such things that you consume. So the early versions of MongoDB, again, sort of before my involvement, had what I like to think of as an opinionated take on these trade-offs. So if you have replication, right? Like, why care about single-node durability? I've replicated my data. It's already over on this other machine. I've got failover. My application knows how to deal with it, presumably. And if the node dies, that's fine, right? So why would I actually ever care about single-node durability? So it didn't implement it. Similar view of writes. If you have an intelligent application, right? Because we've already assumed you're putting intelligence to the application, why would you need to check in on the state of every write you do? You might not care. Maybe you're consuming clickstream data. And if you lose a click because of a network hiccup, it's really okay. Or maybe you're not. And you actually do need to understand whether your data was written. In the early days of MongoDB, the default was that writes were unacknowledged. But you could always ask for an acknowledgement of a write to understand whether it was successfully written. And you could actually specify criteria in terms of how it had been replicated across the cluster. And finally, if you have horizontal scalability, right? If you have lots of machines to which you're sending data, why would you worry about locking on a single node? Presumably you've deployed a huge pile of machines, right? Like this is sort of the early days, like Google architecture, which is like I've got lots of single-core boxes. And I don't really care about concurrency. So if I'm only doing one write at a time, I only have one core to do it on. So if I have horizontal scalability, a per-processed database lock actually sounds fine. And there are other trade-offs like this that were made in the early versions of MongoDB. So over time, the sort of MongoDB's take on these trade-offs has not gone away, right? Like it still represents the same point in design space. But we've gotten maybe less opinionated about how we express these design choices. So we now have a single-node durability. That's actually been in there for some time. Yes? Is that the journal in the future? Yes, it is. Okay, let's switch it on or off. You can switch it on or off. It is on by default in all modern versions, but you can turn it off. I think the flag is just called dash dash no journal when you start up MongoDB. You should think carefully before doing that. Right, so writes are now acknowledged by default. So an interesting thing about this, so now if you issue a write by default, you will get a status for it. This is really more of a driver's issue, changing the default for this. And we now offer database-level locking, not server locking. So if you have multiple databases that are inhabiting the same server instance, they don't compete for the same lock. You still have a global lock for the database. And we have many things that we actually do to mitigate the consequences of that. People hear database-level locking and they assume that it's just a big mutex at the top. That's not strictly accurate. I don't really want to get too much into the locking details because they're actually changing a fair amount as things are happening now. So we sort of migrated away from this opinionated view of these trade-offs to a more safety by default. But if you understand the consequences and you want the performance or to sort of go deeper into that part of the trade-off, we give you the ability to make those choices. So, yeah, sorry. Could you elaborate a little more about what exactly a write acknowledgement is? Is it similar to sort of like the Cassandra write consistency? I'm not super familiar with Cassandra, but the basic idea is that if I issue a write to the database, after I've sent the write, I can issue a request to block until it has reached a certain number of nodes. I can specify that I want this to, that I want to block until it's replicated to a majority of nodes in the set and some other things you can request. So you can choose one majority or all? Yeah, or unacknowledged. Or unacknowledged. So that's pretty much like Cassandra. Yeah, I think so. So, yeah, so there's some interesting questions about this, right? So, one thing that we're looking at is document-level locking. This has been a popular request because if you think about it, maybe you're not in the model where you have hundreds of inexpensive machines. Maybe I actually do have a couple big iron machines and I've got 128 hardware threads and I really want to exercise this machine. Moving to something that gives you document-level locking would allow you to do that and you don't have to be in this model of I need lots of physical machines. There's been talk about schemas, right? I haven't really thought about this too much but it gets to the idea that you were talking about a little bit. I'm sorry, I forgot your name. Sorry, Dave. It's an interesting idea, right? If you could actually prove facts about objects in your collection. So anyway, we've moved to a sort of more opinionated or less opinionated, more flexible model, right? Where you can get those advantages if you want them but the faults are more reasonable. So overall, the point that I kind of wanted to get across is that like MongoDB is not something that just sort of people sat down and wrote one day. It was a set of design choices that were informed by its history. There's a particular set of design trade-offs embedded in the database. We originally started with this very opinionated presentation of those trade-offs and have moved towards a safer one but they're still there if you want them. So the next part of this talk, I think we have about 20 minutes left, actually a little more, is sort of choose your own adventure, right? So I have several topics that I have slides for that might be interesting. In particular, I've got some topics on design, indexing and query, replication, horizontal scaling and aggregation. So I think we should try and cover one or two of those. I think replication is usually the one that people find more interesting. I'm not so keen on doing aggregation or schema. So maybe replication and sharding. Yeah. Yeah, okay. People don't need to be in any cases. Yeah, it's the cool stuff. All right, so ignore the fast forward here. Okay, so we'll talk a little bit about replication and replica setup. So we're gonna be playing with this diagram for a good bit, right? So this represents three physical machines. I mean, maybe three logical machines, let's say, right? And we wanna set up a replica set between them, right? We want data that we write to one of them to become available on the other ones. So we set up replication. It's pretty easy to do. There's some very good tutorials. If you look through our documentation, there are also the MongoDB books from our iLeague have a pretty good walkthrough of how to set up a replica set from scratch. So let's assume that we've done that and talk about sort of what the mechanics in here are, right? So all of these nodes are communicating via heartbeats, right? They're all pinging each other to see whether the other is there. One of these nodes is designated as a primary. We don't do multi-master. There's only one machine in the replica set that accepts writes at any one time. The other two machines are secondaries. So this diagram is actually a little bit misleading in that it makes it look like data is pushed from the primary to the secondaries. And that is sort of conceptually what is happening. In reality, the secondaries are pulling data in a sense, right? But it doesn't matter. Conceptually, the data is flowing from the primary to the secondaries as writes come in. So you can imagine that our primary dies, right? Primary goes down and has a hardware fault. It's no longer functioning. So now the remaining two nodes, right? The two secondaries have to decide amongst themselves who should become the new primary, right? Because we want the cluster to become available again so they can start accepting writes. At this point, we've had a successful failure of a write. Like, node two has now been decided as the primary. And the criteria for how nodes decide who's a primary and who are not are fairly complicated. But it basically involves who's believed to be most up to date. There's several situations in which nodes can decide that some other node is not eligible. So the election process is interesting. And then node three eventually comes back online and that is a recovery phase where it now starts replicating from the primary, right? So now node one and three are up again. Node two is now available to accept writes. We've failed over with one and three becoming secondaries. That's the basic mechanism of replication to fail over. So as I mentioned, only one node in the set is able to accept writes. It's always the primary. So you actually do have a, if you were reading from the primary, you will read the data that you wrote. So we are strongly consistent in that sense. However, there might be situations in which you don't actually need that level of consistency. It might be okay for you to read data slightly out of date. Or even, you know, considerably out of date, depending on what you're doing. In that case, your client application can make a decision that it's acceptable to read from secondaries. And this gives us sort of idea of how that works, right? Your writes are still flowed to the primary being replicated to your secondaries. Your clients are directly connected to the secondaries and performing reads against them. This is actually a pretty useful feature in some cases. So we can look at how this sort of works more practically in terms of organizing your machines for a replica set. So if we just have one data center, right? We're pretty vulnerable. You know, top of rack switch failure, power network, you know, the whole thing can go down and then your replica set is gone. So really, you want to actually move at least one of your members to a physically isolated data center. This will get you some of the way there, right? But really what you want is to have probably five nodes, right? In this way, you can guarantee rights to do data centers. There's a facility in the way rights are managed where you can specify which tags are to be written to. This gets into the thing we talked about with you earlier about how you can specify your rights, right? So you can issue a right and say, I want to specify that this goes to machines tagged in the following way and use those tags to guarantee that you get multi data center resiliency in your rights. So that's kind of a quick overview of replication. Before we move on to Sharding, I should get a chance for questions. I'm sure there are some. Yeah, is there a similar behavior with the reads? So if you have multiple copies of a value replicated throughout your data store and you ask to read that value, can you specify sort of like a read consistency level? It's called, so the technical terms are write concern and read preference. And you can specify a read preference that allows read from secondaries. So you can say, give me any value of this variable. I don't care if it's consistent versus give me. When you're reading from secondaries, you're definitely not consistent. You're definitely not consistent. Well, okay. You are not guaranteed consistency, right? It's possible that everybody does will see the rights that it did if the timing works out. But yeah, there's no guarantee of that. And in fact, you can do some interesting things like you can create delayed secondaries, but presumably you wouldn't want to use that for, you probably want to make those invisible. You can also have hidden nodes in your network for various reasons. Like you want to get back up or snapshot. If you want to have a delayed copy, say you do something terrible, like drop a collection, it can be pretty difficult to recover from if you have a delayed secondary. That operation may not have, or you've corrupted a whole bunch of records somehow or corrupted a whole bunch of documents. So there's a lot of interesting aspects to replication that I'm glossing over in the interest of time because I want to get across sort of a feel of like what this database offers rather than do a deep dive on any one piece of technology. But there is a lot of information out there about how replication works. And I really encourage you to check out the books because they do a pretty good job of explaining in more detail all of these aspects. So let's talk a little bit about the other side of distributed databases, which is, there's this tradition of vertical scalability. My database is growing larger, right? Well, I need a bigger machine. And machines get expensive really fast. If you've ever, back in the day, I remember going to some microsystems website and just for fun, pricing out one of their like 128 core Spark machines along with storage and it was several million dollars. And people would buy those sorts of systems to run databases on for exactly this sort of reason, right? They needed a huge amount of ground to store the working set for their database, okay? Horizontal scalability is the opposite approach which says I don't want a single machine. I don't want a piece of big iron. I want to be able to tie new machines in an existing system and scale it by just turning up new data in the US instances or turning up new physical machines in my data center. And it again comes back to this idea of working set. So every database that is sort of based on the idea of raptor to documents and an index over those objects has a working set of data that includes both those documents and the instance, right? So as that working set expands, you're great as long as it all fits in that, right? Things will be fast. Things will update quickly, you know, the operating system and the, you know, if your database offers one block storage subsystem will do a great job of keeping data in the right place. MongoDB's approach to keeping the working sets for that responsibility more or less impirely to the kernel with the current storage engine that's shipped in MongoDB 2.6. We simply map these data files and we read and write to them and let the operating system worry about which pages should be in memory when. But at some point, if your database keeps growing, your working set is going to exceed your physical memory. And at that point, you start thrashing, right? And the idea is that if you're thrashing and you are doing big iron, now you need bigger iron. If you're thrashing and you're doing horizontal scalability, now you go turn on new machines. One is much easier than the other. There's also the issue that you might be IO bound, right? Not memory bound. Again, if you can strike your data across multiple machines that have independent IO subsystems, you can achieve a version, you can achieve better IO group of that way. So the idea is that you have a key space associated with your collection. You can define what's known as a shard key which says there's an element in my documents that allows me to decide on the basis of each document which of the machines that make up my sharded environment, this document should have it. So you can imagine that this sort of arrange that goes from minus infinity to plus infinity and arranges a segment of that line. So when we start distributing data, it's distributed in chunks across these machines. So here we have two shards and we've got three chunks that live on the first shard and a fourth chunk that lives on the second shard. This is kind of unbalanced. MongoDB will detect when shards become unbalanced and automatically rebalance data across them. So let's talk a little bit about how queries, updates, and so forth work in a sharded environment. So this shows what's called MongoS and MongoS is the MongoShard router. It's a demon process much like MongoDB which is the database server process. When you want to run in a sharded environment, your applications go longer to talk directly to MongoDB, doesn't directly talk to the actual MongoDB servers. It now talks to the MongoS server. The MongoS server is responsible for understanding which chunks live on which shards and routing requests are appropriate. So here this shows sort of the lifetime of a write or a read rate. So in this case, it reads, it says it varies. So the queries issued to MongoS by the application. MongoS, since it's responsible, for instance, balancing the data across these nodes where this data lives, is able to route it to the appropriate shard and then the data comes back to MongoS which returns it to the application. So essentially because of auto balances, once you've turned on sharded, you're pretty much good, right? You enable sharding and then you can turn it on for collection and it will automatically partition and balance your data across the nodes. So the shard is a single node. It can be either a standalone MongoDB instance or it can be a replica set from our previous discussion. In reality, these are almost always replica sets. You would lose one of your shards if this machine were to go down and if you bothered to set this up, you probably don't want that to happen. So in most deployments, I would expect you to find that you have the every shards in fact replica set. The metadata about the sort of topology of the cluster and where data lives is stored on the call to config server. You can get away with having one of these if you're experimenting, but for a production environment, you should not do that. And it's not a replica set. It uses a different unit protocol. Config servers are interestingly different beast, but you can somewhat ignore them in daily life if things are working well. So MongoS, as I mentioned, acts as a router balancer. It doesn't have any local data. It actually uses the config database and these config servers to manage the metadata. So you can have one of these in many. So it doesn't represent a single point of failure in your stack, right? Every app server potentially could have its own MongoS running, co-located with it on the machine. So this is sort of what the whole picture looks like. You have your app servers. Each one of them has an associated MongoS. This relationship could be like, you could have many app servers sharing MongoS. You could have it one-to-one. It's really up to you. You have the config servers which MongoS is using with its two-page commit protocol to ensure that it understands the organization of the data among the shards and understand when things like rebalances are necessary. And then you have the data that's actually stored in each shard. So there are a couple different modes that a query can take. So if I issue a query that I can identify that there's not exactly one shard, I only need to actually send it to that shard because I know it lives there. So I'm basically done, right? The shard returns the result to MongoS and we get that data back in our application. So it's a straightforward process. But it may be that I issue a query that isn't targeted, right? The data might live on several of the shards in my database. So in that case, we actually have to fan out the request to all the shards in the database. So MongoS will take care of that for us. We don't have to actually go and iteratively communicate with them. And then MongoS accumulates all of these results that are returned from the different MongoDs for the shards and returns a coherent result to the original client. So there's also sorting that can be done, right? You might have a sort associated with your query. The sort is performed locally on the shards. The shards return those sort of results to MongoS. And then since they're already sorted, this is actually a fairly simple merge sort to turn these into what you really want, which is a sort of view over the data and all of the shards. And that's returned to us, return to the application. So we should talk a little about shard keys because shard keys are very important. If you are setting up sharding, you must pick a shard key, which is a field in your documents that you are going to use to decide which shards you'd be able to live on. The shard key becomes immutable, right? Once you've decided that something is the shard key, those fields may no longer be altered in the document, and the database will actively stop you from doing so. It was actually, implementing shard key in your ability was an interesting technical challenge for its reasons. Why? It's hard to, it's expensive, right? Like if I have a shard key and I send you an update, that update may have not have affected some way down in the depth of the document, but I still have to do the work to go validate that you didn't actually touch parts of the document that you're not allowed to, and that requires that I go look at other parts of the document. So you'd hope that when you update a document, you only have to look at tiny little parts of it. But adding a shard key is sort of this odd side case where you have to go re-examine it and say like, okay, now what did I change? Right, did I change anything in that way? There's similar restrictions on manipulating the ID. Let's see, so, yeah. You've got, there are also limits on the size of the shard key, so it can't be huge. There are many, many things to consider when setting up sharding. The cardinality of your shard key is important. You wanna make sure that it has lots of different values that can be used to distribute things in good ways. You wanna understand where rights are gonna be going. You wanna understand like what the impact on your queries will be. So setting up sharding is definitely, if you were going, if you expect, you will be setting up sharding. It is important to do the upfront work to reason about the shard keys and reason about how you want your cluster to behave. It's very easy to get in trouble with sharding by picking the core shard key, yes? When you're very balanced with the shard key, then you evaluate the user's sharding range, it's like that. How do you tell this to the application? They shouldn't know. I believe that applications are isolated from knowledge about rebalance. During the rebalance, I think that data exists in both and is migrated, but I'm not up on the details of how balance and rebalance works in practice. But I'm pretty sure that applications do not need to be aware of it. It's handled transparently. Okay, so the application just has the responsibility to choose a attribute as a shard. The designer of the cluster has that responsibility, yeah. Okay, and then it's a manual DB's choice to place this record anywhere. Correct, among the shards that are available, yeah. Okay, so the application does not specify I want to have those records within certain range for certain sharding. No, it does not. It's intended to be as transparent as possible, right? The application should be sort of agnostic about the details of how the sharding infrastructure works. A lot of what MongoEsters responsibility really comes down to is hiding the details of the sharding from the application, right? Taking care of routing, dealing with rebalancing, coordinating config servers to maintain an understanding of where data is currently looming. So MongoEs has a lot of responsibilities. It does, and also, you know, what is to prevent the application to change the shard key? As you said, you shouldn't change it. Well, you can't change the shard key. You also can't change the shard key's values, which is an interesting thing, right? Once you are using certain values in your documents as the shard key, you're no longer allowed to issue a right to mutate those details. Yeah, I understand. Because now where is it, right? You're like, the document was living in the shard, you reached over, and now it's an alien, right? It's in the wrong place. So you have to prevent that. You have to prevent that. And the database will stop you from doing it. We will, if you attempt to do so, we will issue an error. There were some edge cases where it was difficult to do that, but we have resolved those type of things. So it's pretty good. That's telling you if you attempt to stop believing. Yeah. Overall, I feel MongoDB shifts a lot of responsibility to the application driver. It definitely does. For example, now if you traverse a secondary index, you are not guaranteed to run up all the documents. Supposed that some record is missing that attribute. Yeah, and a lot of the consistency in consistency, we have not, there are places where strong consistency is not achievable, right? It's in conflict with other goals in the database. And the application is absolutely responsible for being aware of that. As I said, if you go back to the UnicornDB slide, everyone loves right applications against it because it all just works perfectly. But we believe that applications are already complicated and they're used to dealing with the real world and giving them some like asking them to deal with this to doesn't seem to be too much in practice. No, I agree with you. I actually like the database records to be expressed in JSON. It's just a much more flexible. It is. You can evolve. It is way more flexible. And avoiding having to like take down your machines and do scheme of migrations, especially for people who are iterating rapidly in application, it's, it changes your style of development in a really like pleasant way, right? Technology that does that is unusual maybe, right? I think we're getting kind of time. I want to skip ahead over aggregations since I didn't want to talk about it. The aggregation framework is really interesting and useful. If you're using MongoDB, I definitely encourage you to play with it. It's one of the more powerful facilities for writing interesting applications against the data in your database. So we'll go over these really quickly. I'd hope to talk about this more, but we are running out of time. So I want to talk about some of the things that I've been working on recently that are relevant to actually writing a database that you can run in production and believe in. So any code base has interesting problems. There's the line about in theory, practicing this theory, I've modified this to in practice, you are doing terrible things with pointers. You probably just don't know it yet. Sometimes whole subsystems need to be rewritten. We've done this more than once. How do you do that safely? How do you take a system that has been developed and evolved and is working correctly and rebuild it to do the same thing in a way that you actually believe you've gotten the same results? And the actual ecosystem in which you write your software is evolving all around you, right? Languages, coding standards, tooling, all these things are changing. And it's my belief that if your code base doesn't keep up with those changes, it eventually sort of stagnates, right? And nobody wants to work in that player. So there are interesting tools and technologies emerging for how to write applications. And I think some of them are particularly relevant to writing the data base. So here's an example of evil. So can anybody tell me what's wrong with this code? You're casting a character into a value. I'm taking a buffer. I'm jumping ahead some offset. I'm casting that to a player to double. I'm dereferencing that to get it double. Where do you want us to start? Anywhere you like. Alignment, it's representation of floats in a machine-independent way. Yeah, exactly. So first of all, it's just playing on defined behavior. The compiler is legitimately allowed to do whatever it wants for this line of code. If you just decide, I'm not gonna compile that today. You can't graph for this. You can't use the C-style cast. Even if you wanted to do this for some reason. You could use reinterpret cast, and then I could at least find it. But you've done it with the C-style cast, which is just awful, right? You mentioned unaligned reads. So unaligned reads are double. Let's go work fine on that 66. As soon as you try it on Spark, you're gonna get an immediate signal. Process was just terminated. And as you said, it might not be idrily. I think I got that right to the 754. It was early when I wrote this slide. How do you know the dual of the machine actually uses that in coding for dollars? You pretty much don't, right? And we didn't even bring up any of this, which is yet another wrinkle. So how do you fix these kind of errors? Because these actually happen, right? People have written such code, and that code runs. So you can't just wait for it to crash. You can't go through the entire code base, like looking at things one by one. But plans don't define behavior sanitizer will flag this, right? So we can't use reinterpret cast, if I said, as we already discussed. So you need to convert this into that kind. So you think about how this database was written. Well, we have all these MAP files, right? And we have this in coding called deson. And deson reads and writes, integers and bubbles and other things into and out of these buffers all the time as part of its daily operation. It assumed x86, and it assumed that you're filed with honor attempts to read and write through reinterpret casted pointers. We no longer want to do that. So we're undertaking an effort to actually go and fix all of these. So that's been an interesting project. We're actually trying to make long be the ending in the lineman, the alias in safe. So that we can turn on things like the behavior checker in Clang and identify the real undefined behavior, the interesting undefined behavior, or a more importantly, novel undefined behavior that gets introduced. It's very easy to write code that looks like it's just fine, but in edge cases, compilers allow you to do things that you might not expect. For instance, comparing for ordering pointers into pieces of memory that are not the same aggregate is undefined behavior. Compilers allow you to just assume that they pointed for the same thing and not even do the check. So we would very much like to use the undefined behavior sanitizer, which is, and do they actually use undefined behavior sanitizer? Has anybody used any of the Clang sanitizers? Addressed sanitizer? So if you haven't, and you're writing C++ site, definitely recommend trying them. They will find interesting problems with your code. We have found interesting problems with our code by running them. And we would like to be able to run them automatically every day. But we can only do that in a meaningful way if we fix all the existing errors, so that novel errors stand out. These kind of projects sound like they wouldn't be that hard, but they actually require a lot of effort from a lot of people to achieve them. The build team needs to be involved. The tool chain people need to get the right thing. Yes? What issues have you found with using Java, you can access the MongoDB, and then access the JavaDB gods into the JavaDB that you recommended? In terms of writing an application in Java to communicate with MongoDB, I'd recommend using the official MongoDB Java driver. I mean, there are probably some other interesting ones. But I mean, the official one is the official one. I think it's probably the best place to start. Can I be a little tired to try and wrap up? Yeah, I think we'll be quite good. I wanted to talk some about how to do repackings as well, but I think we'll have to stop there. So, anyway, any questions to wrap up? Yes? I have two questions. Why is it that you talk about migrating to C++11? Yes. On the same track of DCE in your mind, so I'm just interested in what particular features you're interested in that will help you. I am particularly interested in the question. Oh, sure. Oh, second question before I forget. Yes. You were on the kernel team. So, what do you see in the kernel for MongoDB? Okay, I'll do the second question. The second question is, what lives in the kernel for MongoDB? And the answer is nothing. It's probably a misnomer, but the kernel team is the database server team. So, not the Linux kernel, for instance. Oh, okay. I was just wondering. Yeah, no. It's the first question about C++11. I'm actually really excited about C++11. I think that while it's a much more complicated language, majority of that complexity is an interest to library authors, to people who are writing containers or resource management classes. So, once you build a sort of good environment of those resource management classes, many of which have already been thought about a standard library, you can start writing code that operates in terms of value types that doesn't need copy constructors, right? It doesn't need constructors anymore. It doesn't need, you actually have to write your boot constructor, your boot assignment operator to actually end up as a user of a language as a user of a library writing much simpler code. And the second thing that I like about it is the ability to express things that were difficult to express before, like ownership. So, if I have a function that takes ownership of a pointer in C++03, my options are essentially to pass a raw pointer and hope for the best. In C++11, I can use unique pointer as a return value or as a value parameter to a function to express capture of ownership or branding of ownership. And if you use those resource management classes like that consistently throughout the code base, you eliminate sort of a broad space of programming errors of a type that we've seen come up in our code base in practice repeatedly. And there are also just a lot of nice things about it. I mean, who really wants to type standard vector, standard string, colon, colon, const, underscore, iterator, i equals expression. When you just type auto i equals expression and get on with your life. So, I'm actually very excited about it. We're hoping that our next release will be able to start capitalizing on something. So, not the release that's coming up this year, but the one after this. Excellent. This is great. We've got a free client form on 30. So, let's give Andrew a round of applause and hand it back to them. Thank you.