 And his talk, a deep dive into the PyMongo MongoDB driver, is going to start now. Thank you. Thank you. So a quick poll so I can know how many slides I can skip here. Who here is new to PyMongo or MongoDB? Okay, we'll do a little bit. Those experts amongst us who have crashed through replica sets in Sharding will just bear with me for a couple of minutes. My name is John Willis. I run the developer advocacy organization for MongoDB in Europe. So my job is to get you to start using MongoDB and to stick at it. Those are the databases. They're no good. MongoDB is going to rule the world. So we've been doing this for about nine years. It's a pretty mature database. And it's designed from its onset to take away the pain of programming. Anybody who's ever built a program with an SQL database knows about the pain of object relational mapping and the horror of taking beautifully written object-oriented or functional code and casting it into SQL statements. So from its onset, MongoDB was designed with a set of native drivers in mind. Drivers are essentially the client APIs that are laid to interact with the database. And it was purpose-built so that you wouldn't even have to know about JavaScript object notation pretty much, which is the coin of the realm for the database. That's what we store in MongoDB. It's JavaScript documents, not Word documents and PDFs, but JavaScript object notation documents. But when you interact with a driver like the PyMongo driver or the Java driver or the Node.js driver, you're using the types and the objects of the language that you program in. So in Python, you're using dicks and tuples and arrays and date times and all the objects you know and love. So although underneath the wire we're all JavaScript and actually a binary version called Beeson, which I'll talk about, you don't need to worry about any of that. We handle all of that complexity for you. And then beneath that, then again, we've got the document model, which is JSON-based. And then underneath that, you can effectively go straight to one of the storage engines. So the default operating mode for somebody downloading MongoDB for the first time is they run up a single node on their laptop. And in that instance, you're running directly with what we call the wired Tiger storage engine. It's the default storage engine in version 3.2 of MongoDB, designed to be highly performant and to run very efficiently on clusters with high memory instances with lots of cores, right? So that's the design center. We also have a legacy data engine called M-Map, which we used up to version 2.6 of MongoDB. If you're running legacy applications built on 2.6 or versions of MongoDB before, you can start off using M-Map before you do the conversion. We also have an in-memory database in Beta at the moment. That'll be generally available in 3.4, which comes out at the end of this year. And we have an encrypted database. So if you're storing credit card information or you're doing e-commerce data or you've got medical data or anything that you want to make sure is secured at multiple levels, we can put it in the encrypted storage engine. But the key thing is these layers are essentially insulated from you. They're operational decisions. The driver doesn't care about them. The driver works independently of all of those layers. So the great thing about MongoDB is you can start running a single node, then move to a replica set or a sharded cluster, change storage engines, move to an encrypted storage engine. You don't need to change a single line of code. And that's the real utility of MongoDB. You separate the concerns of operational deployment from development. And we have a range of security and management frameworks and technologies that wrap around this for our production deployment. I'm not going to talk about any of that today. I can direct you to lots of information on that for anybody that's interested at the end of this talk. And we're doing lots more talks about MongoDB at the MongoDB Europe Conference on November the 15th in London. I do recommend you sign up for that. And I've got a discount code for that for anybody that's interested as well. And so the drivers support just about any programming language you can conceivably use in the modern world. Obviously Python, we're here today, but PHP, C++, C, .NET node, Java, of course, Ruby, and a lot of the frameworks. So we're ubiquitous. And we're no longer in these kind of monocultures where there's just one programming language rules all. A modern, distributed, mature development organization uses the most appropriate language for the most appropriate time and place. So being able to support all of these languages is important. And one of the key tenants of the driver philosophy in MongoDB is that drivers should have effectively the same semantics so that when you move from Java to Python to Ruby, you don't have to radically relearn what you're doing. And so a lot of the semantics we're going to talk about today are essentially cross-driver in their application, although I'm going to talk specifically about PyMongo. Now, underneath the covers, we don't use just standard JSON. If you were putting JSON text into the database, it would be horrendously inefficient. You'd have to reparse it every time. So BISN is essentially a binary encoding of JSON. It's a public spec that's available. And the BISN essentially adds two things that we don't get from JavaScript. Type information so I can tell ahead of time this is an int or a document or an array or a null or a daytime. And length so I can skip objects efficiently so I can run down a collection very efficiently. I can tell if there's this many objects at this size. I can skip to the end. I can seek into it. And every driver uses BISN. So every driver will have a BISN library included inside of its driver code. And BISN is usable whether you use MongoDB or not. So lots of people take the BISN capabilities out and just use them as encoding mechanism for JSON. In fact, there's a company in Germany that is doing exactly this at the moment. And if you want an example of what they're like, well, they're like Google protocol buffers or thrift, or if you're as ancient as I am, abstract syntax notation one, ASN one. So it's essentially a binary encoding of text data. So there are three standard topologies or layouts of servers that you're going to see in MongoDB. And it's important that you understand these for what follows because the way in which we interact with these clusters is the way in which we will understand the driver helps us. So the standard one that we start off with when we download a developer kit from MongoDB and we get a driver from APIs that MongoDB is, I run the driver, I run some application code which links the driver in and I connect to a server. I've done this on my laptop a million times and it just runs a single MongoDB process which points at generally slash data. The data files are there, great for development, not recommended for production or anything other than development, but very simple to get up and running. The next and the most common deployment for MongoDB is as a replica set. And a replica set is the way in which we establish durability concerns over the data. So a replica set essentially allows us to say how important is this data and therefore how many copies do I want to keep and how are they distributed across a particular cluster of systems. So if you look at our MongoDB Atlas service which is our database in the cloud, we can mandate that each replica member is in a separate AWS region. That means we're isolated in terms of network, power supply and geography so that we can guarantee that even if a region goes down a replica set will survive. So replica sets are designed to give us data durability and they're constructed as a primary, this node at the top here which is designed to only accept writes. We can't write to any node in a replica set except the primary. And then a set of secondaries. In this case the canonical replica set is a three-node replica set which gives us full tolerance for any single node failing. But you can have up to 50 nodes in a replica set. And so when we write to the primary, the primary's job and the replica set's host job is to ensure that those writes are replicated to the secondaries. And we can specify a concern called a write concern that tells us how important those writes are to us. So if I'm writing log data, I might set the write concern to be just acknowledged which essentially says I get a round trip to the server and everything's funky dory. If on the other hand I'm writing transaction data for a bank, I need to be sure that every write goes to the server. In this case I would specify write concern a majority and that will ensure that a write is only returned as successful if a majority of the nodes accept the write. And write majority is the way most of our production clusters run. So replica sets fail. And so if the primary fails, now I have a node I can't write to. Well, the replica set's designed to automatically recover from this. And the way it recovers from this is it has an election. So the remaining nodes agree on the most stable, most up-to-date member and they elect that member as the new primary. So what we get there is that secondary now becomes the primary. We now have a stable replica set again and now we can accept writes again. And that typically happens in two to three seconds and can be faster if you've got a faster network and faster clusters. And then recovery is when we add the other node back, it will essentially do a sync with the primary and come back up with a complete copy of the data. We're now back to where we were. Note the primary is moved and the whole idea is that primary can rotate around the replica set at will. We don't really need to care where it is because the replica set will manage handling where the primary is. And part of what the driver allows us to do is to establish the connection to the primary when it moves or when it fails. And we'll talk about the protocols that it uses to do that in the next couple of slides. The last cluster environment is the sharded cluster. So the sharp eye amongst you would have realized that there's a finite limit to how much traffic you can pump through a single node in any compute cluster. So primaries give you a cap and it's essentially a cap on the amount of bandwidth or memory or CPU that that single node can handle. Well, a sharded cluster essentially removes that limit by allowing you to run a unified cluster in which there are multiple replica sets which give you multiple primaries. And in that scenario you run some additional demons called MongoS's which are shard routers. They root queries, writes, and reads, and deletes to the appropriate shard based on a key that you define. We're not going to talk too much more about sharded clusters today, although you should be aware that what we talk about what's happening in the driver, very similar driver code is used in the MongoS to establish and manage connections between the MongoS's and the driver code itself. The key thing to remember is your driver doesn't change. Single node, start off. Move to replica set, don't change. Move to a sharded cluster, don't change. There are some code changes we suggest you make and we're going to talk about those to handle failures that happen at the replica set level. So the driver does a bunch of things. It's not just a kind of a passive API, a library that you load. More often than not, in libraries like PyMongo and Java, it's a multi-threaded client. So it starts additional threads. So it handles authentication and security. It does the Python to Beeson mapping that I talked about. It handles the error handling and recovery for the event that we see failures. It manages the wire protocol so how we bundle Beeson objects onto a socket and send it to the server and handle those responses. It also has a connection pool so that we can essentially manage a set of connections over a set of servers where we don't have as many connections as we have clients. And finally, it does topology management and that's a very specific choice of words because there's a specification which I'll show you at the end which defines what topology is from a driver's perspective and what topology is essentially. The driver's understanding of what a cluster looks like and so a lot of what we'll talk about for the rest of this talk is how the driver gathers topology, understands it, uses it to allow the client to interact and recovers when the topology changes. So if you look inside the PyMongo code there is a topology object that encapsulates all of this interactivity and so when you start reading the spec, if you ever do get to that point, it's called Server Discovery and Monitoring. There's a link at the end. I would always have the PyMongo code alongside in your favorite editor because it makes it real to look at how this stuff implements it and I've got to say the PyMongo code is some of the easiest code to read I've ever come across. They put a lot of hard work and effort into making that a really manageable, easy-to-understand project. So I'll skip over this. Most of the people here seem to understand MongoDB but this is just a standard set of Python calls that we might make so we call MongoClient, we provide it with a host and a port. This is for a standalone instance so we're connecting to local host. We create a database and a collection and then we insert, we find, we update and we delete and they're all JSON documents but these are in fact, you know, obviously dits in PyMongo speak or in Python speak. So what happens when we're actually connecting to a replica set? So when I want to create or collection a replica set I can give it some number of nodes. One node will work. In this instance we're giving it two hosts and we're giving it a name, replica set. So replica set is the default name used by mTools which I used to try out some of this stuff. mTools is a package provided by an MongoDB employee which is just a great way of building clusters very fast and very easily. I do recommend you look at the mTools github repository. So when we ask the client to do this, what happens? So the MongoClient actually returns immediately. So when you call MongoClient, if you called MongoClient in PyMongo 2 it would stall because it would be establishing connections to the server explicitly. Well, in the PyMongo 3 client, which I recommend everybody use if you're not using already, it returns immediately because it's actually spinning up some threads in the background. And those threads are monitoring threads that are designed to establish the topology of the cluster. So we've given it two hosts. So the way that the spec operates is we spin up in PyMongo one monitoring thread per server. Now, we only know about two servers because we only told the MongoClient about two servers and we started host one and host two. So we start monitor thread one and monitor thread two. And what these monitor threads do is they send the call that's called isMaster. Now, isMaster is an anachronistic name for a function that should be called getStatus. It predates clusters and replica sets from when we only had a master-slave deployment model. But effectively, it's getStatus. And so isMaster will be sent to each one of these nodes. And the isMaster response that first returns, in this case, is to monitor thread two. And it says two things. IsMaster is false. That means I'm not a primary. And just to double-sync that in, secondly true tells me it's explicitly a secondary, but it also returns a list of hosts. So remember, when a replica set is constructed, it knows about all the other members. So every member of the replica set is aware of all the other members of the replica set. And it continues to be aware by a heartbeat that this replica set handles. So monitor thread two gives you a bunch of data. Now, I ran this earlier on. It gives you actually quite a substantial dictionary of data. But the parts we're interested in for the purpose of this talk are it tells me the list of hosts. Now, in this case, I've run them all on the same box. I don't recommend you do that in production. And isMaster is false. Secondary is true. And the replica set name is replica set. So what we do is we take the current topology, which right now is, I know about two nodes, the seed nodes that were passed into Mongo Client. And I'm going to combine it with new information that I've grasped by running isMaster on the first node that replied. And that isMaster has told us that there's three nodes in this cluster. Host one, host two, and host three. Excuse me. So right now we've got a monitor one that hasn't returned, hasn't got a response from isMaster. A monitor thread two that has got a response. So we now know for sure that topology is telling us we have one real node in this cluster. And as a setting, it would be set as RS secondary. There's a status field inside it. We also know that there's three particular hosts in this cluster. So we're going to spin up a third monitoring thread. So now we have three threads, all monitoring the servers, all who have sent isMasters. We've had one isMaster reply. And remember, these threads are running in parallel, but they run with all of the constraints of the global interpreter lock and the threading issues with PyMongo. So they're not super liquidy split fast, right? So monitor thread is returned. Monitor one and two are still waiting for isMaster. In the meantime, what's your code doing, right? So your code startMongo client, it returned immediately. So if it returns immediately, well, I'm going to do something with the database at some point. And so obviously I might call an insert, right? So I'm going to insert one field. Simple document, A, B, right? And it's the same replica set. So it knows it's an insertion to a replica set. This insert is going to block, right? And it's going to block because I can't write to a replica set unless I have a primary. Why? Because you can only write to primaries in a replica set. That's one of the constraints of the cluster. So the Mongo client will hold that right because it knows it doesn't have the complete state of the replica set at this point in time. It only knows about one secondary. So now, again, isMaster response from host one, right? And isMaster response from host one tells me this is the primary node for this cluster, right? Now there's a server selection protocol. It's a whole separate spec. I'm going to talk about it today that tells us how to pick servers to do particular operations. But the one thing that server selection knows is I can only write to a primary. So now that the primary's up, the client doesn't have to wait. It doesn't need to know about the rest of the cluster because the cluster will manage itself. The client knows if I can get a write to the primary, the primary will either satisfy the write concern and write to a majority of the nodes or it will return an error to me, in which case I can return an error to the client. So I can go right ahead. So this write will succeed to the primary host, right? And in fact, the primary will respond with an OK, will send the response back, and the insert will be satisfied. And in fact, you know, in most uptimes, this whole round trip will only take a matter of a couple of hundred milliseconds. So we have write succeeding, we have a cluster topology that's got three monitor threads running, but we only know about two nodes. But at some point in time later, host three is going to respond, right? And at that point, we will update the topology again. We now have three monitoring nodes with three is master responses and we have a complete view of the topology, right? And this is what we call steady state. So this is the standard operating model for a cluster. And now what will happen is these threads will essentially wake up every 10 seconds and run an IS master, right? And the IS master will say, everything's OK, they'll update the topology and go back to sleep. And they may run in parallel, they may get slightly out of sync. That's all kosher, right? We don't mind about that, because as long as we're running inserts that are succeeding, that will all be fine. Unfortunately, life has a nasty habit of intervening. You know, the sixth saddest word in IT, I've ever heard of, we have not experienced a failure, right? Or I have never experienced a failure. Guarantee, I remember I was one of the first people to use Amazon's S3 service in 2006 with building an online backup system and everybody who ever used it said, this stuff's never gonna go down. And sure enough, for like two and a half years, it didn't go down. And then it went down and took out the whole eastern seaboard of applications when the first region went down. Now Amazon had told all the developers, you need to be multi-region to be fault tolerant, resilient to the level that we want you to be. But everybody was piling their nodes into AWS. The default region was East Virginia and they took out most of the startups in North America. And so, you know, we've never had a failure. That's just, you know, I guarantee you next week is gonna be your first failure. And then you can never, ever say that again, ever, right? So, life intervenes. So we whacked the primary host, right? Now, the threads only wake up every 10 seconds. So you can whack the primary host and the cluster could recover and in a lightly loaded cluster where there isn't many right activities happening, it might never notice. The topology would recover. The 10 seconds would come by. I would say there's a new primary. The topology would update. Clients would never see it. But of course, in a heavily loaded cluster and why the hell would you be using MongoDB if you didn't have a heavy write load? The whole point of no SQL is to handle like thousands to millions to hundreds of millions of writes per second, right? Well, it's likely that before a monitor thread wakes up, the insert is going to hit the client and the insert is gonna hit the client and now it's unpredictable what will happen in the client. It might send the request but never receive a response. It might be halfway through sending the response. It might get a server close. We don't know, but all those errors will be wrapped up by the driver and sent back to the client as a connection failure exception. So you don't need to worry about sockets or timeouts or all of that crazy networking stuff. We do all that for you. All you need to know is at the point when you were making a write the write primary died in some way that interrupted your write. So what do you need to do in this situation? Well, you retry, right? And I'm gonna talk about how many times you need to retry in a couple of minutes but you're gonna retry. So what happens when you retry? Well, the primary is down and you retry the insert. Now the Mongo client now knows the primary is down. It's no longer a surprise to it. So it's going to hold your write in the same way it held write to start. It's gonna wait for that write. It's gonna put it in a queue and it's gonna essentially put out an all points bulletin. It's gonna wake up all the threads. It's gonna say, hey guys, hey girls, there's no primary in this cluster. That's a big deal. And it's gonna essentially, instead of waiting every 10 seconds, it'll call those threads every half a second until the primary returns. Meanwhile, the insert is waiting, right? So elections, like I said, happen pretty quickly between two to three seconds. So within a couple of seconds of putting out the all points bulletin having all of these threads calling his master, we're gonna elect a new primary. In this case, host two. Now host two, it's probably, if you've followed our guidelines, it's the same topology or it's the same server size and scope and bandwidth and network and memory as host one. So we shouldn't see a drop in performance. We're slightly less fault tolerant because now we're only replicating two nodes rather than three, but we've still got some replication. So now the insert will concede. Remember, the client will go, once I have a primary, I'm good to go. So now I can send the write straight into the server. And then, the third node will recover. We'll send somebody down to the data center to plug that network cable in. We'll replace the power supply. We'll install a new node on a different AWS region. We'll put the fire out in the data center. We'll take the cup of coffee out of the network connector and the system will recover. And now we're back in steady state. Know that in many cases we may not have the same primary and that's fine. The topology is comfortable with the primary moving around just in the same way that the replica set is. Now you can change that. You can have a priority, which essentially tells you, I always want if the node, if host one is available to have it become the primary, what happens there is you're going to get a second election. So the minute the host one comes back, you'll have a new election. And every time you have an election, you're likely to get a retry error from your code or a connect failure. So we don't really want to promote lots of elections. So if you're managing a cluster, what typically we say is, if you're doing maintenance on a cluster, say you were upgrading all of the cluster members to a different version of the operating system, take out each secondary first because secondaries will not interrupt the ability of the driver to write to the cluster and then take out the primary last and take it out at a time when your traffic is quietest. Now, we all know run services that are global in scope or that never turn off, but there's always a window when you've leased amount of traffic and that's the time to take your primary down. And the drivers will handle it, like I said, as long as you don't take the primary down for an hour. So what does this mean for you as programmers? Well, if you want to establish connectivity to a server, if you want to know, I have connectivity to a particular server. Calling Mongo Client is meaningless. Mongo Client will wait for 30 to 40 seconds before throwing an error because it's got those threads and they're waking up every 10 seconds and just saying, gee, I wonder if there's a server available. Getting a client object back doesn't give you any state information about the server. What you need to run is an IsMaster yourself explicitly. So you can run client.admin.command. Admin is a database that's always available, so you don't need to create a new database by accident. And IsMaster will return, and if it returns, then you've got a connection. If it throws an exception, then you know that you still don't have a server that's available. And IsMaster has another couple of properties that are interesting. One, if you've got authenticated servers that require using them in password, IsMaster does not need authentication to run. It's purely, effectively, a server ping, so you can very quickly run it on a server. It's also efficient, and of course, it gives you round-trip information to the driver, which the driver can use to establish what's the best server to use for a range of queries. So don't use Mongo client, use admin. What does it mean for queries? Well, find with recovery. Remember what happens when we have a failure? It's going to fail, and then it's going to queue your query until it re-establishes connectivity with the cluster. So MongoDB is not a panacea. MongoDB will manage MongoDB, and MongoDB will stay up for as long as you want, right? Clusters are enormously stable, but we can't manage your network, your IT team, the rogue individual who walks into your data center and kinks a fiber-channel cable, as it happened in Bank of Ireland a couple of years ago, or, you know, other external influences. Like, the most... I worked in a digital equipment corporation way back in the day. The number one thing that used to take out the data center, and this was a plan with 5,000 people, somebody with a JCB digging up the power cable, happened all the time. So, you know, those kind of outages, your driver is going to eventually time out and tell you things wrong. So if the replica set doesn't return, then it means something else is up. So if you get a second connection failure, you need to percolate that up into your application, because that replica set isn't coming back anytime soon, or any meaningful time frame that makes a difference to you as a developer. So the guidance for drivers is, we're not going to automatically resolve your issues. You must participate in your own salvation, but you only need to retry once. So these retry loops you see in lots of code, which is for I equal 1 to 5 retries, retry, don't do that. One retry is enough. If it doesn't work once, it's never going to work, and it's not going to work. So it's a very important program, so very important that you just do that one retry. What does it mean for inserts? Well, for inserts, we've got to be careful, because an insert might succeed, but the response might not come back. So we might send the BSUN object and the command to insert to the server, the primary gets it, then we lose the network, and now the response doesn't come back. So we want to essentially establish that we don't do a double insert. By explicitly adding the object ID to the document before we do the insert. This is something the client would do anyway, but if there's already a document ID, it doesn't try to override it. Now when we do the insert, either the insert will fail or will succeed and everything goes on as before, or it fails and we retry, and when we retry, if the insert had actually succeeded before, we'll get a duplicate key error. So remember ID is a unique index on it, so you can't add objects with more than one same ID. So the duplicate key error will essentially identify that you've already done the insert. Now you're essentially back good as gold and you can carry on as before. Now updates are trickier, right? So updates change a document in place. So, well, I mean, there are three ways to think of this, and two of them I always think are a little bit odd, but people are odd and we have different ways of programming. The first way is it doesn't matter if I undercount, right? It doesn't matter if I undercount. So if I call update and it doesn't succeed, so what? Carry on. Maybe we're just counting log entries and missing a couple isn't the end of the world, right? Or it doesn't matter if I overcount in which case I can retry. I always retry every increment. But if you can't overcount or undercount, you need to think about creating an item-potent protocol for the database. And that's beyond the scope of the driver to help you with. You need to think about essentially turning updates into a series of writes, and those writes may be duplicated and you can use the client to essentially eliminate duplicates or more often than not, if you're using an event sourcing model and I did a big talk on event sourcing, so maybe you can catch me again and catch that talk. If you're using an event sourcing model, if you turn all your updates into straight-up writes, you can essentially eliminate duplicate writes by using the same kind of ID model to essentially add a unique ID to each write and then look for unique IDs which are clashing. Or if you're using the ID field itself, it will send back a duplicate key error. And then you can accumulate and aggregate those values into the updates on the server-side or aggregation framework. But the whole point is there are some operations that just can't be designed from a driver's perspective to be recoverable. And that's just the nature of a distributed, clustered database system. And it's one of the trade-offs that we make. You know, my SQL and Postgres and Oracle will tell you, we've never had a failure, right? Until you do have a failure. And when you have a failure, it's catastrophic. You can't write to the database at all, right? And what we're trying to do is avoid that catastrophic failure. Two more kind of tuning fields for you. There are two parameters that you can put into the cluster. Connect timeout milliseconds is how long the monitoring thread will wait to establish a connection to a server when it's essentially pinging with its master. So this is something, again, we can't tell you. So if you're a high-frequency trader and you put a cage of servers in the New York Stock Exchange, right next to the New York Stock Exchange, well, connect timeout milliseconds might be two milliseconds. If you can't recover a cluster in two milliseconds, you'll lose billions of dollars of transactions, right? On the other hand, if we ever put, you know, I don't know whether we've got servers running MongoDB in an international space station, but you can be damn sure the latency on a satellite link to that space station is going to be slightly lower bandwidth than between ourselves and a data center and Amazon. In that case, you might want to extend the connect timeout milliseconds from greater than 30 seconds. So the other one is server timeout milliseconds. So connect timeout milliseconds is on the thread. Server timeout milliseconds is on the client. And we used to use the same one, connect timeout milliseconds. Server timeout milliseconds is something we added because you essentially want to understand how long am I willing to wait in that queue to complete this write. And so if you've got a system that you can afford to queue writes up, if you've got back pressure, so imagine your writes are coming in off a queue that's in a Kafka queue and that back pressure can percolate down the Kafka queue even all the way back to the clients, then you can afford to set this timeout to be very high. If on the other hand you've got IoT events coming in that you need to process in real time, you want that to set much lower, in which case you'll get an error back to the client if you get that timeout. And so these again are areas where you can tune your environment to control what happens in your application. So all of this is documented in a great document written by an engineer in the U.S. called Jesse Giroudavis. He did a talk on this at MongoDB a year ago, so I'm kind of channeling his talk. He has a set of links. I don't know whether you can read that up here. I'll put all these decks up on SlideShare afterwards, so you should be able to get it if you go to my account named Joe Drumgul or Jay Drumgul. The full monitoring inspect, server and discovery monitor inspect is on GitHub. Very easy to read and download, and we accept change requests on that because it's something you don't understand, do come and ask us. But the key thing is this has established a consistent model for how drivers interoperate with clusters, both sharded and replica sets, although I only talk about replica sets today, for all drivers. So all drivers in MongoDB now comply with this spec. So now, if you're writing code that more or less works in PyMongo, that same code pattern should work in Ruby and Java in the same way. It may work slightly differently in C because the semantics for single-threaded drivers are different from multi-threaded drivers, and we typically don't build multi-threaded clients for languages that don't have explicit threading baked into the language, otherwise we'd be pulling too many third-party libraries in. And so the cluster will manage itself, but the driver requires your help. So when you're writing MongoDB client code, don't forget that you're in the distributed network system. The seven fallacies of building distributed systems is as old as the hills, but it's still true. And so what we do is try and help you as much as possible, but you must participate in your own salvation. So you must engage and read this spec. You're not going to become an expert from this in my talk, but reading the spec will really help you to understand how topology is managed inside a driver cluster. That's pretty much it from me. Those of you who read the talk summation will have gone, what about inserts and queries and updates? How do they work? Well, when I started to do this talk, I hadn't really got my head inside the driver spec. I used to run presales from MongoDB, but they wouldn't let me write code in presales, so I moved to developer advocacy because now I can write code at least half the time, and I have to talk about writing code the other half. So all the operations in MongoDB on the client side, apart from topology management, which is the most complicated and interesting part, we're on the same model. Validate the parameters, and there's lots of parameter validation. Convert the data into Beeson. Now, Beeson, actually, there's an intermediate format called SON, which is essentially an ordered dictionary. So SON objects have the property of being able to convert themselves to and from Beeson. We take the SON object, and then essentially we look for a socket in the socket pool, and within that socket or that server, we ask for a server that's suitable for this operation. This is the server selection protocol I talked about. Again, there's another big spec also in GitHub, which I can't go into today. So the most simple server selection is I need a primary to do writes, but for reads I can use secondaries. Once I've found a server that fits the selection, I take the Beeson object, I inject it down the server socket. It's just a straightforward TCP IP socket. I wait for a response object, which comes back in Beeson. I unbundle that into a SON object, and then I unwrap that into essentially dits or arrays or whatever the client expects to see. The only subtly with queries is we don't actually return anything. When you do a query, we just return a cursor, which is client-side only, and when we query a cursor, it's only when you ask for a cursor element that the iterator goes and gets a list of the elements, and when we query a cursor, we get 100 elements at a time. So that's the MongoDB client, PyMongo client. I've got time for some questions if people want to ask questions. Otherwise, thank you very much for listening today. Question down the back there. Wait for the mic, otherwise I'll have to repeat it. I'm so lazy. Okay, the first question is that you mentioned that people often use Beeson serializing logic themselves without using MongoClient client, actually. So maybe it would be a good idea to just refactor module for serialization and deserialization outside MongoClient, because if I remember correctly, I wanted to serialization logic. Yeah, just Beeson, and I had to install a whole MongoClient package and other colleagues came to me asking about why do we need Mongo here because I just wanted to use Beeson. Yes, you made a good point, but it's a trade-off. We're here to support MongoDB developers primarily, and for them, it's easier if they just pull one package, which is a totality of what they need. If we were to separate that package out, then it makes their lives more difficult. So it's a trade-off. You have some pain because you just want to use Beeson, but the Pi Mongo developers have a much easier life because it's all bundled together. Yeah, I see. Could you also refer to the slide with logic for retry of insertion to database? So this logic, you insert first, then let's assume it fails. Then you insert again, but you said this can throw a duplicate key error, but you won't catch this error because the second insert isn't in tri-block. It is in acceptable block, so the second accept won't catch it if I'm right. My error. I will amend that code. Okay, so thanks. Yeah, Anna. Anna also works for MongoDB. She's allowed to make announcements. I just wanted to say really quickly, if any of you use MongoDB with Scientific Python or NumPy, you should come talk to me because we're developing a new package and I would love to get some feedback. One more question here. Thank you for the conference. I would like to know about the motor driver. What's the plan for porting it to Pi Mongo 3? Say it again. About the motor driver. The motor asking key error driver. Don't know about it. I'd have to ask the driver team. Okay. Come to me afterwards. If you give me the question, I will send an email to the driver team and find out what their plans are. Okay, thank you. Do you know about libraries like Mongo Engine and what people, what do you think about this? I mean, it makes it look like it's a normal SQL database and makes it look like all the same? I have a personal opinion. This is not a MongoDB opinion and all developer opinions are personal opinions, no matter what people say. Why would you use something that makes MongoDB look like an SQL engine? The whole purpose of using MongoDB is to make the things you put in the database very similar to the things you interact with in your programming language. Why would you obfuscate that layer with an SQL layer? I know what you're going to say, I already have SQL code that I want to write to alongside. If you have a compelling reason that you need to write SQL queries or write to a database, then absolutely use a layer. One of the things that we have in the commercial offering is a BI connector. It won't let you write to MongoDB, but you can definitely do SQL queries to MongoDB. We support the idea of querying MongoDB using SQL, and there's a range of... there's a mapping capability that essentially allows you to unwrap arrays and nested objects into new tables so that you get a tabular view of your nested data. But you can't write. You've got to write with a driver. So Mongo Engine may allow you to write, in which case, yeah. If you absolutely have to use SQL, then use the tools you need to use. Mongo Engine, I bet you underneath the covers is using the PyMongo driver. Just to complete about Mongo Engine, it could be useful to validate document before MongoDB 3. So, didn't talk about it today, but we have document validation. Yeah, since MongoDB 3.0, yeah. Yes. Before, there are no validation documents, so Mongo Engine could easily validate your document. Well, so I was at a conference recently, and somebody quoted an engineer from their organization. I think it was Gil. He said, you know, he formed a law called Andres Law, which is we can build it today, or we could wait six months and Amazon will build it for us. Well, you know, there's a little bit of that in MongoDB. You know, things that you build today, we're probably going to build tomorrow. So if there's a broad community of people who need a capability, we spend a lot of time talking to the community. So I can't make any future-looking statements on what we will or won't do, but surprise, surprise, you know, having some kind of SQL capability against the databases has been something that's been talked about for a long time, and we introduced that in the last release of MongoDB 3.2. We're going to continue to enhance that capability. So we rarely drop features. This is why IsMaster still runs, right? IsMaster was the original status call from the very original version of MongoDB. So one of the good things and one of the testaments to our CTO is he's obsessive about backwards compatibility. So we tend not to break stuff in the past in order to bring stuff forward. So lots of our customers are still running on MongoDB 2.2, shame on them, but they are. One down the back there. Yeah, I have a question about automatically recover of the cluster. You told us that when you have cluster of three nodes and primary fails, then afterwards new primaries are available. And I had a situation not far ago that I have a cluster, a five-nodes cluster, and when three of the nodes failed and one of them was primary, then I ended up with two secondaries who couldn't elect new primary, and this could be fixed only by either forced reconfiguration of the cluster by MongoDB or by waiting to at least one of the failed nodes recovers. Is this normal behavior? That is normal documented behavior. That's the way the cluster is meant to respond. So a cluster will only recover a primary if a majority of the nodes survive a failure and those majority of the nodes are connectable, which is to say they're all in the same side of, say, a network partition. So if you lose a majority of the nodes, the two nodes that are left will come up, but they'll come up in secondary mode only. So the driver will say, well, I can't write, but that select server spec says I can still query. So I can still write queries to the secondary nodes, but any write will fail. And that's what I mean about the catastrophic failure. A catastrophic failure is where I can't elect a primary because I don't have a majority of the nodes. Now, that catastrophic failure can be caused by failure to isolate your nodes or through a data center failure. It doesn't really matter. I mean, one of the things that I say is MongoDB nor any other distributed database is not a panacea, right? People ask for 99.99999 uptime because they don't know what that means, right? And then I say to them, do you know that your power provider for your national grid doesn't have that level of uptime? So what are you going to do when the national grid goes down? And they say something dumb like, but we have a generator. And I say, who's it for? Nobody has a network. The whole grid is down. Yes, you're running, but the rest of the world can't connect to you. So it's just, you know, we're responsible for ourselves. The rest of the world has to worry about the rest of the world. So we'll make sure the cluster recovers. If you set it up so that the nodes are independent of the single points of failure within your domain of control, then you shouldn't have three nodes failing at the same time. If that happens, then that means two things have happened. You've joined them to a single point of failure, like a single node or an Amazon region, or more often than not, you poisoned your own cluster. So you might have destroyed your cluster with a code error. So that happens all the time. Okay, thanks. Okay. No more questions? Go forth and write excellent driver code. Thank you very much.