 You can run NAP using our APIs. We take care of all of the backend stuff, push notifications, databases, indexing, schema, user management, file storage, data modeling, you name it. We host like around 100,000 apps right now, and we run a lot of MongoDB. We run MongoDB in ways that nobody else in the world does. So we tend to run into a lot of edge cases, and that's a fancy way of saying bugs. So we've gotten pretty good at dealing with a lot of things operationally that like if you just have a regular web stack, you can usually address a lot of those problems just by changing your code or your queries. So today I'm going to talk about managing a maturing MongoDB ecosystem. So this is not really about what happens when you're first learning Mongo, like how to choose your hardware and how many nodes do you need and that sort of thing. I want to talk more about what happens next as your demands for stability and availability go up, and at the same time as you're starting to run into interesting challenges. We run parsed completely on AWS. So some of what I have to say will be cloud specific, but not all of it. Let's see, I like rainbows. There will be some of those, too. A few other things I'm going to talk about, resources, like what kinds of resources to provision, how to set up your replica sets for high availability, which kind of instant flavors and just backing to choose how to do backups and restores, how to tune your file system and your block devices to get the best performance out of them. I'm going to talk about how to provision them, how to treat your infrastructure with code. I see a guy in an ops code t-shirt, so nice. How to provision from snapshot or initial sync and like what the various performance tradeoffs are there, and that leads into some strategies for dealing with fragmentation as your data grows. And lastly, I'm going to touch briefly on some really fun stuff, which is how to track down and kill bad queries and how to deal with certain types of failures and outages, degraded performance, that sort of thing. I assume you all are running Mongo. How many of you are on 2.4? 2.2? 2.0? And some of you using Chef? Puppet? Okay, cool. And are any of you using the Mongo sharding? Brave, brave soul. And I will leave some time for questions at the end, and also I'm going to touch on some tools and scripts and blog posts. Don't worry. There's going to be a glossary at the end, and like the slides are on the web and everything. So anyway, let's start with replica sets. This is the basic building block of high availability MongoDB, and you should always use replica sets. Technically, you can run Mongo in like a master slave configuration, but I really don't know why you ever would. One of the best things about MongoDB is that you can elect a primary, and if you give that up, it's kind of like, you know, what's the point? I personally am willing to trade a lot of, like, breaking in new features, new databases, you know, all this new shit, and just in exchange for being able to elect a new master. It's amazing. It's so much better than having to fail over by hand. So replica sets. The most important things to remember about replica sets are you need to have at least three votes. More votes are better than fewer, because you have to have at least 51% of your votes, like, to elect a master. You can have a maximum number of seven votes, and you should always have an odd number of votes, obviously, and if you're running on AWS, you should always distribute those votes as evenly as possible across as many availability zones as possible. So, you know, if you have a three-node replica set, use three availability zones, and if you need to fill in votes in, like, random mass places to fulfill these rules, you can always use arbiters, which we'll talk more about. So here's your basic replica set. Here's three nodes, so we have an odd number of votes. Excellent. Each of the nodes is in a different availability zone, so this replica set is resilient to any single node going down, or any availability zone going down, or EBS going down in any availability zone. Any of these things are cool. If any of these nodes or AZs go down, the other two nodes will form a quorum, and vote in primary, like magic, may not even wake you up in the middle of the night. Here's another example of a replica set. This one has two nodes and an arbiter. This is why you should never run Mongo in the old, like, deprecated master slave way. It's just as easy and just as cheap to run a two-node replica set with an arbiter for your third node, and then you still have high availability if something goes down, or if something goes wrong with your primary. So, what are arbiters? Arbiters are just MongoD processes that do nothing but vote. They don't have any data. All they do is vote. And because of this, they're very stable. They're very lightweight. They use hardly any resources, and they are rock solid, which is something you cannot always say for Mongo. Money is a good reason to use an arbiter instead of a third node, because we use, like, M2.4XLs for our Mongo nodes, which have, like, 68 gigs of RAM. For our mediums, we literally toss a dozen arbiters on a single M1.medium. So they take, like, no resources, and they really help a lot. Also, if you have, like, four nodes, you have two options. You can take a vote away from one of your data nodes, or you can add an arbiter. But if you're just operating on the principle that more nodes are always, more votes are always better and more robust, it's better to add the arbiter. So, let's say that you have a cluster that's two nodes in an arbiter. Why would you even want to spend money on a three node replica set? Well, you have to run backups. You have to run backups somehow. And running backups generally takes a lot of resources. You really can't expect to use your backup node for serving any production traffic, even just read queries. They will just time out if you're taking a snapshot for, like, the first 30% of the EBS snapshot, and they'll just pile up. And if you actually try to serve real production traffic off of a node that is snapshotting, your site will go down for, like, the entire first 30% of the snapshot. If you're an EC2, you probably want to use EBS snapshots to back up your Mongo. So, let's desendate one node of our replica set as a snapshot node. You don't want your snapshot node to ever become primary or your site goes down, so you need to set your priority to zero. And you don't want them to serve any queries, even read queries, so you need to set it to be hidden. And if you have enough votes that you can form a quorum without your snapshot node, it's not a bad idea to take that vote away from your snapshot node. Because if you are trying to elect a new primary as you're taking a snapshot, your nodes can sometimes form an inconsistent view of the cluster because sometimes it'll respond with a vote to some nodes, and sometimes it won't respond within the right amount of time to other nodes. And so they can sit there, I once had it take five minutes for me to elect a new primary because one of the nodes was trying to vote with snapshotting, and I was sweating bullets the entire time. I didn't know if it was ever going to actually elect anything. So, anyway, here now you have one node that does nothing but snapshots and maybe various utility functions like running compactions or whatever. And you still have high availability with your other two nodes. So we already talked about some of this stuff, but yeah. Be sure and lock Mongo and sync the file system or stop Mongo completely before you snapshot, priority zero, hidden one, possibly continuous compaction. And really you should snapshot really often. We snapshot every two hours. EDS snapshot actually does a differential backup. So your first snapshot may take a long time, but after that they'll be pretty fast. They are faster the more often you do them. Amazon, actually, this is kind of cool. They only charge you on the differential blocks. So storing a lot of snapshots can get expensive, but it's not as expensive to do them more often. So you can use EBS snapshots if you're using an EBS back volume. There's really no reason, in my opinion, to use anything else. If you're using ephemeral storage, you can set up LVM and use LVM snapshots. Or some people use MongoDump. They MongoDump to disk and then upload to S3. Or there's this new service from Tengen that's MongoDB backups as a service that looks kind of cool. I haven't used it yet, but I've heard good things. It is still in beta, so buy everywhere. Also it uses Mongo. I think that's funny. So now we've talked about replica sets and snapshots. So let's talk some options when it comes to resource types. You need to choose an EC2 instance, and you need to choose some sort of volumes for your desks. So the question people usually ask is, what instance type should I choose? And the answer to that question is, whatever instance type has enough memory that your working set will fit into it. And the working set in Mongo terms is the data and indexes that are actively being used at any given time. Reading from disk is really slow, so you want as much of that working set as possible to fit into RAM. The obvious follow-up question here is, how do I estimate the size of my working set? Well, that's a harder question. In 2.4, Mongo will actually try to estimate the size of your working set for you, which is sweet. It doesn't show up in your default server status output for backwards compatibility reasons, but you can use this run command, and it will, like, guesstimate for you how big it thinks your working set size is. If you're not running 2.4, and to some extent, even if you are running 2.4, it's kind of more of an art than a science. If your performance is getting sluggish and you're seeing a lot of page faults, your working set is probably too large and you probably need more RAM. You should always keep an eye on your paging statistics over time, because they will let you know when you're getting, like, close to a danger zone. If you don't have enough RAM for your working set, you have two options. You can get more RAM or shard your data. And obviously, the only real scaling eventually is horizontal scaling, but sometimes you can just throw money at the problem, and that is awesome. So RAM, super important. Obviously, it's not the only thing that matters. Unless your data completely fits into memory, you still have to hit the disk for right. So talk about your options there. If you're on EC2, you have a few choices. There's the elastic block store, EBS, ephemeral storage, dedicated SSDs, or EBS with provisioned IOPS. If you're not an EC2, this is a really easy question. Just get SSDs. You do not want to be dealing with, like, spindles and spinning disks. So I said EBS was an option. It's really a lie. This is why. This is our end and latency on classic EBS. You can see it goes up to 2.5 seconds. This is the same workload with provisioned IOPS. So really, your options are SSD, provisioned IOPS, or ephemeral. The SSD flavor is the high 1.4 XL, and it is a beast. It's got eight virtual cores, 60 gigs of RAM, and a couple of SSDs. They estimate 120,000 IOPS for random reads per second, and 85,000 IOPS for random writes per second. It's also really expensive. It costs about twice as much as the M2.4 XL, which has, like, slightly more RAM, which is, like I said, your most important thing. So, like, it's actually really hard to push this much write load at Mongo unless you have a very specific type of write load. For us, like, our working set basically fits into 60 gigs of RAM. We don't hit the disk enough to justify that many IOPS. I provision 3,000 to 4,000 IOPS per shard, and we almost never hit that. So for most people, I think the best choice is provisioned IOPS. With provisioned IOPS, you can provision up to, oh, that's a lie now. It's up to 4,000 IOPS per volume, up to one gig per volume. And then, obviously, you can rate all those together. The variability is guaranteed to be less than 0.1%. So it's also interesting to note there are no performance guarantees with the SSDs, with the ephemeral storage SSDs. There are performance guarantees for provisioned IOPS. And if you need more disk space or IOPS, you just rate them all together. Amazon doesn't say publicly what kind of disks back their provisioned IOPS. But if you benchmark them, they smell like SSDs. So how do you know how many provisioned IOPS to provision? Well, this is the best way I've found. You just use SAR-D1, print it out every second. You look at the TPS column, and provision about two or three times as much as that. And if you're wondering what does it look like if you run out of provisioned IOPS, it's bad. You don't want to do that. Basically, your disk just kind of stopped for a minute. So give yourself some headroom. If you have a really spiky workload, maybe multiply by three or four. Ephemeral storage is also a totally respectable option, as long as you understand what you're dealing with. You never have to worry about network latency or EBS issues, and it's basically free. It's not quite as fast, and you can't use EBS snapshots to back up. And if you stop the node for any reason, your data goes away. So you really do have to treat them as disposable. So now that you have your disks, let's talk file systems. I think you should use EXT4. Technically, they say use XFS or EXT4, but goddamn, I haven't been bitten by so many XFS bugs. I cannot, in good conscience, tell anyone to ever use XFS. EXT4 is great. You should raise your file descriptor limits. You absolutely will hit those limits, and probably at the worst possible time. You don't want to have to stop Mongo and fix them and start it back up in the middle of your biggest spike ever, so just fix them preemptively. If you're on Ubuntu, you may actually need to use Upstart to get this to work. For some reason, the Sysbian script just don't actually apply, you limit changes. So even if you put it in your init scripts, you should always verify that they actually got applied by doing cat proc-pid-limit. That one bit me. Also, think about how many connections you're going to need to be opening up. The connection limit is actually based off of the file descriptor limit. It's set to, this is funny, it's hard coded, it's set to 80% of your soft view limit with a hard cap at 20,000 connections. You cannot have more than 20,000 connections. We will only actually reach the connection limits when we hit some bugs in the Mongo Ruby drivers, but that happens. You can also put the journal on a separate physical disk. This can help if you have a really heavy write load, or if you don't have SSDs, that can make a big difference. And mount varlib MongoDB with no A time and no Dure A time just because every little bit of work you can remove from the file system will give you slightly better database performance. Bugs, yeah. And to be fair to the XFS guys, it's been a while. My scars are deep, but they're like in the rear view mirror. There's nothing recent, they're probably all fixed. Block dev, your default block dev settings are wrong. This is some really deep voodoo magic, and again, it's kind of more of an art than a science. On Ubuntu, the default read ahead is 2048, which is huge. You probably want it to be somewhere between, if it's 32, if your block dev read ahead is 32, it's almost certainly too small. If it's larger than 512, it's probably too large. The issue here is that if your block dev read ahead is set too large, every time you read something from disk you're gonna be paging in a lot of empty space. The number one sign that your block dev settings are wrong is actually that you're not fully utilizing all of your memory on the box. Like if you're running nothing but Mongo, obviously, it should be taking as much memory as possible. At one point we were only using like 12 gigs of our 68 gig RAM, and that's because our block dev settings were set way too big. So we had all this blank space, we were paging into memory and it was just a waste. Another awesome thing about 2.4 is it will warn you every time you launch the Mongo shell if it thinks your block dev settings are obviously wrong. So that's kind of cool. And if you're using our Chef cookbook there's a resource attribute that you can set per cluster that will set your block dev read ahead correctly every time Mongo starts up. So that makes it easy. Speaking of Chef, let's talk a little bit about not what you're provisioning but how you're provisioning them and some of the performance implications of bringing up your nodes in different ways. Your infrastructure is code. You really don't want to do this by hand for any longer than you have to. There are a lot of tools available of varying levels of sophistication. I think that the best one that's publicly available is the Chef community cookbook that we work on at Parse. I do have some friends who are using Puppet to manage Mongo but they've had to write a lot more stuff themselves. One of the things that the Chef community is really good at is community cookbooks where not everyone's having to reinvent the wheel and rewrite basic functionality. So I like Chef but I'm not religious. Using any of these things is much better than using nothing. I haven't used CloudFormation but it's like some people are using it. It's kind of cool. And a lot of people, surprisingly to me, also seem to manage their Mongo deployments with homegrown scripts. And I think the very best one in that category is the Mongo Lab tools, Mongo CTL. It lets you do things like configure a cluster, do restarts and like rolling upgrades, stuff like that. Some highlights, I'm not gonna go into it super extensively here but I actually have a whole presentation from ChefConf this year if you're interested in more about how to use the cookbook. It'll like assemble and provision EBS RAID for you, configure PI-UPS, multiple clusters. It'll, there are recipes that will do the backups for you. You know, it'll perform a simultaneous snapshot on say you have three rated EBS volumes. It'll tag them and then when you next provision a node for that cluster, it'll restore from those snapshots automatically it's pretty sweet. So basically, whatever, however you do this, there are two ways to provision new Mongo stuff. There's from initial sync, you know, you just bring it up and you join it to the cluster and it starts slurping down all the data or from snapshots. Both ways have their pluses and minuses. Provisioning from snapshot is really fast and easy. It takes less than five minutes to provision a new node with a chef and knife. And importantly, for us, and it doesn't matter at all for other people, it will not reset your padding factors. The padding factor is basically how much, whenever Mongo like moves, whenever you grow a record, it'll move it around on disk and every time it moves the record around a disk, it increments the padding factor so it'll allocate slightly more space each time it does that. For us, and when you do an initial sync, it will reset all of those padding factors to one. And for us, performance just eats shit for two months. So if you provision from snapshot, it will not reset your padding factors so you can expect the exact same performance characteristics which is good. Some important caveats about snapshot restores. If you wanna bring up a new node and make it the primary, don't do it until you've pulled down all of the blocks from S3. Because EBS snapshot, the reason it's so fast, it's like magic, right? Well, that's because the blocks actually still live on S3 until you pull them down and it pulls them down very lazily. So whenever I restore a new node from snapshot, I usually just go into the data directory and do a DD on all the data files just to pull them down, write them out to DevNol. And you should warm it up too. We have some pretty cool scripts actually for you can run it on, you need to warm up both your indexes and your data, your working set. You need to get your working set into RAM. So the way we do it is we have a script that runs in the primary for say an hour and samples the DB current ops every quarter second and then sorts those collections by most frequently accessed and outputs a list. And then you feed that list of the most frequently accessed collections and indexes. You feed that into a script that you run on your fresh secondary and it just like loads them all into memory by doing a, I think it does a natural sort or something like that. If you're running at least 2.2, there's also the built-in MongoTouch command that will read a collection and its indexes into memory. So that works too. If you're provisioning with initial sync, see some people prefer to do this because it compacts and repairs, it rebuilds all of your indexes. It's like doing a DB repair database, basically. So it can cure you of some crazy states you can get into. It may take hours, it may take days, depending on how much data you have. You may not even be able to do it if you have too much data. One of our clusters takes almost a week to do an initial sync and catch back up. Another thing you should be aware of is it will kill your primary. It will be very bad for your primary if you just bring up a new node doing an initial sync off the primary. If you're using at least 2.2, you can force it to sync from a secondary into set of the, because it basically does like a full scan, right, of all of your data and your primary just kind of can't handle it. If you're running at least 2.2, you can sync from a secondary by, so you start MongoDB, you log into the MongoShell and you type rs.sync from and the name and port of the secondary and then you button-match that shit until it actually connects. Because you have to get it in there before it's actually started syncing, otherwise it won't work. If you're using a version before 2.2, you can technically force it to do this by using IP tables to block your new secondary's view of the primary and it will connect to the replica set and start syncing from a secondary and then you just restore its ability to talk to the primary later. It's kind of janky, I don't recommend it. Just upgrade. So why are we even talking about this? The various ways to bring up new secondaries. Well, a big part of this question is fragmentation because your data will age and become fragmented as it ages. This is how much fragmentation sucks. This is a cluster of ours that was running for six months and you can see exactly where we rotated in a compacted version. These are the latency numbers. So we always thought, we thought for a long time that we don't do a lot of deletes. Fragmentation probably not a big issue for us. But it turns out deletes are not the only source of fragmentation. Records that grow are also a huge source of fragmentation because they have to get moved around on disk a lot and every time they move they allocate more space and the padding factor gets incremented slightly. You can also look for like the string moved or updated in the MongoDB log to get a sense for how often your documents are getting moved around. And if fragmentation only affected disk space it wouldn't be that big of a deal but it also fragments your RAM so you can fit less actual data into memory. Again, you don't wanna have a lot of empty space in memory. So three ways to fix it. You can do the initial sync from scratch which resets your padding factors really hard in your primary unless you sync from secondary. You can do like, you can take a secondary offline and repair it which does basically the same thing. Like reset your padding factors, could take a week, may not be able to catch back up. Or you can do what like the solution that we have found is we just run continuous compaction of all collections on our snapshot node. So whenever we restore a new node our snapshots are always freshly compacted and we just don't really have to worry about it. It's a little harder but we wrote a script that does it. I'll have a link to it at the end. So, let's see. So, last section. If you run Mongo long enough you're gonna encounter some fun failure scenarios. There's things that are your fault, things that are Mongo's fault, things that are your AWS or your provider's fault. And this is where I should warn you. Tengen requested that I make it clear to people that we do run into more edge cases than lots of other people do. So, disclaimer. Let's see. Ad queries. Everybody's gonna have bad queries. This applies to everybody. So, there are really three excellent resources for tracking down bad queries. You have dv.currentop, current, whatever ops we're running right now. The MongoDB log and the profiling collection. If things are really bad right this minute and you're trying to figure out what's going on the best place to look is at the output of dv.currentop. Random things to check. I mean check the queue size. You should really monitor your queue size and page yourself if the queue size is above 1,000 because it will enter a death spiral territory quicker than you think. You can sort by num yields to see which queries are running a long time and yielding a lot. Or the lock type. Obviously queries that grab the global right lock have a much greater ability to affect your performance across the board than the queries that grab the per database lock. You can sort by seconds running, whatever. Sometimes the bad query will be one that is running for a long time and yielding a lot. Sometimes some of the worst queries will be ones that have not been running for that long but they've never yielded. If something is not yielding, all the other query traffic on that host are just gonna grind to a halt. Another interesting thing. So one kind of wacky thing about dv.currentop is it doesn't always print out all of the information about an op. Sometimes it doesn't print the namespace. Sometimes it doesn't print the output of the query and they've explained it to me two or three times why this is. It has to do with the point in the query execution that something something is aware of. I don't really understand. But something you can do in most of the drivers is add comments to your queries that have anything like, say, the entire query. And then you can search for that shit. Anyway, once you have the query that you think is a problem, you can run explain on it, get a fill for what's wrong, like maybe you're missing an index, scanning to me documents, returning to me results, all those things will be slow. Then there's the MongoDB log. And I love the MongoDB log. All you need is like set a knock and a little bit of patience and you can find pretty much any performance problem just like staring you in the face. Unfortunately, the Mongo logs aren't documented very well so it can be hard to tell what you're looking at sometime. By default, it will print out any query that takes more than 100 milliseconds to complete. It'll tell you things like how many documents it had to scan to return a result, how many results are returned. So if you had a high end scanned, maybe you need a new indexing scheme. It also prints a lot of information about the connections that are established and detached. And you can use that connection identifier to see which host a query is coming from. I find that when an incident is underway, I use db.currentoptimus. And retrospectively, when you're kind of going back and trying to retrace what happened and what went wrong and everything, that's when MongoDB log is really invaluable. And also you don't have to wait for an emergency. Sometimes I like to just go and sort every line and the MongoDB log ends with MS just to see what the slowest queries were so that I can look at rewriting them or re-indexing or something. There is also the system profile collection. I am way more comfortable with Shell than with JavaScript. So I don't use this as much, but it technically is better. You can just query all of the slow queries that have executed in the, it's a capped collection so it rolls over. Profiling does not persist between restarts. You may want to add it to your Chef cookbook if you want to just enable profiling everywhere every time it runs. So that's the basics of like tracking down performance issues. And let's talk about a couple of failure scenarios. Like, how many of you have a Mongo kill script? Anyone? Oh, God, you gotta have a kill script. Mongo will death spiral. You do not want it to be running with a queue at any, if it's healthy, you should know what your tipping point looks like. You should know what your normal running point looks like for us, anywhere from zero to like briefly a couple hundred queries in the queue is normal. And we page ourselves if it's over a thousand for more than five minutes because it tends to go like this. I mean, it's a traffic jam problem, right? The more queries you have in the queue, the slower everything gets and Mongo just can't keep up and it will never recover. So what we used to do was like restart. Terrible, terrible, terrible idea because like it's gonna fail over to a cold secondary. So performance is gonna be worse. It may take a while to come back up. Oh, and if you, the slaving threads, when Mongo gets under heavy load, the slaving threads will lag. So it may actually like enter a rollback state and that can take a totally indeterminate amount of time to like roll back the op log to try and figure out which data to keep and which not. So worst thing to do is to like restart Mongo. You should have a kill script. And the reason you should prep it ahead of time is because there are some things you don't want to kill. And it's really stressful to try and figure this out when something's broken. For example, you really shouldn't kill writes of any sort. Just don't kill anything that is an insert or has a write lock. Only kill things that have the read lock. Never kill your slaving threads. You can permanently fuck up your secondaries by killing your like op log tailors. Any internal MongoDB operations just don't. Only kill your client queries. Maybe you already have, oh, index build, don't kill those. Not most you have to. Yeah, killing write ops in general can kill your secondaries occasionally in like unrecoverable ways. That's just not good. They can also insert little, they might not always kill your secondaries right away. But I'm not even gonna go into it. But they can like, you can find your secondaries dying for like weeks down the line due to certain Mongo bugs. Most of which I think have been fixed in 2.4, but I haven't done 2.4 long enough to absolutely guarantee that. So, kill your read ops. If you already know what your slow queries are and you know what to kill, like just write that ahead of time so you can just feel like boom. If you can't elect a primary, this is a scary situation to be in. It's happened to me a few times. Obviously never run with an even number of votes. Remember you need more than 50% of votes to elect a primary. If your secondaries are in a crash loop, something I have had to do is destroy their volumes and restore from an older backup just so that the node would come up and like start trying to catch up and it takes a bit of time. So in the meantime, it can provide a vote and I can actually do things in the primary. I like to set my priority levels explicitly because I like to know exactly which node is gonna get reelected if something happens to the current primary so I can keep it warmed up. I can't recommend enough adding arbiters. We had a number of bugs that caused our secondaries to all crash at once. This happened a few times and we were eventually just like, fuck it, we're not even gonna rely on our secondaries to vote. They still vote, but we have enough votes in each cluster with two arbiters and a primary because we have two arbiters, primary, two secondaries. So even if all the secondaries die, we can still elect a master so like we still keep serving traffic while we figure out what to do about the fact that we have no secondaries. And if all those fails, if you're not electing a primary, tail your Mongo log and look for RS sync and see if something is vetoing. A kind of confusing and annoying thing is that nodes can veto even if you've taken away their votes. So you may actually have to like stop the MongoDB process if something has an inconsistent view of the cluster state. And remember like even under normal circumstances, it can take a minute to elect a new primary. So keep calm. Yeah, like I said, there are some Mongo bugs that will cause secondaries to crash. Be very careful that you're not killing OpLog tailors or any internal database operations. There are some bugs where an invalid op can make its way into the OpLog even though it didn't get run on the primary. And there is currently no way to skip an Op in the OpLog. This is like my number one most requested. You know, I mean, you can do it in my SQL. And yeah, data consistency, whatever. But it would be really nice if you just feel like I know that Op is bad. Please skip it. Not yet. Yeah, so there is no way to skip an Op in the OpLog. So the only way to fix an OpLog bug is to resnap shut off the primary, which will take your site down, but that's all you can really do. There are also some situations where the replication will stop. Although, I will say that they've mostly replaced these situations with just, like instead of stopping replication, now they just kill MongoD. I guess it's better. We have had so many post-mortems though where we were like, maybe we can fix this, but just like commenting out this one assert and rebuilding MongoD ourselves. The correct way to fix any of these like replication style bugs is to resnap shut off the primary and rebuild all your secondaries. You can dangerously fix the problem sometime if you're willing to slightly mess with your consistency. And the way to do that is stop the secondary. Take it out of the replica set, which is just like commenting out that line in MongoD.com. Bring it back up, do whatever you need to do to repair the collection and then bring it back up inside the replica set. By or beware, you're really not supposed to do this kind of thing, but if it's that or an eight hour downtime, sometimes I will do this thing. And I think that's about it. Here are the, on that cheery note, here's the glossary with the chef cookbooks for AWS and Mongo, the chef presentation, the Mongo Lab, Mongo CTL, CloudFormation templates, our warm-up scripts, and continuous contraction scripts. So we have about a dozen Mongo clusters now. Big in terms of like disk, each, yeah. So dozen clusters, three to five nodes per cluster, about two terabytes provision for each cluster, not all of them are full. We use them in, so we have six that are for our main, like our user data, and we have some others that we use for logging and analytics, like sort of real-time traffic analysis, yeah. We opted not to do Mongo sharding. We do in-app sharding. We opted not to do Mongo sharding because, A, it really wasn't robust enough, and it kept trashing all of our config server data whenever we tried to perform an election of any sort, so that was terrifying. We figured out later, it's because we have too many collections. Long story. So we do it all in our app layers, basically, though. I'm not actually going to say stay away from sharding. I'm gonna say, if you have any edge cases in your data, like you use a lot of collection, or a lot of databases, or anything that's off the beaten path that they don't test for, don't use sharding. Three people on our ops team. I'm the only one who really does Mongo, and one of the other back-end engineers also does a lot of Mongo, I guess. Yeah, yeah, sure. I'm not really a DBA. I just, seriously, I've only been using Mongo for a little over a year, so we just run into a lot of crazy stuff. I think that's it.