 We are a production performance management company, but pretty much done with it. And my presentation today would be related to MongoDB performance tuning and load testing. So MongoDB is pretty much very well most of the people aware. It's one of the databases that kind of rise a couple years ago. And it's a rival very closely related to the arrival of the cloud and cheap commodity hardware. And with MongoDB, as well as other databases like Cassandra, CouchDB, or Hadoop, what we allow to do, we allow to do very fast scaling of the system. And also those databases providing us ability to not to use SQL. The user open notation, which is no SQL. In some of the cases, it's JSON or BSON, like in MongoDB. In some cases, it's ability to use MapReduce, which is kind of a way to distribute the jobs, make a big processing of data, combine them together. So MongoDB becomes, since then, since last couple years, become very actively used in a large enterprise environments. Companies like Shutterfly use it to store terabytes of data, petabytes of data, and petabytes of data. Companies like eBay becoming one of the largest users in the world of MongoDB, as well as Adobe, which I was personally involved in helping Adobe actually to scale MongoDB. It was a very challenging project, especially a couple years ago when the cloud was, AWS cloud was very basic and very unoptimal. So it was a challenge setting up EBS volumes, finding out that IO really not working, and finding out the proper ways to scale, especially when IO become very sensitive part of MongoDB. And it is still unfortunately very sensitive. So this is kind of a thing. So from the point of agenda, what I want to accomplish today is to provide some overall overview of what can be done with the performance tuning, what can be done with the load testing of MongoDB. And ladies and gentlemen, if I may ask, what kind of background do we have in MongoDB? It's like intermediate, advanced, so I would know the level of the audience. So again, the idea to show general things. And one of the important things is today, like installing MongoDB, it's basically a joke. If you have your own system, you go, you download VMware image. You're doing you install MongoDB, like setting up 10-gen repository. And in a couple minutes, you can have it running. Another couple minutes, you can set up replica and have all those things done. So it's a very simple process, but really the thought should come is how to scale MongoDB when it goes into production environment. And prior to the recent version, like 2.2, prior version, they had big issues related to the single right locking and all those things. And they're working on this, of course. Now they make a database level, and they're joking probably another five years, they maybe would make it to the normal level like a normal database. It's actually would be a low-level locking. But right now, the best we can get from them, it's a database-level locking. So we see that there is issues with the rise in the production. Also, another issue, and this is a little bit more over the scope of this presentation, but also there's lots of issues related to sharding because we have memory map files. You have basically fragmentation there if collection's not being managed properly. And then when you have a shard, you can have a really big issues when suddenly the balancers start doing crazy activity and brings the system down. So you have to be very careful about this. And this is why what's really important is to perform kind of decomposition of your production system. You really need to understand, of course, to begin with what business users trying to achieve where it would be the main right activity. What would be the frequency of operation? This is all given and important. But when you start deploying this into production, you really need to look into, okay, how do I configure my servers? How do I configure my IO? How do I configure my scaling? Do I need replication across collocations? Would I have a Venn-level issues basically going across different zones, for example, and all those things which comes very important in production implementation? And load testing, this is another area which is very important because at the end of the day, we can theorize and we can see it to work, it would not work. But at the end of the day, we have major activity. When we're trying to upgrade the system, we need to know the system would be stable under production load. When we're trying to change some parameter and see if it makes sense, it would improve performance, it all comes back to the load testing. And this is also very important, it's very important to create a sandbox environment in your place where you can actually test this and test this under the load, which would be very close to real production activity. This is absolutely must have. Again, going through the areas, we can deal with the operating system tuning layer. Not too much there, but still can be, some things can be tuned. Storage tuning, then database level tuning and load testing. And of course, there is a lot of things can be done on the storage tuning. There's a lot of things can be done on database tuning and load testing. Obviously, there's lots of approaches which we can explore. So, OST tuning, one area which is given again from the point of the manual, there is a U limits document in the MongoDB manual, you can go there. It will tell you specifically what U limits you want to set for the production implementation. In other areas, there is some production nodes document which also stakes. Like for example, they want you to turn off the time tracking on the file system. They don't want to use those huge pages. And one of the important things, if you have like multiple nodes in the system, like replicated or shard, they want you to use an NTP to synchronize timing between them. So this is kind of a given stuff. Another thing which is very helpful, and I'm not sure if we can see there, but this is a tuned utility. So this is a part of like a red hat, you can install it. It's just you install tuned. And what it does, it's an adaptive system for automatic system tuning. So I wouldn't propagate to use it in the production environment, because it may end up unpleasant surprise. Maybe if you're desperate, you can do it. But what I would definitely tell you to do is to go to your load testing environment. And while you're simulating production load, execute tuned. Tuned has multiple, I don't think again, unfortunately it's not showing here, but it has multiple profiles. Which you can say I want to tune for high throughput and I want to tune for minimum latency, all kind of things. So it has a default profiles which you can use. And then what it's nice thing, you run it and it kind of dynamically evaluates your workload and configures your system for optimal basically configuration under the load that you're putting on a system. So I'll definitely recommend using tuned. It's a good approach. Again, in the load testing environment, it's a great tool to use. And again, I can provide information. So if gentlemen you need any, if you need any help during your test, give me a call or write me an email. I would be glad to help you and please share what I know on this. Then the area of the storage tuning coming in. And storage tuning is the most critical in MongoDB. Unfortunately, MongoDB is still very sensitive towards right performance, your performance. It has journaling, it has its own right, it does f-things, paging becoming a big issue. So one of the things that is becoming kind of a common knowledge, like you need to go with the rate 10. So basically Stripe and Mirror. Mirror and Stripe. Then from the point of file systems, you can use EXT4 which is I recommend personally or you can go and do the XFS. And there is also another reason to use EXT4 because it's the next slide. It's actually they find out some really good performance optimization related to EXT4 which is external journaling on the file system level. So another thing which I can say is like, put a good effort to optimize your IO because you can delay sharding. Sharding can be very painful, it increases complexity, it decreases reliability of your system in general. So yeah, if you have huge system that you know by business requirement, you will require sharding because of the huge amount of IO you have to process. Like let's say you can process like 400 IOs per second versus you need to be able to do like a couple of thousands IOs per second. You may be diverted to sharding to begin with, but if you don't have to do it, just try to optimize the IO and not jump to sharding immediately. Another area which was used successfully at Shutterfly and at Shutterfly I just know the gentleman who worked together at eBay, it's Kenny Gorman. He one of the top no skill guys. So they tested flash cache, they got a very good results. So this is the one option that you can consider, combining SSD with the flash cache. And if you have to be on AWS, that absolutely I recommend to go to provision diops and make sure that you have enough capacity. The regular just standard diops wouldn't work really well in production load. Another very good study which get a very good result and actually share some view here. It's using XTFS plus external journal on SSD, which gives a really, really good IO results. Of course, you need to test it in your environment. They recommend to use journal as thing commit. The full study is located on Australian side, but if you can see that they're actually coming in and saying that if this is would be, if this would be a general like performance after we change those options put it on basically external SSD journal on XT4. It basically was 4x, like four times difference in IO performance. So it's a really good basically approach to tune IO without going like pure SSD because it's maybe expensive. So if you can just put this journal on SSD, this can provide a really good leverage on IO performance. So I definitely strongly recommend you to test it and validate it for your environment, but it gives very good results. So another study which I found worth it to share, it's a company like software layer acquired by IBM. They did their extensive study, the URL is here. But they find out some interesting things like obviously like two first points. They kind of self-obvious like when a working data set is smaller than available memory query performance would be better, obviously. The number of clients performing queries has an impact on query performance because more data is being cached given. But the other two studies is very important because what they find out that if you create a separate mount point for a journal, it gives you immediately good performance boost which is a good way to do. And then the best deployment in the terms of operations per second. They find out that if you go basically separate rate 10 SSD data mount and rate one on journal. So it gives you a good leverage on IO and I definitely encourage you to go and read the whole study. It has a great data. It has a great benchmarking information. It can be very helpful when you need to evaluate for performance. And again, I personally myself, I believe in the benchmarking and believe in the load testing because this is the only way to understand how the system would be performing in your own environment. The next way, like a general way, okay. So we need one of these things which is very important. You need to have a very strong handle on performance of your back end operating system. It's something that can give you a really good clue, something was happening wrong even before you're diving into the database performance. So general things like Linux utilities, make sure you run them. IO start, VM start, MP start, SAR, this is a standard test. To see that like a cached information like it's free dash tm, it's a good command to do this. From the point of availability of the system, just a general monitoring, use standard stuff like Nagios Cacti, they're available. I put couple links with the default plugins and profiles. Like I like personally Cacti because it collects data, enables historical analysis. And one of the recommendations that they definitely, maybe it's not very standard, but I am a big proponent of the Oracle Oath Watcher. Oracle Oath Watcher, it's a free utility, which if you Oracle shop, or have Oracle, or have Oracle DBA friend, you can download the stuff. I don't think there's any license associated with it or something you have to pay for it. Oath Watcher, it's automatic collector, which collects automatically data and archives it. It collects data across operating system, IO, kernel level information, like a kernel parameter changes, top processes, processes, wealth of the amazingly good information. It's already automated, you just do as W start or as W stop. I've seen large enterprises implementing Oath Watcher as a standard across all their systems. I strongly recommend you to use it. It would save you lots of time. They also come with a graphical utility that enables you to charge this data. So again, there's like, I put also a not number if you get access to Oracle support site. This is the Oracle not number which you can use and download this and again install it. So again, strongly recommend it, yes? Usually performance implications with an area of like one, two, three percent of the CPU usage. Of course, if your system under the tremendous load, you may want to change the frequency of the collections. Like default frequency, I think it's like 10 seconds or something like this. But even if there is a big performance overload, I would rather collect the data and find out what was went wrong and fix the problem and make sure it's not happening again versus let it go and then have a random kind of live in the scare that it happened again and don't know again. So yes, absolutely, 24 by 7 collect as a company process. Make sure that it's constantly collected and as soon as we have a problem, it's a flat files, it's a human readable files. You can analyze them. There is a GUI which you can set up the time frame and what you want to generate. And again, I don't want to be a tooting moment horn. We have a software that provides very good analysis for this but it's out of the scope of this presentation. But also watcher definitely strongly recommend to use it. It would remove lots of headaches from you. And the problem with a system like Nagios and Cacti, they're very good and everything is great. They would provide it with a nice chart and notification. As soon as you have a performance spying, the system is CPU bound, IO bound, those tools they just stop working. And you really need to rely on something that was created by people who fighting those problems day after day. So the next thing I just want to do some general overview like a database tuning. So there's lots of parameters in NoSQL and some of them like again something like obvious knowledge like you can, you have to, it's a default disable pre-allocate, basically. You want to pre-allocate space because otherwise when you start in production mode, you don't have pre-allocation if you degrade performance of your system. Another thing which you can look into, it's a journal commit interval. And the default is 100 milliseconds. And again, it's all game between the reliability versus performance. If you monitor the stuff, if you see you constantly, your journal is IO being constantly overloaded, you may even consider, depending of course on the nature of your system, you may consider to extending it to like 300, so it's a topic you can do. But this would decrease the IO load. And again, if your system is in critical situation, it would enable you to survive. If you really care about reliability, then decrease it of course. The minimum is two milliseconds. There is also a couple other options which I wouldn't recommend, but it's there. Like one option for example, you can disable BSON checking. Not a great thing because yeah, it would improve your performance, no question about it, but then if you have a problem, suddenly the driver starts saying you corrupted BSON, you have an issue. And to troubleshoot it, there is another utility which is called MongoSneef, but you don't want to be there. But again, if it's something that critical, you always have an option to do this, but it's a high risk. Another option which you may consider if you had a really tuned environment, you can disable table scans in your environment. So there's another parameter which I also cannot recommend because if you have table scan statement, it would just not fly. So it's a good way if you know and if you have a very good architectural standards and development standard that you know that every collection that being deployed, you have appropriate indexes and everything has to be indexed and yeah, you can put it there. And you say, you know what guys, we have like SLA with our agreement. There should no basically full table scans in our environment, full collection scans in our environment. You deploy it there, it would save you from probably very unpleasant situations going forward, but it may create unpleasant situations and suddenly someone just deployed something and now we need to find out, now we need to put a profiler and by the way, we'll talk, we have a separate page for profiler which is absolutely must have a new environment, but again, you can create a good performance improvement and eliminate unpleasant surprise by disabling table scans. At the same time, you can create unpleasant surprise when having this. In other things you can consider, there is some services which you can potentially disable like HTTP interface if you don't use it, like arrest interface if you don't use it. There is also like a server side JavaScript. If you execute, it's fine. If you don't need it, you can disable it. So it's all again, if you're running production system, if you know you don't use it, then don't. Don't just waste performance. So what database tools MongoDB provides? So the one of the tools that probably everyone using, it's MongoStat. The funny history is again, I'm familiar with the Kenny Gorman who created this tool. Back at eBay, we had a tool which we called the FreeCon, which was doing exactly like this for Oracle database. It's kind of sampling every couple of seconds, giving these lines. So Kenny, a Shutterfly, he got challenged with the performance issues, he wrote it and then he submitted it to MongoDB. The nice thing about MongoStat that it gives you real time performance across important areas. And also if you use an option, dash, dash, discover, it would go across your like chart, replication, basically nodes and you can see it across multiple systems. It provides you with a very good information which you should capture again, my recommendation. Don't just run it when you have a problem. Just run it 24 by seven, log this information. Whenever something bad happens, you want to be in position to go and look into this with your eyes and find out what was going wrong. The information which is important there, for example, like flashes, faults, like flashes basically if you are thinking all the time it's a problem, faults like a page faults, you have a locked database. It's a change basically because right now the locking get moved to the database level. So it's very nice. So you see what percentage of the time is spent on the databases. You have an index misses like when index was not utilized or like a queue of reads or queue of writes. So this information is very important. I strongly recommend to use Mongo Start again, not just a monitoring thing when you have a problem, let's run Mongo Start. No, it's running, it's always there. I have a problem, I look at the log file, I don't look at the Mongo Start. Another utility which is also helpful, not as Mongo Start but it's actually have a good detail on collection level and database level. It actually look at the time spent like a response time spent and you can run it with per collection of your time. So how much reads, how much writes total time spent and you would see it on collection level. And if you do like dash dash lock option on MongoTOP you can look at the level of the database and time spent and locking. So again, don't just look at this when you have a problem, just log it. And have it forever and it can be very helpful. Again, this is what I'm doing. If you run it and if you see suddenly it consumes lots of resources, maybe it's not a good idea for your site. Again, I don't know your production environment but as a general recommendation, definitely make sure to log this information. And I know personally because I'm DBA, professional DBA of my life, the worst situation was my manager when my manager asking me what's happened and I don't know, let's wait again and log this time. It's not a good way to talk basically. It's always better. You know what, we're collecting this data. If you want us to stop it, we'll stop it, fine. Like if you as a manager want me to stop performance collectors, I'll do it. But it has to be your order, not me, not collecting data. Another area which I found very important is ability to actually analyze data on the network protocol level. So you have two tools here. There is a MongoSNF which is not provided by default. You have to actually source the MongoDB and build it but after you build it, you can use it. So MongoSNF is very important because what it does, it sits on the protocol and it shows you what request, what DSON requests. May I have your attention, please? May I have your attention, please? A fire has been reported in the building. Okay, should we run? Okay. So again, it's database network traffic analysis. The light safety system. The possible arm, do not evacuate. So going back to the presentation, the network traffic analysis. So we have two ways to do this. First is MongoSNF which you have to build yourself against source it. There is instructions, I can share it if you need it but I think they're publicly available. So source the code of the Mongo, there is a scone of MongoSNF, it will build it for you. The good thing, you can run it on production system and it would do some slight performance impact because it would actually analyze the traffic but it's nothing that you usually can really run it on a really highly loaded production system. The nice thing about it is that you can actually capture invalid BSON requests, which is very important. So if you have some issue and by some reason like a driver having a problem or you have some corrupted BSONs and now you really need to understand where they're coming from, you can just run MongoSNF and this is the option dash dash OBG check option. And it would capture those requests which is very important. Another area which I really like in general, for example, we're doing some heavy traffic and we replicate every distributed systems doing all this cross zone activity so we want to really understand what's happening. I recommend strongly to use Wireshark. So Wireshark you can literally capture if you can ask your network administrator if you're basically not cloud environment, you can ask to capture the traffic from the subnet, from the database subnet. So you would have activity of all servers, all clients, everything together. And then first of all, Wireshark, they fully support MongoDB protocol so you can see the request, you can just type MongoDB and it will filter the requests. They don't go down to the BSON level so you have to do it yourself but the main thing, it has lots of analytics inside in Wireshark. You can see the break time of the response times, you can see what requests are doing, you can see the chain of requests. So lots of performance analysis on the Wireshark level, they, especially if suddenly have lots of latency, you need to investigate this latency in the distributed environment, it's extremely helpful. So again, I strongly recommend it's open source, doesn't cost anything so I strongly recommend to use Wireshark. Next thing in the database tuning, this is one thing which is extremely important, it's fragmentation. So what's really happening in MongoDB? MongoDB is using like a memory mapped files and it creates a working set and it has to be in memory and when it's becoming fragmented then basically it takes much more memory, it becomes much less efficient. It also can cause, again, so out of the scope of what I'm covering here but it also can cause lots of sharding issues. It was like big outages happened because of the fragmentation and then improper balancing of the shards. So there is a way to do it, there is like of course, there is a way to defragment on the database level but you can't do it in production. This is why you have a command which you can do compact collection and you can allocate some padding basically for compacting. This is very important and this is one of my strong recommendations. You need to automate your compaction of the collections. There is a link I provided, there is a guy who actually brought a program that you can just use or you can look at his program and write your own but I strongly recommend 24 by seven make sure that you compacting. Otherwise you may have performance issues so it's again some of the recommendations I can definitely share here. It goes for collection level, you have different options, you can provide set of collections, you want to go after them. It would perform but I don't think it would perform like a full scale, like a database level log or something. So it's much lighter and definitely would operate in production. So check this blog, it's really good, it's a recent like from the March of this year. So the guy did a nice work, I look at the code, the code looks solid so I'll definitely recommend to use it. Sorry? No, no, this is a whole nice idea about the script and the approach that you can do it in production. And again, the fragmentation is a big thing so if we can decrease it, it's definitely would benefit performance immediately. I'm sorry? You can look at your basically because the memory mapping so you can look at your like a free activity and if you grow with the time and suddenly you see that your work set is growing then you're getting fragmentation. So in other area, it's obviously a profiler and the things that I strongly recommend and again, it's probably common knowledge like run it with the level one which would capture the slow requests. It has to be 24 by seven again, don't turn it off. You can set as a second parameter there, it's a slow threshold so make it like whatever you consider slow, 100 milliseconds, 200 milliseconds, one second. So put it there. It would return very good output. There is a document which gives exactly profiler information, but the good things again, you can see as when you're running a profiler, you know basically how many documents were scanned, how many were returns. If obviously you're returning one document but scanning 100,000, you have a problem and you need to basically add a proper index, especially if you see this happening all the time and it's the same collection. There is like things if documents were removed, so there's lots of tons of very good statistics. The presentation that Kenny prepared is like relink for presentation. It's a profiler deep dive. It's a good presentation. What he did, he basically get a new aggregation framework and he just plugged it in into the profiler so it can break down the response time, all kind of massaging of the profiler data. So it's definitely something worth looking into and profiler again, you should always run it. Don't do it at level two because it would just log everything so maybe if you just wanna capture some QA workload, at level one is definitely strongly recommended. Next area is indexes. So the thing is with the indexes that developers don't like indexes and they don't appreciate them because they don't care about them. They write code, the code's supposed to return the data. The indexes is falling on DBAs. So there is a way of course to do like explain and when you're just running whatever the statement or whatever you're running on the system, you can always add explain and you can see exactly specifically what would be the scans and if you're scanning too much you can understand if maybe you need a better index, maybe you need a combined index and so on. There is a really good tool which I recommend, it's called DEX or DEX. This tool, what it does, it plugs in into a profiler. It scans all bad queries and it automatically suggests you the proper indexes. So I strongly recommend you to look into DEX and at least when you have a load testing environment or pre-production environment, even in production, if you have some environment which can be just used by ad hoc or some users who can deploy code, hope it's not happening but sometimes it does, you can use DEX and you can capture and get a good indexes automatically. I again recommend not just run it because you like to run it, automate. My kind of main idea is how you survive as a DBA always automate things. So if you automate this, if you can get daily email or alerts that hey, we just find out that this query executed, this request executed so many times and it was really bad and this is the index that we should create, it would make life easier. Then standard UI monitoring. So again, it's common knowledge, there is a MongoDB monitoring service, it's just a chart, you can plug in your database, you can set up your alerts. It's a good software as a service approach. My major complaint, it's very simple, it gives you like a one database, what if I have a cluster, what if I have a shard, what if I have big systems? I want to analyze across them, especially if I'm an enterprise setup and if I have shared storage area network, I really want to be able to be in position to do cross system analysis. So it's a very, very simple thing, so again, it doesn't cost, I think at all or something like this, but it can be used. There is a whole page on MongoDB, there is a bunch of open source software which you can use, I mentioned some of them. By some reasons they don't mention Cacti, which I personally like Cacti and there is a plugin for Cacti exists, so I'll definitely recommend to look into this. Again, the last one, it's tooting my own horn, we basically had a product which deals with MongoDB and does this cross system, cross whatever analysis. The next thing, it's replication. So the major problem is the replication if you have a replication lagging. There's a lot of issues which can cause lagging, it can be like very weak secondary system, just putting on a bad IO, putting on a underpowered node. Another issue is the ride bursting. When you have like a suddenly big amount of rides coming in, you obviously would face some replication lag, building in the indexes, somebody blocking like a secondary for backup, secondary being offline, so those things. How you can measure the lag is to commence, you look at the slave replication info and you look at the replication info and you see pretty much where the lag is, you can calculate it, you can get time difference. On MMS, if you go to the software as a service like monitoring by Tengen, you have a replication lag chart and you can set up alerts. So one of the things that's definitely important, you don't want to be late and especially if you're primary going down, you have a problem. So the slide related to load testing. So first of all, why do we want to put load testing and I put a couple of things, for example, and even for production, not even for application sake. Like we want to validate the upgrades. If we want to move from 2.0 to 2.2, we want to obviously understand if it would be stable under our load. If we're deploying some fixes, same idea. If we're planning for the hardware changes, like we want to change StoryChair network, we want to move to the different node server, we want to maybe consider moving to the cloud or changing the cloud, we need to load test because unless we load test it under real production load, it's a problem and we're just risking the issues happening when we're deploying it into production. Another area is testing the multiples of production load. Okay, great, we're running everything stable. What if tomorrow my company runs marketing promotion and I have like a three X of the load? Can I extend this load or not? It's also a big question. And as much as we can apply capacity theories or the queuing theories or the linear regression, whatever the approach is, it would give you some theoretical numbers. You would never be able to know if suddenly we have a huge basically locking issue because it's happened because some kernel parameters were not set properly. So you never know in reality unless you test it. So what options do we have for the load testing? Not too much options at this point. So first of all, you have a really good product which was written which is called Mongo-perf and make it different. It's not like a Mongo-perf utility of the Mongo. It's a Mongo-perf. So it's kind of a benchmarking tool. I strongly recommend to use it. It's open source, you can build it. It creates kind of a semi TPCC load of this type. You can regulate it, you can change it. But again, strongly recommend to use it. This is as an open source, this is probably your best bet to create a heavy load. There is a four IOLA level testing. There's two utilities you can have. One is Mongo-perf, which is internal utility. It creates kind of a very heavy IOLA level load on the system so you can validate if you can process specific number of IOPS. And usually they require a configuration file and all those things you don't have to do. You can just echo the parameters, pipe it to Mongo-perf and it would take it. The second utility which I strongly recommend to use is IOZone. This is a really good IOLA benchmarking system with a great UI. You can test all kind of level of IO load. You can simulate to your patterns of the load. So check the website. And one of the studies which I showed earlier, that demonstrated why basically XT4 plus SSD level is actually better than a regular approach. It's actually they used IOZone to do this testing. Another option which I can give you a couple of ideas. It's you have a pickup files. Those are the files that you can analyze with a wire shark or you can do with a Mongo sniff. You can capture this traffic and then you can create your own basically tool. And again, if you know how to program, it should not be a very heavy work or if you have some friendly developer who wants to get the protocol offsets and just parse it out and do it. You can create a tool which would do the slow testing which I'm saying production, authentic load testing. So there is a protocol URL which specifically talking about MongoDB wire protocol. So protocol is very simple. Literally like there's some type of operations. You have like a BSON, you can parse BSON, convert it to JSON, whatever you'd like to do. So you can create a custom tool and I strongly again recommend to investigate this approach. If you get a friendly developer, if you want to do it yourself again, this tool would go through the pickup file. It would understand what kind of sockets are used, what are the load basically simulated and then you just have a replay. You generate the normal requests like in Java or in Python, whatever you'd like that can replay it and simulate the production load. And you don't have to capture like a one day of load. It's enough that you know for example through your activities that you have like a one hour of the peak load. So take this one hour of the peak load, use it for your load testing or what you can also do, you can take this one hour of the load, you can split it into chunks that says 30 minutes and then replace them in parallel because your environment is probably homogeneous. It doesn't depend on each other. So it would actually simulate very authentic double load or triple load, whatever you'd like. Again, we currently in the pre-releases of the load testing tool for MongoDB, which does exactly that. So we have it for other databases for Oracle SQL Server and a bunch of others and we adding MongoDB, but approach is very simple, literally. Like our developers created this like parser in a week or something and he was not like a super genius. So just to summarize what I discussed. So it's important to understand the business and environment requirements. So we start with this, understand what business really expects and ask them like a crazy questions. Are you guys planning to like, what's your business goal? Like what would you do in a year? Like I understand right now you want to do this. Do you have projections? Do you have any promotions coming in? Because it's important not to just based on what business is thinking, but also have this vision. Because when system is starting to be loaded, suddenly we have like 20 times more business visitors they're doing so much more transactions and you need to do so many other things. They come to you and think why didn't you scale the system? Why didn't you basically provided the proper capacity? And it's like, yeah, yeah guys, but when we talk, this is what you told me. And still again, they're pointing fingers to us, the DBA. So it's important to say, okay, what's your business vision? What's your goal? Where are you going? It's very important. This is how you would architect your production system. Again, when you're designing a system, always look towards, okay, so I want to be scalable. I want to make sure that this thing is going down. System would still be available. I want to make sure that they have like DR side, all this basic stuff, but put it in front of the business because they, in charge of the budget, they would decide how much money to spend on it. But if you do not propose it, they would come later to you and tell you why didn't you talk about this? It's so important, why didn't you mention this? And unless you put in front of them, say guys, okay guys, it was your decision not to invest into DR with all due respect. It was your decision not to buy SSD storage. It was your decision not to go with provisioned IOPS and you wanted me to run it on EBS volumes with a default, fine. But you need to put it in front of the business people, make sure that they're buying on the architecture and explain it to them. And one of the things which is very important is implement your monitoring and performance analysis across the layers. Don't, like I mentioned, when you have a performance issue, don't just go say, hey, show me like a server status. It's a cryptic, it has tons of data. Capture this data, run MongoTOP 24 by seven, run Oracle OSWATCHER 24 by seven, run MongoTOP 24 by seven, capture all this data. Whenever something bad happens, you should have a very strong hold on what's happening in the environment. And the last point is, again, put your effort to implement production-like load testing. The protocol structure is open source. It should be a baby task for a normal developer. I know again there is no commercial product at the moment, but you can create it. You can basically get people and they can do it for you. And after you have it, you have a huge leverage because now we can take production traffic. You can simulate the load and you can say, no, guys, you're planning this promotion with a double load system would not holding it. Because I run it, this is a performance metric, this is overload, it's not working. Sorry, we need to bump up the system before you're going basically to X. So this is kind of a summary of my presentation and any questions I would be glad to address. Thank you.