 Many of you here don't use a database. You just store everything on the file system then. UFS for the win. Is that a FF? OK. OK. So I'm going to guess most of you probably use some form of database or data store to store your data. These slides are released under the Creative Commons license. So you can feel free to do whatever you want with them. They will be online. So you don't have to take notes. And I believe they are also recording a video. So database, data store, et cetera. I say database a lot because I've been doing databases for nearly 20 years. I don't say data store as much, but data store is definitely becoming a more common term to be used. If you see database trends lately, there are time series databases out there. Quite a lot of them are cropping up influx, DB. Prometheus is not a time series database, but it has a time series data store. And timescale DB, they're just tacked on to Postgres. Anybody heard of any of these databases before? Use time series databases. OK. So data generally never call themselves a database. Elastic is another good example. They're a data store, not a database. So it's a terminology difference. But for me, I've been saying database for decades, and I'm going to continue saying database today. If you also come from a commercial database background, you may have heard the term sizing guide before. So if you come from Oracle, SQL Server, they always have these ideas of sizing guides. And in fact, if you speak to people trying to migrate from a commercial database to an open source one like MySQL, the typical thing they'll always ask you eventually is, do you have a sizing guide? It's not a very common thing in the database world, but sizing guides are part of capacity planning. This talk is, of course, going to be heavily MySQL focused because I've obviously spent about 15 years of my life working on the software itself. So that is my expertise. And I should probably not talk about things that are not my expertise. I don't know how many of you have heard of Presto, but Presto is not exactly a database, but it's a distributed query engine that also will allow you to not only query across, say, Cassandra, Hive, MySQL, but also proprietary data stores, examples like Vertica. And Facebook is actually the people who created Presto, and they tell you about how they query tons of internal data stores on a daily basis, and they generate queries over petabytes of data. And it turns out, not only do Facebook use Presto as a querying tool across multiple data stores, and today we live in a world where you have multiple data stores. Polyglot data usage is quite common, I would say, but people like Airbnb use this as well. Dropbox, Grab, they all use Presto. So for Mimi, at this very moment, I'm actually done with the fact that I'm going to talk about data stores. As I said, I'm going to say database a lot, but you also find companies like Kubol, who basically help you capacity plan and scale your Presto instances as well, because you may need to scale your query engine, depending on how much data you're querying as well. So I don't know how many of you like to eat crab or can eat crab. Can you eat crab? Yes, okay. I'm not sure, but these are like snow crabs that you find in Japan. And this is typical of a buffet that you'll have during snow crab season. And why do you need a capacity plan? Well, it turns out even hotel buffets need capacity planning, because if they offer you a buffet and they run out of crab legs, you'd be pretty unhappy, and they'd probably ask for it, and you'd probably ask for a refund. The reality is it costs money to not be available when it comes to the world we live in. So you can sort of guess via statement filings when companies get acquired, or via quarterly investor updates, how these large companies make money, and how important it is for them to be available all the time. So Ticketmaster, a live nation, they had acquired Ticketmaster in, I think, 2011, and they made a 10K filing, which suggested that Ticketmaster was at that point in time, 2011, might I add, selling 269 tickets per second. Now, ticket prices vary, obviously, so it could be for M2M concert or... Or, I don't know, what's popular these days? Savage Gardens, no. Am I showing my age? Nickelback, no. Anyway. And you'd pay varying prices, right? It would be a valid presumption that you would pay more for a U2 constant ticket than you would pay for a M2M concert ticket, or Coldplay concert ticket, I think they're famous now. Anyhow, music's not my strong point, really. Booking.com, which is also from the price line group, files quarterly investor updates, and from there, you can also gather how much revenue they make, and you can see that based on average price of a hotel room, you can guess how many transactions per second they're making. So every second you're down is actually money that is lost, because if you go to Booking.com and try to book a hotel room, and you can't book it for whatever reason, being the application tier being down, the database tier being down, or the payments tier being down, you will just go to an alternative, like Expedia.com or something. You'll go to a competitor. And that's loss in revenue for the company for that moment, you'll see blips in loss. And this is again similar for ticketing, because typically a ticketing company is not the only unique company offering tickets for said event. So all good stories start with a dream. And I read Harry Potter in 1997, and at one stage I thought I might have a Harry Potter scarf. So did Voldemort himself try to kill Harry Potter as an infant because he was trying to solve a capacity problem, because there could only be so many powerful lords around. Capacity planning, it turns out, is also a human science. Organizations love to read guides about maximizing the value of your limited people resources. So just like people resources, machine resources are limited. So in my abstract, I talked about Harry Potter a lot. It turns out it's because I myself wanted to go see this theater show. Harry Potter and the Cursed Child in London, they completely sold out via ATG tickets. I think they sold out some 250,000 tickets, and they sold out for more than a year in advance. That's how popular this theater show is. Now selling tickets happens probably once every year, and it usually sells out in the first 12 hours. And people wait for 12 hours in virtual queues, and then you may purchase a ticket, or you may get rejected because someone in the queue before you didn't buy the average of two tickets, they bought an average of four tickets, so they could maybe resell it. So the queue definitely helped capacity planning. It probably also was there to annoy bots because a very common problem of ticket sales websites is that bots tend to buy tickets and they want to scalp them later. But if you were in this queue for 12 hours, was waiting for 12 hours a very good experience for you, you may have started sitting down in the office at 9 a.m. starting this queue, and you didn't leave the office until 9 p.m., not because you're working hard, because you actually were trying to buy tickets in another window, of course. Should people have been rejected earlier? Also, how do you know how many people are going to buy tickets? How do you balance between two ticket sellers? Because a ticket is a limited resource, right? It is basically a lock. In this auditorium, there are only so many seats. If there's more than one seller of said tickets, you gotta make sure that neither seller sold the same seat. Apple kind of figured this one out. How many of you buy stuff from an Apple online store? Is there no Apple online store here? There must be, no. Anyhow, if you go to the Apple online store, you are basically, when they announce it, say, a new iPhone, they tell you you're limited to buying two iPhones per credit card. Now this, of course, doesn't limit you because you could have more than one credit card, presumably, and you could make more than one order. However, that sort of ceiling limit will mean that you make multiple orders. Now, this doesn't mean Apple has solved capacity planning as a problem because if you try to buy AirPods, they're still basically a six-wheat weight. So I promised M2M, but this is Melanie C, a spice girl, and this is an example of why content should be creatively commenced, licensed. When you're looking for pictures, you can go back on Wikipedia and find your own pictures. So, a lot of capacity planning is about revenue management. You cannot sell more than you actually have. I take the term revenue management straight out of what the hospitality industry uses. Hotel rooms and airline seats are actually perishable commodities. Why do I talk about revenue management, hotel rooms, and airline seats a lot? Because I spend a lot of my time on airline seats and in hotel rooms, and they are perishable. You want to sell as much as possible naturally, but you also wanna drive prices up in the supply-demand fashion. A hotel in a good city at good capacity is possibly used around 96% of the time. Now, there are examples where hotels in Seoul, South Korea, have been at 96% capacity throughout most of the year, which then fuels other investors to want to build more hotel rooms because obviously, if there's a demand at 96%, you build more. But if you come from a place like Kuala Lumpur, where there are multiple hotels, the economy has been sort of going up, but it's sort of stagnant. It's run by maybe kleptocrats. And the room capacity, especially for five-star hotels, reduced to about 60% unless they events. This is not a very good use of perishable commodity because you can't use that other 40%. If you don't sell for that night, it's gone. That's lost money. That's terrible. Nobody likes losing money. So when it comes to basically things like a seat map, in this case, like a theater, you have seats that maybe cost more if you sit in the front. If you sit at the sides, they cost less. If you sit all the way at the back there, they cost less. And if you sit up there, they cost more as well. And planes, if you sit in first class, it costs probably eight times as more as your economy class ticket. And if you sit in business class, it's four times as more maybe. But generally speaking, airlines like to sell out. Theaters love to sell out. It's again similar with rooms. You can pick types. So suites, king bed, queen bed, view, no view. You start dipping down further when you're looking for revenue, but also you think about quantity. So a lot of the work around this capacity planning is based from operations research. And it's applied to capacity planning, which is why you see events like the root con take off. You see events like even the O'Reilly Velocity conference take off. And of course, you have to think about the fact that things like a seat map, for example, I'm not being able to sell more than what you have, is actually a lock. Because a lock is not the kind of resource you get more of. And you also have to think about lock synchronization. So that's a little example down there of how an airline might try to encourage you to book now. They'll tell you there are only two seats left on this flight at that price. And if you miss out on those two seats, you will pay more. Or you may not get to fly on that flight at all because they may not exist any longer. So in the interest of creating everything by myself, here's some scratchy drawings. So high availability and capacity planning have worked fairly well in tandem. I'm only going to focus on the left quadrant at the top, basically, that bit there. And that's because we think about having good performance. We think about having good scalability. And we always think about throughput. We measure throughput either in transactions per second, as well as queries per second. In our world, in the database capacity planning world, QPS is a huge number to measure. But we also think about things from a latency standpoint, because a response time which is measured in person's house is extremely important. It has been shown from a front end standpoint, every second you wait for a website to load is quicker for people to drop off from said website. So basically, throughput and latency are very important measurements that you have to think about when you come to capacity planning. Uptime. How many of you think you want five nines uptime? How many of you think you have five nines uptime? Okay, that's good. I'm speaking to a realistic crowd. How many of you think you have four nines uptime? Three nines uptime? The rest of you think you have no downtime? You know, there are some services that actually sell you that there is no such thing as downtime on our platform, but the reality is that's not true. Uptime is basically levels of availability, which you basically calculate as max downtime per year. Everybody thinks they'd like to have five nines uptime, which is only down for five minutes per year, but generally speaking, sites don't have such high uptime. Typical uptimes very rarely get past a fraction of three nines actually. It's typically around 99.6%. So you do have quite a bit of downtime, and when you measure at scale, it doesn't really matter if one machine goes out of the rack because you have other machines that do failover, generally speaking. So all of this plays a role in capacity planning as well. You can start capacity planning now because all you have to do to think about capacity planning is starting to collect metrics at this very moment. How many of you here in this room collect metrics now? Okay, that's good. So metrics more or less are standard of measurement, and for the rest of you that don't collect metrics, it's okay, just start now because you can figure out what to do with them later. The important thing to start now, space is cheap. You also definitely need a baseline, and we're gonna talk a bit more about baseline probably in the next slide, and the baseline is very relevant to you. It is your baseline. My baseline, one service's baseline will definitely differ from another one, but it also will depend on your traffic patterns. Nobody's two traffic patterns are the same, and you can start now by taking metrics loading them into things like Cacti, Munion, Graphite, StatsD, Icingi even. So that was a good talk yesterday by Bernd about monitoring overall. So baseline, how well is your current infrastructure actually working? You want to definitely look for things like specific metrics such as response times, consumable capacity, peak-driven processing. What is peak-driven processing, generally speaking? It's workload experience by applications resources during its peak usage. Of course, I'm gonna focus on the database, but when you think about capacity planning, you need to think about not just your database, but your web servers, your load balancers, and so on. You obviously need to know what your QPS is. Your queries per second, very important. You need to know what your QPS is going to be before degradation of performance, and of course, even at a degraded stage, when does it really affect the user experience? When does the user start seeing error establishing a connection to database server? That's what you never want, because even if it's slow sometimes, it's not so bad, but completely not available, it's terrible. Then you need to think about what more you may need in the future, in the near future to maintain acceptable performance. What is the load that will cause failure? What is that load that will actually send you an alert? When you add capacity or remove capacity, what do you expect to gain out of adding capacity? Like, if I add one machine, what do I expect to gain out of it? Now, this is a very interesting concept when you start using Amazon's cloud services. How many people here use Amazon's cloud RDS services, or EC2, and so forth? Okay, so you're very familiar with the fact that they are instance types, and every maybe six to nine months, maybe even a year, there is a new instance type that may show up. Sometimes they improve, and if you upgrade an instance, you can save money based on maybe removing two old instances sometimes. So it depends again on your load, but you need to know what you expect from that. And when do you actually spin up new resources, or size for new orders, because probably some of you still run data centers, and actually that's not a odd thing to do, right? Data centers are still very important for people. Your ultimate goal lies between not buying enough hardware and wasting your money on too much hardware. It's a fine balance, and that's what capacity planning is all about. Then you need to think about, obviously, managing the resources, deploying the resources, but this is a crowd at Root Conf, and all of you probably know very well how to automate deployments. Then you need to iterate, because capacity planning is not a one-off thing. This is something that you do constantly as your service scales, as you plan to increase more users as the requirements scale and so forth. In the MySQL world, you tend to look at the operating system, things like VM stat, that stat, DF for size, of disk usage, PSIO stat, uptime. Some of this actually, obviously, would vary if you use BSD, and you can also grab stuff from PROC in Linux, but you want to CAD PROC load average, and then call ARC to print $1, or do you want to use uptime to get that kind of data? But there's also MySQL, where you can show status, show table status, show process list. You can query the information schema. If you have anything above MySQL 5.5 and greater, you can query the information schema. You can also query the performance schema in 5.5 and greater. If you're using MariaDB server, you need to turn on the performance schema because it's off by default, but the performance schema is extremely useful way for you to gain metrics. How many people here use performance schema? Not enough, clearly. The slow query log, another great resource for figuring out what is available, what is slow, what could be running faster. MyTop and InnoTop, this is not easily scriptable for you to take out data, but it is like Top, which will allow you to see information of your server at this very moment. Generally, people script metrics gathering. You can use stored procedures inside of MySQL, but I don't recommend that. Of course, the other thing is, don't forget metrics that in the server will reset when you restart the server, so keep that in mind as well. Also remember, gathering metrics isn't free. So you want to gather things like audit log data. If you want to gather things like metrics, they actually cost system performance. So you have a system impact and you need to know and measure your system impact. There are obviously specific watch points for a database. As I said, QPS is extremely important, so these are your CRUD operations, select insert, update, deletes. The open connections. So in MySQL, you may set an open connection limit to say 300, for example. You want to always know what your open connections are and you're not reaching the limit, because if you are, you need to increase said connections. And you can't just increase connections because every connection actually consumes memory, right? And the memory consumption, when each connection, one connection creates a thread, and the memory consumption is anywhere between 26 kilobytes to three megabytes. So if you're running in an instance with just one megabyte of RAM, for example, you can't just increase your connections and expect not to hit swap. And you do need to run with swap in the Linux kernel nowadays, even if you don't want to use swap, because the reality is you used to have VM swappiness to zero, your database server will get killed. You also want to care about lag time between masters and slaves. This is extremely important. This is actually what has killed multiple services that you may have been familiar with in the past. Friendster, anybody here use Friendster before? Before Facebook was famous? Nobody use Friendster? Orkut, Orkut came after Friendster as well. Friendster was at least 12 years ago. High five, yes, they're dead too, but I don't know why they're dead. Friendster at least, we know why they're dead because your lag time between masters and slaves were like 16 hours. So you could friend someone now, and then you would check back to see if the person accepted you, right? Or you accepted the person and you go, hmm, I thought I just accepted the person. Let me try that again. Let me try that again. If I leave a comment, because we're humans, right? We like to see, we like instant gratification. I thought I left the comment, why is it gone? 16 hour lag, easy for Facebook to come in and kill it. Also, things like cash hit rates. Database performance also often depends on your schemas and queries as opposed to just the speed of your hardware. So you can actually change the outcomes of your database performance by getting a good DBA in. A good DBA who works with a developer or who can tell the developer, look, I want to optimize these computationally expensive queries out and you can actually gain improvements. Also, if you use a load balancer, you want to, if you are going to look at specific watch points, if you want to load a machine to see what the capacity of said machine is, you want to weight that machine much heavily, much, much, much heavier than anything else. So this will give you the ceiling of said instance. And also, if you're caching, you may not only be caching in MySQL, you may most likely nowadays be caching in things like MAMCacheD, Redis, or even using a proxy layer in front of you. The MySQL world is generally comprised of having a master with multiple slaves behind of it. So think of it as a single master, two or more slaves. When it comes to sharding, you can think of this as partitioning your data across multiple nodes. So a simple example is if you are a social network that allows you to upload photos, you want to keep this set of users in one shard, and that set of users in one shard. And then occasionally, you may need to rebalance the shard because she has decided to upload 20 gigabytes of photos, whereas the average capacity for people is they upload 200 megabytes of photos. So then you rebalance the shard such that that she may now join this shard. Wasn't picking on you, just an example. So sharding is not just a shard, just partitioning the data, but repartitioning the data without the user ever noticing that they're basically being resharded. Because when you're moving the user from one shard to another, that user can't do rights. You'd have to actually basically either run flash tables with read lock for that user, or you would be doing something like a start transaction with consistent snapshot. But sharding alone isn't enough. In terms of capacity planning, you also need to think about splitting your reads and writes. And thankfully, there are multiple tools available for this scenario. Proxy SQL is an excellent tool that handles not only sharding, but also the ability to split reads and writes. We'll talk more about Proxy SQL later. Vitesse. Vitesse doesn't sell itself as a tool that will help you with being a proxy. Vitesse is more than a proxy. It is more than a sharding tool. Vitesse is what powers YouTube today. When you go to youtube.com, you are being served via Vitesse, actually. Vitesse speaks to my SQL protocol. It opens up one thread per connection, but it uses lightweight Beeson. So every connection is only 32 kilobytes in size. And Vitesse will help you with sharding. Vitesse will help you with resharding. Vitesse can also plug in to monitoring tools for metrics, for capacity planning, so that you know when it's time to grow your Vitesse shards as well. Vitesse, in terms of setting it up, will require you to use a cluster manager, like EtsyD, a zookeeper. Or you can hand your credit card over to Google Cloud Platform. Vitesse.io and try it out should be fairly cheap and easy to get started. JetPants by Tumblr is another sharding framework that will allow you to basically split shards and also handle resharding and also handle shards that go missing because of downtime and rebalancing those shards. Another tool that's open source called tungsten replicators. Actually, all these tools are open source and friendly and licensed very friendly. So GPL v3 all the way down to Apache licensing. So many, many options for you when it comes to sharding. And sharding is a baseline when it comes to you doing capacity planning. You will have to shard the moment you think you're going to grow beyond that one server. So this is not for your simple WordPress weblog, but this is for pretty much everything else that you're going to run in production. What are your bottlenecks? Are they reads or are they writes? Because the treatment for a bottleneck that is a read could be adding a slave. But the treatment for a bottleneck that is right may be thinking about having a load balancer that actually distributes writes in various places and then having the application not read until the entire cluster or database tier has similar source of truth. Are you having high CPU problems? This is not too common in the database world. IO, IO is a much bigger problem in the database world. Are you seeing lags in the replicas but the queries don't show up in the slow query log? This could be a network issue. Are you seeing locking so you can actually do things like show in a DB status and look for the locks? And are you seeing a lot of locking? Is this something that you can optimize because you're now maybe a pot rose and it actually locks a lot when you do concurrent inserts? And there's also context-based metrics. Query performance and app side performance basically perceive performance. Perceive performance is about being, you know, perception is more or less reality, right? And perceive performance is what your application sees as the response time to queries. So there are more tools to do heavy, you know, context-heavy tracking. So these are similar tools for that. You can also, of course, automate a lot of this by shipping your logs to a central location. A lot of people nowadays ship their MySQL logs to something like Elastic and then look at it, but you don't have to necessarily do that. You can use something called PT Query Digest which will analyze queries from the logs, the process list, so show process list, as well as TCP dump traffic. This is fairly useful. Now another, but this is completely command line-based. If you want to analyze flow query logs to identify problematic queries and then figure out ways to solve, get your DBA to solve them, Box has this thing called Animometer. Also fully open source, builds on top of PT Query Digest, relatively useful for you to use. But there are also commercial tools that provide context-based metrics. And I'm not averse to using commercial tools. Now, I think many of you here use cloud services. Though I don't like recommending them. They only are recommended unless there's no good open source alternative. And I'm sure you've heard of many of these commercial tools. Things like Vivint Cortex, SolarWinds, Datadog, New Relic, Plaster Control, Enterprise Monitor, and even a tool built right here in Bangalore called Monyog by WebYog. A lot of these tools can give you context-based metrics, not just regular metrics that you collect previously. So we noticed at Prokona that there weren't many good open source monitoring and management tools that were specifically focused on MySQL and MongoDB. So we decided to take the best of open source, Prometheus, Consul, Grafana, and Orchestrator. Orchestrator is GitHub's orchestrator, which allows you to look at your replication topologies, which also help in capacity planning visually as opposed to you thinking about it mathematically. Here you can visualize, look at query analytics, visualize it, you get metrics. You can, it's based on a client server model. So if you want to install the PMM server, you just get a Docker container and load it up fairly easily. You install an agent on a client and the agent will then ship data to a set server. Capacity planning from this standpoint is there to prevent resource exhaustion. And the goal is obviously to provide actionable advice that is automated as possible. Focus on the ease of use for non-experts. Now, some people will balk at the fact that there is get agent for client. How could you install a lightweight agent on a client on a MySQL server that then ships data to a PMM server somewhere else? And we've gone through this agent-agent-less debate for a long time, but the only way to get the best low-level statistics that you can get from a system is via having some kind of agent overall. And this is more or less true for pretty much any metrics monitoring system nowadays that will basically tell you to install some kind of agent, and also especially because many of these tools are cloud-based, so they do need an agent for you to ship. Whereas PMM, you could run on-prem. This is what PMM sort of looks like. So easy for you to look at problematic queries, easy for you to capacity plan. So if you go to the demo slash orchestrator, just type slash orchestrator, you will actually be able to see how you can capacity plan visually and then decide to add more nodes into a replication topology based on the load of your slaves and your masters. There is also something else that will help you understand your workload better because you can't do capacity planning in real time on your production machines. Wouldn't it be nice if you could actually be able to take all your production queries and play back that somewhere else in a test environment? This is what is ideal. Many people will tell you you can do capacity planning by running sysbench and other synthetic benchmarks, but synthetic benchmarks only go to show you what a benchmark looks like on that system. It doesn't show you what your workload looks like in production. So there is a Pocona Lab query playback tool that's been around for quite some time now. It's got contributors from people like Dropbox and so forth. And basically the idea here is it's a tool that will replay the load of one database server to another. This is great for debugging. This is great for capacity planning. Though there is one caveat that I should mention that this is just an open source tool. It is provided with no support or real documentation. It is used a lot in production from a capacity planning standpoint, but it is just open as a courtesy to users. And a lot of people, by the way, do use this tool for capacity planning the database. So you do need to, of course, log the slow queries in an obviously verbose way. And once you've logged said slow queries, you can basically replay the load somewhere else. This is great if you want to change, say, if you install the MySQL on an EC2 instance and you are planning to now upgrade to a new EC2 instance to see how much more load that instance can take. Load balancing, fairly important topic. Do you pick a random database server when it comes to load balancing? Do you configure your load balancer? Do you just use what the load balancer offers as a default? Because if there are five slaves and 100 queries, do you think that all those five slaves get 20 queries each? Or do some slaves get 30 queries? Some slaves get 12 queries? You never want underutilized machinery running on your behalf, because underutilized machinery costs money. Now in the data center world, you may not see that because you've already taken out the cost, you put something in the rack, and then you've forgotten about it for the next three years while it amortizes and depreciates. But when you use something like Amazon and you're paying by the hour, now it starts becoming real because you can actually save money and you can see that on your credit card. So load balancing strategies definitely matter because you can end up in situations where one machine gets too much work to do. Sometimes they get more load that they can handle, actually, while the other machine is sitting mostly idle. So you never ever want to pay for machines that aren't helping you solve your problem. So there is a strategy that something like PoxySQL makes use of, which will basically pick two random servers at any given point in time. We should see which one has less load because PoxySQL understands the SQL protocol and then send the request the one with the least load. This is one way to solve said problem. Promise talk to you about PoxySQL. It does connection pooling and multiplexing. It can do the read write splits, the shards. It does load balancing. If you happen to note that MySQL server has failed, it can actually reroute all the queries it was prelling to send there to another MySQL server. Of course, it can rewrite queries and do other things as well. And it's got monitoring built in and the monitoring agents feed directly to PMM, for example, or to Grafana. PoxySQL is probably the best choice in the MySQL world today. It's a stable since December, 2015. And I would say you can improve your database of operations, understand, solve your performance issues. And based on the statistics that it exports to something like PMM, you can also do capacity planning for your proxy visually using PMM. And PoxySQL also has this comparison, this neat comparison where I just want to focus on the load balancing part, where it tells you that it does application layer balancing as well as a weighted balance, which is what we talked about earlier. Capacity planning is also about storage. Now, a common problem in the MySQL world is you may have run out of disk space, which is why tools like MHA for MySQL, high availability will actually check for this nowadays. It'll actually have an option in its config where the ping type could equal insert because instead of just connecting to the master, it'll also see if the master can do an insert one because if the master can't do an insert one and accept the right, if you run out of disk space, this is a problem. Now, on a small budget deployment, having something like four times the working capacity is not a very bad option. Why? Because you'll have to think about the size of your database as well as data files which you keep in volume, MySQL. You may have to think about the bin logs which may not be stored in valid MySQL. You'll have to think of the size of the largest table times two because the times two part is for temporary tables or even sort files. You may want to make logical backups if you only have a single server. You may want to do MySQL dump as well as extra backup and you may want to keep two. You need to keep some free for the operating system. If you use LVM snapshots like MyLVM backup tool or even ZFS snapshots, you definitely need to think about leaving extra maybe five to 20% of space for it. So basically, you do need to have maybe four X the capacity for an OLTP workload. Now, large scale deployments which is more than just one machine, this doesn't make sense. So you use something like profit. Has anybody heard of profit from Facebook? No, okay. So profit is more than just a database capacity planning tool because you can predict how much HDB traffic you'll get and scale accordingly when you need. See if a particular feature of an application will have success if its usage will decline. Approximate date for database resources are going to be exhausted. Forecast new customer signups and resize your staff, augment your staff accordingly. And in the US, Black Friday or Cyber Monday is a very common thing. In China, they have singles date. There might be an equivalent way to spend money here. And you may want to know what kind of resources you need to handle them. So those are the days you need to spin up instances to handle the extra load and then you spin them down after those days are over. And this one makes use of the generalized additive model which is a combination of having either linear or logistic regression as well as the additive model that is then applied to the regression. A fairly interesting paper for you to read. The tool is very easily installable. Like most things that come out of Facebook, it is a Python-based tool so you can easily install it via PIP. You definitely want to have at least one year of metrics data to fit into the model which is why I said start collecting data now. Even if you don't know what to do with it, you will figure it out later. And why? Because you want to have seasonal effects because some seasons are just more popular. People shop more before, say, Christmas than they do on New Year's. And it also has tips for holiday effects as well. So basically, if you want to predict the future for a particular metric, you can definitely make use of profit to make the forecast. Then you can plan based on the information you get from the model and it's extremely useful for different types of problems not just your database problems. And if you really want to know how loaded your database will be in the future, you just have to ask profit if you have metrics. There are auto-scaling frameworks as well. Scaler is open source with an enterprise model tacked on to to help you auto-scale your database loads as well as your front end. Then you've also got Amazon where you can vertically grow an instance, horizontally add more replicas when it's RDS. But when it comes to EC2, you can auto-scale and have groups. So typically advanced MySQL users or Postgres users don't rely on RDS. They rely on what is available inside of EC2 and install it on EC2 and treat EC2 like bare metal, so to speak, right? But there's also other auto-scaling frameworks that are not quite open at all, like RDS, Aurora, Cloud Spanner, and very recently as your Cosmos DB. So if done well, capacity planning is like a very well-stocked bar because you will never drink the bar drive. You will never ever see an ever-establishing connection to a database server if you have capacity planned properly. More or less, in conclusion, you really want to capture signal because if you are looking at alerting, they failed you to human psychology. It's like the boy who cried wolf. You wanna think about revenue management, operations research, as well as management science, and lots of papers on this that are good to read. Always be capturing metrics. So salespeople say always be closing. You should be thinking about always be capturing metrics. You always wanna know your own baseline. Obviously, shard, load balance, monitor, be proactive, definitely not reactive, and something you can learn from the Army which is prior proper planning prevents piss poor performance, the seven P's of the Army. There is a fairly dated resource by John Alspaw, the Art of Capacity Planning Scaling Web Resources, which was released in 2008. The tools may be a bit dated, but the concepts and the mathematics behind it are something they'll still live on today. This leans quite heavily on the work he did while he was at Flickr. He's done great work after that, even at Etsy and so forth. Quick add, you know, Prokona does a whole bunch of things. We make all this nice open source tools that you could use. And to end, I am the Chief Evangelist of the CTO Office at Prokona. I was on the founding team of MariaDB Server and I've been hacking on MySQL since about 2004, and before that I was a user of MySQL since about 2000. I've also hacked on Fedora as well as OpenOffice and I won an award for doing stuff for the MySQL world. I lead itinerant lifestyle, talking about computing, databases primarily. You can get some of my other tutorials about MariaDB Server and the high availability landscape of MySQL, as well as some other popular talks of MySQL proxies, securing MySQL, tuning Linux for your database, and server ecosystem. So this is a nice little graphic about the Cathay Pacific made from you telling me my flights from the whole of last year. So I guess now I'm actually open to questions and answers and thank you for listening. The slides will be available so you don't have to take photos. I thought I said that earlier as well. Hi, here. Yes. Yeah. So we run load on RDS. We run load on RDS. Can you help us finding ways to monitor in the sense, can Parkona hook into RDS? Can PMM plug into RDS? The answer is actually yes, there is ways to make PMM work with RDS without actually having to load on an agent. So you can use it alongside CloudWatch as well as PMM, but you'll have to install PMM in an EC2 instance. So yes. Thank you. Any other questions? Yes. So Parkona, as such, are you going ahead with like developing tool with Presto on high or something, dealing with big data and stuff? Yeah, so like Presto, which you talked about earlier, I mean, when you say is Parkona going ahead with it, I don't know what you mean by that, but do we recommend it for some of our customers? Absolutely. Do some of our customers, and we have nearly 4,000 customers, do they use it? Sure. But do we want to do anything additional to Presto? No, because the software is actually pretty good by itself. Okay, but deals with the frat and data structures, like databases, we don't deal with hierarchical databases, which is the current demand in like many of the site, many of the current commercials in our use. So what exactly is the question about Presto? So right now, whenever we deal with the data, like Presto or Hive, what do we do? We deal with is like a flattened data level structures, right? But going ahead as the demand is, we want to deal with hierarchical structure, like, yeah. So there is like a much requirement in Presto also. So are you looking ahead to it or? You know, if people file requests with Presto and if Facebook or the other developers feel like it's a good thing for hierarchical querying inside of Presto, then they'll probably add it. It's an open source tool, but is Prokona personally looking at it and planning to add it? No, we've not had customer requirement for that either. So no, we just, so far our use of Presto is recommending it to people and deploying it so that they can query across multiple data stores, but not to hack on and add to Presto. We try to limit the not invented here engineering at Prokona. So everything we do, if we like to upstream and then when upstream takes it, we get rid of our own patches and then make new features. So we don't like to make forks, we like to make branches. Thank you. Someone here, yes. Hello, okay. Hi, I'm Arring, founder of a native ad network where we started generating 150 million. And again, oh yeah. So we are the stage where we are breaking monolithic services into microservices, right? So here you have suggested to put a load balancer in front of the db layer itself, right? But the notion in microservice are infrared, evangelize and tell more about, you know, dedicated each of the DBs to each of the microservices, right? So you would like to know your views should, it's enough to put, you know, microservices behind the load balancer and the load microservices take care of individual, individual DBs or it should be on the DB layer also. Yeah, so the question was about microservices and databases and, you know, beyond people like, I think maybe pivotal that pushes a lot about microservices, I still think they run most of their data, their data tier anyway with the MySQL data tier in a non microservices fashion. So if you go to pivotal.io, yes, you have lots of microservices but at the end of it all, you still have a highly available MySQL database that is sold to you, which is actually a Galera cluster underneath it, right? If you look at the documentation carefully, so that's even similar for what you get in Cloud Foundry. Now the question is, should you have like one database instance for like, you know, each microservice and microservices, that is a possibility but you also need to think about, you know, that microservice having its own set of replicas. Do you then want to have your own database load balancer? And actually the answer to that is each microservice may have a load balancer for the database server because if you look very carefully at Cloud Foundry which, you know, pivotal contributes a lot to, they actually have a load balancer that they've contributed to that works on top of their Galera cluster which will tie in very well with the microservice architecture that you will write your application in. So yes, you can have a load balancer for a database cluster but you can also think about having that for a leader follower. So the current Cloud Foundry method is not leader follower, not master slave, it's more, you know, let's have a three node Galera cluster that is just highly available but you do realize that, of course, that synchronous replication is not the same as asynchronous or semi synchronous so that if you have hot rows, so if you're, that could be a problem. So it's not just easy fit and I think even pivotal realizes that because their customers eventually, you know, say, hey, you know, why are we having these problems where, you know, we're having common rollbacks and stuff but you can, to answer your question in a short way I guess you can have a load balancer for microservice with the database here at the end or you can choose to have the database here with microservice on top of it and I think it's still too early to see which one pans out as the best choice but, you know, if we're gonna follow this, I would say we should definitely look at what pivotal does at least they seem to be quite leaders in this space. Do you think, do you disagree that they're? No, no, so with which approach you would like to, you know, go first? Yeah, so like I said, at the moment I'll see what they do in the cloud services because then they have to support that for paying customers, right? And what they're doing so far is they've got a load balancer in front of Galera and then they've got the microservice applications in front of that but it's not one database for application at the moment but that doesn't mean that that can't be the future depending on how things pan out because the database layer itself is not very well containerized so to speak. Like, you know, people talk about databases and containers even and you do realize that that's the one sore sticking thumb of containers is that you really don't run databases well inside said container and the alternative is to actually build containers into the database like some commercial databases have for like resource separation but that would also involve a lot of change to the database code itself and I don't think many people have the will and urge and willpower to push that forward so at this very moment I think you're still, it's still mixed it's up in the air as to what may be a good choice but I'll go with what Pivotal is charging customers for and at the moment it is you can build your microservices you can have the load balancer and you can have your database here which is Galera except the caveats of what Galera can give you due to you. Thank you. Okay. Here. Here? Yeah. So I have a question how does join work with Sharding? I have an opinion that joins won't work because data during Sharding can be distributed across nodes. Yeah. Question is how do joins work across Shards and guess what? You don't do cross-jard shines the frameworks don't, the frameworks separate users typically so that you don't need to do a cross-shard join. So when you think about Sharding you don't think, you can think about a table and you can say oh maybe I like to shard the table and that would involve a cross-shard join which is a no-no but then the most popular way for Sharding is to put n number of users on a shard and that's what the frameworks do. So it's more horizontal sharding as opposed to vertical sharding and that seems to be the way to shard you never have to do a cross-shard join and the only MySQL variant that does cross-shard joins is MySQL NDB cluster and that has this adaptive query language functionality that tries to push the join down to the engine layer but it is still comparatively slow if you just did the join on the one database. So when I talk about Sharding which is why I tried to say we divide that this amount of users to one shard that amount of users to one shard and if we have one, a very expensive user here we reshot that user out we don't split the table data, we split the users itself. Does that clear things up? Okay, so it's the framework that is going to do the sharding part? Absolutely, you can either write your own sharding framework or you could use a framework that I recommended earlier there are plenty of sharding frameworks that are fully open. Okay. But generally you don't want to split people you want to split the people you don't want to split parts of the data of the people because then you do need to do cross-shard joins and that's just notoriously complex and slow. Hi, this is Yusuf from Wingify. So you have talked about performance schema. How much impact does it have on the server performance? Yeah, so how much impact does performance schema have on server performance? So in MySQL 5.5 and 5.6, there was easily a three to 7% performance impact on just having it turned on, even if you didn't use it. But in MySQL 5.7, they've really worked on fixing performance schema. So thankfully, there is actually now almost negligible overhead. It's less than 1% to have it running. And a lot of people use performance schema now from a metrics monitoring standpoint. So MySQL Enterprise Monitor more or less depends on performance schema if you use newer releases. Yahoo made an open source tool that also ships performance schema data to this Java-based monitoring tool so that you can go back in time to look at it as well. So more and more tools are starting to depend on performance schema. So I wouldn't worry about the capacity. And now even if performance schema gave you 5% less capacity, so to speak, you should not be running your servers at 95% capacity. Your well-run server is probably running at around 80, 85%, because you need to allow for spikes in usage. So the average out should be around 80%, but never at 95%. So you should always have enough for performance schema nowadays. Okay. Last question. Someone whom I don't see. No? Okay, we're done. Thank you.