 Is there a disco ball or something? It's in the bottom. There's a switch in the bottom, so it's pulled. Good morning everybody, sorry. Welcome to Scale Night 16. If you're like me who wear many hats or you've inherited a daunting role to be a DBA, you've come to the right place. Sophia is going to help us kind of navigate through that daunting task without further ado. Sophia, welcome. Thank you everybody. Can you guys hear me okay? Yep. So I'm here to talk to you about how you can become a MySQL guru or what is also known as growing pains with databases and lessons you've learned. Let me tell you a bit about myself and why am I talking to you about this. My name is Sylvia Botros. You may know me on Twitter as DB Smasher. I am currently a principal DBA at Sengrid and I have been with them for a little over six years now. Before I joined Sengrid, I accidentally became a DBA. I started off as a now completely forgotten CDN as a software engineer. I was in charge of some UI dango code and Python code. Some day, about nine months in, there were performance problems. Within a couple of days, I was knee deep in MySQL documentation because you realize it was a database. The rest, as they say, is history. A little bit about Sengrid. If you haven't heard about us, we are an email infrastructure provider. What that means is that we are a middleman for our companies who try to get their customer communication in the inbox without having to build all the infrastructure entailed to dealing with so much email. We have 63,000, more than 62,000 paying customers in over 100 countries. We process more than 40 billion emails per month. These are just a few of our customers who are by now household names. Another step that I found out recently is that we apparently have sent emails to more than half of the unique email addresses in the world within the last year alone. In the process of getting to that very large scale, we learned a lesson or two. Who am I here to talk to? You're an engineer at a rapidly growing company. You are suddenly in charge of databases because nobody else is. Your ops team is a single digit number of people. They are busy building an infrastructure, whether in Amazon or self-hosted, and it's growing very quickly. You don't have a DBA on staff yet. However, every time there's conversations about an outage or why a certain feature hasn't come out yet, the database comes up. Yeah, we've all been there. What are we going to talk about today? First, I'm going to tell you some basics, some tips about schema design, some metrics that you should probably always track, and some things that you need to monitor. Then we're going to branch out a little bit more into architectural best practices that will help you scale your database. Then we're going to talk about what is the point where you do need to get a DBRE, and what are the things you should look for in that person that will help your company and your team grow. Let's start off with some schema design. In my sequel, primary keys are your friend. The most used engine at this point in my sequel is called MLDB, and primary keys are also known as clustered indexes. That means that every leaf of that index tree also carries a full copy of the row that this leaf is pointing to. In non-DBA speak, that means if your query is going to filter directly on the primary key value, you can get the entire row with the single lookup, which is the fastest query performance you can achieve, unless your database is dev-null, but then that's not a relational database. So it's not easy to guarantee that you're going to always query on the primary key. One way you can do that is to just avoid auto-incriminating keys entirely. It would be great if you could do it. You could pick a unique key that exists in your table, a combination of columns that are unique together. My sequel allows you to do that. Most modern ORMs will also allow you to do that, but that's not always an option. Sometimes that is a value that is too large, and by itself will cause performance problems. If you do end up having to use an auto-incriminating key for your primary key, you should always remember to make it unsigned. One of the tricks and traps in my sequel that are less known is that integer values are presumed signed unless you specify otherwise. What that means is that if you define an auto-incriminating key and forget to mark it unsigned explicitly in your definition, you're coming out of the gate with the primary key that can only use half of its key space. Do you know what happens when your primary key runs out of key space? You'll stop taking your rows. You try to insert to the table, and it will give you this very vague error of you've reached the maximum value for column X, and you will have a red outage on your hand because you will now have to stop all right to do an alter table to change the type of the primary key, and things will be down for quite a while. I'm assuming once you've reached that maximum value that you have a large table on your hands. Ask me how I know about that one. Some more schema design tips. Always set a default. My SQL tries to help you out by not being so restrictive on how you define things, but on the other hand, that flexibility can be a problem down the line. Now is a default for most things, but a lot of times null doesn't make any sense. If you're defining a Boolean column, what does null even mean? You should always have a default that makes sense within the context of your data. Null should be the exception, not the norm. Times stamps are a good friend. Especially now in current versions of my SQL, it doesn't have any more restrictions that don't make much sense. They only allow one column that is one timestamp, and only one of them can have current timestamp. These are no longer the case, so it will help you a lot down the line if you actually set timestamps on all of your tables that will tell you when a row was added and when the row was modified. I'll show you later how those timestamps are improving down your critical databases to a reasonable size. Don't let the implications on how the query optimizer is going to try to figure out how to decide the query plan. It may get in the way of using certain indexes. Sometimes just an empty string will probably be a useful thing. Context, of course, is everything here. Context is everything. What is the actual context of the data in there? What is the application doing? That will help define, and I'm not saying do not use it as a rule. The biggest rule in database design is it depends, but it's important to know why you're using the null and not just accept the null as a default. Be deliberate about how you're using it, is what I'm saying. Another thing you need to keep an eye on is don't swipe the indexes. It is impossible to come out of the gate on day one with all the indexes you will ever need or not need. It's important to know to understand that there will be changes down the line as your user base expands and as your use cases exchange. There will be maybe one or two obvious cases. Say it's a user table and the user name is an obvious one you want to index on, but done is better than perfect. Always accept that you will have to go back later and look at your logs or look at your slow performing queries and decide later how to change things. This is a fun one. There are a lot of synonyms in MySQL, and by that I mean MySQL doesn't explicitly define certain types. A good example of that is Boolean. There is no explicit Boolean in MySQL. When you define a Boolean in a table, under the hood is actually a tiny int, and tiny int, if it's unsigned, can store up to 255. That's a lot of possible values for a Boolean. Now, it may seem like a really cool flexible thing to do, to just start using it with more values than the two, but if your column name is active and it sounds like it's a Boolean, you are going to be setting everybody up later for some serious confusion with how you use that column. Worth a piece of advice, even if the synonym allows you to use this value for more than what it actually sounds like, that's not a good thing to do. Because down the line, your DBA is going to try to look at this table and figure out how to optimize some query. So you have to select distinct on the active column, and 12 different values come out. Ask me how I know about that one. Let's move on to metrics. There's a lot of metrics that you can track in MySQL. A lot of things that it actually exposes to you. I'm not going to be comprehensive about it because we can spend all day about that, but I'm going to cover a few that are super important for everybody to track no matter what your context is. Two that are related that I'm going to talk about are threads running and threads connected. These are very close, but you will track them for completely different reasons. The first one is actually a subset of the second. Threads running is actually showing you what is the work in progress in your database. It's showing you how many threads, how many connections are currently doing work. And so you always want to keep that value low. And if it starts going up very often, it's a find that things are overall flowing down because it means that the queries are not starting and finishing real fast. That cat picture is really threads running. That's a mutex that they're finding over. Reasons that you could have an elevated number of threads running is for in a right context, it could be a mutex. It could be a lot that there's contention of. One of the good examples of that is foreign keys. They are notorious for causing right contention because there will always be this check between the tables to make sure that the relation exists. And so that will get in the way of the rights and can cause a lot of rights to stall waiting for each other. Auto increments have the same problem sometimes. If you have enough inserts trying to go at the same table, there's a single mutex for the auto increment value. And they can all start waiting on that one. In a read context, you could have elevated threads running because of this guide. If you have a bunch of reads that are trying to scan a number of secondary indexes, the number of row scans can multiply to the million sometimes. And in that case, if enough of those queries are happening at the same time, they're all doing that scanning independent of each other. And that way you can complete and saturate your disk array with read volume. And next thing you know, all of those reads are just stalling. Moving on to threads connected. Threads connected is similar. They're not exactly the same. It is actually a count of all of the open connections, whether they're working on a query or sleeping. And that one is actually more sinister. The reason it's going to hit you at the worst possible time is because my SQL has another config called MaxConnections that I'm going to talk about later that specifies how many connections can be open to the server at any given time in total. If your threads connected is climbing continuously and starts getting too close to that, you are ramping up for an outage when you're traffic spikes. So it's important to always keep an eye on threads connected and to make sure that it doesn't get too close to your max connection value. Another good one is disk IOPS. Now, a few years ago, we used to say that, oh, well, SSDs are getting cheaper. We don't have to worry about IOPS anymore. We can just, like, stick flash stories or anything that needs a lot of disk reading. So we all have flash disks. Databases will use them if they find them. Nothing, even if you get both fancy fusion IO things, you can saturate the IOPS of a disk array with enough read contention or with a schema that is not super well designed or using large text blobs. So always keep an eye on disk IOPS because you'll be surprised how fast you can saturate that under high load. I will show you a really cool graph later that shows how you do that. So this is one of the databases that I used to manage at Sengred. I have redacted the hostname to save the innocent. This box is not a, this is not a VM or like a small commodity server. This was a box that had 48 SSD drives, three RAID cards managing the entire stripe, and I don't even remember how many cores, but it had a schema that used heavily JSON blobs and that would scan on the order of 10 to 20 million rows per query, and so that happens. And this was a common occurrence, by the way. The moral of that story is that nothing is unlimited. Your resources are always limited no matter how cool the technology behind the hardware is. Four metrics, space growth, and the thresholds that you alert on disk space for. My sequel is notorious for one thing. When it takes up disk space, it doesn't give it back, but at the same time if you want to alter a table, you are required to have free disk space in the OS that is equal to that table size. So say you're trying to alter a table that is 500 gigabytes, you better have at least 500 gigabytes free on the host to be able to make that change. And it doesn't matter what tool you're using to make the change, it's a blocking alter table, or you're using an online schema change tool like the ones I'm going to mention later. So tracking your disk space growth, how fast you are eating up disk space on that database, is super important. You don't want to find yourself where there's an emergency or there's a feature that needs to come out tomorrow and you're trying to do an alter and oops, we don't have enough disk space to actually make this change. The rate at which disk grows should quickly point you towards the parity of whether you need to vertically expand the host that you're running your databases on or more preferably find ways to prune down your data. And also this is one of those things which will show you a sign that your business might be ready for a data warehouse. Most companies when they first start, they don't think about data retention issues. They don't think about when they need to age out some of their tables. And so the next thing they know within a year or so, they find themselves treating the database as both business critical and data warehouse which is a really bad place to be. This is a very important role. Things like disk space growth and how fast it goes and data retention and how to rotate data out which I'll talk about next are super important in making that distinction and helping your business separate those two concerns so that they don't clash in their needs. And by that we go into added rotation. Remember when I mentioned time stamps, those become super important down the line. If your tables already have on the markers where you can easily tell how fast you're adding rows and how often you're changing rows, you can quickly make decisions along which parts of that large large table on your hands now needs to be aged out. You can start making decisions of okay data that's older than a year needs to go. I can start adding database partitions on these things so that I can easily drop the older parts without affecting the entire table. MySQL has some of this stuff in the box built in with it and some of it you have to manage on your own outside of it. So it gives you the ability to do database partitions. You tell it you're going to have 20 partitions one per week and they go from like 52 weeks ago till now. And next thing you know your table is broken into 52 files. However, that is as far as MySQL will help you. The commands to manage those partitions are built in but you have to figure out a way to automate how to use them, how to add partitions in the future and how to age out the old ones. This is where a tool that most DBs are familiar with comes into play. It's called PDB Parted. There will be links to these tools in the slides that I'm going to share online later so you'll be able to find them. It's a tool called PDB Parted. It is Perl script that wraps those commands in MySQL. It helps put human readable arguments on them. It can help you both add partitions in the future and to also age out and drop the older partitions based on the arguments that you give it. But still at the end of the day it's a script so how are you going to run that? What you do is you wrap that script in one of your monitoring commands. In my case and I'm going to link to a blog post that I wrote about this in the past where we use censu at Cengrid. What I do is the partition rotation is explicitly a censu check. What that does is that the command will run as part of censu and so the monitoring framework that we have will take care of watching the execodes, making sure that it ran successfully. If it doesn't, it takes up the execodes and notifies us with whatever handler we decided is appropriate for that particular case. That way, one, you automated your data rotation and two, it is monitored because cron is terrible, I kind of think. Like I said, these are links to those tools and to that blog post which explains this particular approach a little bit better. Moving on from metrics to architecture. This is going to be some larger aspect tips on how to get your database layer to scale longer. Be careful with ORMs. Now, I know that ORMs are a very easy way to get off the ground when a company is starting and that's totally fine, but you cannot go into using them assuming they will work for forever. You need to be aware that there's going to be a point where your scale is going to mandate using something else because the ORMs make presumptions about how your databases look and what kind of traffic do you have that will at some point be a problem for you. Once you're hitting on the order of thousands of queries per second, it's going to become difficult to scale with ORMs and that's when you're going to need something like ProxySQL. ProxySQL has been around for a couple of years now. It is written by a former DBA and it is not an ORM. You don't have to define to it and code what your table structure looks like. What you do is you just give it the back end. It has certain knobs that are more geared toward scale and its feature set is geared towards large throughput far more than just the simplicity of I just want to talk to the database like it's an object. Things that ProxySQL gives you like caching. Now, I know that my SQL technically has a query cache, but it doesn't work. In fact, if you have it on, it will actually get in your way. There are mutexes around the query cache that cause reads and writes to slow down because all of those queries are trying to populate the cache, but the way it's engineered, it just never gets the cache hit. In fact, my SQL 8.0 is going GA sometime this year and Oracle already announced that they're completely removing the query cache in 8.0 and their recommendation is to use ProxySQL for caching results sets. You can control how the caching works in ProxySQL using comments on those queries as part of your connection and so you can determine how well they are cached like how granular you want them cached or how loose your query hit can be. It offers conservation changes with zero downtime which means that if you decide one of your read replicas needs to be under maintenance for a bit, you can easily have ProxySQL drop all the connections to that replica without affecting the rest of them. Raise your hand if you have to do a full business maintenance because you have to go restart HAProxy because HAProxy doesn't drop existing connections. Ask me how I know about that one. ProxySQL can also do inline load shedding and what that means is that it has its own monitoring logic towards all of the backends it's talking to and so it can tell based on statistics that are internal to MySQL like the performance schema and the MySQL variables, it can tell which of its backend databases is under load and which is less under load and so it automatically fluctuates and sometimes can even shun a database backend that is super loaded until it can cool down and then start sending queries again and so it can have a much faster feedback. A really cool feature in MySQL is connection mirroring and that means that if you set the right query rules on it it can mirror the same connection with the exact same queries to a completely different backend at the same time. A really cool use case for this is if you are trying to upgrade MySQL, another big minor version, you have a couple of replicas that are upgraded but you are not ready just yet to send your customers to that upgraded one because you still haven't tuned the configs. ProxySQL can with very little work on your part allow you to mirror that work over to the upgraded replicas and basically watch the difference in performance real time. Moving on to more things in architecture. You have to always keep in mind that writes will fail. You cannot build a system where you presume all writes are going to always succeed 100% especially if the data in question is business critical. This used to be difficult to work around in the past. People tried all sorts of things like they would have their application just keep retrying which is a really cool way to do as your internal infrastructure or they would have a local state file on the application side that it tries to replay. Now you just made your application also sort of a database and now you have to make sure that the application node doesn't die because you have state local to it that is super important to your business. Another thing that was my favorite was a fallback table on the same database that is ostensibly down. That didn't happen. So you have to be prepared for the upgraded modes and you have to be prepared for what is now famously called event-based architecture. You can have things like Kafka with explicit resilience and persistence promises that we can use to basically stack our events and be able to replay our writes to the database if needed because these are dirty little secrets. MySQL's goal is not being available all the time. MySQL as a data storage relational and it is designed for consistency far more than availability. So you have to work around that and have the availability be designed to your system through other means. And that's what the combination of things like Kafka coupled with your database gets really into play and gets super important for you. Now not everything can be asynchronous like that. You can't make all of the writes in your system just drop into a Kafka partition and then we'll deal with them later. Sometimes things are super obvious to the customer and you have to deal with it right away. And that's when read-only mode and degraded modes are super important. Say it's your sign-up page. The database is down. You can't write any sign-ups. But maybe have your sign-up page send an email to support as opposed to just be a blank error page. So this is where you can get really creative with how you deal with failure and how you make things a little less impactful to the business. This is the kind of thing that not engineers that engineers cannot alone decide or work on. This is where product gets involved as well. But that's this afternoon's talk. To charge your abstraction layer the same way you charge your databases. If you did all this work to split your data stores by function where sign-ups don't impact statistics or user settings don't impact sign-ups but then your abstraction layer whether it's ORMs or ProxySQL or what may be if that abstraction layer is one large cluster that everybody talks to to talk to any database you haven't really done anything. Because at some point that abstraction layer is going to start having load problems. Its capacity might go down. And you don't know if it's because we have more sign-ups or because someone is trying to do less the page or if it's because there's real traffic coming through and we just made hacker news or something. So make sure that your abstraction layer is completely also charted the same way your databases are so that you can tell when each part of your product is gaining more traction and getting more traffic than the other. Another important thing to keep in mind when you're designing your data store layer is that lag will happen. That's just a fact of life. Now this is where ORMs can be super limiting for you because a lot of ORMs give you a binary choice. You will either have the choice to send your reach to a redreplica or you don't across the board. Which is a problem because not all endpoints are created equal. Some of your endpoints are going to be super sensitive to lags like you have a user sign-up flow and the customer input some information and now they're moving on to the payment information and now you can't have that customer just waiting until your database cluster is done with this lag. But then other endpoints are going to be super not that sensitive to lag. Like I want to look at all of my stats for last year. I'm okay sending that query over and then waiting for it for a minute or two. You can't have both of those kinds of classes of endpoints be following the same binary choice. And that's where having a proxy like proxy SQL where you can define rules like this and decide based on each endpoint whether you want certain sensitivity to lag or not to decide what you're going to do. But you can't just pretend that lag is not going to happen. Once again this is one of those things that products will help you with. This is where conversations across the org and cross-functional things cross-functional conversations can be super important because everybody needs to be on the same page and how this will operate. And now it doesn't mean to send all your leads to primary. Let's try that. Now you need to understand that you will never have just one database or one database cluster. In fact that's probably a sign of success when you start needing more than one database and more than one database kind. So be open to that. It should be fine that you have more database clusters because it's a sign of success. Now that was a lot of stuff that I just went over. You can't do everything at the same time and prioritization can basically sync or swim a business. So let's go over shortcuts probably take things you don't have to worry about from day one and then things that you really should do from day one. You don't need to agonize over every column type. Memory is super cheap now. I've seen posts go up to one terabyte in RAM. And I don't know how many terabytes we'll just save. So it's become less important to be super finicky about whether your integer needs to be an int or a big int. Things like that are less important now. The one caveat here is be careful with JSON blocks and large text types because my SQL as a piece of software is still super inefficient in how it deals with them. But besides that for the most part you don't have to spend too much time on that part. There will always be changes down the line. You should accept the fact that you will always come back and revisit as the form is trends appear from your real customer traffic and as your product grows. You don't have to tune every possible config. This is a big rabbit hole in my SQL. Now in the next slide I'm going to turn around and contradict that by telling you you can't not tune certain configs but there's a specific subset of configs that you need to worry about. Going down the rabbit hole of tuning every possible config can be quickly a road to diminishing returns and your time is an asset too so you need to keep that in mind. Don't index all the things. They're great for read queries but they're not free. And sometimes it's going to be hard to tell up front whether a certain table is going to skew towards reads or skew towards writes. So like I said earlier accept the fact that you shouldn't index everything and maybe revisit later if there's performance issues. Audit logging is really cool but it's not as important unless your product is out of the gate super coupled with some compliance. For example you're doing something that has to do with healthcare or payment processing. If PTI or HIPAA or something like that is part of your business model, sure this probably is something that you should do from day one but otherwise it is a large project that you probably don't need to worry about by the way. Things that you should not take shortcuts on things that you should do from day one. You should always set your database time zone and your servers to UTC. You'd be surprised how often I've seen that not happen. And it's incredibly difficult to fix this later and that's primarily why I mentioned it as I think you cannot take a shortcut on. You can't afford not to track your auto incoming key space on all your tables because this will hit you at the poorest possible time. It's going to be when there's like a big flow of new signups when your user tables were not at keys and now you can't sign up customers. That's a big problem. And it's relatively easy to write a script that will just compare your current primary key type with the maximum idea that exists on the table. Do your future self a favor and give that one your monitoring. Backup testing. Now if most of us now have our database in the cloud and it really makes backups and snapshots super easy but unless you're testing those backups they don't exist. You have to always and not just test the backups you're better yet to automate the testing of those backups. Grab the last snapshot, pull it up on an instance and then destroy the instance once it comes up. It'll save you a lot of time. And your future self when your company really succeeds and you have to do some audit compliance they're going to ask you about backup tests. They're not going to just ask about making the backups. And that's when you're going to thank your past self because you've got that already down. I said I was going to contradict myself. So let me go over some configs that you cannot afford not to tune from day one. Turn off the query cache. And it's not one config, it's actually two things in my SQL. There's the query cache size and the query cache type. Even if you set the query cache size to zero it will still get all tripped up on the mutex if your query cache type is not set to off. Hopefully soon this will no longer be a thing that we have to talk about and we will all forget the big disappointment that was the MySQL query cache. But for now that is still saying you need to make sure you turn it off. Set your InnoDB buffer pool size. By default in MySQL up until now the default size for this is 128 megabytes which is really, really small but I'm assuming it was set like that because VMs and dev environments are laptops. However, there is no production server that needs to be using such a small buffer pool. So you have to always make sure to set it. Bonus points if you're doing this in config management then use NAS based on the existing RAM but you've got to be careful not to be too greedy with it either. You have to also keep an eye on how many connections you have to keep open from your ORM or your proxy SQL cluster because there's an overhead to keep in the connections. MySQL maintains buffers on the first session bases as well. And so if you go too greedy with your buffer pool and then you have a spike in traffic and a lot more connections are starting to open you can actually cause your database to run out of memory and next thing you know the kernel has unkilled my SQL and you're down which is really the opposite of what you wanted to happen. So I used to do 80%. I have tamped it down to 70 mostly for reasons that have to do with our architecture but yeah, that seems to be a good spot to start. Less than 50 is definitely not okay. So big part of this is going to be plus you. The big part of this is going to be what are you using for connections? Things like EventD, Python and Twisted are notorious for opening connections and not closing them but if you're, say, using the Go driver it's generally better at closing them when you're not used after a certain age and so the connection hygiene of your application layer is a big part of this. Set your MySQL process file limits. It is the year of 2018 and MySQL will still start with only 24 file handles enabled sadly but because this is a fact of life you've got to keep an eye on it. Every connection that is open to the database server needs a file handle which means that even if you tune your max connections and you're tracking your sets connected what you don't know is that thing is waiting for you in the grass out there and your connections are like getting close to over to a thousand and the next thing you know all of your apps are seeing this very strange error that says cannot create thread and that's why. So, a system deservices file even a direct command to the shell from your chef run that will just set this all the time will probably save you a lot of hard brand down the line and max connections. The default value for max connections in MySQL is still at 151. That is probably plenty for an old style lamp stack but we are now getting into microservices and everybody has on the order of 20 to more services talking to the one database 151 is not going to be enough. So, between process limits and max connections you need to be careful what is your actual application behavior and how much do you really need. This is usually a game of resources so you're going to also probably decide at some point whether you want things to fail quickly with hard limits like this or whether you want to expand those limits but then watch your resources deplete. There's always a happy medium of course. That was a lot of information. Now, this is definitely a lot of stuff and not all of this will come up at the same time but at some point it's going to start feeling like managing the database is literally a second job and that's the point where you need a DBA. However, there's good reasons and bad reasons you want to hire one of those. Do not be that team. You don't want to hire a DBA because you don't want to worry about the database anymore. That is not a good reason. You're still going to need your engineers to be aware of how the database behaves, how it interacts with the code they wrote. You need the DBA for other reasons. Now, let me first go over some of the bad reasons you would want to hire a DBA in your team. To run schema changes. Now, everybody, me included at some point has to write schema changes, has to run schema changes in production by hand because automating schema changes to the database is scary and at the beginning of the business life cycle that's not what's going to help you the most to get more customers and keep your product going. At the beginning it's fine but you have to accept that at some point you're going to need to automate this and that automating it does not mean hiring a human to do it. Your team should be already familiar with tools that help facilitate schema changes so they're not just going in by hand and running alters on production and stopping all traffic. Things like online schema change is a tool called shift by square which adds on a nice UI and the ability to review schema changes and so it gives some more transparency to everybody on the team on how the schema change actually gets through its cycle from laptop to production. You do not want to hire a DBA to be on call for databases. Now, when I say this I'm not suggesting that DBAs are too cool for pager duty but I am saying that they should not be on call 24-7 for everything database and as soon as one of the databases pages is to the DBA because that's a quality of life thing and DBAs deserve a balance in life just the same as your engineers. However, they are a perfect escalation point. What you need to see happen is if there's problems your engineers have to first try to the bucket and then if they're stumped they can escalate up to the DBA because I don't think I've ever seen a shop where they have a number of DBAs that is equal to the engineers and so it doesn't make sense because all of the engineers taking a rotation that's like a few months long. You do not want to hire a DBA because no one else has time to take care of this. If the database and the data is the most important asset of the business then it should get the same level of importance and hiring someone is not an outsourcing solution. So now some good reasons. You need someone to set some best practices. What you need the DBA for is to help train your engineers to be the best engineer around the database and how to use MySQL the right way and when to realize that Maybe MySQL is not the best tool for this particular part of the product. Like I said, they need to be an escalation point. You will need someone with expertise to help your engineers when they hit something they're not familiar with. If your engineers are new to scaling databases if you're the kind of company that's seeing 40% growth every year that's a large base and your engineers have had experience dealing with that. So having someone who has done this before would probably be useful for them as a subject matter expert. Or you're solving really interesting and new problems which is hopefully most of us. So who are you looking for when you're hiring for a DBA? There's a stereotype around DBAs that they always say no and they're protecting their databases very carefully. That is not what you want. Once again, that is afternoon stock so you can find me there for that one. You need someone who is enabling your engineering team who is actually explaining to them how to do things. Why things work a certain way. They need to be more than anything a mentor because if your DBA is not leveling up your engineers on how to talk to data stores then they're not really doing exactly what they should be there doing. What you do not want is someone who is just maintaining your database super carefully by hand and nobody else is allowed to touch them. I'm reading material that I'm going to close with. These are super important and I think they would be useful even if you're not looking to become a DBA down the line. The DBA are you both came out late last year I think and it's super important that two women who wrote it have really good perspectives on both NoSQL and relational databases and so I think the advice in it can apply either way. More specific to MySQL the second book is a classic and then you'll find that the third one is a blog post by Charity and it hit close to home and I think it will for a lot of you in the room as well. It really talks about things that you've got to watch out for if you just suddenly find yourself in charge of databases when you aren't planning to. And that's it. Thank you. Are there any questions? Sure. There's a mic here too. If you'd like to come up. I'm sorry, I didn't hear the last one. How do I get to the links that you showed? To the links? I'm going to share these slides online later. I'm going to tweet out a link so you can follow me on Twitter or something at dbsmatcher and you'll find a link to the slides and within it you're going to find links to those resources. Yup, that is my handle right there. It's more about MySQL tuner the infamous script that gives you mathematical equations and it dumps all in. Should anyone actually use it? Should there even be a blip on someone's wall? Or should it honestly just be totally disregarded? If you're completely lost it's a good starting point. So if you've never done this before and you know there's a problem it's a starting point to start comparing reality with what this thing is saying but I don't advise towards using any of the things that are out there as hard fast rules. There were things that I used to do with databases before I joined Sengred that I completely flipped over after about a year at Sengred. So what happens on the ground will a lot of times change things. Thank you for the talk. Just a quick question regarding the database time zone. You mentioned like it's good to set on UTC but do you have any tips on how you can move an organization to use UTC when they're already accustomed to using something like Pacific for instance? Man if you do I'll pay you some money for that. It's really difficult because it's going to become a problem like if you're using a lot of day times day times don't even have a time zone attached to them so you're going to have to make sure that you know what the time zone was from the beginning of this database's existence and I guess do conversions is going to be a large project like you'll have to do migrations and things like that. I don't believe there's anything out there that will just do this for you.