 Okay, I've been told that I have a whopping 20 minutes to give you an introduction to troubleshooting with MySQL. So this is what I call the TLDR, the took too long, didn't read. So this is kind of going to be mostly meat, very potatoes going on here. My name is Legaia Tamel. My official title is Principal Technical Support Engineer for MySQL within Oracle. That's my email address. That's my Twitter handle. If you have questions afterwards, you can either meet me up in the MySQL booth upstairs, or you can email me the question and I'll always answer questions. My Twitter handle is to let me know how I did. Did you like this talk? Did you not like this talk? Did I cover the things you hoped for? Did I totally bomb it and absolutely suck? Either way, let me know what you think, because I can't make it better unless you tell me otherwise. So I have to show you this. I have to let you read it. Sorry, I work for the other guys. So read that. Okay, let's actually get into this stuff. Step one of troubleshooting is that you absolutely have to define the problem. And when I say define the problem, you have to be as specific as possible. You also have to know what is normal on your system to absolutely be able to understand what is not normal. If you don't monitor your system, you will never know what is normal. So you will never know about a problem until it shows up. Monitor your system. Find out what's normal. Be exact in your problem definition. Don't say my server is slow. What is slow? Is it the database that's slow? Is it the queries that are coming back slow? Or is it your network that's slow? Is your application taking too long? What's slow? If your query is slow, how slow is it? Does this query normally return in 10 seconds? Or does it normally return in 0.1 seconds? If you don't know, find out. If you don't have a specific definition of your problem, you can't fix it. First thing you do whenever you have a problem is hit the manual. We have a fantastic manual. Make sure you read it and find out at least the general area of what your problem is involved and take a skim through it. Next thing you do of troubleshooting, you're going to need to collect information. You're going to need to collect diagnostic information. This information can either be within the MySQL database or it could be on your operating system. If you have no information on your problem, good luck trying to solve your problem. That's part of the monitoring. Another thing that you may have to do, which totally sucks sometimes, is you're going to have to collect your diagnostic information during the problem. Information before the problem and information after the problem doesn't help you tell you about the problem that's happening inside. Collecting that information during the problem means that, again, you're going to have to be monitoring your system to be able to have that previous information about your system. And if you don't have that, a tool that I recommend that people use is actually from one of our competitors. It's called PT-Stock. PT-Stock is a tool that allows you to set a trigger so that way if a specific event for this problem occurs, it starts automatically collecting diagnostic information from the operating system as well as from MySQL. Check those out. Now, I don't care what anybody says. If you have a problem in the very first place you look for an answer is your error log. Anybody that was in my admin class session earlier will tell you if you remember nothing else, you need to know about the MySQL error log. That's the first place you look whenever you have problems. It's also going to be the area that's going to potentially give you a history about the problem. If you don't have history, if you don't have context for the problem, you might start solving the wrong problem, which doesn't give you any of your information. Look at the error messages in there, and you'd be surprised how many times the error message includes where to go for additional information for that problem. Solve your own problem if you can. Now, if you have a general server crash, just to make sure everybody's clear on this, I'm going to give you general scenarios of problems which are MySQL server, and then I'm going to tell you the information you need to collect to try to solve that problem. This is the information that will give you, at least point you to, where to find the solution to your problem. So in a general server crash, the first thing you're going to look at is your error log, because you want to see if there's going to be any error messages within it about the crash, as well as a stack trace. If there's a stack trace in your error log, that means you can then go to our public bug repository and see if anybody of any of those bugs has a matching stack trace. Now you just found your problem. If you have, again, a general server crash, you want to look at your operating system error logs. If you have run into the OOM killer, it's going to be in those logs. That's going to tell you what's going to happen. You can deal with it then. You want to collect any and all diagnostic information you have available around the crash. So that means all of your operating system diagnostic data, your MySQL diagnostic data, any graphs you have, any raw data you have. If you generated a core file during that, this is all information that you could then read, analyze, and try to figure out what's going on. Last thing you want to do for a general server crash is you want to look around for information for things that were happening before and during the crash. That means you want to look at your external scripts, what applications were running, cron jobs, backups, any of that kind of information, because MySQL does not work in isolation. These things can come in and cause problems for us as well. Cool? Man, everybody's in the dark. I can't look at everybody and see what's going on. Gee, am I going faster? What? General server performance issues. Things you want to look at. You want to look at the MySQL error log. You want to look at your... Wow, where did that C come from? OS, data logs, not COS, data logs, data information. You want to look at your IO. You want to look at your CPU, your memory. Are you using a new architecture? You want to have relevance onto your performance. For MySQL specific, you want to gather the information of show global variables, show global status, show full process list, show engine NODB status while the problem is occurring. So that way you can see what's happening inside the MySQL server at the time. If you feel daring and adventurous, there's lots of information also in the performance schema that you can gather. Most people are not comfortable with that, but they're totally down if you want to go out and go get it. If you're having query performance issues, again, error log for the MySQL error log, if you do not have on your slow query log, you can't look to see about the query performance issues. So if you're having query performance issues, you turn on your slow query log to be able to look at them. You want to look at the explain plan for any slow queries that you have going on to be able to tune them. You're going to look at the show create table indexes that are already on the table. Maybe you could work with some of them or tweak them a little bit to work better with your query. You want to look at your show table status so that way you can look at how big the actual table data is. How many rows do you have in it? What are you working with? What indexes are available for show indexes? That includes cardinality that's on the indexes. If you have an index on a table with one million rows, but you only have a cardinality of three, you're never going to use that index. You need to look at that information to be able to find that way you can then work with your query to tune it. If you have a query that no matter what you're doing, you're tuning it, you're creating indexes for it and it's still problematic, you can then start profiling your query, either doing the old-fashioned way of set profiling or using the performance schema and that'll give you a very distinct breakdown of where things are happening within the query itself so that you can then troubleshoot further to try and speed things up. Man, I'm going fast here. Have I lost anybody yet? Am I going too fast? Okay. Excellent. You have table corruption in MySQL in the NODB. So, and you may or may not be able to start up your MySQL server due to NODB corruption. Generally speaking, if you have NODB corruption and your system crashes, if it can start back up, NODB has already fixed itself during the recovery phase. If you haven't been able to bring it back up and that's when you start looking at things like forced recovery and stuff. However, to be able to find out what's going on with the corruption for NODB, you want to look at your error log again so that we can see if there's any error messages provided in the MySQL error log. You want to look at your stack trace to see if it's maybe associated with a bug already for that. So, looking for the known issues. Is it a data dictionary error? In which case, you'll have an error message in your MySQL error log and will probably have a link telling you where to go to start troubleshooting that data dictionary error. If you have to go and rebuild because of this corruption, again, you'll be doing the forced recovery. If you're doing forced recovery, you want to start on one each time to bring the server back up because each step along the way is much stricter and will return you less information for you to build a backup off of it. This is why, as a database administrator, you should always have backups because crap happens, things blow up, and it's your job if you can't get it back on. Make a backup. Go to a backup if you have to have it. And when you make a backup, when you get it out there, a backup is not a backup until you've proven it can be recovered from. So every backup you have, you should at least be testing it to make sure you can actually recover from it. Otherwise, it's the same thing as not having a backup. Just saying. If you have table corruption in MyISAM, the only place that will tell you you have table corruption with MyISAM is in the MySQL error log and it'll be marked as crashed. That's when you're going to have to do the whole if your server is down, you're going to use MyISAM check and only if your server is down do you use MyISAM check because MyISAM check will automatically corrupt your tables if your server is up. Well, unless you lock them and that's just a that's difficult. If the server is up, you might as well just use the MySQL check utility which utility you use depending upon whether or not your server is up or down. Regardless of whichever you use if you have a corrupted MyISAM table you want to do a backup a physical copy of the backup of the MyISAM files because it's I won't say it's common, but it's not uncommon for you to lose data when you check and repair MyISAM table. So you want to have those copies just in case you go back and try to figure out what's going on with that. Again, this is the part where backups are required they're not an option you need to make sure you have them there just in case of these kind of things. Okay, this is my last one this is for replication just out of curiosity who here actually uses replication I can't see a little bit Okay, a fairly decent amount of if you have replication issues whatever they are you want to take a look at your error log if your replication has stopped your error log will tell you where to restart it you want to look at show master status that will tell you where what the master is located and what it's doing already you're going to look at the MyCNF files or MyINFI files because that is the only location you will find any kind of replication filtering noted and if you have filtering going on that may be part of your problem for whatever reason you want to look at your global variables because that's going to tell you how you have your replication set up and the big one to look at though is show slave status show slave status will give you the error that's going on as well as things like where to start your replication again after you do get things fixed and things of that nature simple tip if you are using replication and your configuration file has you skipping any kind of errors on it for whatever reason more than likely your slave is out of date it's out of sync I'm telling you right now you should never ever skip a replication error because then you might as well consider your slave out of sync with your master which just feats the whole purpose of having replication so that's all I have for my TLDR does anyone have any questions see I did that fast man I caught them off balance and unready questions so so most of these strategies are great except that you are operating on RDS MySQL which who I work for does not work on RDS we can recommend we can just give you best efforts but how much this translates to RDS I don't know because I don't know some of the absolutely the MySQL commands absolutely works as long as they are supported in RDS no no no not over the commands so certain things with respect to let's say show master status and the performance example that you mentioned with Postgres similar kinds of problems exist as far as MySQL is concerned I would assume that everything that I gave you would work in RDS it would be a my assumption that everything that I gave you would work in RDS except maybe the problems of finding the OS logs and the OS monitoring and stuff like that that would be outside of my purview but everything else I have given you should work with RDS hypothetically as far as I know any other questions oh come on this many people and I don't have a single additional there we go Hi so my question is 5.6 version master and slave both the query which goes through very well in the master box does not go through on the slave box it just hangs what could be the reason how could we figure it out I have seen the logs I have seen everything nothing just kind of hangs my first assumption would be the most common cause of that is either locking another query that's running on the system is blocking it or if you are using row based replication we have a known bug out there that if you are using NODB with no primary keys if you have a query on the NODB table it has the potential to run up to 10 times longer on the slave because of a known issue hit me up later on the table and I will go into it in detail with you if you want and explain exactly why that's happening hi can table optimize corrupt a table one more time can optimizing a table actually corrupt it technically no generally speaking with my isam if I remember correctly with my isam and optimize all it does is it removes the fragmentation and it makes it smaller if you are talking about an NODB table if I remember correctly NODB table optimize automatically becomes a forced rebuild of the table depending upon what version you are on it will be a either an inline ddl or it's going to be a full copy so it may feel like it's corrupting it it's just taking a while because it's rebuilding the table I've never yet off of my experience heard of anybody having a corrupt table after running optimize if it wasn't corrupted before it may find a corruption that pre-existed in the table that you didn't know about it but it doesn't corrupt it afterwards okay any other questions? am I right on time? dang I'm good thank you very much everybody have a good day now