 Hi there. In the previous talk, you could find out how you can use profiler-based optimizations to make better MySQL binaries. I'm going to talk how you can use profilers in production to find out what's going on with your MySQL production server. Mostly about that. So, I was working with MySQL for almost 11 years already in all kinds of companies that provides public support. The experience I've got from there is interesting. In MySQL I.B. Sanendorkal, I was a member of bug verification team. So, as a result, I had to deal with MySQL mostly as a problem, not as a solution. After that, I tend to think and speak about MySQL in terms of bugs. So, you will see a lot of bug numbers. Whenever bug number or anything is highlighted on these slides, it's a reference. You can click on the PDF slides or whatever and end up looking at the details. At Percona, where I worked for three years and a half, Percona is a solution-oriented company. So, there I try to think about MySQL as something that can be used with all its limitations and bugs to solve a specific problem. So, that helped a lot. Finally, now in MariaDB, I have to deal with a wider range of customers' kinds of MySQL and use cases than ever before. So, that's what's important. And also, starting from Percona, I am now writing in public about MySQL and I am speaking about MySQL. Of all the conferences I like for the most than anyone because it's a aspect of content. Let's hope I will keep it that way as well. So, I would like to speak about profiles, specifically about profiles on Linux and only three of them. Historically, the oldest probably and widely applied to MySQL in the past or profile. Just to thank the developers and people who used it in the past. No talk about profiling MySQL can end up not mentioning Domas Mituzas and his poor man's profiler. So, I will mention it even though you may consider it unsafe to use it in production. And in some cases, you may hit a problem that I will quickly mention and how to resolve the problem, how to notice it. And the last but not the least, recently, if you would like to profile MySQL, you have to use Perf. And you've seen an example how it's used to get better binaries. I will show a couple examples how you can use it to find out what's really going on in life. We will also get a slide about the performance impact of using profiles in production because you need to know it's safe. Your measurement, your performance monitor and should not be a performance problem by itself. It's not easy, but with Perf it's kind of possible. What I will not speak about is all kinds of good profiles for memory, for calls and everything that requires rebuilding of MySQL. Because not everybody like Dropbox can effort to spend days building proper binaries for their specific load. And it's a hard task. We will use in most cases just vanilla binaries from some vendors or something you build once for yourself and then use for years. So these rules are GPROF, these rules are running on the wall grind and all the other things. There are also great operating system tools to get insights what's going on in any process like system tab or historically that's now also available in Linux Dtrace. I'm just not good at them. So that's why they are out of consideration. What's also out of consideration is performance schema. And that might be an interesting question why. I will tell. First of all in production you might hit cases when performance storage engine, that's very of a high interest. Effort restarting your MySQL in many cases just to get more monitoring insights. Okay, next time we restart it you will put it there, but it may be not there at the moment. Another thing, unfortunately performance schema is getting more dynamic and more detailed and it's lightweight instrumentation comparing to anything else with each release. But unfortunately still there are instruments that must be enabled at startup then they can be disabled later to be so that you can enable them selectively when you need them. You may find out that the instruments you need for your specific load was not enabled initially and then you still need a restart. Another problem with performance schema it stores data in memory in cycle buffers. The size of these buffers is predetermined at startup at least in the J versions of MySQL. So if it was too small you will find out but it's too late you again have to restart. So performance schema data may not be there. Another problem that some of the plugins or third-party storage engines may just do not have instrumentation. Mark Leith will speak about how to do it properly. But not everybody does it. Again, MyRox does not have this instrumentation. I am not sure if any of vendors like Prokona or MariaDB who is working on implementations of it will put the performance schema there at all. So you should be ready for that. And surely a lot of people also speaking about performance schema all the time. What's interesting for me in this regard is that unfortunately based on some cases I've studied for this talk and I had to deal with over last year I see that performance schema may just not provide enough details. I will use a very stupid example from one of the oldest bug where I spent time making MySQL run faster on IX and PowerPC many years ago but this is a great example. So when this simple select benchmark 500 million times from something works like for dozens of seconds everything in this MySQL binaries will be slow. Replication will be slow because single thread performance matters. So it's an interesting bug by itself. But what would you see for example if you would try to use performance schema to kind of optimize or check how you better optimize binary work. All you will see is at the right. You will see that you spend 13 seconds plus something in executing and that's all you will get. Even in a very simple profile call this output is from Perf. You can find out specific functions because profiler instrument things per function in any case where the time is spent. No wonder it was spent in multiplication and in benchmark function call. But the share of time spent it may differs with rebuilding and in more complex cases you get an insight that you just cannot get from performance schema in any way. So it doesn't provide enough information. Now let's get back to real production issue. Some of you probably were present at the pre-forced them day yesterday and Sveta Smirnov told that one of the most strange stages of query execution is called statistics. So the query may spend a lot of time in statistics stage and you may not know why. I've got it in real life like for three months I am working on a complicated case where this is one of the problems we see. Simple query it's a single line select by primary key from one table that is not accessed by anything but this selects at the moment so there is no kind of read view looking but in a high concurrent environment. Sometimes spend seconds many seconds and even minutes in statistics stage and what you can get from performance schema even from very advanced query that well there is a statistic stage. I do not see weights behind that even though I expected them. It may be for many reasons. One of them that when you really query the performance schema data you query on a live system. It may be few seconds later and all the weights even if some of them were there they may be gone already from the buffers. They were not lost but they may be gone. Or maybe the query is wrong. You would never find out. So but what you would find out you would find out a lot more with profiles. This slide with the links is just a history of applying oprofile to MySQL. Oprofile is available since 2001 or so. It's based initially based on sampling by timer. So it adds overhead but still there were a lot of problems in MySQL found out like all that scalability issues of table cache that were resolved mostly since 5.5 with introduction of metadata logs and all that kind of stuff. So just check all the details. There are many more like probably 20 bucks that I picked up for my blog post that is already published and you can find it and get more insights. So good people are well-known engineers like those who now work for Facebook historically like 10 years ago were already spending time profiling MySQL and proving where is the problem with oprofile at that moment. So oprofile is actually easy to use but you should know there are two kinds of oprofile. Oprofile till version 1.0 old one based on a separate diamonds that is started and pulling events by timer collecting them and then you can process and then creators of oprofile after Perf appeared and successfully used like CPU instrumentations were able to get a lightweight profiling. So they kind of rewritten the oprofile provided different binaries. So version starting from 1.0 just work differently. The op control binary is gone, the diamond is gone and OpF binary is introduced. So if you are going to use it and you will have to use it depending on the kernel because all that access to processor counters and everything is available only starting from kernel 2.6 something. So if you are using maybe for whatever reason Red Hat Enterprise Linux 5 you are out of options but oprofile. That's why probably you still should know it. So you need it, you start it. If you do not need and probably for MySQL you do not need to profile kernel calls you put NoVM Linux there. So the data is collected when the data collection is not needed anymore. You stop it then you dump it and then you run the report. These set of actions was so widely used that Percona embedded it into PT stock tool as a separate option collected profile. It kind of works. So I was able to use it not at the very first case. So it's not always clear why the data are not collected. There may be several problems. The diamond may be not there, the version may be different, the command may not work so it's not very worthy about it and not all set of options produce or profile results but you can get it. So what changed? Now if you will take for example Fedora 25 you will see a new oprofile there. You install the same package but commands do not work. So PT stock will not produce your profiles at all because op control is gone. So currently in these versions oprofile runs more or less like opf. So you start data collection with operef then you stop it by sending signal or if it's in foreground by pressing control c and op report command is still the same. What I should say after trying operef operef theoretically allows to run under operef single binary and you can do it not as a super user but if you will just try to pass the process ID of mysql to operef and try to get the results on a multi-threading system you will get the results only for one sorry on multiple core systems. You will get the results from only one core. I'm not sure how it's picked up like the one where the operef itself works but it's just one core so it misses things. That's why I suggest to use system-wide profiling with operef as well and it's now embedded as one of the pull requests to ptstock so that ptstock continues to use oprofile when available like confedura 25. Let's see if it's implemented in this way or that but it's just a warning for you. So if you have multiple CPUs or multiple cores please use operef with system-wide profile otherwise you may come up with wrong conclusions. So this is how it works this is what you get the same stupid example as before with a benchmark and multiplication. What's interesting here you get a number of samples so how many times this function call was noted and the kind of overhead percentage of time spent in all these calls. Sometimes these numbers do not correlate so you may have a very hard call. So another thing poor man's profiler. So domas invented it kind of around 2007 or 2008 I don't remember and what's great about it in a lot of cases you do not need the power of a real profiler. All you need to know is where your threads are now hanging and you can get it with ptpmp in original poor man profile really easily. Behind the scenes it's historically it used gdb so you trace your binary and after running it may be left in traced state. So please take care and after running ptpmp check if your myscale is running if it's hanging send continue a signal to it to release it if you are going to do it in production otherwise you have services stop and another couple of things there are ways to make ptpmp less impacting the performance there is a quick stack that might be used instead of gdb and there are some command line options for gdb that can be changed in the code. The bug is not fixed by prokona it just hangs around. So in ptpmp you may find out that for example with that statistic state one of the cases that I reported actually we are waiting sleeping and trying to enter the inodbqe why because we were running with a limited concurrency so we had many long running threads that were all busy and our statistic state while jumping into the statistics trying to access index to find out how many rows it will get it will get one row it's stupid but this is how myscale works it's just like diving into the index trying to find out and it hangs there because it cannot get the results from inodbqe. If you do the same select from other myisome table it works instantly but inodb it hangs and already poor man's profiler allows to do that. Is it always useful no if we apply it to case of benchmark we will see okay we had one occasion of this long stack thread okay we see how things are called but what does it give us we do not know how much time was spent in each specific call that's it and we do not see anything special like we are hanging on some mutex lock so it's a kind of useful. What's useful in this regard is perf so you will see some success. Tourists here have collected bugs just some of them like half of them are still active mostly reported by one single guy Mark Agam from Facebook you can find a lot of more of good usage examples of perf so basically perf works with two commands perf record you needed to record the perf data and perf then you stop it and then you analyze the data with perf report so here it is different command lines are explained in my blog probably you need to send significant signal probably you also need to always use minus a system wide profiling so that you will see where time is spent from the operating system level as well and otherwise you again may get wrong profiling data as with or perf just from one core with or perf and with perf you can get not only flat outputs but you can get a hierarchy three of calls with detailed information where the time is spent and everything speaking about which profiler to use this day I'll tell you just just the result use perf if you can if you are on older red hat you have to use a profile why that so because of the performance impact and I will show it on my last slide so in case of applying perf to that statistics problem we will we see that actually we really try to get statistics by diving into the index and we cannot do it because my skill is busy at in adb at in adb level so it's like a stupid to create a plan that is simple use primary key we spend time trying to get information and we do select the data to find out how to select the data sorry about that but well the problem in that case was limited concurrency but that's not the only possible reason of hanging in statistic state so I'm still like work work in progress on that the last but not the least one example of when you have to use our profilers because there is just no performance key I try to profile my work specific load that is questionable and everything there are there is a query historically from older in adb bug that can be executed differently a different join order and one of the queries avoiding file sort that is normally a desirable thing and but another does file sort and it turned out that curious in my case in my specific case that is explained in details in the blog post that the query with file sort actually a bit faster and scales better with the number of threads like I tried on four cores up to 64 threads and the gap is visible and it's interesting what happens at four to eight threads because the gap is growing specifically in my rocks so I try to apply it to apply profiler to that and found out that in case of file sort avoided we have like mutics logs where notable time is spent so somehow at this concurrency we have a problem with concurrent access to some internal structures of my rocks so this is kind of information you can get from profiles also you see that different time was spent for specific my rocks and rocks db functions that may also give you some hint about performance impact like one of the final slides so I run that same stupid benchmark test under different kind of profiles and what you see here ptpm p it gives like maybe 60 to 70 drop down in performance so the huge impact or perf of all kinds even recent or perf it's from fedora 25 still gives notable performance impact and if you will go here you will see that perf without recording call trees gives a very minimal like several percent of overhead comparable to what you can usually get from performance schema but with all system wide profile you get insights not only into my scale code but in all the kinds of operating system code in my case for example I found out that most of the time is spent on hdd that is encrypted on encryption functions on some lot so you cannot get this inside other than profiles thank you very much my time is gone finally because I was from bugs team I still tell you please report bugs whenever you see them so here's a photo of my wife found the praying mantis and she reported you should do the same for my scale Maria db and questions I am around go for it click on the links find out there is a lot of stuff behind that thank you very much