 Hello everyone, you already introduced me, my name is Mayank working in Oracle MySQL team and today I will be talking about MySQL performance schema. Am I audible to the last pinch? Yes. Okay, fine. So this is a safe harbor statement which states whatever I am presenting over here is for informational purpose, you guys should not make any purchase decision based on that. Right, so this is the agenda for today's talk. I will talk about need and design of performance schema. I will start from why do we need performance schema at all. Then I will talk about the instruments and instrumentation points which we have in performance schema. I will talk about the statistics tables where we store the statistical information and then I will take few use cases to explain how we can use performance schema to diagnose real world problems. And then we have released MySQL 5.7 which was G8 in 2015. I will talk about what new we have introduced here and then currently we are working on MySQL 8.0. So we have one BMR released. I will talk about what we have there as well as performance schema is concerned. Okay, so let's talk about need. So the question comes why do we need performance schema at all. So in MySQL, there are several MySQL users. So there could be an end user who might think why my session is stuck. There could be an application developer who might think why my application is running so slow. And a DBA might think why this particular table is so hot and so on. So to answer all these questions, performance schema comes into picture. So if you ask me to define performance schema, I would say performance schema is a mechanism to give user an insight of what is happening behind the scene when MySQL server is running. So we introduced performance schema in MySQL 5.5. We introduced a new storage engine which is called performance schema. And a new database was introduced that is performance underscore schema. And we introduced a lot of tables in this database. And these are tables where we store all the statistical information of running MySQL server. The data which we store in these tables is not persistent. So it's in-memory data. So once the server is stopped, the data is gone. And the best part is we provided an SQL interface to the statistical data. So you can fire your SQL queries to see the data which is collected by the performance schema. Before I go further, how many of you heard about performance schema? Apart from MySQL users. I mean Oracle. So let me explain you how this performance schema works in real world. So there are two portion of performance schema. There is one part where we collect the statistics. There's another part where we report the statistics. So the first part where we collect the statistics looks like this. So we have a MySQL server running. In the MySQL server source code, we have different code points which we call books or instrumentation points. When the server is running and the control comes to these points, the control is transferred to performance schema. And we collect the runtime statistics data and then store them into performance schema internal buffers. So that's the birds eye view of how the statistics collection is happening. Now if suppose a user wants to see some statistical information, he can simply fire queries on performance schema tables. Then this query will be passed and executed. And the data will be fetched from this internal buffers and presented to the user in performance schema tables. So that's the overall design. Let's talk about the instruments and instrumentation. So what are the instruments? So instruments are the name which we give to activities which we monitor in MySQL server execution. So in MySQL server executes, there are multiple activities which happens behind the scene. So to uniquely identify those activities, we have given them some name. That is called instrument. And how these instruments looks like? They are the tree-like structure. Different parts are separated by forward slash. And as we move from left to right, we go from generic to more specific things. Let me take an example to explain. So these are the two instruments. And the first one is weight slash IO slash file slash MySQL slash log. So first one says this is a weight instrument. Now there could be multiple kind of weight instruments. So what kind of weight it is? It's an IO weight instrument. Now there could be multiple IO. It could be IO on table, IO on file. So this is an IO on file. And which file? This is a MySQL log file. So this is an instrument. Now another instrument is statement slash SQL slash select. So more general thing is statement. So statement could be an SQL statement, could be command statement. So it's an SQL statement. Now SQL statement could be insert, delete, update, select. So this is a select statement. So this is uniquely identifying an activity which is a select activity. In MySQL 5.7 we have 1000 plus instruments. That is we are instrumenting 1000 plus activities when MySQL server runs. And all those activities or all the instruments are listed into a table which is called setup instruments. Now let's have a look at this table setup instrument. This is how this table looks like. It has three columns. The first one states the name of the instruments. The second one says whether this instrument is enabled or not. And the third one says time. So if this instrument is enabled, then only performance level collects statistics for this instrument. And if it is timed, then only it will collect the timing information of this instrument. So for example, the very first instrument is statement slash SQL slash select. So I say enabled is equal to years and time is equal to years. So when MySQL server runs, performance level collects the statistics for this statement and collect the timing information as well. But for the second instruments which is create table, I said enabled is equal to years but time is equal to no. So I want to see the statistics of create table, but I don't want to see the timing information. And similarly for create DB, I don't want anything to be collected for this instrument. And the best part is these things are configured at runtime. So you don't have to restart your server just to modify your statistics collection information. Can we update with this table? Yes, you can. Can you stop it? Or enable it or disable it? Yes, yes, you can do that. So as soon as you say this is disabled, performance level will stop collecting information for that instrument. Right. So now let's talk about the statistics table which we have in performance schema. So there are many tables. So these are the tables using which you can tune performance schema, what to collect, how to collect. And then you can also, I mean these are the tables where we store the data of our statistics. The first one is set of tables. Like we have set of instruments table, I showed in the previous slide. You can see which instruments should be enabled or disabled. We have set of actors table. In this table you can specifically say for which user you want to have statistics to be collected. Now we have set of objects table. So you can specifically say for what all objects you want to have the statistics information to be collected. Like you can specify the table itself. For this table only I want to have the statistics. Or for this stored procedure I want to have the statistics to be collected. Similarly we have consumers table, we have timers tables. I mean you can specify you want information in picoseconds or microseconds or nanoseconds. Then we have events table. So when MySQL server runs and a query is executed. There are multiple events which happens. So first of all the transaction starts. So this query belongs to a transaction. Then this query is a statement which executes. When this statement executes it goes to different stages. Like opening table, closing table, sending data, etc. So these are the stages. And each and every stage goes to different rates even. It is waiting for mutex or waiting for the generate operation. And then if at all it is not doing anything it is an idle event. So we monitor all these events in performance scheme. Then we have instances tables. So we have multiple mutexes, read write logs, files, sockets and condition instances. Condition variables in MySQL server source code. We monitor that for example. Then we have lock table information. We have information about metadata logs. We have information about the table handles which are locked. Or unlocked and the queries perform. Then we have tables for connection attributes as well. The type of connection is the socket connection of the TCP application. And this thing we introduced which is replication summary table. So earlier you used to do show slave status to get the replication information. Now all this information has been ported here in replication summary table. Categorized logically. Then we have, you used to do the show variables, show status command. All this information has been ported in performance schema. In system variables table, status variables table. And then there are miscellaneous tables, other tables as well. So these are the tables which we have. And the biggest thing which we have in performance schema that is the summary tables. So these are the individual events. Now we have summary tables which summarizes this information and give us consolidated output. So we have summary tables for events. Like summary for statements, stages, weights. We have summary for memory allocation or deallocation. We have summary for file IO, table IO, etc. Sockets and connection. And the good thing is this summary is also categorized based on different factors. Like we have summary by global overall on the system. Summary by thread. For a specific thread how many statements have been executed. Summary by a user or by a host. Or the combination of user and host that is account. And summary by digest. That is for statements only. Let's see what do we get from performance schema. So let's take an example. So as I said we have the setup instruments table where we store all the instruments or activities which we monitor. Let's say I said update performance schema setup instruments. Set enable is equal to yes. And time is equal to yes. So I want to monitor all the activities. And I want timing is equal to all the activities. Let's say I did that. Then I connected 2 sessions to running Marsco server. Session 1 let's say is 3.24 and 3.25 session 2. And from session 1 I did this. I started transaction 3 times. Inserted this is very simple use case. Inserted 3 values and then I did a select start. That's all I did from session 1. From session 2 I started transaction did 2 insertion. That's all I did from session 2. Before I go to the next slide just keep in mind there are 3 insert done here. Session 1 and 2 insert done here. And the last statement executed from here is select start and from there it's a commit. So this is what I know we have done. Let's see performance schema is able to tell me these things. So in performance schema we have a table called event statement current. So this table give me the information about the last statement executed by any thread. So in my example there are 2 threads 24 and 25. The last statement executed by 24 is a select statement. And the last statement executed by 25 is a commit statement. This is the time it was taken to execute this statement. And this is the time taken to execute this statement. Okay so number of rows send and if at all there is a table scan. Or if at all there is an index use or not. Okay so I got the information about the current statement. Now how what if I want to know how many statements have been executed by thread. Then comes the summary table comes into the picture. So we have the summary table which summarizes execution of statement based on thread. So here I can see there is thread ID 24 for the insert statement the count is 3. So that means thread ID 24 has executed 3 insert statement. Total time for this insert statement is this much. Similarly for thread ID 25 there are 2 insert statement. And time taken is this much. So I got the recent statement. I got the statement by thread. Now how about if I want to know overall on the system. Doesn't matter from which thread they are coming from. How many statements are executed and what are their characteristics. So I have this summary global by event name. So here I see over on this system there are 5 insert statement executed. And 5 commit statement executed. And the time taken for insert statement is this much. And the commit statement is this much. So I can see overall on the system how many statements are being executed. Right. Now let's take the use case. So suppose I have a use case in which there are multiple queries running or running or MySQL server. And there are few queries who are taking lot of time to complete. And I want to know what are those queries. So can I do this with performance schema? Let's see. So in performance schema we have a table event statement history. This table consists the historical status list of statement executed. So I say give me all those statements who are taking time written X. Let's say X is my threshold. And I get this information. So it tells me so there is a thread ID 25 who is doing a select statement. And that is this is the statement X which is taking time more than X. Okay so I found out the query which is performing slow. Anything else can I get from performance schema? I get this information. So select scan is equal to 1. So for this query a full table scan has been executed. No index use is equal to 1. Index has not been used for this particular query. So if at all I can introduce an index for this query the performance will be better. Right. Let's take the use case 2. So several statements are running on MySQL server and few of them are giving errors. So I have this table event statement summary by digest. Which give me information of the statements categorized based on digest. Digest is a normalised form of a statement. So if I query this table I get this information. So it says for this digest text that is a normalised form of a statement. It has been executed 1725 times. And out of those 1725 times 1725 times it has given error. So it has never executed successfully. Right. And I also get the information when was the first time this statement was executed. And when was the last time it was executed. So I need to look into this statement what is happening. Right. Let's take a use case 3. So I have, I am a MySQL developer or source code developer. And there is multi threaded environment I am working upon. And my session is stuck. So I want to know what is happening behind the scene. I go to performance schema. So as I had event statements current table which was telling me information about the current statement executed or being executed. We have events weight currents table as well which gives me information about the current weight being executed or is executed on the server by any thread. So I say, let's say my thread is T1. I say give me information about everything from the weights current table where thread is equal to T1. And I get this information. Thread T1 is waiting for a mutex which is mutex A. Right. Now thread T1 is waiting for mutex A and it is not getting it. The only thing is it has been acquired by somebody else. Now we go to the mutex instances table where object instance begin is equal to mutex A. And I get from this column locked by thread ID this mutex A is locked by thread ID T2. So T2 is holding mutex A and not releasing it. That means T2 is also waiting for something else. I again go to the event weights current table and do a select star with thread ID is equal to T2 and so on. So I can keep on going and digging and finally I will find out where exactly the problem relies. What is the culprit, who is the culprit then? Okay. This is very good thing which we have in performance scheme that we have maintained with event hierarchy. So a session could have multiple transactions running. A transaction could have multiple statements inside it. And I said a statement goes through different stages and each and every stage goes through different weights. So this hierarchy is maintained. You can pick a particular weight event and you can correlate it to the transaction it is coming from. How do we maintain it? Let's see that. So this is what I said. Right. So there are multiple transactions on the server, multiple statements, multiple stage events, weights event. So uniquely identify these events. We have given them an event ID. So using an event ID we can uniquely identify a weight event or a stage event or statement of transaction. Now how do we correlate this weight event to the stage event? So as I said, a stage consists of multiple weights event. So all the weights events who are coming from a specific stage event would have nesting event ID of that stage event. Similarly all the stage events who are coming from a particular statement would have nesting event ID as an event ID of the statement. And similarly all the statements which are coming from a specific transaction would have nesting event ID of the event ID of the transaction. So with this hierarchy I can start from the weight event and go back to the transaction and see from which transaction this weight event is coming from. Right. So the instruments which we have in 5.7. We have a question. How do you assign an event ID to a statement? This is by internally. The user don't have to do anything. It is done internally by performance. See unique ID. These are the instruments which are available in 5.7. We have instruments for IO operation like file IO, table IO or the socket IO. Then we have instrumenting for locking. We have mutexes, nutrient locks, table locks, meter locks. We monitor them. Then different events as I said. Projections, statements, stages, weights. And I delivered as well. And stored program, function triggers, events and procedures. And then we also have statistical information categorized by user, post, account, rate and global. And we also monitor the memory allocation and deal of the performance tumor which is happening in MySQL server. And many more. In 5.7 which we released in 2015, we added a lot of new instruments. We added information about the user variables, connections, memory usage, stored program, and prepared statements, MDLs, types of connection, the different stages of inodity execution, replication summary tables and global and session variables and status. Apart from that, we introduced a lot of enhancements as well. The digest size which was fixed in 5.6 is configurable in 5.7. We can specifically say to collect a historical statement for a particular thread. So there is no need to collect a historical statement for all the threads. And then the memory location and performance schema which was fixed before in 5.6 is scalable now. So based on the load, it allocates memory for performance tumor buffers. And the memory footprint, we have also reduced a lot of improvements we have internally in performance tumor. There is memory like performance schema is less and many more. So in 5.7 GA, we have 87 tables in performance tumor. So these are the tables which are storing the statistics of MySQL server. And we have 1000 plus instruments. So as I said, 1000 plus activities which you are monitoring in performance tumor. And as in the previous session you heard, CIS schema is also reduced in 5.7. So CIS schema is a collection of procedures, function and views and what are those. So they are the common day-to-day use scenarios which DVAs are seeing or debugging. What does CIS schema do? It takes information from performance schema. It takes information from information schema tables and do some formatting, create a very good format and present it to the user. We have received very good feedback about this component which we introduced. Earlier it was known as VS Helper in 5.6. So you guys should go back and try this schema. And in MySQL 8.0, we have released one DMR last year. In performance schema, we introduced indexes on PFS tables. So earlier there was no indexes, it was going to table scan. Now we have introduced indexes, so specific data was, specific rules it can fetch. And then we also introduced instrumentation for several errors. So earlier there was no instrumentation for errors. Now we have reduced that as well. And there are many more things which we are working upon now. So if you say you will see those things. And that's all. Any questions? One. If there are any chance, for example, in 5.7 there is no indexes on the performance schema. If I have to run a query to check the performance of the production database, sometimes it creates a road on the beginning. Is there any chance I can propagate those performance schema in a replicated environment, on a slave actually, and then run and check by running a query on the slave of the production. So is this possible? On slave you have to turn on performance schema. That will tell the performance of the slave. On the slave, yeah, on the slave only. But I want to check actually the performance of the master from the slave. I don't think so, it cannot. I don't look. My problem is that I don't want to create a road on the production environment. I mean the performance schema is specific to an instance. It will give you information about the instance. There is a different engine. How much load does it put on the engine? Just to read the performance schema tables. You don't really put any load again. Engine like this. So we talked about this pluggable engine in MySQL. So performance schema is actually a separate engine. So it is not really putting that much load on the engine. If you don't read the data, the performance should not be affected that much. It's only the other part of MySQL. So that is the thing which we talked about yesterday. My observation is that in MySQL, any bad queries running from any session, other sessions will get infected. This is the problem. But those queries would not be performance schema, I guess. In my case, that was normally not from the performance schema. But sometimes it was from the performance schema. So much data, other users will get the impact. First of all, performance schema queries does not fetch data from the hard disk. That's all in-memory data. So there's no higher operation involved. So unless it's sending a lot of data over the network, and that's going to be short, then I'll never see you. If you're in the performance schema, that is the problem. It's your network pipeline. And we don't recommend using SAMs and NASs over the network. We are not using actually SAMs and NASs. We are only 140 RDS-based customers. RDS has its own limitation. This is AWS service on Power off MySQL. But you need the data pipeline. There's somewhere, but yeah, of course, we have to use different restriction, of course. Naturally, that could be a problem. And I guess there was a question in the morning about can we do profiling of queries in MySQL? So we can do it using performance schema. I mean, we can see for a specific query how much time this query is spending in a specific stage in which we can get the performance schema from the stage instrumentation. So the thing which you had, one query is very long and it is impacting other queries. So you can run this stage instrumentation on other queries and see where they're actually waiting for. Which particular stage they are waiting a lot in to get waiting most of the time. And so I have to add, in the SUS schema there's actually a procedure that can do that profiling and output into the dot format that they produce by the graph. So create a graph of the profiling by using the hierarchy so you can see running this statement, going through these stages The function in the SUS schema. The procedure in the SUS schema. Of course, if there's a lot of instruments that are getting triggered, that graph can be so huge that it's not possible to really see anything but in smaller cases it can be really useful. Okay, one more question. Is there any chance I can, mark my query with some flag? So I can track that particular query in performance schema or system schema. For example, this query is coming from specifically part of my source code for some translation. So I can track what is the performance of that particular query while curing the performance schema. Is this anything which is possible? So you can specifically say for this particular user, for this particular host, I want to get the statistics for specific cases and for those things you can specifically see the statistics of each and every query. So let's say I say from Maya, which is coming from xyz.com, instrument everything from this and talk about all other users. So the performance schema will not consider all other users and collect the statistics for this particular query. It's nothing which you already mentioned there for one particular session or one particular thing you can but my exact question is to track particular query, actually. For example, I flag a query with some name. For example, just like we are introducing a hence in the query. I just made some comment in it. It's coming from this translation, part of my source code, my Java code and then I track that particular query. Java code? Performance schema cannot tell me the query actually. SQL's statements coming from the Java source code. But you can see the select on this table whenever a select is done on this table that query. That much you can do. Or not date on this table from this user from this machine, that much. The whole query text is also available. It is available. Can you use the line? Yeah. Answer the pattern of the main query. Just before the query, you enable the estimate just after the query is enabled. On the chat, you can see how it works. Let's go ahead. Any more questions? Okay, Rahul of course.