 From Oracle again, to speak about MySQL performance schema. Yes, OK. OK, my name is Mayank. I work in MySQL server development team, and I work in performance schema development specifically. And today we'll talk about MySQL performance schema, a great insight of running MySQL server. That's what we call it. And this is not working now. How? Do you want to visit? Then let's check this again. Surely. Do you have another USB slot? Yes, here. Oh, yeah. It was like this? Yes. No. Can it be the last page again? Wait, wait, wait. I know why it is not working. No, it should work. Right? Yes. Right. OK, the safe harbor statement. So whatever I'm presenting over here is for informational purpose. Right. So this is that agenda of today's talk. I'll talk about why do we need performance schema and what is the design looks like. And then I'll talk about the instruments and instrumentations, statistics tables which we have in performance schema. And then I'll take few use cases to explain how we can use performance schema to get runtime statistics of MySQL server. And then I'll give you a glimpse of what we have in MySQL 5.7, which is the current GA version, and 8.0 DMRs, which is development milestone releases. Right. How many of you have used performance schema? OK, good. Right, so how many of you are using MySQL? Everybody. So there are a lot of users of MySQL. So there might be a end user who might see my session is stuck at this particular time. There might be a MySQL source code developer like me who's working on the source code. And he might have a question, why my particular session is stuck? Why my thread is not going forward? There might be an application developer who might be thinking why my application is running so slow and so on. So to answer all these questions, performance schema comes into picture. So if you ask me to define performance schema, I'll say performance schema is an insight of running MySQL server or what is happening behind the scene when the server is running. We introduced performance schema in MySQL 5.5. We introduced a new pluggable storage engine for that. That is performance schema. We introduced new database. That is performance underscore schema. So all the tables in which we collect the statistics are part of this database. The statistics table which we have in performance schema have hard coded DDL. That means user is not allowed to modify the structure of these tables. The data which we store in performance schema, we should not say store. The data which is there in these tables is not persistent data. So it is the runtime statistics of server. When the server is gone, the data is gone. And we have a SQL user interface for that. So if at all at any time you want to get the statistics of running server, you can fire some queries to MySQL server and then you'll get the data out of it. So let's understand how does performance schema looks like from a bird-eye view. So there are two parts of performance schema. One is statistics collection. And then other is statistics reporting. So statistics collection is the part in which a running MySQL server collects the statistics. And we keep them statistics in performance schema buffers. Statistics reporting is the part in which a user wants to see the statistics of running MySQL server. So in MySQL server source code, we introduced a lot of instrumentation points. When the control comes to these points, the control is transferred to performance schema module and performance schema collects the runtime statistics and store the statistics into performance schema and tunnel buffers, right? Now suppose there is a user who wants to see the some statistical information of running MySQL server. He can fire simple queries on performance schema tables and this query will be passed and executed like other queries. The data from these buffers will be fetched and displayed to the user in performance schema tables. So this is the bird-eye view of performance schema design. Let's talk about instruments and instrumentation. So when MySQL server is running, so there are a lot of activities which are happening behind the scene. So to identify those activities uniquely, we have given them name. So these names are the instruments, right? So this is a, so when in a name if you move left to right, you move from more generic to more specific thing and these parts are separated by forward slash. I'll give an example to explain. Let's say, so these are the two instruments or the two activities which we monitor. The first one is weight slash IO slash file slash MySQL slash log. So what does it say? It says this activity is a weight activity. This is a weight, this is an IO weight activity and that IO is happening on a file and which file it is? It is MyASM log file. So this is, so it says in performance schema, we have a mechanism to monitor any IO happening on MyASM log file. Similarly, we have this instrument, statement slash SQL slash select. So it says in performance schema, we have capability to monitor or to instrument the select statements. So like this we have in, when we did 5.7 GA, we had 1000 plus instruments. That means we are monitoring 1000 plus activities when the server is running and all those activities or the name of all these activities are stored in performance schema setup instruments table. So if you look into the stable, you will see all the activities with performance schema monitors. How does this table look like? Let's have a look. So this is the setup instruments table. It has three columns name, enabled and time. The first one is the name of the activity which is instrument. The second column which is enabled which says whether performance schema is going to collect the statistics of this activity or not. And the third one says, if at all user wants to collect the timing information of this activity. What does it mean? It means, so let's say for statement slash SQL slash select. The enabled is years and timed is years. It says, it means that performance schema is supposed to collect, is supposed to collect the statistics of select statement as well as collect the timing information. But if for the second instrument, create table, enabled is years and timed is equal to no. So user is saying, collect the statistics of create table but I'm not interested in timing information. And the third one is create database. So no and no for enabled in time. So that means don't collect the statistics of this activity at all. So using this table, user can tell performance schema what to do and the best part is these are configurable at runtime. So when the server is running, at that time you can decide for what all activity you want to monitor and for what all activity you want to have statistical information. Now the statistics table, let's have a look what are the tables which we have in performance schema. So the first category of tables is setup tables. So setup tables is a table using which user can tell performance schema what to do. Like setup instruments table, the one which we saw in the previous slide. Setup actors tables. Using this table, user can tell performance schema for which user we want to collect the statistics for. The object, setup objects table, user can specifically say for what all objects you want to collect the statistics for. For what all tables you want to collect the statistics for. If at all any IO is happening on the table, give me the statistics. Similarly, consumers, so all those statistics are collected are consumed by the consumers. So user can specifically say for what all consumers you want to turn on when the statistics are being collected. And timer, so timer is a simple table in which you can specify the unit of time. It's picoseconds or microseconds you want to collect the statistics for. Then we have event tables. So when there are a lot of sessions connected to MySQL server, they execute different transactions and the transactions execute different statements. Statements, execution goes through different stages like parsing, optimization. And every stage goes through different weights events. So suppose there is a stage which is opening a table. It might go through a weight event if at all somebody else has locked the table. So that is a weight event. An idle event, so if at all the connection is established to the server, but nothing is being done for that connection. It's an idle connection. That also we monitor. Then we have instances table. So mutex instances, read write lock instances. So whatever mutexes we have in MySQL server source code, all are listed here. Like similarly file sockets and conditional. Then we have locks tables. We have a metadata locks tables. So if at all there is a transaction who is going to modify the metadata of a table, it is supposed to take the metadata lock that we instrument here. Then we have locks on the tables. We have connection tables, attribute of connections when client is connecting to the user, what are the attributes it has provided, the type of connections. Then we have replication summary tables which we deduced in 5.7. So earlier when you want to get the statistics of replication, you used to do this show slave status. So all those information, that is very clumsy output, a very big output. So all that output has been moved to performance scheme into replication summary tables. I guess in group replication also has exposed few tables, right, Hemant? Right. And then we have system variables and status variables. All the things which you do when you say show variables and show status, all those information is there in performance scheme of tables now. And there are other miscellaneous tables, like thread tables, what are the threads on the server running as of now. And apart from these tables, the very useful tables which we have are the summary tables. So summary tables give you the summary of events. So like all the transactions coming from a user or all the statements happening on the server as of now, all the insert statements happening on the server now. So summary of events which are transactions, statements, stages, weights, and summary of memory, memory utilization by different users, different modules of server itself, file IO, table IO, table locks, sockets, connections. And the best part is the summary is categorized based on different factors which are summary by global. So overall, what is happening on the server? Doesn't matter from which thread it is coming from. By thread for a specific session, by user, if at all a user has connected three sessions, give me the summary of those three sessions, by host, by account, which is a combination of user and host, and then by digest. So digest is the statements which are similar kind of statement of similar structure. Now let's take few use cases to understand how we can use performance schema. Before we get into the use cases, let's have a look what information we can get from performance schema. So let's say I have a running MySQL server, and I executed this command, update performance schema setup instruments, enable is equal to yes, and time is equal to yes. What does it mean, this command? This command is saying, this command is telling to performance schema to collect statistics for all the activities what are happening on the server, and collect the timing information as well for all those activities. So I have enabled everything. Let's say I have two sessions connected to the server, session one is thread add 24, and the session two is thread add 25. From session one, I did three insert statements into table T1, and then I did a select. Session two, I did two insert statements, and then did a commit, that's all. So these are all the activities which have happened on the server as of now. Keep in mind, I did three inserts from session one, and I did two inserts from session two. And the last statement I executed from session one is select star, and the last statement I executed from session two is commit. Now let's see what we can get from performance schema for these activities which have happened on the server now. So in performance schema, we have a table, event statements current, which give me the statistics of last statement executed on any thread, or the current statement being executed on any thread. So if I execute this, if I fetch the information from this table, I see that thread add 24 has executed a select statement, and the text for the select statement was select star from test.t1, and this how many rows it has sent, it did a full select scan, full table scan, and this is the time taken for the select statement. Similarly, for thread add 25, the last statement executed was a commit. This is a time taken for this commit, right? So with this table, I got the information of the last statement executed. Now let's see I want to see the summary of the statement executed by any thread. So we have this event statement summary by thread by event name. So this information is giving me the consolidator summary of events by a specific thread. So if I see that thread add 24, there are three insert statement executed, and total time taken for these three insert statement is this much, thread add 25, two insert statement executed, and time taken for these two insert statement is this much, right? So I got the statistics of last statement, I got the statistics of all the statement by thread. Now can I get the statistics, what is happening overall on the server? Doesn't matter from which thread it is coming from. So we have this event statement summary global by event name. So if I query this table, I get on the server till now five insert statement have been executed, and time taken for this five insert statement is this much, and there are five commit statement executed, and time taken for this commit statement is this much, right? So this is a glimpse what we can get from performance schema tables. Now let's take few use cases. So let's say I have a MySQL server running in which lot of queries are being executed, and there are few queries which are taking lot of time. Can I find out information about those queries which are taking lot of time? Let's see. So I execute this query on event statement history. So event statement currents give me the current statement, event statement history gives me the historical statement execution for any thread. So I say, okay, give me all the information from event statement history, wherever the timer weight is greater than x. So suppose x is my threshold, if anything goes beyond x is a long-running query for me. So with this query I found out, okay, so there is a thread ID 25 which is executing some select statement, and the text for the select statement is select blah blah blah, some statement it is executing, and this is the guy who is taking time more than x. So this is the long-running query for me. So I identified it, but there are some more information we get from performance schema. It says select scan is equal to one. So it is doing a pool table scan without using an index. So if at all I could introduce an index in this table and use that index in my query, I might get better performance. Let's talk about another use case. So let's say I have a server running on which lot of queries are executing, there are lot of queries which are giving errors. So can I get something about it? So this is a table event statement summary by digest. As I said digest is the, is the categorization of statement which are similar type. We consolidated their statistics and put them into one row. So this table gives me information. So there is a table, there are statements which are of type create temporary table some statement. They have been executed one, seven, two, five time, and out of those one, seven, two, five time, they were given error for one, seven, two, five time. It has never been executed successfully ever on that server, right? And we can get this information as well. When was the first time this statement executed, and when was the last time this statement is executed on the server? Let's talk about use case three. So let's say there is a MySQL source code developer who's developing some source code. I'm writing a new module. And suddenly I realize my thread T1 is not going further. It is stuck somewhere. So what can I get from performance schema about the scenario? So as I said, we have event statements current which give me the current statement being executed on the server. The event weights currents give me the current weight event being executed on the server for a thread. So I execute this query, select star from event weights currents where thread ID is T1. So T1 is my thread ID, which is not going further. So if T1 is not going further, that means T1 is waiting for something. So from this query I found out, okay, T1 is waiting for mutex A. Or T1 is waiting for a mutex A and not getting it. So that means somebody else is holding mutex A. Now I go to the mutex instances table. I told you, writes all the server mutexes are listed in this table, mutex instances table. So from this table I found out there's a thread T2 which is holding mutex A. And that is the reason T1 is not getting mutex A. Now thread T2 is holding mutex A and not releasing it. That means thread T2 is also waiting for something. So I again go back to the same query, event weights current and this time I search for theory T2 and so on. So I can keep on digging like this and find out who is the corporate guy who is holding the mutex and not releasing it. Okay, what is an event hierarchy? So events are transactions, statements, stages and weights. So when a session connects to MySQL server, it execute different transactions. A transaction, when it executed, it execute different statements. When a statement executes, it goes to different stages and every stage will go through different weights events. So this is the hierarchy of event. In performance schema, we maintain this hierarchy. I'll tell you how and I'll tell you how it could be used as well. Right, so four type of events, transactions, statements, stages. Each transaction has different statements. Each statement have different stages. Each stage have different weights events. So all these events are different activities which are happening on the server. So to uniquely identify these activities, we have given them an ID which is called event ID. So all these are distinct event ID. Now this one more thing which is called nesting event ID. So what is a nesting event ID? So the nesting event ID for all the weights events which are coming from a specific stage events will point to the event ID of that stage. Similarly, all the nesting event ID of all the stages which are coming from a particular statement will have event ID of that statement. And similarly, all the statement for a transaction. So if at all on server, we found out, okay, this weight is not going, I mean, there is having this weight event is happening and I don't know who is causing this weight event. So I can take that weight event ID and find out from which transaction this weight event is coming from. And I can find out the culprit, right? In performance schema, we have instrumentation for IO operations which is happening on file or table or socket. We have locking information, mutexes, read-write locks, table locks, metadata locks. Events we have, as I said, transactions, statements, stages, weights, and idle event. Then we have instrumentation for store program. So you can get statistics of store procedures or prepared statement, yeah, prepared statement is also there. Functions, triggers, events, and the categorization based on user host and account and instrumentation of memory usage and many more. Now let's have a look. What do we have in 5.7 GA and what do we have in 8.0 DMRs? In 5.7 GA, we introduced lots of new instrumentation points. As I said, it started from 5.5. So 5.6, we did some improvement. 5.7, we did a lot of improvements. So in 5.7, we introduced this user variables, transactions, instrumentation, memory usage, store programs, prepared statements, MDLs, types of connection and all those. The user variables, system variables, status as well. Apart from these new instrumentations, we have also improved the existing one. The digest size, which was not configurable earlier, it is configurable now based on the request from community and the customers as well. And the history, as I said, the historical statement, statistics. Earlier if you say history is on, then it is on for all the sessions. Now you can specifically say, I want history to be on for this particular session only. And the memory location is scalable. So earlier what used to happen performance, when we are looking at all the memories up front and then collect the statistics there. Now for few of the buffers, the memory location is scalable. So as and when load increases, it locates some memory. The footprint of memory footprint of performance can be also reduced and we have introduced, we have improved the performance as well in 5.7 a lot. And apart from this, we introduce one new thing, which is called Syschema. Anyone of you heard Syschema before? Right. So Syschema is a collection of procedures, functions and views. What are these views? These are the day-to-day use cases, which DBAs face. So we have found out the use cases, took the feedback from community and wrote some procedure functions. What do they do? They collect the data from performance schema. They collect the data from information schema. And based on the queries, which we have in these procedures or functions, it gives the very good formatted output to the user. So performance schema is huge. There are a lot of tables, there are a lot of statistics. With the usage of Syschema, you can specifically find out what you're looking for. So this has been very, I mean we have got very good feedback of the Syschema from the community, as well as from the customers as well. In 5.0, 8.0 DMRs, which are development milestone releases, we have introduced indexes on PFS tables. So suppose there are a lot of data in performance schema tables, we were not using indexes before. Now we are using indexes to get the query execution faster. And we have instrumentation for errors as well. So if at all error is coming, how many times this error has come on the server, or how many times this error is coming from the specific user or specific session, we have that. And table plugin. So till now, till 5.7, whatever tables are there in performance schema, they are there. So if you want to add one new table in performance schema, you have to study the performance schema code, understand how it is maintained and how it is done, then only you can add your new table. But there was a request from community they wanted to add their own table in performance schema without understanding the code of performance schema. So we introduced new table plugin. Using this plugin, you can write your own table, you can push in your own data, and that will be visible in performance schema tables. There is an example also, we have shipped with this table plugins feature. So you can go through this example and then write your own plugins. Histogram. So in histogram feature, we have categorized statements based on the latency, how many statements are falling into this latency bucket and how many statements are falling into this latency bucket. So we have categorization based on the latency as well. So you can say 95% of the statements are running under this particular time. And then apart from that, 5% are going into this time bucket. And then we have setup instrument metadata as well. So I should write there are only three columns, activity, enabled, and time. But those are the very simple ones I showed you. Slash, statement, slash, SQL slash select, but there are some complex statements as well. So to give the user what they are all about, we have introduced the metadata information as well in 8.0. Right, so in 8.0 we have, as of now, before the GA, we have 102 tables in performance schema and there are more than 1200 activities which we are monitoring as of now. Yeah, that's all. And so these are the differences. So MySQL documentation is the biggest one where you can get all the information about performance schema. Then we have blocks by Markl, the guy who started performance schema project in MySQL 5.5. Marklith, the guy who works with performance schema. The Marklith is the guy who implemented this Sys schema, the wrapper. And this is my blog. I also keep on writing about performance schema. And then we used to write into these blocks. We have very good blocks over here, but after that we are consolidating all our blocks into this MySQL server team.com. So as our new features comes, we write about those blocks and push them into this MySQL server team.com. That's all. Thank you. Any questions? Any questions? How does this scale with the data in MySQL? How does it scale with the data of MySQL? So as I said, the performance schema allocates buffers initially. So it doesn't matter how many queries are coming. It will collect the status into these buffers only. So if you have a lot of data coming into MySQL server, you collect the statistics into these buffers only. There's no memory being, the more data is coming. So the bottleneck becomes the configuration of the system rather than the systems? Yes. So if you say, OK, I am allocating this much buffer to the performance schema, you can allocate the buffers. So I want to collect the statistics only for 100 users. But then 100 and first user is coming, then the statistic for that user will be lost. So we have status for lost counters as well. So there you can see, OK, now I'm losing the statistics for six or seven users. So I need to increase the buffer size. Have you done any benchmark versus impact? OK, so yes. So there was the main concern in 5.5 when we introduced this thing. So there are a lot of guys asking, it will have impact on the performance and all those things. So we did benchmark about that. So I settled 1,200 plus activities. So we don't enable all those activities by default. We enable some selected activities by default. If you don't configure performance, there are few activities which are enabled by default. And with those default activity, we did the benchmark. The performance impact was less than 5%. But obviously if you enable everything, then there will be significant impact. But if you are very careful and you know what to do, what not to do, then you can disable everything, then cherry pick what you want to enable. There will be negligible performance impact. Questions that we have? You guys go and use this statement histogram. This is very useful. Thank you so much, sir. Thank you. Thank you.