 Hello, my name is Sveta Smirnova and today we will be talking about my school performance for DevOps. At first I want to introduce myself. I am my school support engineer with more than 10 years of experience. I am after a book, my school troubleshooting, I have a few hobby projects on GitHub and I am frequent speaking. The idea of this topic is raised by various communication between people who use MySQL but who are not DBA and who use it either from different time and different people. It could be application developers, it could be system administrators and today now people who actually wear many hats like who called DevOps. Another thing which raised me for this topic once I was talking about interviewing someone for support team, for working with MySQL and actually we in support we know a few things, a few very clear things about which hardware should MySQL use, which hardware options, which MySQL options by itself are safe and are good for performance but as expected people who are experts in other areas do not know about this secret knowledge and now I want to show you these options, these configurations which will improve your MySQL performance. But before talking about this secret knowledge I need to tell a little bit about how MySQL works. So what is MySQL? MySQL is database server with 25 years of history. It has a popular forks. This is a 100 compatible fork with enterprise and troubleshooting and performance features. And MariaDB server which started as a fork but now it's going to be an independent product with similar features like optimizer or replication but which are designed internally designed differently. MySQL has replication support from the beginning and this is important and I believe this is one thing which made it so popular. And replication is such a setup so that one service could be a source of changes, it can accept changes and other servers replicate these changes so they could be used for backup, for read scale and for many various purposes. And also it allows MySQL to be high available server from the very beginning. To understand how MySQL performance needs to understand that MySQL architecture is layered. It contains its part. And like in case when your application connects to MySQL server it uses some connectors. It could be MySQL command line client, it could be PHP, Python, Ruby, Java, C, program. It could be any program which uses OEDBC interface. Once that program connects to MySQL it goes to connection pool where authentication happens and some caches like if connection thread could be taken from the cache or could be created. Then your query goes to SQL interface which reads it, parser parses it into the tree which understandable by optimizer. And optimizer sends like decides how query should be executed. And sends this information to storage engine. Here it's also like could be used global cache and buffers. They could be global or engine specific. And if data could be taken from there it would be taken from there. And storage engines they talk with hardware. So practically with file system and they can also use memory and CPU to resolve the request. Where MySQL is installed in 2020. It could be started by hardware. It could be installed on cloud. And also it could be managed by container orchestration systems such as Kubernetes. For MySQL there are two Kubernetes operators. This is Percona Kubernetes operator for Percona X3db cluster. Percona X3db cluster this is a synchronous replication solution. So that you can have multiple nodes and you can write to them. And all writes will have the same data upon commit. And also exist MySQL operator. Maintained by Oracle MySQL team. So watch Alphabet's query performed. So it's when I usually explain how database administers database professionals. People who use database all the time during their working days. I'm telling them the first thing which you need to tune is query. Because query that's actually decides it's actually what you ask database for. So like your request you may request million of rows. You may run this query write it's effective manner or maybe not effective manner. So this all affects your application. But today since we are speaking about people who manage everything and probably sometimes do not have access to the application itself. We will start from hardware. Because hardware also makes a very good sense for MySQL. MySQL runs fast and good for hardware. And it's actually can run on even on Rapsberry API. But it could not handle on a small hardware. It could not handle a large amount of requests. Because if you want to MySQL handle more concurrent connections, process more data it's better to run it's even better hardware properly configured. And one more thing which needs to be discussed is overall MySQL configuration about watch because even if it's run on good hardware if it's configured not properly it could just like use small part of hardware possibilities and it would be not effective. So you need to understand how to configure MySQL properly. Okay, so watch is hardware. First how are we going to tune? For bare hardware you have operating system options. It's usually Linux kernel options. For clouds there is no much things you can do. It's actually it depends from again which contract do you have. Like for EC2 you can do Amazon EC2 you can do a lot. And for Amazon Aurora you can just use administrator console and maybe just buy additional options. For Percona Kubernetes operator for PXC it's a loss to limit resources. So like it decides and tell how much resources to use this. They could be defined as cr.yaml file and you can limit memory how much use and maximum usage and storage and CPU. A more short tune which hardware makes sense for MySQL to run fast. This is memory which has caches and directly access data. This is disk which practically stores the data. CPU which allows MySQL to support high concurrent environment. And this is network which is used for sending data between client and server. And first we will talk about memory because memory is a very important thing for MySQL performance. Memory configuration that's MySQL doesn't work when it's swapping. So you need to disable swapping. It's actually for modern kernels you need to set it to one in case swap is used only then it's critically needed by the kernel itself and not for other processes. MySQL supports NUMA but NUMA should be NUMA interleaved should be enabled in BIOS. MySQL for NDB also there is option. NDB NUMA interleaved this should be also enabled and it works differently on MySQL server from Oracle and Percona server. On MySQL server it just enables NUMA interleaved for MySQL and for Percona server it fills memory with zeros as a start up. That implies a longer start up but it makes also for a work of MySQL Percona server for MySQL faster. More memory for MySQL is better. That's because MySQL has optimizations which allows to use memory without with minimal disk access and then flash changes to the disk. And memory access of course is faster than disk access, therefore it's better to have more memory. Practically frequently accessed data should be in memory. It's not always possible because that depends completely from your application but if it's possible that's a good thing to have. How MySQL uses memory? MySQL supports three kinds of buffers and global buffers they are allocated up to the limit and freed at showdown. Important buffers is NDB buffer pool. This is NDB buffer pool. This is pool where NDB stores updates. Actually it could store all active data set and could operate all operations from buffer pool if you have enough memory for this. And then data which needs to be saved on disk it would be saved in background which allows very fast operations. Performance schema tables. These are diagnostic tables. These are just... Here for example they do not affect performance of MySQL by itself unless they grow too high and just use memory which could be used for some different things. But again it's data memory for such buffers it's just allocated and never freed until you showdown the server. Session buffers they are associated with MySQL connections. So then connection starts they are allocated and they are freed and then connection ends. This include for example like I put here example it's connection and result buffers. So it's practically two times by net buffer length it's a small value by default but they could grow up to a max alloyed packet. So that's again it's maybe not very thin. And also MySQL uses memory for operation specific buffer which help operations to perform fast. Good example could be joint buffer size it's for each table pair tables pair and join like if you are joining two tables this allocated only once per query if you are joining ten tables it allocated nine times per query so again like you may use it for improving query performance but you need to be aware if you set it globally that it can be grow very high. Temporary table size for memory temporary tables it's again it's for each temporary table. All this data which you can access in memory then finally will be saved to disk and second important hardware is disk. Disk MySQL faster would be better so use modern disk like I don't think spinning disk is an option an option today unless this is something when you can allow slow write since maybe only for some special purpose. Disk should support parallel write so that's you can gain from multi-threaded execution. It also should use battery-backed cache in order to don't lose data in case of power failure. How MySQL uses disk? It stores tables data that's obvious it also stores log files these include binary log file these are log files which log all updates and use it for the replication on the source server it includes relay log files which are using on replica server to receive updates this includes storage engine logs like in ADB mantains red log file which use it for crash recovery it's also stored on the disk and it also contains some diagnostic like error log, like audit, general slope, query log and other log files which you can use for troubleshooting it's also some, if in memory temporary tables exceeds the size of the maximum temporary table it's flushed to disk and again that's MySQL uses disk for this purpose you may put all these things on different disk to explore more parallelization and give more capacity of your hardware CPU uses it to handle parallel execution it's configuration it should be set to noob or deadline it's not for default which is for user system so for better performance CPU governor also needs to be set to performance and not like power safe or things like this more course is better it's more course is important because actually number of core files limits how much parallel connections could be executed how MySQL uses CPU it's for user connection it creates one thread per connections but they are mostly sleeping and CPU uses it only for active frets that means frets which are executing some useful job at the moment but ground work by storage engine uses frets independently and to demonstrate you how it works you may see the difference on this picture in this case on the top there are connection frets which are perceived by the server then they serve to the storage engine and the storage engine executes its smaller number of frets if active frets running frets is more than storage engine frets they have to wait in its queue and there are algorithms which allows to tune how the queueing will work so what happens with frets in MySQL with active frets in case if there are less than CPU cores they are executed simultaneously if not they wait in a queue even if they executed it's if disk supports parallel write if it supports it's write happens in parallel if not it's again frets has to wait in a queue and finally last part of hardware which I want to discuss is network network is network should be is for MySQL is important part and should be as fast as possible and you put need to pay attention for these free items this is speed of the line and pay also attention for RTT which is pin command prints for you about bandwidth that means that's like how much data you can send and receive at the same time and also stability and MySQL uses TCPIP protocol but in case if network is not stable it has to resubmit packages and to resubmit packages it's take time and it increases time for your clients to receive answers on the internet connection it's possible to run MySQL so what will happen? clients can't work they will just connect they will just receive data not as fast as on local network as soon as replica may delay it's actually I brought here will delay but actually may delay because well in case if it's source server has active right set so Synchronous replica will sort on the delay and Synchronous cluster like Percon X3DB cluster and also solution available since MySQL 5.7 is in the DB cluster they just would not be functional because with default options which is Synchronous cluster Synchronous cluster is one node receives and updates and it's a commit time before returning result to client it distributes updates to all other nodes receives answers and then only return result to client if it's done on internet connection on remote data centers that's very slow defaults are not supposed for date so you will all this cluster will be all the time partitioned but it's possible to change defaults but in this case your application will have very slow response time with this adjusted configuration so this is big node to use this I unfortunately see this in production environment but only watch I can say people who do it just stop doing it How MySQL uses network? It's uses for communication between server and client and actually that's everything serving client it's a regular client which like application and it's MySQL server it's a replication connection IO thread and traffic between the Synchronous node it also communication as server and client just in case of replication it plays different roles and while we spoke about how which hardware MySQL should use and how it should use so how to configure MySQL to get use of this nice hardware so watch can be configured it could be configured server options and component options like if you remember MySQL has loaded architecture so components include storage engines and main is in ADB that's default storage engines, multi-purpose it also has could be plugins like audit plugin for example it also could be server parts which are not plugable like optimizer or binary login on the source server scope of configuration it's could be global this is parameters necessary for all server processes such as location of data directory like shared buffers like in ADB buffer pool size and like this session is control connection specific parameters and most of operation specific options I also have session scope here it's working core code you may use it now and once my talk is done I believe my slides will be available on the conference website and certainly will be available on my slideshare and you can click this link and this is working lift and I recommend to you to bookmark this all MySQL options in one page so you can search for one which is interesting for you how to change MySQL variables you can do it online by running like set command if you omit any thing in square brackets it would set session variable if you use global it will use global variable and all new connections if it's also variable association variables all new connections will use new value and existing session will use old value but if it's like variable which affect all the session it will be changed immediately and science version 8.0 you may also use set persist command which not only change global variable but also store it in automatically generated configuration file so on the restart it will be used same new value you may also use command line option when start MySQL server and in the configuration file which is either in the default location that's different from different systems again link to the user manual or specified by option defaults file and in format first it's group name and then variable with value for px.cona.extraDB cluster operator configuration you can change it either in CR.yaml file just search for specifics for px.extraDB cluster and change configuration or you can use config map in this case create custom configuration file then create config map and restart px.extraDB cluster if you are using px.extraDB cluster note it can have to configure for you in adb buffer pool size and max connections but all other options you need to configure yourself and of course it will configure like data derelocation and things like this but other performance options they are just default of MySQL at least for now who can change variables dynamically global variables and few sessions variables could be used changing the user with special privilege called super in version 8.0 super was split into multiple privileges and now it needs to be system variables admin but also in 8.0 you can use super session options can change anybody 8.0 has restricted privileges there are not many like binary log format binary logging and something like this this could require privileges system variables admin and session variables admin not restricted options options could be changed by anybody it not that there is no limit and it's easy to change like if user doesn't use it wisely they could change it to very high value that you should be very careful when grant people access to MySQL server when buffers then are speaking about memory buffers it's also important to understand why when they are allocated it's force which change control behavior of whole server they allocated a server start up and can start with low values and then grow connection options for every connection when connection options and operation specific for every operation when needed but they could be allocated more than once like if you have joined buffers size it could be allocated multiple times and kill actually maybe even your server so which are important options first is like most popular storage engine it's in ADB in ADB buffer pool size that we talk a little bit about it ideally it should hold active data set so in ADB will perform all work in memory that's in ADB log file size should hold changes like it's approximate it's actually approximation it could be larger but at least it should hold changes for an hour because how it works then MySQL when in ADB like receives updates it changes data if it's possible if in ADB buffer pool size and writes updates sequentially to in ADB log file size that's very fast operation and once it has it's less busy it ground free it takes data from buffer pool and flashes it to tables in case if in ADB log file size file is small sometimes they are written in circle of fashion and then that's limit is reached in ADB starts aggressive flashing because there is no space to write to log file and in this case all your transactions are stopped and if in ADB log file is too small it's this situation could happen faster if it's set proper size in ADB has enough time to flash all changes to the disk and here which is a good way to find out if your log file size is proper size it use Percona product called Percona monitoring head management it's only open source monitoring system for MySQL and I am afraid for all databases it also supports PostgreSQL and MongoDB and for log usage you may see on this picture red line is total size of the log and orange bars is how much data was written during the hour and you see orange bars are much higher than red line so this is not good option and in ADB log file is set to low on this picture that's in ADB log file set to proper size to good size because all orange bars are below red line so this is very good visual way to find out if your log file size is set properly in ADB thread concurrency is either zero it's default and if you notice situation then your CPU usage is high but performance is low it may be because in ADB creates more threads than number of CPU cores and Linux kernel has to schedule job between this CPU cores in this case it makes sense to limit up to number of CPU cores this is actually how much parallel threads in ADB can run concurrently in ADB our capacity is with which speed in ADB will write to disk its default is too small for nowadays because it's set for spinning disk so you can set it up to number of IOPS your disk can handle it's not exact number because it also depends on how much data you write and some other factors should be tested but that's rule of thumb it's like this very good explanation actually is official manual and another option is in ADB flash method recommended option is most cases is o-direct that means in ADB writes in flash time it writes to disk and not store it in cache it's safest option most times it's fastest option also but test on your file system because some file systems do not work good like SNSUN they do not work good with o-direct so test it and this is why it's not default flash method another option which can affect performance it's synchronization options but this option can compromise durability so use it like better to change it on read-only replica or like if you can lose data and I will show you how and what's the change first it's default value about in ADB flash log at TRX commit is 1 that means what it says it's flushes read the log files it's where flushes at each transaction commit so that's for full ACID compliance and you would not lose any transactions in case of power failure or server operation system restart or also not graceful restart of MySQL in case if it's set to 2 logs read at each commit and flush it per seconds but you need to remember so it's one second it's UMSC that's very short period of time but MySQL can handle up to 1 million inserts per seconds so that's in source it could be a transaction in case of after commit is used so this is not like you when you set it to 2 you need to understand what you are doing but it's safe with synchronous clusters because in this case when you are doing commit commit also another nodes and so that's like it's unlikely all three machines will die when before flush happens 0 it's also less strict with 2 it's logs are reading and flush at once per second and what is important is once per second is not guaranteed for 0 and 2 so it's sometimes it could be faster flashing but sometimes it could be delayed and so you will lose not just 1 million inserts but more inserts another option is syncing log that's synchronization of binary log changes default is 1 that's again it's a transaction commit binary log get synced to file system in this case it's no transaction lost but it may imply performance penalty like up to 10% of speed for high concurrent environments so that's to relax it you may use option 0 when synchronization handled by system but this option you have no control how many transactions you can lose or you may see set it to N that's after N binary log group commits it's groups of transactions which could be flushed together which were executed together in this case of power of operating system crash in this case we are not flushed this not flushed transaction could be lost from the binary log you may only rely on engine but it would be really hard to restart so you need to understand which number you can set here but you may set it to some if you can lose some number of transaction you may set this to positive value these options they are free almost free that's table open cache that's a number of open tables of all frids it's actually file handle for a table so that's increase them when you are syncing the connection in process these are waiting for opening table value of opening tables larger than open tables since it's a file descriptor it's just 4 kilobytes sorry it's not 4 kilobytes it's even less it doesn't use as much resources and resources and you may see it's very high table definition cache is size of the cache for table definitions you may increase then value of global status variable opening table definitions is larger than open table definitions and it should be less than table open cache but it's safe to set it high and you may also increase operating system open files limit if needed so MySQL can handle so many open files and finally query tuning on the server side so like I told you that query is hard of your application because you communicate with database using queries even if you are using NoSQL and MySQL supports NoSQL interface but they are still queries they are not SQL queries but still queries data which you request matters so if you request one million rows do not expect the query would work as fast as one if you request one row even if this one million rows goes into temporary table so query execution workflow is a query send it goes to connection pool optimizer creates a tree which is sustainable by optimizer and optimizer created a plan how it would be executed and storage engine to execute the query you may not be able to change the flow query it's because it selects for example just selects a lot of data and this is data you need or maybe it badly written like left join and stuff inner join like retrieving large results and discarding it's not effective SQL at least for particular MySQL version but if you do not have access to the application it cannot rewrite it you have to rewrite too many parts of the application you still have to deal with it but we still can't improve performance and let's see what we can do and first thing which to do is indexes to show you the difference which indexes are let's see a query which selects name from users which selects all the users and select name from the users where ID equal to we actually need only this record but this query selects same rows in the table after index added it selects just one query so what is the index? index exactly like in address book sorted order of unique values in the table which points to if table has not unique index it points to tables data MySQL supports different types of indexes but what is important you need to understand that well kind of in the DB supports B3 we are able to speak about B3 and they left part ordered it so how to create an index so use just create index command for single column you can also create on multiple column and you can also use alter table comma alter table add index you can specify index name and you can write so when MySQL can use indexes it can use indexes to results very conditions so like if column is equal to value and column is could be in set of values and command is similar like like is value and on the right side could be anything but since MySQL used B3 or B3 kind of compatible indexes this query would not use index because it shows on the left part and indexes are sorted from the left to right same in case if multiple multiple column indexes MySQL will use indexes where left part equal value or right part like this it would use indexes with left part equal value or right part part equal something else but would not use indexes for the last row because it has to resolve this value first for joints it also will use indexes for this expression then this is same as this query it's just written differently from previous example you may understand that index could be used for group buy it would again you can use indexes for group buy can use multiple columns indexes if group buy left part first but would not use if group it by right part first and would not group buy in different indexes same applies for order buy it could be sorted by an index by left part first and right part second but would not use indexes in other cases it's in 8.0 in 8.5.7 or before order buy left part dischanging it could not be used by 8.0 supports indexes which could be created with dischanging k-word so in this case for this query it also will use indexes for expressions it will use index if built in deterministic expression are used it's same as which is returned same value for same argument like floor use for non deterministic expression like such as run it would not be used for storage functions in UDFs but my school supports generated columns and if you need to use them and still want your query perform fast you can use generated columns so how and when add indexes first identify which queries do not indexes use indexes for example use performance schema status variable it also available information and for this variable name of this status variable it shows some problem with the query I would not explain in details what all this means but it shows problem of the query and these queries are candidates for optimization handlers is actually shows which actual work done for storage engine and handler read R&D next if it's big that means query doesn't use indexes so that's probably also candidate for optimization performance schema when statements will show you statements which which are not effective it's again like for status variable I selected here all values which has shows problem and one important fund if row examine is much greater this row send and you're like requesting small amount of rows this is symptom of query optimization issue in slow query log you may also use this metrics you may also use in standard slow query log in Percona server NPT query digest which could show problematic queries PMM also you can use like no index used and you may use this column and search for for badly optimized queries To add indexes analyze if adding an index will help it's better to test it on the test server and if query improves just add it then add the index but note that add index is expensive operations so test first and on production use PT online schema change which allows to add index without interrupting your other other operations another option is to use histogram statistics which available science version 8.0 it's collected and used by the optimizer it could be examined in information schema it indexes for example they are collected and used by storage engine it affects execution plan but not access it can change like this query select from A join B can change to select from B join A but cannot change like some like to use different indexes for example it's more details it's a detailed webinar on histogram statistics and how to use them and if you are curious you may check and you may want to watch to show you quick difference between indexes so indexes like we have table with different values number of data and index this when optimizer discusses communicates with storage engine which index needs to be used and used value cardinality and it's cardinality is the same for all rows for all values no matter how how much data each value has so this could be for example has multiple indexes it could be used like query can decide to use wrong index for example for histograms it shows like for the same data it shows difference between each bucket so this is exactly like what is the difference between data in this table and could be make more wise more detailed decision so create a histogram when data distribution is far from uniform it's query accesses to more tables and indexes cannot be used another option is on the server size it's an optimizer configuration I would not explain what it means but you may experiment, you may search box database and you may search which you may find out which optimization options affect your query so how to work with it it's turn on and off particular optimization it's tested it could be not helpful because because some optimization could be not helpful for queries specially tuned for previous versions then turn off try set optimizer switch turn off select, explain select and compare which with default value and and again if it's help, implement and queries it's better to use this syntax because default value works for most of the queries and this tuning needed only for queries which are not good for optimization and final part is also increase size for optimizer temporary objects such as temporary tables you may also change default temporary storage engine to use one which better works for your queries but usually it's not needed buffers for query execution that's joint buffer size it's for joint conditions not using indexes read buffer size is for question indexes for order buy for question results of nested queries and pull concert into partitions this one is for multi-range read optimization this is for select into out file or down file search session and this is for sort operation order buy and group buy but be careful change this only at the session level because if changing it globally it may lead to increase memory usage which could be used more wisely for different operations to conclude that's hardware more memory is better disk use modern disk faster is better for more course is better for concurrency for network use high speed possible these are hot options for NADP and for server it's again slides will be available after the conference and you can download and you can use them and these are query performance options which you can explore for for tuning your queries while you cannot actually change the SQL more information there are first two links are links to the webinar they are available in Percono sites about how to troubleshoot it's not we are talking about how to get best of of your hardware and configuration but webinars about how to troubleshoot if you are seeing some issues or some problems with that MySQL query tuning for the ops it's a similar talk but about only query tuning it has more details on how to tune queries it's again for not MySQL professionals Percono monitoring head management is linked to the graphical software which I use it for illustration and final link is Percono Kubernetes operators these are my contact information I will also upload slides to slide shareer and we'll announce everything on Twitter. Thank you.