 My name is Alex Kuzminski. I will be talking today about my scale indexes We we were supposed to talk together with my colleague, but he had Emergency data recovery work. So yeah, this is what we do in our normal life. So sorry for that too this talk is about my scale indexes and You may wonder why another Talk about my scale indexes, but I'm I'm glad you came probably have your reasons but what I wanted to to tell here is I Wanted to explain you give you some insights about how indexes in my scale are built and and how my scale uses them to produce result of queries and The point is if you know that if you Can picture what's going on underneath you can efficiently write queries straight away and make my scale database more performant so I'm Alex Kuzminski. I used to work while ago as consultant in Percona as well as my colleague now I'm my scale is sorry in Dropbox and Together with my colleague. We also run twin DB that does Data recovery backups disaster recovery all that kind of work for my scale So here we go I Highly encourage you to ask questions if you If if I need to go somewhere in more details feel free to ask Although there is a Q&A in the end so It's up to you There are two big parts of this talk and the first part I will explain B plus three indexes What it is why this design was chosen for my scale in for in the DB in 1994 and In the second part of the talk, I will illustrate you on Secular table How exactly my scale executes different kind of queries so you have better understanding what's going on? Okay Yeah, many times Back in our consulting times we we've heard these stories when Application hit Limit of my scale and People were suggesting to migrate to something else But after careful look we found out that okay you don't You don't use my scale right way. So you have to tweak something and here you go Databases is Performing much better now. So Let's see how data is organized in in my scale and I will be talking about In the DB only at some level it applies to my son, but In my opinion my thumb is not relevant now. So let's just drop it The foundation of In the DB That organization is B plus three structure. This is binary tree This picture is from Wikipedia Maybe it's familiar This B plus three structure has multiple Properties features that were useful for Data organization and What was good choice when in the DB was designed Keep points here is that as any other tree it starts from the root and The data Is clustered together with key key values in in the leaf nodes of of the tree This tree is quite optimal for searches its speed is all logarithm n and The same perform the same speed for insert and delete and Update So B plus three in B plus three All data is is clustered together with keys in the leaf nodes This is one of the reasons why in the DB Requires to define primary key because primary key is the primary index is exactly The data structure where in the DB keeps the table The difference from B plus three Well in in a DB in a DB implementation of B plus three is a little bit different and one of the differences is doubly linkedly list of nodes in B plus in classic B plus B plus three These are pointers to the next page to the next leaf page In in a DB there are This list is doubly linked so in a DB can perform optimally Queries like order by ID and order by ID descending so any order will work fine because of that It's important that keys these are primary key values and Inside the tree and inside each node Keys are sorted in in the order This is important for queries like range scan and and so on and we have first question, please sir. Yeah Right Let's say you you have query like select star from table and You have order by ID Descending If you have descending order you can you can access this node and Get some value for for the result and you can You can output values in ascending order, but in case you want to do descending order Yeah You need to have Link to previous Node yes, this is this is to highlights the difference between classical B plus three and in a DB implementation Yeah, it's it's different indeed and The tree the B plus three has this property that For all leaf nodes height is is the same that means that No matter where our key is in the beginning of the table in the end of the table we We can access value with given number of searches Yes question Yes, and this is what what happens when you insert new New values new records or delete some records from the table and Yes, you have to rebalance tree because people B plus three is balanced tree but what is good about B plus three is you You don't have to do that at every insert because one one leaf node actually contains many many records many rows and You don't have to rebalance until until the page is full After it is full You you have to split the page and do do rebalance, but that's that's optimal I I'd like I'd like also to know that B plus three was was chosen for in the DB when in 1994 when In the DB started it was time when we had spinning discs and B plus three is good for for for those discs to access data on those discs because Beep B plus three structure is very shallow you have height is is very small for even for large data sets and If you if you need to read from table, it's all sequential access. This is what rotating discs are better in Oops, sorry Yeah, so we we talked about reduced rebalancing B plus three is efficient for range scans because we have all data in leaf nodes nodes are connected to each other and When you read when you do like full table scan or full index scan it's sequential reads from from the disc and And the data is sorted in sorted by the key and That From that my skill benefits as well for all kind of queries like Where you need to do sorting or where you where you do group by etc and I noticed there was a question some of the Yes, please I'm not sure I understood the question. What what do you mean? This is just few of them because If you compare B plus three with other data structures to give you another example not covered here is For some trees Insert speed Changes and the worst case and average case are different for B plus three is the same Which which means you have to you have predicted performance? Which is which is good because you you can expect you can Expect how to meet SLA for example for query response time Okay As I said B plus trees are very shallow and that means that My skill to access leaf node has to do less comparison and To to get to the leaf node In the DB has to do random random lookups inside the tree and Because the height is shallow height is small We can we have to do them less This is a formula to calculate exact height of the tree and I would highlight here that first of all height grows Slowly with not with growing number of records another thing is It's important to keep key s as small as possible because in that case we will have more keys in Intermediate node That means that our branching factor is higher and that means that the height of three is Smaller which is good because the smaller tree the less random lookups and To illustrate the previous formula from the slide I I calculated height for some some number of records and Here what happens for 1 billion records if your primary key is four bytes integer Heights is only three That means that you have huge 1 billion records table and to access Any value you have to do just three random lookups. This is very good and To compare with Longer Primary keys for example, some people use UID. It's okay, but to understand implications of that you can see that Like on 1 million table In a DB would have to do 50 percent more random lookups. It may be Critical if if they if you are IO bound and You you have to do a lot of these creeds make sense so and Now I'm slowly going to the second part and I will be explaining all Access methods in a DB uses based on on this table This is a table from circular database like reference table and It has a primary key and one secondary key by last name. What it means Yeah So when you create this table You get Actor IBD file if in a DB file per table is on Inside this IBD file you have two B plus three structures one B plus three structure is index primary and This is where you have all your data. You have key which is actor ID and All other fields of values of in in this tree Last first name last name and last update There is a secondary key as well and that means separate B plus three structure and the key in that's B plus three structure in that index is last name and Value is actor ID So when you create the table when you populate it with data in in the IBD file you have these two structures Okay, so and based on this example, I will review all um Access patterns and here I suggest a rule If you can access data And produce result well if you can produce result while you are accessing data Then it will be fast in other words if you don't do it if you don't have to do any additional like temporary tables additional sorting if you can If client require requires a Query result and you can produce that result just accessing the indexes It will be fast because in that case you just Spend the list of resources To produce the result I Hope it's not too confusing and in the next slides. I will illustrate this rule so Feel free well feel free to ask questions and don't sleep on my talk, please Okay So the simplest case point lookups We have we need to get record by its primary primary key our actor ID is 3 and that happens very Quickly because B plus 3 is fast its perform its search speed is fast And we can quickly search from from the root know from the root node to the lift node and get our record If we need to to do Range scan range query That's also optimal because see what happens in the DB finds record with ID number 3 and It doesn't have to do additional lookups It just reads all further records and that will be the result of the query We don't we don't do any extra data processing We just read records from the index and send it to the client. That's it Let's see what happens when we query the table by the secondary Index, let's say we we need to find all ellens You remember we have these Two indexes. This is our primary index. This is our secondary index We can find we can quickly find records ellen key ellen in the secondary index and the value is Primary key in the primary index. So if we need to we need to output first name last name last update and To find that we have to take actor ID Go to the primary index find this record is 118 118 and output it the same for Next record we know actor D 145 we find it and we produce it. So see what happens We quickly accessed the the leaf node. We do range read but we also do random lookups from the primary index and Sometimes that can make a huge difference because if this table is huge that will lead To a lot of random reads from the disk now, let's see how For what kind of queries we can use secondary index only so we we don't We do it like in the most optimal way One of the examples is when you do when we do select count if we need to call if we need to count ellens We don't really need primary index, right? Because we we have the data in the secondary index We read from it and we can count it That's it and This is what called Using index optimization and if you run explain against this query, you will see that it shows using index that means exactly that In order to produce the result we don't need we We just using secondary index and that's it. We don't use primary index This is what using index means and this is very good optimization because we avoid all random Lookups from the primary index Covering indexes they help to use the same optimization Let's say we have query like select first name where last name is ellen if we have our That old secondary index we would have to do this Additional primary key lookups right because we need to know first name and in the secondary index. We have only actor ID so We do little trick we created another index secondary index and we include first name in it and although we do not we do not include it in the where close we still benefit from it because We can use these first name to produce the result of the query and Again, if you run explain you will see this using index this this is Good, so you sacrifice disk space because you you need either big a index or you you need additional index but You don't do Random lookups and that's good You cannot use both indexes at the same time so My skill optimizer will will decide that okay. I will use a secondary index on the last name to access data and It will have to do random lookups to the primary index in order to get first first name for each record Index secondary index on the first name will not be used at all here. I see Yes, actually you can benefit from that index if you if you search for Kim ellen You can use this index efficiently if you have two indexes and You have the same query like where last name is Ellen and first name is Kim then my skill has to decide which index should it use it should use because again, it cannot use like two indexes at the same time for the same query so It will it will choose By index cardinality which index is more selective and for selective in more selective index it will It will use it to access the data But then it will It will filter out all all all extra records where first name is not Kim and you will see Using where in that case that means that my a scale has to do extra filtering out in order to remove all All unnecessary records Make sense Let's go so that another kind of queries that can benefit from secondary indexes Select distinct if we need to find distinct unique last names from the table we can use secondary index for that and Yeah, it's it's very easy because we we read many ellens But we just throw them away until we we find some another another last name and Because the records are sorted keys are sorted. We know that we are not going to meet Ellen any more further Make sense again using index For group by see what happens if if we have like grouping function count It's similar to previous Query, but we do not output unique List of list of unique last names, but we also count them We can also use secondary index for that because we read it we do index can and while we read we count them and It can be max here or minimum or whatever and it will still benefit from it and What I like the most about this concept this way of picturing Indexes how it works how it accesses is that you can you can understand if you if you do that if you picture Indexes in that form you can even understand complex things like index lose scan Yes, please question for for maximum. So what? No, you you will still benefit from it because see what happens Oh max. Oh Now I understood what you mean Yeah max max what If max Actor ID then yes if something else then no Yeah, but if if you have index like this and You do last name max first name it can use it, right? Because first name for any given last name first name is sorted as well So the last record will be maximum and the first record will be minimum Yeah, good good catch. Thanks Yeah, so lose losing the scan what it's all about Let's add another column rank and And Let's fill with some values. So we have actors with their ranks and now the question is Yeah, we we build this index with last name and rank and this is our primary key so the question now is let's find all Losers right so we we find Records by their last name and for any given last name we find Person with minimum rank We can use secondary index for that and this kind of termization is called lose index can because see what happens For any given for any last name we have to find value with minimum rank so Let's say a croit We because fields are sorted like by last name for and for Every last name by rank. We know that for the first record for the first record of a croit rank will be the minimum so we know that this This is our record from our result and then My scale can choose either to do range scan or Do lose index can which means it will it will do another look up for next key and That helps to to avoid large Index scans. So you just read From the secondary index this row this row and maybe some other row and That's it. So you don't scan that the the index you just pull some selected records. So this is lose index can My scale benefits from from indexes being sorted by by their keys for For queries like like this, let's say again you have this Secondary index and if you need to output All record all a croids ordered by rank you can do it because You already have the order rank is The tape the index is sorted by rank and Yeah, so you don't you don't do additional sorting. Yes, sir My SQL tries to well in a dv tries to To reduce impact what you said is right. So when when you update the rank this index has to be rebuilt and Yeah, it can shuffle indexes it can Like change the tree to reduce the impact from that in a db Uses index buffer pool Wait index index buffer. Yeah Huh? No, no, no in a db buffer pool is like whole whole thing where all are your console Yeah index ah insert buffer. Sorry. Yeah insert buffer, but they change it in 5.6 the name changed it to change buffer because the same Happens when you update records or delete records from the table secondary indexes has to be rebuilt as well. So, yeah it uses that change buffer to to reduce Random writes to to the disk because secondary indexes are Rebuilt in this change buffer and then flushed later on with like in batches It doesn't work for unique indexes though because you Change buffer doesn't work for unique indexes for unique secondary indexes because you have to ensure uniqueness of of them of The index B plus three itself it doesn't require Uniqueness keys can can repeat each other Yeah, so it's like on top of B plus three feature, right and this This day this slide can also illustrate you. Maybe you Maybe you remember the way many advices that like You have to You have to put In the back close you have to put columns with more or less sign to the most right side so let's say Close where x equal close where x equals a and y more than be will use index and if close is x more than a and B and y equals B is not gonna use index Look at the look at the at the index if we have if we had rank equals 40 and order by last name Can you use secondary index for that? No, you can't it's just doesn't make sense because indexes sorted by last name and then for Each last name it's sorted by rank Secondary indexes are question. Please Exactly exactly this is This is This is why you cannot do Like per percent value percent because If you have index on that field Keys will be sorted let's say there are strings and You miss the beginning of the string so you cannot find records you have to scan all of them Yeah, so joining tables in my scale Tables are joined in in a nested loop. So if you need to join three tables My scale takes one table for each record it finds For each record in in the first table it finds records in the second table that come that satisfy joining close and Then for each record of the result it it searches for records from the third table and To do these if these searches my scale uses Secondary indexes if it can Yeah, but This In this case like we want to we want to output all movies with Actors starring in them and we want to to order by title and I Don't I'm not sure you see that but this is the best what I could get in terms of formatting the thing is my scale decided to take Table actor join it with table film actor and then join field and then the result it will put in the temporary table and Do additional sorting because we need to order by title and Title is the column of film not column of actor So it has to do additional sorting and remember that rule This query breaks this rule because it has to do additional sorting. It cannot access the table Oh my god, it cannot access the table and produce the result But my skill optimizer decided to do so most probably because it thinks that table actor is small So not big deal. I will I will do additional sorting, but I will read less records Let's let's force my skill do what we want to do and We know that title is the column of film and We know that We will access records in the sorted order. So we will force My skill to use this Index by title to produce the result and then for each title we will join other tables. So it will give us the result at least we will not do additional sorting and Yeah, it looks like it it's worse, right because it has to read More records, but it doesn't do temporary table and doesn't do file sort and Yeah Response time of this query. By the way, you can easily reproduce these because circular database is Available data set is available and I managed to reproduce this on 5.6 20 something 25 This query is five times slower than this query Because because we just use right to index and my skill optimizer chose bad index and bad way to Join tables. So, yeah, you can check that by response time, but you you also want to Check Handler counters This is a This is API between storage engine and my skill server part and Yeah, if you if you run Show status after the query it will show you how many times like Handler read key how many times my skill asked storage engine to read key So we let's compare them First of all, you see that in the first case my skill accessed indexes 20,000 times and on top of that it did some rights rights for select This is because our temporary table Yeah, in this case It did only 12 15 Yeah 13 13,000 lookups so 20 13 25 even 25 2013 13 wins right and response time shows that we Optimized query took five times last time and questions Well, it's a big story with my skill optimizer this this is what they do and both oracle team and Maria DB team they work hard to improve optimizer But in the end of the day optimizer is stupid computer and we humans We know how we can access data the most optimal way. So, yeah, well Optimizer is We cannot exclude the optimizer because my skill has to decide how we access tables, right? So this is what this is job of my scale optimizer Sometimes it makes mistakes as you see it makes mistakes only on even on the small Tables reference tables that were around for many years Yet This is life nothing is perfect more questions Yes, please exactly you You have to understand that The index is not sorted by your second column is sorted for first column And then for each value of the first call column it It's sorted by the second column. So you yeah, you're right. You cannot you cannot Search by second column using that index because the index is not sorted by that column what you can do what you how you can work around that is You can have two indexes by each of them and If you If you have query by first column, it will use first index by if you do query by second column It will use second index if you do or like if first column is this or second column is this my scale can even do Union So it will Get all records for the first part part of the close get all records from the second index merge the result and And Yeah, and merge the result and give it to you so it can do but what is important is in this talk I believe is to understand what what's possible to do Because if you know for sure that okay, this is possible then most probably it's implemented in my scale and If if not, you can give a hint to my scale optimizer or you can file a bug. Hey Why why this query is not optimal even though I have these indexes? right, right You have to make sure that My scale can still use the The column you have to you have to define proper secondary indexes for that Yeah, yeah, some sometimes Yes So you have to you have to have indexes So even though you force order join in order My scale still can use them you new indexes because as you said If you change order some indexes may not be used So you have to either create indexes or maybe change joining close so Yeah To join tables you you need to have secondary indexes anyway Question, please. Yeah, the thing is It really doesn't make much sense to increase join buffer because of several reasons first of all if you access tables In not optimal way You will end up with more IO anyway another reason is If buffer pool is large enough so your data set is in the buffer pool Then you don't need join buffer because data is in in the memory all already in the buffer pool so yeah, well, I Cannot remember cases when Changing changing join buffer improves performance significantly So, I don't know maybe in rare cases when You join large parts of the second table it will help, but I I'm not ready to give you example Yeah, this is the danger with Percession buffers like sort buffer join buffer some other buffers They're allocated per session and if you have many sessions multiply that amount by number of Connections and My skill and in the DB particular Particularly when it allocates memory it doesn't check if it was allocated So if you run out of memory most probably my skill will just crash because at some point it will reference null pointer and here you go crash Thank you, and one more question, please Mm-hmm. Yes, you you can yeah, if if if joining close is Is right it can use secondary index in in the same table then yes, why not? I'm not sure what What what kind of Yeah, yeah, I wasn't a hundred percent accurate when I said that Query cannot use more than one index. That's not true and I I Gave another example with index merge when for the same Query my skill uses two indexes like to get part of the result from one index to get part of the result from another Index and then merge the result. That's another example when my skill can do can use Multiple indexes to produce the result Joining tables again you you at least use three three indexes to join three tables if you join Table to itself Exactly this is what happens to join the table second part second time you can use any index the table defines So there are there are no limitations including foreign key indexes because foreign key index foreign key Forces to create secondary index. So yeah Effectively, this is secondary index anyway, so yeah, you can use it. Thanks Question for low cardinality columns well Depends on depends on situation because if even if index is Not very selective you can still apply limit optimization for example if you need just 10 10 man, right? So you have index by sex and This is classical example of low Cardinality index, but you just need 10. So It will access index read 10 records and will stop reading so that query will be efficient even though The index is not very selective and yeah, there are many Yeah depends on situation, but general solution I don't know you have to be as Specific as possible. Yeah, it will. Yeah, it will Well It's good to double-check anyway But this kind of situations it handles and limit optimization is well known to my skill optimizer So it will probably use that low cardinality index to access the data Any more questions? No, okay, then thank you very much guys for the questions