 Good afternoon, everyone. I am Kapil Agrawal. I will be with you for next one, our discussing Database Scalability and MySQL. So the outline for the discussion will be scalability, partitioning, sharding, OLAB and OLTP, and various SQL commands. So with any large applications, we are having demands which are continuously changing. And with time, we have to develop new features. And the changes, demands that are required, like the computation demands for the new features are unpredictable. And if our application is slow, so it can cause loss to the business. Like if our application is slow, so the user will not revisit our site again. So it will cause loss to business. If database is not available to the applications, so application will not run. So there is a requirement that database is available 24x7 with zero downtime. What we are going to discuss is scalability of databases. What does scalability mean? Scalability refers to a scenario if we are increasing the resources. So our application should scale. It should provide improved response time and improved throughput. Like if we are increasing the number of users, then if we increase the hardware, our user should get the consistent performance. Like currently, if we are supporting 100 users and if we double the hardware, so our application should support 200 users. And if the number of users are constant and if we are improving the resources, like increasing hardware, so the user should get better response time. Like currently, if he is getting response in half a second, so if we are increasing hardware, it should get response in 1 by 4 seconds. So scalability can be organized in two ways, like vertical scaling and horizontal scaling. In vertical scaling, what we try to do? We will try to improve the efficiency of the existing hardware. Like currently, if we are having the i5 processor and if we want to scale our system, so we will provide the hardware which is of higher quality. Like we will increase the processor to i7 and increase the RAM from 8 GB to 16 GB. That comes under vertical scaling. So vertical scaling is of two types. One is scaling up in which we are going to increase hardware and other one is scaling down in which we are taking out resources from the system. Like we are degrading the hardware. Like currently, we are having i5. We are moving to i3. They are various merits and demerits of vertical scaling. First of all, if you are doing the vertical scaling, like if you are moving from i5 to i7, the code you have written earlier, you need not to make any modification in that code. Your existing code will work on the i5 and i7. In the case of vertical scaling, you will have a single machine to manage. Like earlier, you are having one machine and you are improving the resources of that machine. So you will have a single machine to manage. And it is easy to set up a single machine. With merits, there are various demerits. Like if you are going to improve the configuration of the machine, you are adding new hardware, so you have to reconfigure that the machine. You have to start the machine, so there will be a downtime. And there is a limitation. Like with vertical scaling, you can scale up to a certain limit. Like currently, you are having i5. And if you want improved performance, you can move to i7. And in future, there will be some higher processor that you can use. But it will be up to a certain limit. If you want to improve the performance by hundreds of scales, then this approach will not work. Because you don't have the CPU that is having 100 times performance compared to i5 processors. The amount of money that is required in the initial setup will be high. Like if you have by a processor that is cost around 1 lakh. And if you want to upgrade, you will try to reference that thing. You don't want to upgrade it continuously. Because it involves high cost. You have to buy high processor, which costs more. And if you are going to improve processors, in that case, you have to improve many other things. Like you have to improve the motherboard and other various components that are used for supporting that processor. So the other type of scaling is horizontal scaling, in which we are going to add more hardware. Earlier what we were doing, we were trying to improve the resources of the existing system. Like we are moving from i5 to i7, or moving from 4GB to 16GB. But what we do in a scale out, instead of improving the performance of the existing system, we add more number of nodes. It is like distributed computing. Earlier we were having one servers. Now we will add more servers. Horizontal scaling is of two types. One is scaling out, in which we are adding more hardware. The other one is scaling in, in which we are removing hardware from the existing system. Like earlier, if we are currently having 10 nodes, if we are adding 10 more nodes, we are moving toward scaling out. And if we are removing hardware from that, we are moving from 10 to 5 nodes. In that case, we are scaling in. There are several merits and demerits. With the cost of hardware decreasing with time, we can use horizontal scaling, because we can use a number of simple community machines. With low cost machines, we can also provide replication that will improve the availability of the system. And it will be fault tolerance. Like if you are having hundreds of machines, and if two machines goes down, then you will have 90 machines at your disposal that are working. And there are several demerits. Like if you go by horizontal scaling, it may happen that your adjusting code does not work with that, with horizontal scaling. You have to make certain modifications in your code. Because earlier systems were written with a single system in the mind. So the traditional database servers may or may not work with horizontal scaling. If we compare in horizontal scaling, if we add more computers, it will be more complex. It will increase the complexity of managing them. Like there will be a networking, so there may be a fault anywhere in the network. There will be latency, because data have to move from one node to another. And there will be a change in the programming paradigm. With vertical scaling, you have to initially decide what is the size of the hardware you have to choose initially. You can consider horizontal scaling as a linked list and vertically scaling as a array. In vertical scaling, you have to initially decide what will be your requirement. And with horizontal scaling, you can think it as linked list whenever you need more nodes So you can dynamically allocate nodes and add it. So this figure shows the difference between vertical scaling and horizontal scaling. In vertical scaling, we are increasing the resources to a single machine. But with horizontal scaling, we are creating smaller machines which are large in number, database scalability. It refers ability of database to handle changing demands by adding and removing more resources. So when it comes to database scalability, it can be at the software level. It can be at the hardware level. At the software level, when we are executing any queries, so like database, take logs on the tables or rows. In the earlier system, look for a cursor. Like if you are trying to require any resource, they take look at the cursor level. Like they take logs at the table level or the block level. But with time, they have made some improvement. In that, they have shifted the look, looking to the rows level. In that case, what happens is like, if you are taking look at the block level, so other rows which are not involved in the transaction will get blocked. So with the row level looking, other rows can be looked by other transactions which are not used by the other transaction. Like if there are 10 rows in a block and one transaction is using row number one, so other nine rows are free, so they can be accessed by other transactions. So this removed a critical scalability bottleneck at the software level. After the software level, software level scalability bottleneck has been added. So the attention turned to a hardware. Like in current systems, we have multiple cores. After software level, bottleneck has been handled. So attention is moving toward how to execute different queries on the different cores of a processor. And now the modern trend is we are moving toward distributed databases. So scalability in the three ways. Like we are having more large amount of data. So we have to spread it across different systems. We are having large number of requests on different systems and the size of requests may be large. Okay, there are two types of queries. One is transactional queries. One is analysis queries. We can categorize database into two parts. One is OLAP and other one is OLTP. OLAP is generally used for analysis purpose. If you look at some bank database, so what they does, they are having the customer records of over years. But what they does, they consider last six months records as the transactional database and the previous records as the analytical database. Analytical databases are generally used for finding patterns or providing recommendations. And the transactional database are currently used for fulfilling business requirements. So database are generally implemented by clustering. In clustering, we have multiple servers that are used to serve the user requests. There are two predominant architectures that are used in database clustering. One is shared disk, other one is shared nothing. Okay, so in shared disk architecture, there is an underlying storage and all the nodes are connected to this storage. So all the nodes share a common disk. Each processor is having a private memory whenever the transaction comes. It comes to one of these nodes or it is distributed to one of these nodes and these nodes connect to the underlying storage system. So it takes data from the disk to the RAM of the node and if different nodes are processing the same queries or using same data, so there may be a problem of consistency. For managing the problem of consistency, so we have to use locking. So shared lock systems use distributed lock management. They are generally used for systems where there is very less access to the shared disk. If different queries, they are very rare chance that different query requires the same resource or same record. And Oracle RSC use shared disk architectures. Next one is shared nothing architecture. In these systems, every system is having their separate disks, separate memory and separate CPU. So they communicate with each other through network. There is an internet connect network. Every system can connect to other systems through an interconnect network. And these systems are generally used where we require a large amount of storage because each node is having separate disk that can be, like if we group all these systems, we can get a very large amount of space. So in shared nothing systems, we require partitioning because we are having large amount of data. So we have to partition it into different, across different systems. For partitioning, we will require a different, we will require a partitioning function. And this system is generally when access and modification is performed in a single partition. MySQL and SQL database use shared nothing architecture. There comes other term replication. The application generally refers to create a copy of a single database across different nodes. It is used to improve availability and scalability of queries. Like if you are having the same copy of database across multiple systems. If one of the system goes down, the same copy of database is available on the different machine. So what happens like if you are having 10 copies and if one copy goes down, one system goes down, so you have nine systems, query will provide the output, but with little bit slow speed. Okay, so with replication, we are having multiple copy of a single system. So there are trade-offs. One is like when you are going to update the replica systems. It may be instantaneously, it may be at some different time. If you update it immediately, in that case there will be a lag. Like you have to update all the copies of the database. And in case you try to defer it to some later point of time, in that case you will have inconsistent database. Like your different database nodes will be having different data. The user yield will depend on the machine from which he is querying the data. In the replication system, there is one single node that is master and other nodes are slaves. Like in the figure, there is one master that is shown in the regular and the nodes shown in the below color are slaves. So all the right query will be passed through the master node. And all the read queries can be passed to any of the slave nodes. Okay, so generally application will read from the slave nodes and write to the master nodes. But there will be some problem. Like if you are having some site, like shopping sites, if you are making some transactions, when you check out the request will be made to the master node. But like if there are limited number of stocks, you are making right, like check out at the master node and user are reading the availability of the item at the slave node. And when you, if there is a lag, so user will, there will be an inconsistent data that there is no stock available, but the slave are showing that there is stock available. So it will result in the mismatch in the orders. Like there is no stock, but you are allowing users to order the item. One other method of scaling database is sharding. In sharding, what we does? We divide a lot database into multiple independent databases. There are two concepts. One is partitioning, one is sharding. In partitioning, we are having a lot database. We are dividing database across multiple machines. All the partitions will be managed by single machines. In case of sharding, the database is totally independent. They like, if you query from one machine, you will get the database of a particular shard. You will get the entire database from different partitions. Databases are classified into two categories. One is operational data and one is historical data. Operational data is OLTP and the other one is historical data is OLAP. That is used for query processing. Okay, so OLAP consists of historical data, that as I have mentioned, and it is generally used for data mining and making decision making. And it can be in petabytes and terabytes and it is very, it may be slow to carry the OLAP database. It consists of large number of rows. Generally, OLAP database are generally used for querying. It is very rare to redo write operation on the OLAP databases. OLTP database consists of current databases. They are used for generally business tasks, day-to-day operations. Their size is limited, maybe in GBs or MBs. They are very fast because they are limited number of rows and they are read and write operations with equal probability. Okay, so now we will start with the MySQL part. Okay, so first thing that we will discuss is view. So view is just like a stored query. Okay, so view does not have his own data. It fetches data from the underlying table. The benefit of views are like, you can reduce the complexity of the query. Like if you are providing the database access to a non-technical user. So what you can do, you can create a view and give it to him. So he has to fire a simple select query. Does not need to know the database details like joints, et cetera. And with views, you can apply various kind of security, like row-level security and column-level security. And with views, you can show limited data, like aggregated data and hide details of the data. Like what is the salary of particular employees. So you can hide these details with the help of views. So syntax for the views is like, you have to create or replace views. After that, you have to give the query. And views are of two types. One is simple one, another one is complex. With simple views, we are having generally one base table from which we are fetching the data. And with complex views, we are having two or more base tables from which we are fetching the data. Now we will see how to write queries in MySQL for views. Okay, so first of all, we will see how we can provide row-level security using views. And after that, we will see how we can provide column-level security using views. So in the underlying table, we were having five records. But here, we are trying to provide row-level security. So what we have done, we have created the views and we have provided the restricted access to the number of rows. Okay, so when the user will fire the select query on the row-level view, so he will get access, only those user names which are starting from A. And like, if we try to apply column-level security, so what we can do? Using this kind of queries, like we can provide column-level security. So this kind of views are generally used when you are having a large database table that contains employee details. It may contain an employee salary, or you want to share that database with someone and you don't want to show the salary of the employee to that person. So in that case, you can provide column-level security. And you can provide both, like combining the column-level security with the bare clothes. And in that case, you will provide the limited access to the restricted column and the limited number of rows. Okay, generally views does not have their own data. They fetch data from the underlying tables, underlying base tables. In case the tables are deleted, so views will not have any data. So there is a new concept called view metallization in which views can store their own data. So view metallization is not provided by MySQL. So what view metallization does in the background, it creates a table and it stores the yield of the query in that table. And whenever there is an operation on the base table, so these views are updated using triggers and stored procedures. This will help me like if your query is taking large amount, a lot of time. So in that case, and if you want to execute that query multiple times, in that case, your result will be stored, catch in the table, and when you fire the query, it will be, the result will appear instantly. So there is no need to fire that query multiple times. So now I will show you how you can create, view metallization is not provided by MySQL. So I will show you how you can implement it by yourself in MySQL. So I will take the example in which view will be storing the count of the records in a table. And the data you are catching in a table can be updated on demand or can be updated immediately. So for implementing view metallization, you have to first create a table that will be used to store the yield of the query. And after that, there are two options. If we are going with immediate application, in that case, we have to write triggers and if we want to implement only one updation of the view, in that case, we will go with the stored procedures. So I will show you both the ways you can update view. First, we will create a table that will be used to store the yield of the query. So I have created a table, user count that will store the yield. Now I have to insert the initial count of the table. So now the user count table will contains the number of records in the table, login details. So currently it contains five records. I will insert one record and show you how we can update this table on demand using a stored procedure. Now the login details contains six records, but the user count table contains only five. So we have to update it. So we have to write a stored procedure if we are going to update it using on demand. Okay, so we have created a stored procedure called refresh. It will insert the count of the records in the user count table. So the content of a stored procedure is as follow. First of all, it will flush all the data of the table user count and after that it will update. It will insert the count of the users in the login detail table. So we have created a stored procedure but we have to run it so that it can update the details in the user count table. Till now it contains only five and now we have to call the stored procedure so that it can be updated. So now after calling the stored procedure it contains the current count. What stored procedure does is it delete the earlier record and update the count with the new count of the number of records in the table. This is the on demand. Like whenever we want to update we have to call the stored procedure. And if we want to implement the functionality that whenever there is an instruction update, instruction update or delete in the table it should get updated automatically. In that case we have to write triggers. Okay, so the code of the stored procedure is as follow. First of all it will truncate all the data within that table user count. So initially it was containing count five. So first of all it will delete the count five. After that it will count of the users in the table login details that is six and it will insert six in the table user count. We will see how it immediately updates using triggers. So we have to write two triggers. One is for insertion, one is for deletion. So first of all we have created one trigger that is for insertion. Whenever any row is inserted in the table so it will increment the count in the table user count. So other trigger is for deletion. Whenever any delete operation occur on the table login details it will decrease the count of the user count by one in the table user count. So now we try to insert one record and see how it works. Currently there are six records in the login detail tables. Now we insert one more record. So now we have done one insertion in the login detail tables. So let's see what happens with this one user count tables. User count is updated automatically by using trigger. Now we perform the delete operation on the login details table. So now we have deleted one record from the login details table. So the number of records in the login details tables will be six. The delete trigger has automatically updated the count of users in the user count table. Just now we have implemented the stored pictures. So we will see what is the stored procedure. Stored procedure is a code segment that is stored in the database catalog. You can call it by name and whenever you are having large numbers of statements or like two, three statements that you want to give it simultaneously so you can create stored procedures. And other thing that we implemented is a trigger. Trigger is a statement that is fired whenever any event occurs. It can be insert, update, delete. And we can have various kinds of triggers like before, after and instead. Like do this operation before performing other operation on the table or after performing other operation on the table or instead of performing the operation on the table perform this operation on the some other table. So we are having three types of triggers before, after and instead. So next thing is indexes. Indexes are used to improve the overall performance of the database. Okay, if we don't have any index on the table in that case we have to go row by row. And like process of moving from one row to another from starting to end is called table scan. We can create up to 32 indexes on a single table and each index can have up to 16 columns. That can be index. Indexes perform the performance of the query. They are certain drawbacks. Like if you are creating too many indexes it will slow down the performance of your instruction query because with every instruction you have to update the indexes also. So if you are having a large number of indexes it will slow down your instruction performance. So there are various criteria that you should keep in mind when you are deciding whether we should create a column as a index column or not. Like if you are performing a large number of where operations on that column you should create index on that column. And if you are using order by particular column and in that case you can declare it as index column and if you are having lot of different values in a column then in that case you can declare it as index column. So there are generally three types of indexes. One is clustered index, other one is unique and third one is full text indexes. Clustered index are generally created on the primary keys. Clustered index defines how your data will be stored in the table. Like they define the ordering of the data in the table. If you are having the clustered index on the name field so your data will be stored in the table sorted by the names. So you can define only one clustered index. You can save data sorted only in one way. Clustered index are like telephone directories. Like if you go to a particular record or particular name you will find the phone number clustered correct at the same place. So clustered index does not require any external space. Whenever we create a primary key there is a clustered index created. With primary key there is a unique clustered index that is created. And they are faster because they don't need any indirection. Like they directly have record next to them so that they can access it faster. Other kind of index is non-clustered index. Indexed data are not clustered or non-clustered indexes. So like if index is clustered then it is a clustered index. All other indexes are non-clustered. We can divide indexes in other categories like unique or non-unique. Like if you want to enforce unique constraint then in that case we will have unique indexes. If the index is not unique it will be unique index are used to enforce unique constraint. If the index is not unique non-unique index are used for improving the performance of the query. The other kind of index is full text indexes. Okay, data generally used for fast. Educating fast performance. Data generally used for search with text. So we will see the example of each of these indexes. So now we have created a table that contains my unique key and one primary key. So there are two indexes created and both are unique indexes. You can see this in the column non-unique. So it contains zero which refers that the index that are created are unique indexes. And if we create any other index that is not unique it will create a non-unique index. So now we have created one index on the field enrollment ID. But we have not declared it as unique so it will be a non-unique index. So you can see that there is an index on the field enrollment ID but it is mentioned as not unique in the list of indexes. So it is mentioned not unique is equal to one in the column. Now we will see the full text index. How it is created and how it works. So for that we have to add a column that is and create a full text index on that column. So we are adding a column descriptions that is where care that will restore text. So we have created a full text index on a column descriptions. And after that we have inserted two records in the table applicants. And when we are firing a select query we want the issuance that are BTEC but does not have done course in compilers. So we have mentioned it as we are match description against what we have to match is BTEC and we want to remove compilers. So we have mentioned it minus compilers. And we are firing it in Boolean mode. So we are getting the year that are BTECs but does not have specialization in compilers. So what happens when the full text index. So for every full text index there is one inverted index created. So what does inverted index do? It will store data corresponding to words. Like you can see that if you see the word cat in the inverted index. So it will store in which document this word has occurred and how many times that word has occurred. So like if we consider these two sentences as two documents. So the cat in the first sentence the cat is occurred one time once. And in the second the cat is occurred once. Cat is occurred is showing one colon one. Means in the first document it has occurred once and in the second document it has occurred once. Like if you look at the word the. So it has occurred twice in the document one and thrice in the document second. With time we are having large databases. The size of database may go beyond the size of the disk. So we have to partition the database. Database can be partitioned on various basis like range. We can partition the database equals on the basis of range of a particular column on the basis of list or we can do random partition based on the hash. So it enables us to store database that are larger than the size of the disk. And MySQL supports selection from the partitions. Like you can explicitly define from which partition you want to select the record. And your partitioning function should be non-constant and non-random. If your function is constant in that case all the record will be mapped to a single partition. And if your function is random in that case you will not be able to fetch the record because you don't know in which partition that record exists. So let's see how we can implement partitioning in MySQL. So I am going to implement range partitioning. Like how range partitioning how we can implement range partitioning in MySQL. What that I am creating a table student. So this is a syntax for creating partitions. Like you have to define the table and after that you have to define the partitions like how you want to partition the table. Like I am partition the table by age, age range. So I am creating for partition one is less than five other is less than 10, third one is less than 15 and last partition will contain all the values that are beyond this range. So after creating partitions you can see like how many partitions are created. Using the following command. So information underscore schema dot partitions store all the information related to partitioning. Like on which field you are creating partitions and how many records each partitions have. So now we have inserted three records in the student table and we can see that which partition is having how many records. Like I have inserted age as six for one record 28 for other and 29 for third one. So one record is inserted in the partition that is below 10 and two records are inserted in the partition that is the last one that is above 15. It is displayed here like in the partition P1 we are having one record and in the partition P3 we are having two records. Now you like MySQL provide the facility that you can search record in a particular partition. Like you need not to search the entire table you can search in a particular partition. Like partition zero does not have any records. So we are getting empty results. Like and if you go to the partition one we can get the one result that is age between less than 10. And if we go to a partition number three that will contain store records that will be age greater than 15. And if you look at the whole table so you will get all the records. These kind of partitions are useful like if you are having a large database in which like if you are having site that you are having a website and you are doing the global trade and if you want to like store the results continent wise like there is a separate partition for different continents and you want to query a particular partition. So in that case you can use partitioning. And while working with databases like we are several times we are having the requirement various kind of requirements like if we are having the data in the rows and we want to get the data in the column format like we are having the input tables like this one as shown in the figure we are having customer name and the limited number of items like we are having three item types customer monitor and software. And we want to see like what is the amount spent by a particular person on a particular category like how much amount addition has spent on the computers and monitors and software. Some databases like MySQL server provides Pivot as an internal function. MySQL does not provide Pivot as an internal function. So you have to implement it by yourself. So you can implement it using two steps. First is extend the columns and after it apply aggregate functions. So let's see how we can implement this. So we are having the initial level that contains 60 codes and we want to convert it into other format like we want column monitor and software as the columns and we want to see how much addition and addition has spent on the particular item. So first of all we have to extend the columns. So we have graded the view that will contains the result of this query and what this query does. Okay, so this query will generate the columns, computer, monitor and software whichever column is having computer. So it will mark the price of the computer and the rest of the columns will be empty and like for monitor whichever row is having monitor will contains the value and rest of the columns will be empty and same with other columns. Now we are having computer monitor and software as the column and customer names. So now we have to apply one aggregate operation so that we can get the desired output. So now we have the view user item extended that will contains this result. Now we have to apply one aggregate operation on this one and we will get the desired result. So what we have done we have applied the group by on the customer name and compute the sum of the column. For creating private we were having two steps one is extending the column that we have done using views and after that we have applied the group by operation so that we can get the desired result. So we have discussed the scalability, sharding and the applications, the concept of partitions, need of views and indexes and the concept of stop precedent triggers. You can use these resources for further reference.