 Hello everyone. Today we are going to see query processing. So, learning outcome for this session is students will be able to find query violation plan for a given query and analyze the cost of selection operation. So, without index. So, rather other operations that we will see in the next lectures. So, this is the query engine how it executes. So, first query will be given to the parser and translator. So, first step is parsing and translation. So, query that we have written for example, SQL statement will be converted into the relational algebra expression expression using parser. Now, there can be one or more relational algebra expressions or plans for a given query. So, that will be that plans will be passed to the optimizer. So, second step is that optimization in which the cost of every plan will be found or find and based on that it will be decided that which plan is optimized one or is having low cost that will be passed to the execution plan. Then execution plan will pass that plan or that plan will be passed to the evaluation engine. Evaluation engine will execute that plan and the return the result of the or output will be written. So, third step is that evaluation. So, let us look at in detail. So, first the query will be translated into its internal form. So, understandable by machine. So, parser checks for the syntax find out whether the relation name which is a relation name that has been used are these relations are existing or not. So, once it is a valid syntax, then the system will prepare parser tree present representation or we can call it as a relational algebra expression will be found. Now, as we have seen that it can have the one or more relational algebra expression for the same query. It may be the possibility that in the instead of relation it will be view. So, that view is going to be replaced by the actual expression that is used for creating view. So, all these will be done in the first step. Now, let us look at the example. So, this is a query select salary from instructor where salary is less than 75000. So, here the relation name is instructor and again there is a no error in that particular query. So, which will be converted into its relation algebra expression. So, look at this condition is that salary less than 75000 and projecting salary from the instructor because we have to check that salary is whether less than or 75000 or not. So, same will be is converted into this parser tree. So, first at the leaf load it will be the relation name which is instructor, we are projecting salary. So, as we are required only salary and then again select salary which is less than 75000. So, this is one evaluation form or this is one plan for this given query. So, at this moment pause this video and try to find out the other plan. So, same query for this particular query there is one more plan possible. So, try to find out that plan. So, I hope you have got that. So, second option is that first filter out all the records less than having the salary less than 75000 and then project salary. So, in the first we have just projected salary and then we have applied the condition in the second plan we have already first selected all the records which is less than 75000 and then we have just projected salary. But here when we are applying this condition it will select all the attributes whereas, in this scenario it is only the salary attribute is selected and then we are filtering out the condition and that is why these two will give the same output but the cost is going to be different. So, the first that passing and translation find out all the possible plans. Now, plans that partiality representation will be more enhanced in the query evaluation plan. So, what does it mean that? So, we have to give a query or for while evaluating a query we have to provide the relational algebra expression or that has been calculated with and then we have to specify how to evaluate that expression also. So, again I am coming back to this example this is a plan or this is a relational algebra expression. Now, how to project this salary or how to select this salary less than 75000 that specific instructions need to be given. So, here that the operation which is going to be noted with that particular instruction is called as an evaluation primitive. So, that we have to specify in the query evaluation that particular plan which is called as an query execution plan or query evaluation plan. So, in this case now for this instructor now we are interested in salary and the condition is also on salary. So, what we are doing is that if you look at this it is written here that use index one. So, for instructor there is a primary key instructor ID. So, index is present for the idea also. Secondary index is present for the salary and while this selecting that salary less than 75000 use index one which will be more efficient than using the first primary index and that is why it has been mentioned over here. So, which index should be used how to retrieve the records that has been specified in query evaluation plan. In query optimization now as we have seen there is a different evaluation plans and obviously different cost will be there. So, in query optimization among all the equivalent plans the plan which is having the low cost or the lowest cost is going to be estimated. How it has to be estimated? So, it has to use a statistical information for example, how many number of tuples are present in that relation? What is the size of every tuple? What is again the other measures are there? So, that we are going to see while calculating the cost, but these are some catalog information which is going to be used for finding the cost of the every plan and then with the plan which is having the lowest cost that will be selected for the execution. So, in detail now we have to see how to calculate the cost. So, sorry the parameters used for this query evaluation will be in terms of number of resources like disk access, CPU time to execute a query, cost of communication and so on. Or the disk cost can be estimated as number of 6, number of block transfers. So, in that block transfer it can be block read or block write. So, that we have to consider, but here we are going to concentrate only on two measures. One is number of block transfer and other one is number of C corporations. Also the other assumption is that data will be always read from initially from disk. So, it may be the possibility that data is available in buffer, but we are assuming that every time initially data will be read from the disk. So, once the operation will be required that time. And the write time and read time cost though ideally it is different, but here it is assumption that it is going to be same. So, with these assumptions we are going to calculate the cost of the query. So, there are two times one is a time average time to transfer one block of data and the other one is access time. So, it is going to be the disk 6 time plus rotational latency. So, based on this previous again measures that is a number of block transfer and number of 6 the cost of this particular query will be b into tt plus s into ts means that how many blocks that we need to transfer into the time required for block transfer one block transfer plus number of sick operations into time required for one sick operation. Now, this ts and tt will depend on where the data has been stored. What does it mean that? For example, assume that block size is a 4 kb and if the data is stored on high magnetic high end magnetic disk then the time required for sick operation is 4 milliseconds whereas, time required for block transfer is 0.1 milliseconds that has been found by the experiment. If the data is stored on SSD then the time required is less as compared to magnetic disk. So, ts will be 20 to 90 microseconds and tt will be 2 to 10 microseconds. So, based on the hardware your time will be different. Now, first we will start with select operations. Obviously, we have to see the select operation, join operation, sorting and so on. So, that we will be looking in the next lectures. So, what is the selection operation? So, it will be like a file scan which is the lowest level operator to access the data. So, for example, select star from employee. So, I have to go through this complete employer relation. So, from the first to the last record. So, it is like search algorithms that locate and retrieve records which fulfill the condition. So, select star from employee where salary is less than 75,000. So, all the records which is fulfilling the conditions that will be retrieved. It allows the entire relation to be read in those cases where the relation is stored in the single dedicated file. That is nothing but the file scan and we are going to use this at the basic level first. Also attribute or set of attributes which are used to look up the record is generally called as a key that we already know. So, we will start with first linear search. So, here in selection operation. So, consider the query that select star from employee where salary is equal to 30,000. So, which records will be qualified. So, if you look at this particular relation here. So, what it will do is that it will bring the first block of the relation to the memory. It will check for all the values of the salary and which is satisfied that will be added to the result or I will be output as a result. So, here each block will be checked. So, first block will be checked here it is one record which is present with the salary equal to 30,000. Second block will be then see assume here it is an assumption that it is containing one block contains one record and memory fits only three blocks. That is why we are taking here in the first as three blocks then next three blocks and next three blocks and so on till the file will not exhaust. So, first three records or the blocks will be brought to the memory condition will be checked and the records will be taken into output if it is satisfying the condition. So, here in this example first block qualifies only one selection or the one record in the second block is also having the one record and in the third block it is we are having the one record. So, all these three records will be retrieved as a result. Now, really linear search will be applied regardless of the condition that is the selection condition also ordering of the record. So, whatever may be the ordering it does not matter when we are going for linear search because we are going starting from first record till the last record and then availability of the indices here we are not using any index for that. If the file is too long or if it is not stored continuously then we require many SIC operations otherwise if the file is stored continuously then we will require only one SIC operation to start with and then all the blocks will be brought to the main memory that is why the cost of this operation will be TS plus that is the time required for the only one SIC operation plus BR into TT. So, it is going to be one plus now three blocks has been brought to the memory that is why it will be one plus sorry not one TS plus three BT. So, depending on the TS and TT the cost will be calculated. The second thing is that consider now this query select star from employee where employee ID is 1004. Now, here employee ID is a key attribute in this case whenever the record is found it will not go further because as it is a key attribute it will be only having a single qualifying record. So, how it will be executed first block again will be brought to the main memory. So, employee ID 1004 is not present in the first block that is why it will go for the second block. In the second block the first stopper itself is having the employee ID of 1004 then this will be retrieved and then thus this particular selection will stop it will not go further for the next block because there is only a unique record present in this particular relation. So, average time will be VR by 2. So, if there are three blocks again. So, 3 by 2. So, 1.5 that is two blocks needs to be transferred we are going to use seal operation of course and the time required for TT and TS. But the worst case what can be the worst case? So, select star from employee where employee ID is equal to 1008 if this is a query all blocks need to be read to retrieve the record. So, the time will be same that we are into TT. So, this is how the linear search will be used and the cost required for the selection operation using linear search. Thank you.