 Hello, welcome to SSUnitex, Lucille this side and this is continuation of PySpark interview questions and answers. So in the today's video we are going to see how we can get the department wise nth highest salary for the employees. So we are having input data frame which is containing total 4 columns and in the department we are having IT department, we have the HR department, we have the sales department. In the output we are picking the employees who are getting the highest salary in their department. So for the IT department we can see the employee ID 3 is getting the highest salary. For the HR department we can see ID 4 is getting the highest salary and for the sales department we are having tie breaker for the 4000 salary for the ID 7 and ID 8. On those scenarios we will be getting both the rows. So we can see ID 3, ID 4, ID 7 and 8. So these two for the sales department. So how we can write our queries for getting this output? So simply first we are going to create the rank and this rank will be going to generate on the descending order of the salary and it is also reset on the department change. So like we are doing the partition on the department first and then we are generating this rank. So as we could see for the HR department so here 3000, 2000 and 1000. So for the 3000 we have 1, for the 2000 we have 2, for the 1000 we have 3 and once the department name has been changed this value reset this rank and here again it is starting from 2500 for 1, 1500 for 2 and 1000 for 3. Similarly if we are having the same values for the salary we are getting same rank and once we have created this rank we can simply filter out based on the highest salary we are getting or the second highest salary or the third highest salary we can simply filter out and in the output we will be seeing this output. So let me quickly go inside the browser and we will try to do this in practical. So here let me try to create this data frame that is DF. It would be having the same data that we have seen in the slide. Let me go and try to create the rank. So before going to create the rank we are required to import the function and import the window these two options. So let me use the from pyspark.sql.funcense then we can go import as tick. Second we are going to use the windows function. So we have to import window as well and we can import as tick. Now here we are required to write our logic. So our logic is very straightforward. First we are creating a new column that will be rank and it is going to have the dense rank because we are having the same salary for sales department and we are going to create the same rank for those. So here let me try to create another data frame and this data frame could be DF underscore rank and here let me try to use the select and here let me select all the column first and then additionally we are going to add one column and that column will be rank. So for using the rank first we have to use the dense rank function so we can simply use the dense rank and we can specify this bracket and after that we can specify over clause and after this over clause inside this over will be going to use the window function. So we can go with the window dot here we are required to do the partition. So this partition by inside the partition by we are required to do the partition on department name column. So DF dot department underscore name. So this is the not underscore this is the department name column only. Once we have done with the partition by then we have to go with the order by and inside the order by this order by should be on salary column and this should be on descending order. So let me try to use the display of this DF underscore rank. Let me try to execute this and we will see the output of this. So it will be adding a new column and that column value it is having like one two three as per the descending order of the salary and it is going to reset once the department name is changed but this column name is not proper. So let me try to add the alias name for this column and this could be rank. Let me execute and we will see the output here. So now we can see this rank once we have done this next simply we will be going to filter this DF underscore rank data frame. So we can filter out and here we can specify the condition. So condition is DF dot rank column value should be equals to one. Let me use the display of this. So actually it is not DF it is DF underscore rank. So the data frame would be DF underscore rank. Let me execute. It is going to filter the rows which is having the rank one. So we can see this rank one. So this is the highest salary we are getting. We can pass two for getting the second highest salary similarly we can pass three and four like that it will be going to filter out. So I hope guys you have understood how we can write the PySpark query for getting the department wise highest salary. Thank you so much for watching this video see you in the next video.