 Hello, welcome to SSUnitex, so we'll decide and this is continuation of PySpark interview questions and answers. So recently one of my friend has attended interview with Delight. So this question was asked there. So let's assume we are having an input data frame that is having total two columns. First is the department name and second is the gender. So in the output of this data frame, we want to have the department name and how many total employees that we have. How many male employees that we have and how many female employees that we have. So how we can achieve this output? So you can pause this video and you can think how we can achieve it. So let's start. So first what we can do, we will be going to create total four columns. First could be your department name and second could be your male employee. Third could be your female employee. In the department, we can go with IT and here if the gender is male, then we want to have one, otherwise we'll be going to have as null. So in the second row, here we can see it is having the female. So this time your male will be null and your female will be one. Once we'll be going to generate this data frame, then we can go and simply create one more data frame that could be your final data frame. So it will be having department name. And total employee and then the male employee, then the female employee. So here the department name will be IT and the total we can get the count on the group by department name. So here it should be have four. In the male employee, we can go and sum of this male employee column that we have created. Similarly, the sum of the female employee that we have created here. So by using this approach, we can simply get it. So let's jump to browser and we'll try to see in practical. So here let me try to execute the cell for creating this data frame. And it is having the same data that we have seen in slide and Excel. So now next, as I told you, we can simply select the department name column. So we can go with department name. So df1.department name. And this would be df1 not the df. Here we can use the when condition. So when function is not all able directly. So we have to use the pyspark.sql and we can import this when function. So inside the when function, we can simply check df1.gender. If this gender value is male, then we wanna have one. And let me put the alias of this as male employee. Similarly, we can also do the same thing for the female. And this time we'll be checking this as female. And here the alias we can put as female. Let me put this into another data frame that could be df2. And let me use the display of this df2. So it should be having total three columns. First is the department name. Second is the male employee and third is the female employee. So till now we are good. Next, we can simply go with df2.groupby. In the groupby, we can specify department name. Next, we can use aggregate and inside this df2, we can specify your count. So the count function we can simply use for checking the total number of employee. So total number of employee we can go with department name. So we can simply get the count of this department name. And let me put the alias of this as total employee. Now, next we can also go and try to do the sum of this df2.male. So this will be total male employee. So let me put the alias of this as total male employee. The same thing we can also do like sum of this df2.female. So this will be your total female employee. Let me put the alias of this as total female employee. Let me put this into display. Then we'll see the output of this. So it should be going to have the output as expected. If we can scroll down. So here we can see we have the department name as IT. It is having total four employee, three male and one female. In the HR department, we can see total five employee. All those are female. So in the male, we can see null. And here we can see the sale. So in the sale, we have total six employee, four male and two female employees. So by using this approach, we can simply achieve it. So I hope guys you have understood how we can write the query for getting this output. So thank you so much for watching this video. See you in the next video.