 Hello, welcome to SSRetake. So, see this side and this is continuation of PySpark interview questions and answers. So, recently one of my friend has attended interview with Deloitte. So, this question was asked in second round of interview in Deloitte. So, let's assume we are having one of the input data frame that is having employee ID name and skills. And in the output of that, we just want to see the skills with the comma separated value. So, let's focus in the ID one. So, that is for John. Here, John is having total three skills, ADF, Azure Databricks and then Power BI. So, in the output, we just want to see like ADF then comma Azure Databricks then comma Power BI. So, all these we just want to skills as a comma separated value. So, inside the SQL, we are having a function that is the string underscore aggregate. So, by using that function, we can simply specify the comma and then we can specify the skills and we can simply achieve it. But inside the PySpark, how we can achieve it? So, here what we have to do first? First, we have to use the collect function. So, what collect function will do? Collect function will be going to use under the aggregate function. So, it will be going to take all these skills as a array. So, it will be having a single row. But here, under the skills, it will be having these three values as an array while we are using the collect function. And after that, we can use the concat function. So, by using concat function, we can specify comma as a concat string. So, it will be going to convert that array into this string format. So, let me quickly go inside the browser and we will try to see in practical. So, here let me try to execute this cell. So, it will be going to create the data frame that is DF1 and it should be having total three columns as employee ID, employee name and the third column will be skill. Now, data frame one has been created and we can also verify the data. So, it is having the same data as we have seen in the slide. So, what we have to do, we have to use the DF1, then we have to go with the group by because in the output, we just want to have only EMP name. So, we can specify DF1 dot EMP name column. And then after we have to use the aggregate and here we have to use the collect list. So, collect list function is not available directly. We have to import that. So, how we can import, we can go with PySpark dot SQL dot functions, then we can import and here we can use collect list. And let me also import concat underscore ws. So, ws is nothing but with separator. This function will be going to use later. Here, I am going to use the collect list, collect underscore list and here we can specify data frame one dot skill. So, the skill is the column. And let me put this into another data frame that is DF2 and let's see the output of this DF2. Let me execute this cell. So, as of now we can see it is having the value like employee ID, employee name and skills. And here after using this collect list, it is going to add in a single row. But here this is not coming as expected. It is converted as a array. So, inside the array we can see like it is having the zero index, one index and index two, first ADF then ADB and then Power BI. Now, how we can convert this array to the string with separator. So, for that we can simply use the concat with separator. So, by using that function we can simply get it. So, we can use DF2 dot. Let me go with select. First, we just want to see this employee name column. And the second column we have to go with concat with separator. First parameter it is specifying what is the separator. So, in our case separator is comma. Second parameter it is asking the column name. So, column name is nothing. Here we can see collect list in bracket that is skill. So, here let me provide the alias of this. So, we can simply go and provide the alias of this as skill. Let me execute. So, column name will be changed as skill. So, here let me use this column name as DF2 dot skill. And here let me also provide the alias of this as skills. Let me put this into another data frame that could be DF3. And let me go with display of this DF3 and we will execute this cell for checking the output of this. So, we are able to see it is having like employee name and skills are coming as comma separated. If we can expand we can also verify the skills are coming in string. And if we can go in up query and here we can see this data frame too. So, the skill is coming as a array type and the element that is a string type. So, I hope guys you have understood how we can convert our input data frame to the output data frame. So, thank you so much for watching this video. I will provide all this code in the description of this video you can copy and you can utilize for your practice purpose. Thank you so much. See you in the next video.