 Hello, welcome to SSUnitex social decide and this is continuation of PySpark tutorial. So in this video we are going to see about the concat and concat with separator. So in this video we will see about these two functions. So the azenda is first we will see about the concat init function when we can use it. Then we will see concat with separator and then we will see what is the difference between concat and concatws. So basically concat is used for concat init for multiple columns into a single column. So let me quickly go inside the browser and we will try to see in practical. So here I am going to create one of the data frame and this data frame is containing total four columns. The first column is the first name then the middle name then the last name and last column could be your gender. Here we can see we are creating the data frame by using this create data frame function. It will ask two parameter. The first parameter will be actual data and the second parameter will be data frame and this data could be in array type data. Schema is also an array. So let me quickly try to execute and we will see this data frame output. So in the output we could see we have total four columns with the first name, middle name, last name and gender. Here I am going to import few of the important function. The first function is the concat. Second the call function then the lit function then concatws and regular expression replace. So I am going to use all these one by one so you will understand. So here we just want to do the concat on this data frame that is df. So here I am going to select the first. Here we can use the concat function. So simply we can call this concat function. Then inside this concat it is asking to supply the list of columns. So the first column is your first name. So we can simply supply the first name here. Then put comma then we can supply the second column. So we just want to concat first name then the middle name and then last we can supply the last name. So it will be going to concat these three columns together. And here we can also supply the alias name of this and this name could be full name. Let me try to put this in another data frame that is df1 and display with the df1. Let me try to execute and we will see the output. So here we are checking it is going to concat the first name, middle name and last name in full name. So simply we can directly use the concat and inside the concat we can supply all these columns. But what is our requirement? Our requirement is we just want to put the comma between first name, middle name and last name. Then how we can do that? Simply we can use the lid function for passing any literal values and here we can supply the comma. So first name then the comma. So let me try to copy this. The middle name again we are going to supply this comma and that's it. Let me try to execute and we will see again. So this time what we can see we have the suzil comma, middle name is not here then we can see comma two times then we can see sing. Similarly we can see in the comma then bahadur then sing. So if we don't have the middle name then this comma is coming twice. So I will show you later how we can replace these two comma by a single comma. Next we can see about the concat ws. So let me quickly use the df one and df dot similarly I am going to select and here instead of concat I am going to use the concat ws. So this ws with separator. So what it is saying first parameter should be your separator. What will be the separator between first name middle name and last name so that we have to supply here. So I am going to supply comma over here next parameter is asking about the first name. So I am going to use the first name. Next I just want to combine with the middle name. So we can use the middle name and at last we should be seeing this last name and here we can simply supply the alias of this as full name. Now let me try to execute and we will see the output of this. So we can see the same output because here we have not explicitly supplying the comma between first name middle name and last name. At the starting we can simply supply whatever will be your separator. It could be pipeline so you can supply the pipeline here and we will see the pipeline in the separator. So first name then pipeline middle name pipeline then the last name. So similarly we can supply like this. Now the next thing as I told you here we can see this multiple pipelines or multiple comma are coming then how we can replace all those. So we can simply use the regular expression underscore replace so this function will help us so how we can use that. So before going to use let me add another column which is remaining that is gender column from here. So we can execute and here okay. So this gender spell is wrong. Now we can see it has total two columns first is the full name and second is the gender and this is available in this data frame that is DF1. Now here I just want to replace this double pipelines with a single pipeline. So how we can do that we can simply use the DF1 dot again we have to use the select and inside that we can use the regular expression underscore replace function as we can see it is asking about the column or name. So we have to supply the column that is the DF1 dot full name. So this is the column name then it is asking the pattern that you just want to replace. So what we want to replace we want to replace multiple pipelines with a single pipeline. Now here again I am going to put alias on this and this alias could be full name new and the second parameter DF1 dot gender and let me put this in another data frame that is DF2. So instead of pipeline I am going to put this as comma and just try to replace this double comma with a single comma. Now let me try to execute and we will see the output. So here what we could see we can see earlier we were having multiple comma between Suseel and Singh. Now we can see only a single comma similarly for the Indra Bahadur Singh is okay but in case of the web hosting we were having multiple comma in between. Now we can see the multiple comma has been removed. So I hope guys you have understood when we can use the concat when we can go with the concat ws and the regular expression for the replace. Let me conclude this. So whenever we are going to use the concat here it is going to just combine the multiple columns together. If we have a such requirement by which we just want to keep the separator in between then we should go with the concat ws and if any column values are missing and we don't want to keep this separator multiple times then we can use the regular expression to replace those multiple values that we can see the multiple commas here. So thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.