 Hello, welcome to SSUnitex, you see this side and this is continuation of PySpark tutorial. So today we are going to see about the union, union all and union by name. So all these three functions we will see in this video. So today's azenda first we will see about the union, next we will see about the union all and at last we will see about the union by name. So let me quickly go inside the browser and we will try to see in practical. So here as we can see we are going to load the data from the sales.csv file and we are loading this data into this data frame which is df. So let me try to execute this. So here we will see it is having total six columns with the sales order id, sales order date, item code, item name, quantity and value. Now first let me try to create one more data frame by using this data frame. So simply we can use df1 from this df. So now we have two data frames, data frame 1 and data frame. So if we can see both are having the same number of rows. Now what we need to do let me try to create another data frame by doing the union on these two data frames. So we have to use the df. then we can use the union and here we can specify the second data frame which is df1. Now we can use the display command with this df3. Now let me try to execute and we will see the output. So what it will return it will be returning total 1598 rows. So in single data frame we have stable 99 rows. If we can see then we can also verify it is having 799 rows. So simply we can use the union and we can get the output or we can also use the union all. So both will work as same. So either we can go with the union or union all it will return the same number of rows with the records. Next I am going to comment this line and here let me try to rename the column which is our label in this df1. So here we can see it is having value column. So let me try to rename this value column as well. So we can simply overwrite the existing data frame which is df1. And here we can use the column rename and here we can specify the old column name so that is value. Next new column name that is well. Now next I just want to create another data frame for df3. I am going to use the df1.union and here df data frame. Now let me try to execute it. So we will see the same output here. So here we have used the column rename so it should be with the column renamed. So here as we could see we can see the column name is well. So because we are using this df1 at the first place that is why we are seeing the well in the column name. If we are going to join with the df1 then we will see it should be value instead of the well. Whatever the data frame is coming first it is picking the column names from there that we can see. Now next let me try to rename the column names one more time. And this time I am going to overwrite the existing data frame df1. And this I am going to use the column renamed again. And this time I am going to rename the quantity column as value and value column as quantity. So value as quantity. Now here let me try to execute and we will see the output. So this time if we can notice we are having the quantities here. If we can scroll down for checking another file then we can see the quantities. So we can simply say if we are going to rename the column names it is even picking as per the order. So if your data frame is having first quantity second value whatever the column name it doesn't matter it will always check the sequence and we will be going to do the union accordingly. Next I just want to add one more column on this data frame which is df1 so overwriting the existing one and here for adding a new column I am going to use with column. And the new column may be quantity new and I am going to pass the quantity whatever the existing quantity multiplying with the hundred. So this will be the new quantity. Now this data frame is having one extra column and data frame df is having less column. So let me try to execute and we will see. It is throwing an error like this quantity because here we have not specified the data frame. Now let me try to execute and we will see the output. So it is returning an error like union can be only performed on the table with the same number of columns because here we have one extra column that's why we cannot use the union there. Let me comment this. Now the last where we can use the union by name. So it is always trying to pick the columns whatever the column names and accordingly it is going to do the union. So here we can also use the union by name. So union by name can be used whenever we are checking all the column name are same. So if your first data frame whatever the order doesn't matter it will always pick as per the column names if the order ID is coming first and in the second data frame ID is coming at last it will be going to combine the ID with the ID because it is working like a name. So we can execute and we will see we will see the same output over here because it is picking as per the name. So this is the only difference by union union all and union by name. So thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.