 Hello, welcome to SSU intact social decide and this is continuation of PI spark tutorial. So in this video, we are going to see all about the columns. So how we can add a new column inside the data frame, how we can rename any existing column, how we can modify any existing column and how we can drop any column inside the data frame. So all these we will be seeing in this video. So before going to start with the columns, let's understand about these two function. So the first function, which is the lit function. So what lit function will do lit function is used to create a new column by adding the value to that column in PI spark data frame. So let's assume we want to create a new column in the employee data frame and that employee data frame does not have the company name. So we just want to add a new column as company name. And the name of the company we will be specifying as SSU intact. So we cannot pass directly SSU intact inside that particular column. So we have to have use this lit function. So what lit function will do lit function will be going to take the values that could be your static values like SSU intact and passing that value to that particular column. So we are required to use the lit function whenever we are going to pass any static value. Next function is the cast. So it is used to convert the data type of any column. So simply it is going to convert the data type of any column. So next we are going to start with the adding a column. So the first thing that will be adding a new column. So let me quickly go inside the browser and we'll try to see in practical. So here we are having this employee single line dot json. So this file we just want to read and we'll be going to add this files data inside a data frame. And then we'll try to implement about the columns on that particular data frame. So how we can read this single line json file we have already seen in the last video. So let me quickly go inside the other data bricks. Here I have added a new notebook and we can see this cluster up and running. So for reading the data from single line json file we are required to use the spark dot read dot option. We are going to use the option because your file is having header. So header value will be true. Next we are going to use option and this time here we are going to use the single line. So what it is indicating it is indicating that your json file is the single line json and we are reading it from the json file. So we can simply specify the complete path. So we have already created a mount point for the input location. So we can specify and here we can execute it. So we can see it will be executing. So as we can see it got executed. Let me add this inside a data frame and this data frame as df underscore emp. It means it is for the employee data frame. Let me try to display the data. So here we can simply use this display command and try to execute it. So it will be displaying all the data from that particular data frame. So as we could see it is having total four columns. First is the id second name third department id and fourth is the age of the employee. Now the first requirement will come for adding a new column. So how we can add a new column in this data frame. So I just want to add a new column. So for that we are required to use the with column option. So this with column will be taking two parameters. The first parameter will be your column name. So the column name could be your company name and the next parameter that will be the static value that we want to pass as ssunitec. So if we are passing this like this one so it will be reflecting an error. Let me execute and we will show you. So it is saying this column should be a column that is the ssunitec because this is not a exact column value. So here we are required to use the lead function. So after using the lead function let me try to execute it. So it is saying lead is not defined because we have not imported that lead function yet. So for importing the lead function we have to use the from pyspark.sql.func and here let me try to import the lead and cast both the function and let me try to execute it. So as we can see it got executed and we are having is department name ID name and company name is newly added. Let me try to add one more column here and that column could be your company ID and here let me try to add the company ID value as 101. So now we are adding two columns here. Let me try to execute it. We can see company ID is here. Now let me try to put this in a data frame. So let me call that data frame as dfadd and here let me add this inside the dfadd. So this could be dfadd1. Now let me try to execute it. So as we can see two data frames has been created that is a dfaddress1 and dfaddress. So in the dfaddress1 we have the company ID and dfaddress will be not having the company ID. It is having only company name that looks okay. Let me try to read the data from this dfaddress1. So we can use the display command and here let me try to execute it. So we should be able to see all the data. So these first four columns were available earlier and last two columns we have added. But if we can notice then we can see the data type of the company ID which is the string data type. But as we could see it is having the integer data type. So we just want to convert this data type which is modifying the existing column. So the existing column is company ID. We just want to modify the data type of this column. So how we can do that? So for that let me quickly go and try to copy this data frame that is the dfaddress1. And here again we are required to use the with column. So what with column we will be doing this time we just want to replace the data type of the string for the company ID as integer. So for that as we have not imported any type. So we have to import the type first that is a SQL type. So pyspark.sql.types here we are required to import two types. First is the string type and second will be the integer type. Now here first parameter will be saying the existing column name which is the company ID. So let me try to copy this company ID from here and specifying that company ID over here. So we just want to make the change on this company ID column. Next we are required to use this data frame which is containing the company ID again. Then inside the bracket we have to again specify this company ID. So you have to remember this. Next we will be going to use the cast function. And in this cast function it is asking the type and the value. So the type is integer. So we can use the integer type like this and let me add this new data frame which is modified and let me try to execute it. As we could see it got executed successfully. And if you can see then the data type of this company ID has been changed and now it is pointing as integer. Earlier it was a string. You can again recheck we are going to specify the first parameter under the with column which is the existing column which we want to update. Then we are required to use this data frame with the column and then we can simply cast it. So we are going to update it. Now as we can see if we don't want to keep this company ID column here we just want to rename this company ID column and we will be keeping company ID as company underscore ID. So how we can do that? So it is going to rename your existing column. So for renaming we simply going to use df underscore modify dot. Here we can see one option that is the with column rename. So what it will be saying it will be asking two parameter. The first parameter will be saying your existing column name which is the company ID. And what new column name we want we want a company ID with the underscore. Let me add this inside a data frame with the df underscore rename. Let me try to execute it. As we can see it got executed. If we can expand then we can see now your data frame is containing this company as company underscore ID instead of company ID. Now the last thing how we can drop any column. So for dropping the column it is very straightforward. Let me try to drop a column which is the company ID from this df underscore rename data frame. Here we can see drop option. So we can simply use the drop and then it is asking the column. So I am going to specify the column which is the company underscore ID. So we can drop from this. Now we can see it is having only company name, name ID, department name and it does not containing the company ID anymore because we have already dropped the company ID from this data frame. So I hope guys you have understand how we can add a new column, how we can modify the existing column, how we can drop a new column, how we can rename a column. So all these we have seen in this video. If you have still any doubt then you can comment your questions in the comment section. I will try to respond there. Thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.