 Welcome to SSUNITEX, so she'll decide and this is continuation of PySpark tutorial. So in this video we are going to see about the substring function in PySpark. So azenda is first we will see what is the substring, then how we can use the substring with column and then we will see how we can use the substring with select statement and at last we will see about the str using the with column. So what is the substring? So substring function is used to extract the substring from a data frame string column by providing the position and length of the string you wanted to extract. So you need to remember like we have to provide the position the first thing, second how many characters we want to extract from the string we have to specify. So let's assume we have one of the string that is a substring and we just wanna extract the sub part from here. So what we have to do we have to specify the position, so position is the first position so we have to specify one. Then how many characters we want? We want only three characters as in sub we have three characters. Then in the second parameter we have to specify three. After that it will be going to extract the string part from the substring. So let me quickly go inside the browser and we'll try to see in practical. So here I am going to create one of the data frame and this data frame is having ID and date column that you can see. So let me try to execute and in the output we can see it is created. Now the requirement is we just wanna extract the year part, month part and date part from this date column. So how we can do that? So simply we have to use like df. Here I am going to use the with column so with column so it will be going to add a new column in the existing string. So first it is asking about the string. So we just wanna extract the year first so this will be the year. Second it is asking about the column. So the column here we have to use the substring. So substring function is not directly available then we have to import it. So we can use the from ispark.sql.funcents then I am going to import all the function. Now here simply we can use the substring and inside the bracket it is asking again to parameter. First is the column from which we want to extract. So we just wanna extract from this date type column. So df.date we can add. In the second parameter it is asking the position. So position we just wanna start from 1 and how many characters we want? We want 4 characters so that we can see 4 here. So it will be going to add in another data frame that is df1 and here let me try to use the display of this df1. So it will be having one more additional column in the data frame so that we can see and it is having the year. So similarly we can also use with column one more time and here instead of the year I am going to use the month. So it will be going to extract the month but in case of the month the starting string that should be from 6 and how many characters we want? We want only 2 characters and similarly we can add one more with column and this is for the day. So we can specify day. In the sub string we can count from where we need to start. So we need to start 4, 5, 6, 7, 8, 9. Here we can specify 9 and how many characters we want? We want only 2 characters. Let me try to execute and we will see here. So it is saying the syntax error. This is because I am missing this closing bracket. Let me execute it again. And here we can see it is successfully extracted the year, month and day by using the sub string. So we have seen how we can use it with column function. Now let me try to implement the same by using the selected statement. So let me try to copy this and go to here and here let me try to use select. The first parameter we want as id and the second column we just want as date. Then here we have to use the sub string. So we can use the sub string and again it will be same like year and we are going to start from 1 and we want 4 characters. Then dot alias. So the alias of this will be year. Now let me try to execute it. So here the first parameter that we have passed which is not correct. It should be df.date. Now we can execute and we will see the output of this. So here we can see it is extracted the year part. So similarly we can also use one more column on this and this should be for the month. So we can simply use month and it will be going to start from 6 and want only 2 characters. So that we can do and similarly let me try to add one more column on the existing data frame and it will be having date and it is going to start from 9 and we want only 2 characters. Similarly for the month only 2 characters. Let me try to execute and we will see. It will be having total 3 additional column in the existing one. So year, month and day. Now let me try to see how we can use the sub str function. So the sub str function will also do the same thing that we are seeing inside the sub string. So here let me try to use df.with column and it will be for year only. So the first parameter of the column name as year then we have to specify the column by which we want to extract. So I am going to use the column which is the date. So either we can use like this or we can also go with the df.date. Now after that we can use the sub str function. So this sub str will be asking like how many part we want. So we want to extract from this column which is the date type. We want to start from 1 and we want only 4 characters. So simply we can do like this and let me put this in another data frame is df1 and here let me use the display of df1. Let me execute it. It will add one additional column with the year and it is having the same value. So sub str is going to use like this under the with column. First we have to specify from which column we want to extract. Then under dot we can use the sub str and we can specify the starting character from where we want to start. Then we have to specify how many characters we want. So we can specify the length. So similarly we can do for the month and day as well. But mostly I am not using the sub str, I am using the sub str function because this is very similar to the inside the SQL we have already seen that. So I hope guys you have understand how we can use the sub str function. So thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.