 Hello, welcome to SSUniTeX, so see this side and this is continuation of PySpark tutorial. So in this video we are going to see how we can filter the data from the data frame. So we can do the filter by using the filter and like operator. So let me quickly go inside the browser and we will try to see in practical. So here we are having this file which is a sales.csv file. So first we are going to read the data from this sales.csv file and put that data into one of the data frame. And then we will try to see how we can filter the data from that data frame. So let me quickly go inside the data bricks. So here as I have already created this notebook. So let me quickly go and try to read the data from that file. So for that we have already seen we can use the spark.read.ofsun. I am going to use the option just because the file is containing header. So I am marking this as true. We want to read it from the csv file and here we can specify the path. So we have already created the mount point in the earlier videos. So I am going to use that. So under the sales we have the file which is the sales.csv file. Let me put this into one of the data frame. Let me try to execute. So this df will be containing all the data. So let me quickly see like display this df. So it will be returning all the data which is available in this data frame. So as we could see it is having total 799 rows. Now the first requirement what we need to do. We are required to rename the item name column name. Because the item name we can see it is having the space between item and name. So in the actual requirement we just want to remove this space. So how we can do that? We have already seen in the last video how we can rename any column. So let me go into use the df.column renamed. Now the first parameter will be the existing column name which is the item name. The second parameter will be the new column name. So that will be item name. And I am going to replace in the existing data frame. I don't want to create any new data frame. As we can see now space has been gone. Now let's try to start with the actual requirement. The first requirement we just want to do the filter if your item name is having total income. So how we can do that? So for that we can use df.filter. And here we are having two methods by which we can apply the filter. The first method if we are going to use these columns directly then we should be going to use the call function. If we don't want to use the call function then we can use the df.item name. That should be equals to your value which is the total income. So let me try to paste it here. And let me try to use a new data frame which is the df1 and display from this df1. Now let me try to execute it. So it should be going to return only those records which is having total income. As we could see it got filtered. And here we can see 25 rows. So this is the first way by which we can do the filter. But we can also use another way. Let me try to import the call function first. So we can use the from pyspark.sql.func and I am going to import all the function. Now here instead of specifying this data frame again we can simply use this call function. And here we can add the double quote. Now let me try to execute it. It should be going to return the same output as we have seen. As we can see it is having the same output. So by using these two methods we can do the filter inside this particular data frame. Now the next thing will be let me comment this out and display this data frame. Now the next requirement is we just want to do the filter if your item name contains total. Whether that will be total expenditure or the total income. We want all those records if item name contains total. Let me try to use the like operator next. So df.filter we are required to use. And here we have to specify the column by which we just want to do the filter. So we want to do the filter in the item name. So we can use the item name. And outside that we can use the like operator. So here we can see the like operator. And inside the like operator I am going to say if that value is having total. So I just want to put this inside a new data frame that could be df1. And let me try to display this df1. So it should be going to return all the records those are having total. So we could see like the item names are having total. That's why we can see total 122 rows. Earlier it was 25 rows. So this is how we can do the filter. Let's assume if we are having the multiple filters. So how we can do the multiple filters there. So let me try to comment out this one and let me try to use df.filter. And here if we want to do the filter for the multiple columns. On that scenario let me try to use the df.itemname equals to that value could be total income. And let me put this in one of the data frame that is the df1. And let me try to execute. So it will be going to filter and will be seeing only total income. But here as we want total income and the quantity that should be 5. We don't want all those quantities those are not 5. So if we are having such type of requirement. Then simply we can go and then we need to add the and clause. So and clause we can specify the m% and again inside the bracket. We can use df.quantity if this quantity value is 5. We can close the bracket. Now let me try to execute and we will see. Total it will be having only 9 records earlier it was 25. Because the quantity has been filtered. Now the next requirement is we want all those records. If those are having item name is total income and quantity value is 5. Or your SOID value is less than 63. So we just wanna all those records. So how we can do that? So for that here we are required to specify the bracket like this. Now and condition we can add. And here let me try to use the df.soid less than or equals to 63. So what it is doing now it is going to filter out and we will see all the data. That is 70 records because the 63 records we are getting directly. And after that we are getting the combination of the total income and quantity. So it is going to return both. So what it is doing first it is checking this condition either this condition will be true or this condition will be true. So all game is only for these brackets. So you have to remember while specifying such type of conditions. Make sure you are closing the brackets at the correct place. So I hope guys you have understanding on the how we can do the filter by using the like operator and the filter operator. Thank you so much for watching this video. See you in the next video.