 Hello, welcome to SS Unitex. So, we'll decide and this is continuation of PySpark interview questions and answers. So, in this video we are going to see two more interview questions. Recently one of my subscribers has attended interview in one of the MNC organization. So, these questions has been asked there. So, how we can read files from folders and subfolders and the second question is how we can create GIP and ANGIF files. So, for the first question here we are having option for the recursive file lookup and for the second question we have option for the code C. So, how we can read the files from folders and subfolders. Let's assume we are having a parent folder under that we are having the child folders. So, the requirement is we just wanna process all the files those are available under the parent folder along with the child folders too. If the files are available in the child folder or the parent folder we just wanna process all those files. In the second question we will be going to create one of the GIP file that is a compressed file in your blob storage and then we will try to read that compressed folder and creating a data frame for reading those. So, what is the recursive file lookup? So, this attribute used to recursively scan the directories to read files. It will read data from subfolders too. So, that I told you we are having parent folder along with the child folders and we will be reading data from all those. Next is code C. So, this attribute can be used to compress CSV or other delimited files using passed compression method. It only works on CSV or normal delimited file. Spark can read GGP without specifying code C but for writing GGP code C must be specified. Compression is the synonym for code C. So, let me quickly go inside the browser and we will try to see in practical. So, for the first question here under the load files folder we are having total three subfolders like 29, 30 and 31 and then on this under the load files that is the parent folder we are having this CSV file. If we are opening this 29 then inside that we are having the one more CSV file that you can see sales. Similarly, we are having the same file in 30 and 31. So, while we are going to read the data from this load files folder then we are having the subfolders so reading the data for those files too. So, how we can do that? So, for that while we are reading the data from the CSV file we have to use the spark dot read method and after that we have to specify the option. So, under the first option I am going to specify the header. Header value will be true and the second option we can specify that is your recursive file lookup. So, recursive file lookup will be used to recursively scan the subfolders those are available under the parent folder. We are reading the data from the CSV file here we can specify the path. So, the path what is the path? Path is mount point that we have created under that we have input folder and under that we have load files folder. So, like this. So, let me try to create a data frame that is DF and let me use the display of this DF. Let me execute and we will see the output of this. So, here it should be scanning the data from all those files and total four files has been scanned and one file is having total 799 rows. So, total it is having 3196 rows. So, by using this recursive file lookup option we will be going to read the data from the subfolders too. Now, the next thing as we have created this data frame. So, let me try to create another files by using this data frame. So, for creating new file we can go with write method and under the write method we can specify the option and here I am going to specify the header of this as true and then next we can specify the mode. So, if the file is already exist we just want to replace that file. So, for that we can use the overwrite method. So, overwrite mode we can use here. So, this should be overwrite. Next we just want to create the csv file and under the csv file first thing we have to specify the path by which we just want to keep this file. So, the location that could be mount point your and then input location and here I am going to create the files that is sales new dot csv dot gg because we are going to create the gif file. Now, the second thing here we can specify the separator. So, I am going to specify the separator as pipeline and the next thing here we can specify compression. So, the compression that should be your ggip and then at last encoding. So, this encoding is very important you have to remember while you are going to use the encoding value while creating the file. So, this value cp 1252. Let me try to execute this cell and we will see the file will be created inside the input location. So, as we can see the file has been created here let me go inside the input folder. Here we should be able to see the sales new csv dot gif file. If we can go to downside here we can see that file. So, now let me try to read the data from this file. So, how we can do that? So, for reading the data from this gif file we can simply use spark dot read method. Then we can specify option and inside the option we can specify the header and this header value should be true dot here we can specify we are reading data from csv file. The first thing we have to specify the path. So, what is the path that is mount point and after that we have this input and then we have the sales new dot csv dot gg and then we have to specify the separator. What is the separator that we have created pipeline separator and after that we have to specify encoding option. So, what is the encoding value that we have passed cp1252. Let me put this into a data frame that could be df double one. Let me display of this df double one. Let me execute and we will see the output and that should be having total 3196 rows. So, here it got executed and we can see it is having the same number of rows. So, let me recall what we have seen in this video. First we have used the recursive lookup option by which it is scanning the subfolders along with the parent folder. After that we have used the comparison type. So, comparison type is ggp we have used here for doing the compression on this newly created csv file and here we are passing the encoding option. So, this encoding is used while we are going to read the data from that gg file. So, that gg file we are using here for reading the data. So, encoding we are passing here. Thank you so much for watching this video. See you in the next video.