 Hello, welcome to SSUnitex, so we will decide and this is continuation of PySpark interview questions and answers. So, today we are going to see a very common interview question. So, how we can load only correct records while reading data from a file. Here we are having the options under the modes we are having total three modes while we are reading the data. First is the permission mode. Second is the drop malphode fail fast. So, these three modes are available while we are reading the data from any file. So, how we can read only the correct records. So, as per the today's agenda, first we will see about these three modes and then we will see how we can use all these three while we are reading the data from source. So, here if you can see in the Apache Spark document, so here we can see the mod. So, under the mode we are having permission. So, permission is by default option. If we can read permission here, then what it is saying? It is saying when meet the corrupted records, put the malformed string into the field configured by column name of corrupted record. So, what does it column name of corrupted record will see in the next video. Here we will be going to see about these three modes. So, when we are going to set this mode as permission mode, then if your data is not correct as per the data type, then that data will be ignored as a null. Next we are having drop malformed. So, if any data is not correct as per the predefined data types, then that row will be eliminated while we are reading the data from the source. And third is the fail fast. So, what fail fast will do? It will be going to throw an error if your data is not correct. So, here we are having one of the csv file and this csv file if you can see, then it is having id name is and department name. So, here in id we can see 1, here we can see 2, triple u, then 3, 4, 4 times y and 5. So, total 5 records we are having, but the second record and the fourth record is not correct as the id will be the integer column and second row and fourth row will not be having integer column values in id column. So, these two are correct records. So, our requirement is we just want to load only first record, third record and fifth record because second and fourth is having problem. So, how we can do that? So, now here I have declared this schema where the id name is and department name four columns and id is integer type. Name is string type, as is integer type and department name is string type. So, let me try to read the data first. So, we can use the spark dot read method and after that we can specify the option. So, under the option first we can specify the header. So, this file is having header. So, header should be true dot we just want to load data from csv file. Here we can specify the path. So, the path is mount point under that we are having input folder under the input folder we have employee dot csv file. Let me put this into a data frame and let me try to use the display of this DF. Let me execute and we will see the output of this first. So, now it is reading the data perfectly and all the rows are coming and if you can see the data type. So, for the id data type is string. So, this data type is not correct. So, if you are going to use the schema that we have created here. So, let me try to use the schema option and under the schema option we can use this schema and let me try to execute now. So, now here we can see id is integer and az is integer but here we can see the value for the second and the fourth column for the id as null because by default it is using permissive mode. Now, let me try to use the permissive mode. So, by default it is the permissive mode but explicitly I am going to use the permissive mode. So, here I am going to use under the option I am going to specify mode and what will be the value of this mode. First, I am going to use the permissive. So, now let me try to execute. So, it should be going to return the same output as we have seen here. So, here these two rows are coming as null and now let me try to use the null font. So, drop null font. So, this time the second row and the fourth row should be gone. As we can see the second row and fourth row are having problem. So, that's why those rows has been gone. Now, here we are having only correct records. So, the question's answer is we can go with the drop null font and here it will be going to load only correct data. Now, we are having this data frame. We can process this data frame and will be create a new file by using this data frame. We are having the third option as well by which we can go with fail fast. So, what fail fast will do? The fail fast will be going to throw an error and it is saying like we are having the string value for this id column which is 2 triple u which is not correct. So, that's why it is reflecting this error. So, I hope guys you have understood when we can use drop null font, when we can use fail fast, when we can use the permissive mode and by default it is having the permissive mode. In the next video of this video series we will be going to see if we have the corrupt records then how we can load the corrupt records in log file and the correct record will be processed. That we will see in the next video. So, thank you so much for watching this video. See you in the next video.