 Hello, welcome to SSUnitec social decide and this is continuation of PySpark interview and real time scenarios. So today we are going to see one more real time scenario. So here we are having three options while we are going to read the data from the CSV file. So first is the inferred schema, second is the delimiter, third is the line separator. So line separator and the delimiter, these two options we can also use while we are going to write the data into the file. But the inferred schema will be going to use only when we are reading the data from the file. So as per the today's agenda first we will see the delimiter. So what is the delimiter? So this attribute can be used to specify single or multi characters as a separated for each columns which reading or writing using with option or options function. So this is basically if we are going to load the data into one of the CSV file then what will be the separator for between multiple columns. So that is the delimiter. Next is the inferred schema. So it is automatically guess the data types for each field. So while we are going to read the data it will automatically guess the data type. If we set this option to true the API will read some sample records from the file to infer the schema. If we want to set this value false we must specify the schema explicitly. So sometimes it's also very dangerous to set this value as true because it is only picking few number of rows and then it is guessing what is the actual data type for those. And last is the line separator. So let's assume we are having total three columns like ID, name and gender column and after gender column we should be having next row. So that is the line separator. What is the value for that line separator? We can also specify by using the line separator attribute. So let me quickly go inside the browser and we'll try to see all these three attributes in practical. So here I am going to read the data from one of the CSV file and loading the data into this data frame. So let me try to execute this cell. So here it is having the sales data. So you could see sales order ID, sales order date, item code, item name, quantity and value. Now we just want to write that data into the CSV file. So how we can write the data? So we are writing the data from using this data frame that is DF. So here we can use the write option and after this write option here we can specify the option. The first option is the header because this should be having the header. So we are marking this header value as true. Now the second option we can load this data into the CSV file. So under the CSV file the first option is the path. So the path we just want to load this data into mount point that we have created MNT. Under that we have input folder. Under that we are having sales separator dot CSV. So on this location we just want to save the file. Next we can specify the delimiter. So delimiter is option for the separator. So here we just want to use the separator as pipeline. And next here we can also specify the line separator. So the line separator will be something we can pass backslash r or backslash n. So these two values are available for this. And here I am also going to use the mode. So what mode will do? So under the mode we are going to specify as overwrite. So it will be going to overwrite. So let me try to execute this and we will see the output of this. So this command should be executed and as we can see it got executed. And we are having this sales separator dot CSV file. And here let me try to refresh this container. And here we should be having that folder with the sales separator dot CSV. That we can see sales separator dot CSV. Let me open this for checking the data. So here let me open this. So this should be having all the data with the separator as pipeline and a line separator should be backslash r. So that we can see line separator and here the row separator is backslash r. Now here we can also specify with backslash n. So like these two line separator options are available. So let me try to execute this. So as we are using mode as overwrite so the existing file will be overwritten. And let me open this go to edit. And here we can see line separator is backslash n. Now if we want to read the data from this file then how we can read it. So for reading the data we have to use the spark dot read option. Then we can use the option and under the option we can use the header as true. The first thing the second we just want to read the data from CSV file. Here we can specify the path. So the same path that we have used in the above query we can copy from here. And under the path we can specify this path. Then we have to use specify the separator. So the separator is the pipeline and then we have to specify line separator. So what is the line separator line separator is backslash n. So let me put this into a data frame that is DF1 maybe. We can use the display of this DF1. Let me try to execute and we will see the output of this. So here as we can see it is loaded successfully. Now here if we are going to open this then we can see the data type for all these like SOID then SODATE all the data types are in string. But the actual data type that we can see is not the string for the SOID. It should be integer and for the date it should be date format and remaining we can see quantity and value. So these two are string. But actually it is coming as a string for quantity and value that should be integer. But actually it is coming as a string. So how we can make this as integer? So we can go with the option as infer schema. So we can go with infer schema and this infer schema we can mark this as true. Let me try to execute this now. And this time if you could see all the data types. So for the SOID it got converted as integer quantity and value has been converted as integer. SODATE has not been converted because this format is not correct inside the PySpark. It should be having yyymm then DD. So that format it is supporting inside the PySpark and treating as a date column. So I hope guys you have understood when we can use the separator line separator and infer schema. Thank you so much for watching this video. See you in the next video.