 Hello, welcome to SSUnitec Social Decide and this is continuation of PySpark tutorial. So in this video we are going to see how we can read any CSV file with the schema option. So before going forward, if you haven't watched the last video of this video series, so I would strongly recommend to watch that video where we have seen how we can create the schema. So it means how we can create the table. So let me quickly jump to the browser and we will try to implement it in practical. So here we are having this file which is the sales.csv file. So first we will be going to read the data from this file directly and then after we will be reading the data by using the schema option and we will also see the difference. So let me jump inside the Azure Databricks and here in this notebook, let me try to read the data from the sales.csv file. So as we have already created the mount point for the same, so let me try to read it directly. So for reading the data, we should be going to use the spark dot then we need to use the read. Then we have to specify the option and under the option I am going to specify header value as true because this file is having header. So let me make it as true. Next we have to specify like from which format we want to fetch the data. So we want to fetch the data from the csv file. So we have created the mount point which is the mnt then input and then we have the folder which is the sales and then sales dot csv. Make sure you are typing this name correctly, it means it's the case sensitive. So I want to put this inside a data frame, so let me try to execute it. So we should be going to see this data frame will be holding all the values from this sales dot csv file. So command completed, now let me try to see whether we are able to see the data from this data frame or not. So we can simply use the display command. So as we can see sales order id, date, item, name, quantity and value. But let's try to see what is the data type of all these. So we can use the df dot schema. So this schema option will be going to display the schema for all columns. So here as we could see sales order id, so which is the string type. So we can also see the date which is again string type. So all these are in string type. So it means when we are trying to fetch the data directly without specifying the schema, so all the columns data type will be string only. It will not have any other data type. So we don't want to specify string by default. We should be going to specify the exact data type what we are having inside the columns. So let me try to create the schema first. As we have seen in the last video, we should be going to import few functions. So for those we can use the pyspark dot sql dot type. So actually we are required to fetch the types and the type if we are going to specify as a stick, so it will be going to import all the types. But we can also specify the required types. So first type that is the type. Second required type which is the stock field. Next is the integer type. Next is the string type. Next is the date type. So this should be in capital letter and type t will be in capital letter. So all these are required for this schema. So let me try to add a schema here. So let me add a variable which is schema and for creating the schema as we have seen in the last video we should be going to use the stock type. And inside the stock type we should be going to specify these brackets. Now next we have to declare all the columns. So those columns will be declaring by specifying the stock field. So stock field we can specify we should be having total three parameters. First will be the name and second will be the type. So here we have the integer type and the last parameter that will be indicating whether this id will be having null value or not. So we don't have the null value over here so we can use the false. Now let me try to add another column and this column for the name as we can also verify from here. So we have the SOID is the first column. So let me use the SOID instead of the id. Next will be the SO date. So let me use the SO date instead of the name and this type should be date. Now let me add another one and let me copy this, paste it here. So next is the item code. So here we can see it is having the space between item and code. But we don't want to put any space then simply we can use and replace this space. And this item code is having the string so we can specify the string type. Next let me use the item name. So we can simply copy the item name from here and instead of item code it should be item name and will be also deleting the space from here. Next we can see quantity. So quantity is an integer value. So the data type that should be integer and the last one which is value. So we can use the value and we can remove this comma. Let me try to execute it. So here we have wrongly typed the integer. So let me try to correct it. Now let me try to execute it. So we can see command is completed with success. So this schema we have created. Now we need to bind this schema while fetching the data from the CSV file. So we can use the same command that we have used earlier. But here we should be specifying the schema option. So for that here we can use the schema. So under this schema we have to specify the schema name. So that is the schema again. So this is the thing that you have to do. And let me call this data frame as data frame new and let me try to execute it. So it is executed successfully and if we can expand we should be able to see the data types over here. So SOID is having integer, SODate is date, then item code, item name. Then we can see integer, then integer for the value. So let me remove this backslash. So everything is working as expected. So let me recap what we have done. Here we have seen while fetching directly data without specifying the schema. By default it will be having the data types as string. Next if we are going to declare the schema and binding that schema while fetching the data from the CSV file, it will be picking the actual data types which we have specified inside the schema along with the column names, those column names we are specifying here that we will be going to return as an output of the data frame. So I hope guys you have understand how we can read the CSV file with the schema option. Thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.