 Hello. Welcome to SSUnitech. So, see this side and today we are going to see how we can read and write the CSV file by using the PySpark. So, we are going to use the PySpark inside the data bricks. So, before going forward if you haven't watched the first video and the second video of this video series. So, I would strongly recommend to watch those videos. I will provide the link of those videos in the description of this video. So, let's get started. So, remember in the last video we have seen about this flow. So, in this video we are going to see the first part and the last part. So, in the first part we are going to read the data and the second part we are going to write the data. The transformation part we are not going to cover in this video that we will be going to see in our upcoming videos. But first we are going to see how we can read it from the source and we will be loading that into the data frame. So, up to this part we are going to cover and after this data frame we are going to directly write inside the destination. So, let me quickly go inside the browser and we will try to see in practical. So, here we are having this blob storage and in this blob storage under this input container we are having this file which is the sales.csv file and this file is available inside the sales folder. So, if you can go back then we can see the sales folder and under the sales folder we have this sales.csv file. So, what is our aim? Our aim is we need to read the data from this sales.csv file and loading that data into the data frame. So, data frame is the nothing but the temporary storage of the table. So, it could be having your tabular structure as a column and rows. So, we will be storing there and after your data frame to again csv file is our actual need. So, let me quickly go inside the Databricks. So, here I have created this PySpark read.csv notebook. So, here we need to write the query. So, here as we can see our cluster is up and running. So, that looks good. Here what you have to do you have to write your code. So, remember few things that you have to notice as we have already seen inside the Databricks playlist. So, the first thing how many mount points are there because I have already created the mount point. So, dbutils.fs.mounts. So, this command we can use for checking how many mount points are there. Then we can analyze we are already having this mount point with the name of input. So, if we want to see how many files are there under this input then we can simply specify ls command. After that we can specify the path. Let me try to execute it. So, we can see we are having these many folders. So, under the sales folder we have the actual file. So, we can navigate to the sales folder and let me check how many files are there. Then we can analyze we are only having this one file which is the sales.csv file. So, we have already seen everything in our Databricks playlist. Now, let me start with reading the data from this file. So, we are going to use the spark clusters. So, first we have to use the spark dot. Then we can press control and shift. So, this all our label commands will be here. Here if we can scroll down side we should be seeing this read. So, we are going to read the data from the source. So, that's why we can use the read command. Then after we can again specify dot then control shift. So, from where we are going to read the data. So, we are reading the data from the csv file. So, we can specify csv and inside this bracket we can simply specify this path. So, let me try to copy this path from here and paste it here. So, it will be going to read the data but we should be going to store the data inside some of the data frame. So, we can call a data frame like dfls. So, this data from will be going to hold all the data from the csv file. So, this is very straightforward command. Let me try to execute by pressing shift and enter. You can see this data frame got created. Now, let me try to see how many records are there inside this data frame. So, simply we can use the display command and here we can specify this data frame. So, that is df underscore sales. Let me try to execute. So, we will be seeing it should be having the values but here we can notice we are having the values as in column and rows but the first rows should be header. So, how we can make this change? So, for making this change we can simply go the code where we are reading it. So, we are reading it from here. So, let me try to put this in downside and again dot control and shift. So, it will be going to load. So, here if we can go we can see many more options but we can see one of the option value. So, in this option we will be going to specify the key and value pair. So, the key is header. So, we can simply say header and the value is true. It means your file is already having the header. So, the first row should be going to considered as a header. So, we can execute it and after executing, okay it is saying some error that it is having unnecessary spaces. Let me try to execute it again. So, it will be executed as we can see. Now, let me try to read it again. So, this time you will be seeing your first row has been moved to the header as we can also verify. So, we are also having multiple options but we will be checking all those options one by one. But in this video we are more focused how we can start. So, here we are going to start. So, I hope guys you have understand how we can read it. So, for reading it it is very simple. We can read and storing inside the data frame. Now, the next task will be writing this inside the output location of the blob storage. So, let me try to use the write command. So, we have to use this df underscore sales. So, now we are writing this from this data frame. We can use the dot and after that we can see we are having the read and we should be having something with the write. So, we can use the write as we can see. So, it is saying write. So, we can write and after that the next thing we can specify your format. So, we are going to write the file in the format of CSV. So, we can simply use the CSV and we can specify your path by which you want to keep this file. So, I don't want to keep the file into the input location. I just want to keep this file inside the output location. So, here simply I can use the output. Now, here I can use the sales 2023 then 0409 dot CSV. So, we are going to load the data in this file which is the sales 000230409 dot CSV file and here this file will be available into the output location. If we can go here go back into the output location and in this output location as of now we can see we don't have that file. Let me try to go here and try to execute it. So, once we will be executing then we can see your job has been executed successfully. If we can go back and try to refresh. So, we will be seeing one more file here. So, that we can see sales202030409 dot CSV file. So, we have successfully loaded the file here. So, I hope guys you have understand how we can read it and how we can write it simply inside the CSV file. In our upcoming videos we will be seeing in the detailed part of the transformation how we can do the transformations. So, thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.