 Hello, Welcome to SSUnitech Susil Decide and this is continuation of PySpark tutorial. So in this video we are going to see about the right modes inside the PySpark. So what is the right modes and why it's important and when we can use it everything we will see in this video. But before going forward if you haven't watched the last video of this video series, so I would strongly recommend to watch that video because this is the continuation of that video. So what is the right mode? Let's assume we are having a source and a destination. Our requirement is we are having a file that is the F1 file at the source. We just want to copy this file F1 from source to sync. So inside the sync we can simply copy it. And by next time when we will be going to again execute our code for copying the same file which is the F1 file from source to sync. On that scenario right mode will come into the picture. So here if we are going to directly copy from source to sync the same file it will throw an error. Then how we can handle that error? Because it is saying your file is already available at the destination and how we can avoid it. So for that we are having these four options if your file is already at the sync. The first is the overwrite. So overwrite will be going to overwriting your destination file by the source file next time. So here if we have loaded the data at 1pm and again we are going to load the data at 2pm then 1pm file will be replaced by 2pm file. So that is overwrite. Next is the append. So in case of the append it will not replace any file it will just append the new data that we are receiving from the source and destination will be having old plus new data. So here if 1pm file has loaded total 10 records and 2pm file is having 20 records then it will be total having 30 records that is in case of append. Next is the ignore. So in case of the ignore if destination is already having the file then nothing will be happening at the destination but your code will be executed successfully. You will not see any error. So last is the error. So if your file is already at the destination it will throw an error and by default inside the pyspark it is having write mode as error if we are not going to specify anything. Let me quickly go inside the browser and we will try to implement it in practical. So here we are under the blob storage under the input container. We have this sales.csv file. Data requirement is we want to copy the data from the sales.csv file to a data frame and then from that data frame to the destination file. So that is the actual requirement. Let me quickly go inside the data bricks. Remember in the last video we have created this notebook. This is the code for reading the data from any csv file and storing the data inside the data frame. So this is we have seen in the last video in detail. Let me try to execute. So here it got executed successfully and if we want to check the count then we can simply use the data frame dot then we can press the control and shift. So it will open all the available functions. So I am going to use the count here and execute it. So you will be seeing how many records it is having. So we can say it is having 799 rows. Now next we just want to copy the data from this data frame to the destination. So we have to use the write method. So simply we can use the data frame then dot. Here we can see the write. So simply we can use the write then dot. Here we can see the csv because we want to copy the data as in csv. Here we can simply specify the path. So we are able to access these locations because we have already created the mount point. And if you are not good with the mount point then you can also watch the Databricks tutorial playlist and where you can see the mount point. So here in the output location we have this sales folder and under that sales folder we will be seeing the file. File names will be going to generate at the runtime. Let me try to execute it. So it will simply go and create a folder and loading the data. So that we can see it is executed. Let me go back here go to the output container and under the output container we should be seeing the sales. So sales folder should be there as we can see. Let me open it and here we can see the last file which is containing the data. If we can open it and here let me go inside the edit. So inside the edit if we can go in the preview. So here we can see it is not having the header. So the first problem we don't have the header. So how we can add the header. So simply under the options we can mark the header as true as we are seeing by reading. So we can simply use the same command here like this. Now if we are going to execute it it will throw an error like your file is already available so that we can see file is already at the destination. Now how we can avoid it. So we can use the mode so dot mode and under this we can simply use the over write. Let me try to execute it. So it will be replaced by existing file and will be adding the new file and that file should be having the header. So here let me refresh it. Go here. So this file name is going to generate by PySpark. Let me check the preview. So first we can see we are having the header and this header just because we have made this change. So over write is working as expected. Now as we can see this is having total 799 rows. So first let me try to read the data from this location which is the CSV file. So how many rows are there. So for that we can use another data frame and here we can simply use the spark dot read dot CSV and under the CSV we can specify this path and this CSV file is also having the header. Then we can use this option and under the option we can use the header value as true. Now let me try to check the count. So we can simply use the count here execute it. So this time it should be having 799 rows. As we can see it is having 799 rows. Now let me try to use the append. So what append will be doing it will be adding the records twice. So 799 multiplied by 2 so it should be having 1598 rows. We can execute and we can also check it like 1598 rows. Now in case of the ignore so as I told you it will not do anything at the destination but your command will be executed as you can see command executed but nothing happened because any cluster job is not executed. As we can see here the spark job so it is not happened here. So nothing is happened at the destination and we can also verify so it is having the same 1598 rows and last one is the error. So in case of the error as I told you so it is reflecting an error that your file is already exist. Thank you so much for watching this video. I hope guys you have understand how we can use the write modes in the real time. Thank you so much and see you in the next video.