 Hello, welcome to SSUnitex social decide and this is continuation of SEO Databricks tutorial. So today we are going to discuss about the file system utilities. So before going forward in this video, if you haven't watched the last video of this video series, so I would strongly recommend to watch that video, where we have discussed about the DBFS, which is Databricks file system. So let's get started with today's video. What is the file system utility? So inside the Databricks total, we are having five types of the utilities and one of the utility that is the file system utility. Second is the data utility, notebook utilities and many more. So going forward, we'll see the other utilities. But in this video, we are more focused on the file system utility. So what is the file system utility? So the file system utility allow you to access Databricks file system, which is the DBFS, making it easier to use as your Databricks as a file system. To list down all the available commands, we can run DButils.fs.help. So we'll see in practical, don't worry about the command. So why we are more concerned about learning the file system utility? Because if you want to play with the files and whatever the data is available inside the files, you should be understanding about the file system utility. Because here we are having multiple commands and those commands will be helpful for playing with the data. Here as we can see DButils. So what is the DButils? So DButils is nothing but the Python library and we can directly use this library. So under that DButils, we can use the FS for the file system and help. So help will be going to get all the list and commands under the FS. So total we are having seven different type of commands. First is the CP. So CP means copy. So it will be copy the file or directory from one location to another location inside the DBFS. So that we will see. Let's go inside the Azure Databricks browser and we'll try to see in practical. So here I have already created one of the notebook. So I'm going to directly start with this notebook. And for saving the time I have started this cluster that we have created. And here if we can go and we'll see under the DBFS which is the Databricks file system. Remember in the last video we have uploaded this employee.csv file. Now our first task is copy the file from this location under the file store and we will be creating another folder over here and that folder will be output and we'll be keeping the file on that. So we want to copy the employee.csv file from the file store to the output folder of that. So how we can do that? So for that first we have to use the DButils dot FS. Let me tell you about the available commands. So we can directly execute DButils dot FS dot help. So here it is listing all the commands. So as we can see FSutils if we can scroll down. So under that first we can see the CP. So CP will be copy from one location to another location as we can see inside the input parameter. So first is the from which is the string then to as a string then the recursive and it is having the Boolean as false by default value. Now when we can use this recursive we will be seeing in this video don't worry for now. Let me try to use this command. So for using this command we can go and write DButils dot FS dot CP and after that we have to specify the source path and the destination path. So as I told you we are having this source path we can directly go into the file and copy the path from here like we can copy this API one close go back to here. So this is our source. So from this location we want to copy the file into the output folder instead of the file store. So we can simply call this as output folder. So what it will be doing it will be creating the output folder and after that copying the file from file store to this location. Let me try to execute it. So we will see. So as we can see it is returning true it means your command got executed successfully. Let me go back to here and browse DBFS again. So here we can see the output. We can go inside the output and we will see this employed out CSV file. So we are successfully able to create this file and copying from file store to output folder. Next we want to copy this folder which is the output folder inside the input folder on the DBFS. So under the DBFS root directory we will be having another folder that will be input and under that input whatever is available inside the output we want to copy. So how we can do that let me go back to here. Here our source is the output. So we can simply cut this part and instead of supplying the file we can call this as output. So we want to copy this output folder from this output to an input. So we can call this as simply input. Now let me try to execute it. So it will be returning an error. Why it is returning an error? Because we have not set the value of the recursive as true. So whenever we are playing with the directories then we have to mark this recursive as true. Why it is required because let's assume inside the output folder we have only one file which is the employee.csv file. It may be having more than one files and folders under that. So if we are going to mark this recursive as true so it will go in each file and recursively accessing those and copying from output to input folder. So that's why we are required to mark this recursive as true. Let me try to execute it by pressing shift and enter. So it will be executed for sure as we can see true. So this command executed successfully and we can go back to here go to the dbfs again. Here we see the input and inside the input we are having this file. So whatever the content is available inside the output is successfully copied into the input folder. Now let me try to upload two files under this file store and then we will try to copy from file store to output folder and we will see. Let me try to upload a new file here and that file should be for the sales and click on done. So we should be having two files under this as we can see now we want to copy the files which is available under this from this file store to the output. So let me quickly go here and here instead of output let me call this as file store you want to copy from here to the output folder. Now let me try to press shift and enter. So it should be executed successfully as we can see. Let me go back here and let me go inside the browse dbfs go to the output folder. We should be seeing two files which is the employee and the sales closely we were having the employee file here as well. So that file got replaced while trying to copy. So this is all about the copy which is the CP now let me go inside the next command. Let me access first LS so in the last video we have already seen about the LS but let me recall here as well. So list of the file which is available in particular directory we can access by using the LS command. So let me try to write the dbfs.fs instead of CP this time we can supply LS then we have to specify the slash. So slash it will be going to return all the file which is listing under the root directory. So let me use the display command so it will be going to return as a table format. So here as we can see it is having total six folders but like these two folders are not visible in the UI as we have already seen in the last video. So other than that we can see all these four four folders. So this is the use of the LS for listing all the files. Next is the make directory. So inside the make directory we can understand it will be returning as a Boolean value whether true or false and here it will be going to say it will be creating the directory. We can read out it will create the given directory if it is not exist also creating any necessary parent directories. So what it mean? Let me quickly go here and try to understand. So the requirement is should be having one folder here and that folder something like the temp and under the temp we should be having another folder which will be the input. So the temp folder is not available and we want to create the input folder under that temp. So we can directly create both the folders by using the make directory. So let me go and use the DB utils dot FS dot make directory here we can specify the path. So the path we want to keep as temp outside the folder which will be the parent folder and under that we want to create input folder. So let me try to execute it. So here it is saying like this command is not valid. So let me try to see the exact command which is the MK DI RS. So let me use the correct one and try to execute it. So once we will be executed it is true. So we can go and we can test we should be having one folder with the temp and after that we can see the input folder under that. So it is creating directories and if parent directory is not available it will also create the same. Now let me go in the next one. So next is the move. So it will be going to move the file from one location to another location and the similar kind of recursive method will be happening here as we have seen in the copy file. So let me try to go and explain you that as well. So let me go and try to understand the requirement. So inside the temp we don't have any file. So our requirement is if we can go inside the file store. So inside the file store we are having these two files. So we want to move these two files from this location to temp. So how we can do that? Let me try to copy the path first and go back to here. And let me use the dbutils.fs.move directory then the source. So this is our source and we want to move from destination and destination is we can say instead of file store that will be temp. So it will be moving only a single file. Let me try to execute and we will see. So as you can see it's true. If we can go here we should not be able to see the employee file under the file store that we can select and we can see we don't have. Under the temp we should be seeing that file. Now if we want to let me try to upload the file again here and then we will see. Now we are having two files again. So we want to move all the files which is available under this to the temp folder. So again instead of supplying the file name we can simply supply the folder path. And after that here instead of supplying the file name we can supply the folder path and we need to mark the recursive as true. Let me try to execute and we'll see. So simply we can say we cannot move the file store from file store location to any another location. So let me do one thing. Let me use the input folder and we'll try to move the input folder from this location to temp. Let me try to execute it and as we can see it's true. Let me go back here. We should not see the input here. We should be seeing this input under this temp. So let me browse this and we'll see. So we should not be seeing the input here and we can go into the temp and we'll see the input. So now the next that is the put. So before going to understand about the put we need to read the data from the CSV file. So we can go here and inside the file store we have this employee dot CSV file. Let me try to copy the path and after that let me go back to here. And here we have to write some of the spark code. So you have to learn this and remember this spark code because we are going to use this very often. So for reading the data for any file we have to use the spark dot read dot option. So inside the option if your header as true. So we have to specify header value as true dot. Then we can see this is a CSV file. So we can supply the CSV file and this part. Let me put this in one of the data frame and we want to see the data from this data frame. So this we can check and we can see the data as of now this is one of the CSV file and it is having the data as ID name is and department name. So when we are going to use the put command it will be adding the data into this particular file but it will also replace the existing data. So how we can check that let me quickly go here and try to write DB utils dot FS dot put and we can see what is the syntax. So inside the put it is having the file. So you have to supply your path then whatever the content that you want to add and here we can see the overwrite value by default it is false but you can also make this as true. So let me go back to here and here you have to supply your file path. So let me copy this and we can paste this and after that it is asking about the content. So our content will be something like 6 comma name as a webho is could be 25 and department may be manager and after that the third option that is the true or false you can set but I am going to leave as it is and try to execute it. So as remember the file is already exist so that is why you have to use the last parameter which is the overwrite as true otherwise you cannot execute this command. So it will be creating the file at the time this value could be false. If file is already there then you have to add this as true. Now let me try to read the value from this file again. So whatever the value is available on that file we can see. So as we can see it got executed successfully but somehow we can see the data is having like 6 webho 25 manager only this data is there. So we cannot see the data which were present earlier. So we can see only this data. Now let me go into the last one which is the RM. So RM means remove directory. So if you want to remove any particular directory then we can remove that by using the RM command. So let me try to remove one of the M folder which is available this. So let me use the DBUtils.fs.rm for the remove. Here we can use the temp. Let me try to execute it. So it should be going to remove all the values. But here as we can see it is we have set as recursive as false. So we have to set as true. Why again as we can see inside this folder we have the multiple files. So it will be going to remove all the files. So that's why we have to mark this as true. Now we can see output as in true. So it means command executed successfully and here we can see we don't have that particular temp folder. So this is all about that we have seen for the file system utility. Here we have left one of the command which is the head. So what this head will be doing? So it will be returning a first max byte of the given file and string encoded. So what does it mean? Let me try to use and we'll see in practical don't worry for now. So let me try to use the DBUtils.fs.head then we have to supply the path. So remember like in here we can copy that particular path and we can paste back here. What is the next parameter it is having? So as we can see it is having file as the first parameter. Then is the max byte size and this is the optional we can skip that. So let me try to execute and we'll see in practical what it is returning. So as we can see it is returning all the values. But if we want to see only particular values if we can go here. So this will be returning this much character so that we can see 65563. But in this file we have only these many characters. So that's why it is returning only this character. But we want to see only six from this output then we can simply call this as one and we can execute so we'll see only six. If you want to see first three characters so we can supply as three and we can see like first three characters so it is included comma as well. So this is the use of the head command. So this is all about the available commands inside the file system. If you have still any doubt in any one of this command then you can comment your question in the comment box. I will try to clarify there as well. Thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.