 Hello, welcome to SSUniTeX in this site and this is continuation of Azure Databricks tutorial. So today we are going to start with the Databricks file system. So what is the Databricks file system? How we can access that inside the Databricks and how we can upload the file inside the DBFS and how we can read the data from that file we will see in this video. So let's get started. So first thing we have to understand what is the DBFS. So Databricks file system is a distributed file system and which is mounted into Azure Databricks workspace. So what does it mean by mounted? So mounted means we can say it is attached inside the Databricks. So we have created the connection and by using that connection we will be attaching this file system inside the Databricks workspace. And it is also available inside the Databricks cluster. Next the DBFS is an abstraction on top of scalable object storage. Next the default storage location in DBFS is also known as DBFS root. And inside the root we will be seeing three folders. First is the file store. Second is the Databricks dataset. First is the Databricks result. So like these three folders will be created by default while creating Databricks. So we will be able to see only the first inside the UI which is the file store. And apart from that like last two folders we will not be able to see directly inside the UI. So for that we have to write some code for accessing these two locations. So don't worry we will be accessing those by writing the code you will be seeing in this video. So what type of information we will be having in these three folders? So in the file store it is imported data files, generated plots and uploaded libraries. So we will be also uploading the files directly inside the file stores. Next is the Databricks datasets. So it is sample of public datasets. So under this we will be having few of the public datasets. So you can access and you can play with the data. So that we will see. Last is the Databricks result. So the files generated by downloading the full result of the query will be keeping in this folder. Here we can see the data. So we are playing with the data only. So data will be available in one of the storage locations. So that could be your Azure Blob Storage. And inside the DBFS we will be accessing this data. And inside the meta store that will be on your Databricks. So we can divide this into three parts mainly. Here is the Databricks and here is your actual data and it is stored inside the Azure Blob Storage. And DBFS will be making the connectivity and attaching that inside the Databricks. Now go to on the browser and we will try to see in the practical. So I have already logged in inside the Databricks. And we have already created one of the cluster in the last videos. So if you haven't watched those videos, I would strongly recommend to watch that video. I will provide the link of those video in the description of this video. Now here if you can go directly inside the data. So inside the data pane we cannot see any option for the DBFS. So how we can access the DBFS here. So for that you have to go on your login email ID. And then you can go on the admin console. And inside the admin console you will see like the users group. So you can directly jump inside the workspace setting. So under the workspace setting we can scroll down side and where we will see the advance option. So under the advance option we will be seeing for the DBFS file browser which is disabled now. So we have to make this as enabled. So when we will be making this as enabled, then we have to refresh and reload the space. So after that we will be able to see the DBFS inside the data pane. So let me quickly go inside the data pane. And under the data pane now in the top side we can see the browse DBFS. Earlier we were unable to see the DBFS here. Now let me click on that. So here as you can see we are having the file store. Under that we can see like this is the file which is the ns.xlsx and this is the folder. So when first time you will be login here you will not see this ns. I have uploaded this ns file here. So that's why we will be seeing. And under the tables as of now we don't have any. So here we can see in the top side is the path. So DBFS is the optional either we can provide the DBFS. If we are not providing the DBFS then automatically it will be taking that. Now as I told you in the slide so we are having three folders. This is the file store then the Databricks data sets and Databricks result. But if you can see here we can see only file store other two folders are not visible here. So how we can access those for that we have to go into the notebook and will try to create the notebook and will be accessing and reading from there. So in another window I have login with the Databricks. So let me try to go inside the workspace. Go to on the user and go to on the login. And after that let me create a new notebook here. And let me call this notebook as one point demo something like this. So here the default language which is Python or we can select SQL Scala R. So I am going with the Python and after that the cluster. So this cluster we have created let me click on create. So once we click on create it will be creating a new notebook. Let me close this. So this is the notebook and here we can see the cluster is terminated. So we can go and try to start the cluster. So it will take around two to three minutes to be started. Now we are having few libraries which is the Python library for reading the folder path. So which is the DB Utils. You have to remember this and after that FS for the file system then LS for the list of the file and folders. And after that we can provide simply backslash. Now if we are going to execute this then it will be displaying all the folder and files which is available under the DBFS root directory. So let me try to execute it. So it will be taking little bit time because your cluster is not up and running. So let's wait until this will not be started and this query is not executed. So as it got an syntax error because we have to use this slash. Let me try to execute it again. So now it is going to return all these files but this is not visible as much clear. So we can use the display function. So this display function will be going to arrange this in a table format that we can see over here. So as we can see the first column which is the path. So this is the path and after that the name then the size and then the modification time. So the first one which we can see we are able to see inside the UI which is the file store. But other two are not visible as I have explained inside the slide. These two folders are also available. If you want to see how many files are available under this folder we can copy this path and instead of this slash we can specify this. DBFS is the optional either you can use that or you can skip. If you are going to use this it will return the same output. So under this we can see we are having some of the sample data for the COVID. So let me quickly go inside this folder and we will try to see how many files and folder are there. So once we execute it. So it will be going to return how many files and folders under this. Let me go into the second path and we will see how many files and folders are there. So we can simply copy the path and we can specify the path inside this LS and it will return the files and the folder under that particular location. So under this if we can go and we will try to see then we can see we are having this one of the file and which is the CSV file. So we want to read the data from the CSV file and we will see what type of data it is having. So let me quickly go here and for reading the data from any particular file location we have to use the spark dot read dot option and under the option we can specify like header. So header is the true. So that's why we can specify header as true. And after option we can simply call like CSV and we can specify the path. So this syntax you have to remember. Let me try to copy this path and we can specify the path here. So after that let me take this in a data frame. So if you are not very clear about all these then don't worry in our upcoming videos we will be seeing in detail about the data frame and how you can read all these. So don't worry for now. As of now you can only understand we can read the data into the files. So here let me try to show the data under this. So either we can use the show command or we can also use the display command. So let me going to show you both. So inside the show here we can see it is going to return the data as you can see here. But here it is also truncated few of the data that you can see like the three dots. So it is the data is truncated. So instead of using the show I always prefer to use the display command. So display command will be going to return the data as a table. So inside the table format that is quite easy. So it is executing and here we can see we are having all these tables data. This is all about the Databricks dataset. Now let me quickly go here and try to upload a file. So before going to upload a file let me try to open the file first. So I can show you what type of data it is having. So this is one of the Employee.csv file and it is having the data for the id name, as and department. So let's try to upload that data into the dbfs location and after that we will try to read the data. So we can go here and here in the top side we can see the option for the upload. So let me select this one. So I want to keep under the file store only. So click on the upload. Here we can browse. Let me quickly go here and try to select the Employee.csv file. So here we can click on done. So it should be available Employee.csv file. Now let me quickly go into the notebook and we will try to read the data from that particular Employee.csv file. So let me quickly go here and try to execute it. So it will be returning all these folders which is available under the dbfs root directly and under this we should be having the Employee.csv file. So let me try to compute this path and here let me specify the path. Let me try to execute. So this data frame will be having the Employee.csv files data. Now let me show you that data by using this display command. So you can see like the id, name, age and department. So this is all about the dbfs and how you can play all these. If you are not very clear then you can learn and remember few of the libraries which is a dbut. In our upcoming videos we will be seeing in detail about the Databricks Utilities and under that how many type of utilities available and how we can use all those. So thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.