 Hello, Welcome to SSUnitech Social Decide and this is continuation of PySpark tutorial. So in the previous video of this video series, we have seen about the introduction part of the PySpark and why PySpark is important to use inside the Azure Databricks. So if you haven't watched that video, I will provide the link of that video in the description of this video. So today we are going to see about the data frames. So what is the data frame? So before going to jump into the practical session, I would say this is very important to understand about the data frame because here inside the PySpark everything will be handled by the data frame. So what is data frame? So data frame is a distributed collection of a data organized into the named column. So for example, if you are having some data, so that data should be going to organize into the distributed format as a table, so that we will be having the columns and the rows. Next it is conceptually equivalent to a table in a relational database. So as I told you, it is very similar to the table that we have already seen inside the SQL server. So that we will be having your table name and then the column names and the rows as a data. Inside the PySpark or the Python or R language that is called as a data frame. And it is also an API. So in the simple word we can say like the data frame is an API. How the data frame will be going to work in the real time. So here we can first understand about the flow, how data will be processed. So for example, first we are having some of the source data and the source data will be on the CSV file, JSON file or any other format. Once we are having this data, so first thing come into the picture for reading the data. So how we can read the data. So for reading the data we are required an API. So that API is called as data source API. And here we are using the method that is the reader method. So in the simple word we can say first we are having the source data. We should be going to read that data. So for reading that data we are required something. So that something is the data source API. So this data source API will help us to read the data. So once we will be reading the data then we should be keeping that data in some place. So that is called as data frame. So data frame will come into the picture after reading the data. Here we have read the data and after reading it we are storing that inside the data frame. After storing the data frame we can also create another data frame by using the source data frame that we have seen here. So here we can say we are having this data frame one. So data frame we can simply understand a table. So this table we are having over here and after making some transformation on this table we will be creating another table. So for making that transformation we are required data frame API. So this data frame API will help us to transform the data. So after transforming the data we will be loading inside another data frame. So here we are having this data frame one that is a table one we can say. Here after making some changes on that table we are creating table two. So that is the again data frame. So once we have made all the changes and we are having a final data frame. So that data frame should be going to load inside the sync. So that sync will come into the picture as an write method. So as we are going to load the data or we are going to create any file and dumping the data there. So that is the part of the write we are writing something inside the sync. So here we are writing this table two into the sync. So that sync could be your ORC format or the packet format or any other file format or we can also dump inside the SSMS. So this is the flow that normally we are following in our upcoming videos. And this is the format is very important. So for writing the data into the sync we are again required some of the API. So that API is data source API as we can see but we are writing inside the sync. So we should be going to use the data frame write method for reading. We were using the reader method. So here we are using the write method. So let's summarize we are having some of the source. So first we are required to read the data for reading that data we are required some of the API. So that is the data source API. And after reading it we should be going to store in some of the temporary location. So in SSMS if you are familiar then you can understand something like the temporary table. So here the data frame is very similar to the temporary table. We are creating a temporary table or the data frame that is the table one I am calling here and then we are doing some of the transformation on the data that we are having inside the table one. So after making that changes we should be going to create another temporary table where we will be having the transform the data that we can see the table two. So that is the gain data frame. So after making all these changes we are going to write that data into the sync. So we will be writing this table two data because this is the actual data we need to process and loading into the sync. Here we have the raw data and this raw data we can see in table one and after processing we are having table two that is the formatted data and this formatted and transform data will be going to load inside the sync location. So I hope guys you have little bit idea about the data frame. In the next video we will be going to see how we can utilize this in the real time. So we will be seeing how we can create the data frame how we can read the data and how we can write the data everything we will be seeing by next videos. Thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.