 Hello, welcome to SSUniTek, so we'll decide and this is continuation of PySpark tutorial. So in this video, we are going to see how we can read the data from the single line JSON, multi line JSON and complex type JSON files. So before going to browser, let's try to see how many formats are available inside the JSON file. So total we are having four types of format. First is the single line JSON file. Second is the multi line JSON file. Next is the single line complex JSON file, so which is the nested values are there. And next is the multi line complex nested JSON file. So let me quickly go inside the files and we'll try to see all these formats. So as we have seen total four different formats are available. So first is the single line JSON file. So inside the single line JSON file, all the row values will be in a single row that we can see like the ID name is and department name. So all four are in a single line and the second row will be in the next row next and next. So going forward, it will be having all the rows in one by one and the data in a single row. So that is the single line JSON file. Next is the multi line JSON file. So in case of the multi line JSON file, the data will be same. But here all the data will be in different rows. So as we can see this is one row and the data which is in different different rows. So ID will come in one row then name in second is and department. So all four will be in different different rows. So this is multi line JSON file. Next is the nested single line JSON file. So in case of nested single line JSON file, the data will be in a single row, but it will be having the nested values. So for example, we can see name is an address. So total we are having three values in the outer JSON. But under the JSON again, we are having two values one for the city and second for the state. So this is a single line JSON with the complex data and last one which is the nested multi line JSON file. So in this we are having the values into each row and the address will be again split out into different different rows. So under the address we have city and state. So it is representing into two different rows. So that we can see we are having total four different formats. So let's try to read the data from all these four formats. Let me quickly go inside the browser. I have already placed all these four formats file here. So go to on the Azure Databricks and under this notebook, we can see the cluster is up and running. So first we are required to read the data from single line JSON file. So how we can read it? So for reading the data from single line JSON file is very straightforward. We can simply use the spark.read method. Then we have to specify the option. So under the option, we have to set like single line and this is true. It means we are fetching the data from the single line JSON file. Next we are fetching the data from the JSON. So we can specify JSON and then we simply specify the path. So as we have already created the mount point which is the MNP input. So it is pointing to input location and under that input, we can see we are having this employee dot single line JSON file. We can copy this name and specify that name over here. And simply let me add this data into a data frame and this data frame is single line. And let me try to see the data from this data frame. So DF dot single line. Let me try to execute it. So here we can see we are able to see the data directly. So there is not a big challenge. Simply we can fetch out all the data and we can read it. Next for reading the data from the multi line JSON file, we have to make a small change on this code. So what change instead of the single line, we should be going to use the multi line. So once we will specify the multi line, then we are able to see the data from the multi line JSON file. Here we have to specify the multi line and let me use the multi line. So what it is saying, this location is not available. So let me quickly go and see the file name. So that is the employees ML. So we have to use the employees ML. Let me try to execute it. So now we should be able to see the data from the multi line JSON file that we can see. So all the data we can simply fetch out by making the change on the option as multi line instead of the single line. And if we are not specify this value as multi line, so let me try to remove this option here and let me try to execute it. So what will be happening? It will be causing the problem and will be saying like this is not allowed. So we have to have specify the option as multi line. By default it is taking as single line. Now go to the next one. So here we need to more concerned about the complex single line JSON file because here we have to specify the schema before fetching the data because it is a complex JSON file and nested values are there as we can see. So here the address is having two more column, one is city and second is state. So this is little bit complex. So what we have to do? We have to declare the schema. So the first schema we have to declare as address. So address will be having only two columns, one is city and second is state. Then we have to add one more schema and that is schema for whole JSON. And here for the address we have to specify the data type as schema for the address that we have declared. So let me try to do that so it will be very easy to understand. So first remember for declaring the schema we have to use the from pyspark.sql.ypes. Then we have to use the import and here we have to use the stuck type. Second we want to fetch out the stuck field. Next data type we are required for the integer. Next we are required for the string type. So I guess these four data types will be enough to declare the schema. So the first schema we have to declare as address. So I am going to specify address schema. Then for specifying the schema first we are required to use the stuck type. And under the stuck type under the brackets we have to add the columns. So for adding the columns we are required to use the stuck field. And here it will be asking three parameters. The first parameter will be the name. So the name of the column is actually name. The data type of this is string. So we can add the string type like this. And the last parameter is asking whether this value will be in the label or not. So I am going to specify this as false. The next one will be state. Actually this is for city. So let me rename this as city. And this is for the state. So simply we can add the state here. So we have done with the address schema. Now we have to declare the schema for the customer. So for declaring the schema for the customer let me add the customer schema like this. Here we have to use the stuck type. And inside the stuck type let me add the stuck field. So those stuck field will be having all those required columns. The first column is name. We can see second is the age and then address. So name and the data type of this will be a string type. And last column we can mark this as false. Now let me add another column. So the second column is age. So we can add the age and instead of string that will be integer type column. Now the last one is address. So we have to add the address column here. But we need to make sure like this address the data type of this will be your schema. So the schema that we have created the address schema. So this is the only thing that you have to remember. So if you are having any complex json. So first you can declare the schema of the inner json. And for the outer json we can specify the data type of that particular complex column as that schema. So that's it. Now let me try to read out the data. So let me put this data inside the nested single line json. Or reading it we are required to use the spark dot read method dot. Here we are required to specify the option. So inside the option we are fetching the data from the single line json. So we can mark this as true. Next we are also need to specify the schema. So the schema is just a schema. So this is you have to remember we have to add the schema as well as we have to specify the option. Now here we simply specify the json and we can read the data from that json file. So this is from the input location and the file name we can get it from here. So that is the customer nested single line json. Let me copy it and specify that name here. And let me try to use the display command to see the data from this data frame. Let me try to execute it. So here we are getting the null for the as and address because we have not specified the address and as columns as it is as we have inside the file. So the as we should be having the as in small letter and for the address d is missing. So that is why we can see the null there. Let me try to re-execute it. So now it should be reflecting all the data. So here as we could see the name is sustile as is 30 and address we can see the city and state under that particular address. So we are successfully able to see the data for the complex json. Let me quickly see for the last option which is the multi-line complex json file. So here again we are going to use the same query but we are required to add the option here. So in this option we can specify the multi-line value as true. And let me change this single line to multi-line and this data frame as well that's it. Let me try to execute it. So everything will remain same and we should be able to see the data. So as you could see we are fetching this from the multi-line complex json file. So let me recap what we have understood over here. So the first thing we are fetching the data from the single line json file. So there is no need to do anything. Either you can specify the schema or you can skip for specifying the schema. If you will specify the schema then the data type will be according to your schema otherwise it will be as string. Next for the multi-line json file simply instead of single line we can mark that as multi-line that we can see here. But while we are fetching the data from the complex json first we have to declare the schema of the inner json value. So that is for the address we have specified. Then we have to specify the outer json schema. And here for the address column we have to specify whatever the schema that we have declared. And rest will remain same for reading the data. We have to specify the schema and we will be using that schema. And here if we can go and we will try to see the column data type then we should be seeing the data type which we have declared. For the name is string for the age is integer and city and state is string. So whatever we have to specify we can see it will be working accordingly. And for last option again it will remain same. But here under the option we have to specify and marking this json as multi-line. So this is what we have to do. If you are having the multiple columns those are having the complex json value. On that scenario here as we have added only for the address schema you can add the schema for that column as well. And for creating the final schema json we have to add that column schema as we are specifying there. So this is the way by which we can simply read the json files. So I hope guys you have understand how we can read the data from all these json formats. Thank you so much for watching this video. If you like this video please subscribe our channel to get many more videos. See you in the next video.