 Hello, welcome to SSU, we will take you to this side and this is continuation of PySpark tutorial. So in this video we are going to see about the join. So what are different types of join available inside the PySpark and how we can use all those we will see in this video. So today's azendize first we will see about the inner join, next we will see about the outer join. So inside the PySpark outer means the full outer join. Next we have the left outer join. Next we have the right join. Next we have NT and Semi. So these two additional type of joins are available inside the PySpark. So inner join will work as it is we have seen inside the SQL Server. So it will return the common record between both the data frames. Outer will return common record as well as the left side records and right side records. So all those records it will return. In case of the left it will return all the common records and the left data frames record. Right will do all the common records along with the total record from the right table. Next we can see the NT. So NT means it will return all the records from the left table. Those are not available into the right table. Next we can see the Semi. Semi is work as a inner join but it will return only records and columns from the left data frame. Let me quickly go inside the browser and we will try to see in practical. So here I am creating two data frames. First data frame for the employee and second data frame for the department. So inside the employee we have the employee ID name, department ID, gender and salary. Similarly inside the department we have the department name and department ID. Only these two columns. So for creating the data frame here as you can see we are going to create this data frame by using the spark dot create data frame. So by using this function we can create the data frame. So first it is asking about the data what is the actual data. Next it is asking about the schema of that data. So these two parameters it is asking to create any data frame. Similarly for the employee we can see the data for the actual data and next we can see the schema for the table structure. Now let me try to execute this query. So what we will see it should be going to create two different data frames and the values that we will see. Now we can see we have two data frames. The first data frame for the employee and second data frame for the department. Now let's start with the join. So first I just want to get the common records between these two data frames. So for that I just want to join with the EMP data frame. And here we just want to join with the right data frame will be your department one. So that is the department data frame. Now the next parameter inside the join it is asking the join condition. So on what basis we just want to join. So we just want to join on the basis of department ID. So we can go with the employee data frame dot then we have the department ID. If you can scroll a little bit upside then we can see the department ID in the employee data frame. Similarly we just want to join with the department data frame with department ID column. Last parameter it is asking which type of join we are going to apply. So first I am going to join with the inner type. So let me try to use the inner join here and let's see the common records between these two data frames. So it is saying this data frame is not defined. This we can scroll a little bit upside and here we can see it has EMP DF. Let me scroll downside. So here DF is in capital letter. So this is the case sensitive. So that is why let me execute it and we will see the output. So now we can see it is turning total common records. If we can scroll in the upside then we should see in the employee data frame we are having total 6 records. But in the output we could see only 5 records. One record is missing there because we can see the ID for department ID is 50. And this 50 department ID is not available inside the department name. So the employee ID 6 should not be in the output that we can see simply here. Now next we can use the outer. So outer as I told you it is going to treat as full outer join. So we can simply specify outer here and let's execute it. So what we will see it is going to return the common records between both the data frames that we can see. Like the first 4 and 6 record. And here we can see this null because this sales is not available inside the employee. So for the sales inside the employee we don't have any data. Similarly for the department 50 we don't have any data in the department. So we can see that is null. So it is going to return the common record between both the tables. And unmatching record from the right table as well as from the left table that we can see. Row number 5 and 7. Instead of using outer we can also use full outer. So both will be going to work as same as we can see here. So it is going to return the same output. Or you can also use full underscore outer. So it will also return the same thing. And we can also use only full. So any one you can use for applying the join. Now next type of join is left join. So I am going to use the left here and let's execute it. So this time it should return only 6 record. Because the 5 record is the common between both the tables. And one record that is not available into the right table. So we can see that is the ID 50. So that is coming here. So this is working as left. Or you can also use left outer here. So it will also return the same thing. R is missing. Let's execute it. So this will be going to return the same thing. Or you can also use the left underscore outer. So any one you can use. Let me go with the next type of join that is the right join. So this time it will return all the matching record. That is first 5 records. So that we can see first 4 and the last record. One that is unmatching we can see the same one. So it is coming from the right table. And that is not available into the left table. Or you can also use the right outer right underscore outer. So whatever you want to use. Now the next type of join is NP. So you can use the NP and execute here. So what it will return. It will return all the records from the left table. Those are not in the right table. So as we can see in the output. So it is returning department ID 50. So this department ID 50 is not available into the department data frame. So that is why it is returning here. And we can also notice one thing. It is only returning the columns from the left data frame. And in the right data frame that is the department we could not see anything. When we are going to use the other type of join like inner join, left outer join, right outer join. We can see data from the left data frame as well as from the right data frame. But in case of the NP and semi it will return. Only columns those are available into left data frame. So that we can see here. So instead of using this NT we can also use left NT. So left NT will also do the same thing. Or you can go with the left underscore NT. So we are having this much flexibility for using the NT, left NT or left underscore NT. Anyone you can use. The last type of join that is the semi. So in case of the semi it will return as the inner join. It will not return anything from the right data frame. It will return only the columns those are available in the left data frame. So let me use the semi and we can execute. So we can see total 5 records as matching that we can see here. But we could not see any column from right data frame. So instead of semi we can use left semi or left underscore semi. So all these will be working as semi join. So I hope guys you have understood how we can use all these type of joins. I will provide this data frame creation script in the description of this video. So you can use whenever it is required. Thank you so much for watching this video. See you in the next video.