 Hello, welcome to SSUnitex so see this side and this is continuation of PySpark tutorial. So in this video we are going to see about the GroupBy function and we will be going to see about the aggregate functions and we will also see about the AGG function. So as per today's agenda first we will see about the GroupBy then we will see the aggregate function and last we will see about the AGG function and we will see the real time use of all these. So let me quickly go inside the browser and we will try to see in practice. So here I am going to read the data from one of the CSV file. So what I am doing I am first declaring this schema of this file. So here the SOID is in integer and then we can see quantity and value is in integer. If you are going to directly read the data from the CSV file without specifying the schema it will be going to specify and return all the columns as an string. So make sure while you are going to read the data from the file data type should be proper. So now we can see all this data. Now let's see start one by one. So first is the GroupBy. So GroupBy is very similar as we have already seen inside the SQL server if you have any idea. Then GroupBy is used for doing the grouping for specified the columns. Then immediately we can also use the aggregate function for checking the aggregate value of that table. So our requirement is we just wanna check how many items are here and what is the total count of those items. So like we can see we are having total 799 rows. So on these rows how many times each item is coming. So simply we can use df. Then we are going to use the GroupBy because GroupBy is going to use for the item name. So first we are required to do the GroupBy then we can use the aggregate function. So inside the GroupBy we just wanna do the GroupBy on the item name. Next we can use the aggregate function. So I am going to use the count. So this is the first aggregate function I am using. Let me put this inside a data frame. So that could be df1 and let me see this display of this data frame 1. So we should be seeing two columns as we can see. So item names and then the count. So we can see the count null is having 299 rows. So item name are missing there. So that's why null is coming. Then we can see this interest one is 17, 17, 17 then 16. So as per the item names we are getting the count. So simply we can use the GroupBy then we can use the aggregate function whatever you want to use. Now next let me try to check the maximum quantity as per the item names. So simply we can use the same query up to the GroupBy. Here while we are using the aggregate function as count. So we can use the max and then we can also specify the column name. So that is the quantity. Let me try to execute. And here we can see the maximum quantity of this item is 8. So similarly we can see all these rows. Now let me quickly and check the minimum quantity. So for checking the minimum quantity instead of maximum we can use the minimum here. So we can execute and we will see. So the minimum quantities are coming. Now next we just want to check the average quantity for items. So we can use the average function. So that is AVG. Let me try to execute and we will see. So it will be going to return the average quantity as per the items. So that are coming. Now next we just want to check the total quantity that is sold as per the items. So we can use the same query and this time instead of this max we can use the sum function. Let me try to execute and we will see the output. So this is the total quantity. Next we just want to check the total quantity as well as the total value. So we just want to do the sum based on the item name of the quantity and the value. So we just want to use one more column over here. So for using one more column if the same aggregate function we are using then we can specify with comma then the value and we can execute and it will be working. So as we can see it is going to return two columns. First is the quantity and second for the value and we can see the item names. Now the next requirement we just want to include the item code in the group by. So as of now we have only one column. So for the item code here we can use the item code with comma item name. So both we can use over here. Now let me try to execute and it will be written the output. As we can see item code is here. Item name is here quantity and value all these are here. Now if you are having more than two columns then we can add as much column inside the group by as we have. Similarly if we have the same type of the aggregate function for those particular columns then we can also use directly here. Let's assume if we just want to check the total quantity and the average value then how we can do that. If we are going to use directly here .avg function and like this. So sum up the quantity and average of the value. Let me try to execute it will not be working because directly we cannot specify for the multiple type of aggregations. So what you have to do we have to use the agg function. So what this agg function will do it is indicating like we are having the multiple type of aggregates inside this agg function. So first is the sum of the quantity and second is the average of the value and we can wrap this inside the agg function. Let me try to execute and this should work. So as we can see it is returning the average of the value and sum up the quantity and item name and item code all four are here. This is all about the group by aggregate function and agg function. I hope guys you have understood how we can use all these inside the PySpark. Thank you so much for watching this video. See you in the next video.