 of the subsection of tutorial 4 till now we were trying to visualize data sets either created synthetically or sample data sets using different plots in 1D, 2D and 3D. Now let us move on to part 2 which deals with univariate and multivariate statistics. So let us first look at the measures of central tendency. So central tendency is nothing but the middle point of a distribution and they are also called as measures of location. For example, we have mean arithmetic mean which is a single number that represents the whole data set. Now arithmetic mean does have its own disadvantages because even though it is easy to compute it can be affected by extreme values, outliers that are not representative of rest of the data sets. Which means if in a set of data say the marks obtained by students sitting in my class, if there are any extremely high or extremely low values then the mean is not a good representative measure to rely on. So we do have other measures of central tendency like the weighted mean or the geometric mean and we also have the median. We have seen about median while dealing with median filter in the tutorial on speckle, is not it? So median is nothing but a single value from the data set that represents the central value. So say let us take the same example that is marks of students obtained. So to first find median we arrange the data set that is the marks in either an increasing or a decreasing manner and then we pick the middle of the marks, middle value. So the advantages over mean is that medians are not affected by extreme values. But then you know there are certain statistical procedures which utilize median that can get slightly complex than when we use mean. Now shown here are also the measures of dispersion or spread of data in a distribution. In the previous slides we were trying to define the scale and location. Location measure of central tendency and then scale is measure of dispersion. Now let us try to look at multivariate statistics and let us think of n as the total number of samples in an image. First we will try to understand what is covariance. So here I am going to consider two random variables X and Y. Now X and Y can be anything it can be say the digital numbers of the same image but which is captured in different regions of the electromagnetic spectrum say X can be dn values captured in band 1, Y can be dn values captured in band 2 and so on. So the covariance between two random variables is a measurement of the nature of association between the two. So first let me show you the expression for covariance and then we will discuss. So here K and J refer to the bands as I specified band 1, band 2 and mu of K refers to the mean sample captured in band 1, band K and sample mean captured in band 2 or band J respectively and small n is nothing but the total number of samples in an image. Now if you look at this expression closely say you have two variables X and Y. I will use that because it is familiar to all of us and if we look at the expression of covariance we will find that if large values of X result in large values of Y or say small values of X result in smaller values of Y, a positive X minus mu will result in positive Y minus mu and negative X minus mu will result in negative Y minus mu which means the product X minus mu and Y minus mu will tend to be positive. Here instead of dn i, j I am using X and instead of dn i, k I am using Y for the sake of clarity. Let me re-iterate assume large values of X results in large values of Y, small values of X results in small values of Y in this particular situation a positive X minus mu will result in a positive Y minus mu and similarly a negative X minus mu will result in negative Y minus mu which means the product will always be positive X minus mu into Y minus mu that is the numerator of the expression is going to be positive. Now let us consider the reverse case. Okay? Say larger values of X result in smaller values of Y then the product X minus mu into Y minus mu that is numerator will tend to be negative. So, what I am trying to point at is that the sign of covariance whether it is positive or negative it indicates the relationship between two random variables whether that is positive or negative. Okay? Of course when X and Y are statistically independent it can be shown that covariance is 0 but then the converse may not generally be true but nevertheless moving on. So, here is a quick question for you. As in can we create the variance covariance matrix for two or more bands of data or say if we have two bands of data what is going to be the size of variance covariance matrix? Okay? We will see that how to achieve the same using Python that time we will discuss about the size of variance covariance matrix but moving on we have another measure known as correlation. Now correlation is used to estimate the degree of relationship between variables in a manner so that it is not influenced by the measurement units. Okay? So, we use the correlation coefficient. You can think of it as a scale free version of the covariance. Okay? It is widely used in statistics and we also use correlation analysis as a statistical tool to describe the degree to which one variable is linearly related to another. Please focus on the terminology is amusing. Correlation as a scale free version of covariance we use it as a statistical tool to describe the degree, the strength, the degree to which one variable is related to the other. Okay? So, as before let us see how to achieve this using Python. So, right now we stopped at how to get the PDFs for random variables corresponding to normal distribution that we saw first and then we saw the same for chi exponential and uniform. So, now let us try to compute the measures of central tendency for the temperature variable. So, let us see the data as before. So, I have removed the empty blank columns in the excel sheet that is why you do not see the nan values towards your right hand side just the data is being displayed now. So, I want to estimate the mean of the variable variable being temperature I am going to use data dot mean command simple straightforward commands and say I want to estimate the standard deviation. So, one of the measure of dispersion spread of data. So, again I am you going to use the command data dot std standard deviation. Similarly, I can get the value of variance dot var where similarly let me quickly type the command to get the minimum value of the variable that is temperature. So, dot min and the maximum value that is dot max. I want to print all these as well. I want to print the mean standard deviation variance minimum value and maximum value not going to display the range because that is self-explanatory. Let me quickly type the name that is what I want to display while I take care of the syntax minimum of temperature and then max maximum. So, there I have the measures of central tendency and the measure of dispersion which is being displayed. You can even round off these values because you can see that there are many digits available after the decimal you know dot. Now, to compute correlation values I am going to use the precipitation variable here. Once it is read into precip or you can even use X. I am going to read temperature into temp. Previously, we worked with Y and say I want to print the Pearson's correlation coefficient going to type dot c o r r correlation. And let us try to look at the Spearman's row Spearman's correlation coefficient. So, here the variable is precip dot c o r r going to specify the method here as Spearman. Similarly, we can have the Kendall's tau as well. Let me specify the method small errors syntax yes. So, now you see the Pearson's correlation coefficient the Spearman's row and the Kendall's tau displayed. As discussed earlier similarly, I can compute and display the variance covariance matrix. So, here there are two variables. So, the size of variance covariance matrix is going to be 2 cross 2. Similarly, I can compute the correlation matrix which is also going to be 2 cross 2. Let us try to move to the third subsection of this tutorial that is namely regression analysis to be more specific linear regression. So, shown here is a scatter plot I have used the same example that was shown in the previous slide that is on the x-axis you have marks obtained by students in a class during middle of the semester and on the y-axis I have the same marks obtained by same students but towards the end of the semester. So, here along with the scatter plot I have drawn a regression line that is put in place by fitting the lines visually among the data points. So, in regression analysis we always have something known as an independent variable or known variable and we have the variable which we are trying to predict namely the dependent variable. So, we have the independent variable and the dependent variable. So, in this case the relationship is linear but relationships can also be inverse independent variable causes the dependent variable to change. So, mathematically the equation of a straight line is nothing but y equals mx plus c where y is the dependent variable, m is the intercept, x is the independent variable and c is the slope of a line. So, without going into too much details let us try how to fit a regression line over a scatter plot using Python. So, here to avoid repetition I have typed in the commands. So, you can see ax dot plot I have used for both the variables that is precipitation and temperature and I want to display the r square value as well. And as before I have used the plt.gca and the plt.legend and plt.show. So, this is how you get a linear regression line that is displayed on top of a scatter plot and you also get to display the equation as well as the r square value. So, till now we have been seeing how to visualize a set of data points. We started with a sample data of precipitation and temperature. We saw how to create box plots, plots in 1D, plots in 2D and 3D respectively and we also learnt how to work with empirical CDFs and how to display the PDFs of random variables pertaining to certain specific distributions and we touched upon linear regression. So, in the next section of this tutorial we shall be learning more about what is autocorrelation, what is lag and we shall be touching upon hypothesis testing. Thank you.