 The goal of this program is to calculate the correlation coefficient, r, that tells how strong the linear relationship is between two sets of numbers, one an array called x and another an array called y. And this is the formula we're going to use. Let's solve this problem by hand, and as we go along, observe what we're doing and write it down as part of our Java pseudocode, half English and half Java. Here's the start of the pseudocode. We'll want a method called getCorrelation that takes two arrays of double as its parameters and returns the correlation coefficient. That will be public static double, getCorrelation, which takes a double array x and a double array y. We'll fill this in as we keep going through the process. Let's continue with the calculations. We'll start with the numerator. Let me explain some of this terminology. This x with the bar above it stands for the average of the x array. We read this symbol as x bar. Similarly, y bar is the average of the y array. What we're going to do is we're going to take each individual item from the x array and subtract its average. That's called the deviation from the average, also called the deviation from the mean. Then we'll take the corresponding entry from the y array minus its mean, multiply those together and add them all up. That's what this symbol stands for. It's a Greek letter sigma and it stands for sum. That's S-U-M. We're going to need the x deviations in both the numerator and the denominator and the same for the y deviations. So let's create two new arrays called x deviations and y deviations. To fill in this array, we need to know the average of x. 5 plus 3 is 8, plus 2 is 10, plus 6 is 16, divided by 4 is 4. 5 minus 4 is 1, 3 minus 4 is negative 1, 2 minus 4 is negative 2, and 6 minus 4 is 2. Now we'll do something similar for y. 15 plus 8 is 23, plus 19 is 42, plus 2 is 44. That's an average of 44 divided by 4 or 11. 15 minus 11 is 4, 8 minus 11 is negative 3, 19 minus 11 is 8, and 2 minus 11 is negative 9. Since we've gone through the same series of operations twice, it's worth writing a method that takes an array and returns the array of deviations. Let's add that to our pseudocode. We'll have get deviations, takes an array of double, and returns a new array of double with the original data's deviations from the mean. Again, mean is another way of saying average. That will be public static. Double array, get deviations, and we'll give it an array and we'll just call it data as a good name for a placeholder. What are the steps we're going to need to do? Get the average of the data array, use a loop to total it. Then we're going to run a loop through each item in data. Nope, I can't do that. I have to create a new array the same length as the data array for my result. Now that I've done that, I can loop through the data array, subtracting the mean from each item, and storing the result in my result array. And then I'll return the result array. I'll call this method twice in get correlation. I'll have a double array called x deviations, and I'll fill that in by calling get deviations with x, and I'll do the same for the y deviations using my method with y as the argument. Back to the calculations. I now need to multiply the corresponding elements of the deviation arrays and add them up. That will give me my numerator, and I may as well use the spreadsheet to help me with the calculations. That's going to be 1 times 4, which is 4, plus negative 1 times negative 3, which is 3, plus negative 2 times 8, which is negative 16, plus 2 times negative 9, which is negative 18. And there's my numerator. Let's put that in our pseudocode. We'll set the numerator to 0, and then run a loop that multiplies each element of x deviations times the corresponding element of y deviations, and adds that product to the numerator. This is a good place to take a break. We'll continue with the calculation in the next video.