 The goal of this program is to calculate R, the correlation coefficient, that tells us how strong the linear relationship is between two sets of numbers, one an array called X and the other an array called Y. Let's solve this problem by hand and as we go along let's stop and figure out what we're actually doing and then write notes to ourself that will help us solve the problem when we translate it to Java. The first thing we're going to need to do is we're going to need to calculate the average of X and the average of Y, that's X-bar and Y-bar, and we're going to need those twice so it might be worth the trouble to create two variables called X-bar and Y-bar. X-bar is going to be 5 plus 3 is 8 and 2 is 10 and 6 is 16. 16 divided by 4 is 4. Similarly for Y we have 15 plus 8 is 23 plus 19 is 42 plus 2 is 44 and the average of 44 divided by 4 is 11. Since we did the same thing twice this is a clue as to something we're going to need to do when we write our program. We'll want to write a method that takes an array and returns its average. Public static double average, we'll give it a double array called data and that's all we need and there's some code that goes in there. To get the numerator what we have to do is we have to take each item in the X array and subtract its average and multiply it by the corresponding entry in the Y array minus its average. That would be 5 minus 4 which is 1 times 15 minus 11 which is 4. Then we would take 3 minus 4 times 8 minus 11 then 2 minus 4 times 19 minus 11. But this subtraction is going to have to happen twice. We're going to need X minus X-bar all those items twice so it might be worth making an array for that. Let's create one called X minus X-bar and another one called Y minus Y-bar. Let's do the X minus X-bar first. 5 minus 4 is 1, 3 minus 4 is negative 1, 2 minus 4 is negative 2, and 6 minus 4 is 2. Similarly we have for Y minus Y-bar we have 15 minus 11 is 4, 8 minus 11 is negative 3, 19 minus 11 is 8, and 2 minus 11 is negative 9. Now I have these available to me to use again without having to redo the calculation. Time to write that down. We're going to write a method that takes an array and an average and returns a new array with the items minus the average. And these by the way are called the deviations from the mean. Okay, great. It's going to be a public static double array that gets returned. And let's calculate the deviations for a data array and its average, and the code will go in there. Now that we have these arrays, let's look at our numerator. Our numerator is going to be the sum of these products. It's going to be 1 times 4, which is 4, plus negative 1 times negative 3, which is 3, plus negative 16, plus negative 18. And that works out to 7 minus 16 is minus 9, minus 18 is negative 27. And there's our numerator. Let's write down some notes about what we just did there. Here's what our main method looks like so far. We're going to read the x array and we're going to read in the y array by prompting the user for those. Now what we're going to do is we're going to have to say that the x bar is going to be the average of x, and the y bar is going to be the average of y. Once we have those, then we can create our x minus x bar, and that's going to be the deviations of x and x bar. And we can have y minus y bar, and that's going to be the deviations of y minus y bar. Now I haven't declared these arrays and I haven't put the data types. This is pseudocode. I'm doing this half in English, half in Java. I'm just trying to get an outline of what has to happen to make this work. Now I have to say my numerator is calculated by running a loop that adds up the product of the corresponding entries in x minus x bar and y minus y bar. That's where I am so far. Now let's think about the denominator. Let's concentrate on the first part of the denominator. I'm going to have to take everything in my x deviation array, square those and add them up. That's going to be 1 times 1 plus negative 1 times negative 1 plus negative 2 times negative 2 plus 2 times 2, which is 1 plus 1 is 2 plus 4 is 6 and 4 is 10. That's the first part of my denominator. I'm going to have to do the same thing for the second part of my denominator, and that's going to be 4 times 4 plus negative 3 times negative 3 plus 8 times 8 plus negative 9 times negative 9. And that works out to 16 plus 9 plus 64 plus 81. 16 and 9 is 25, 25 plus 64 is 89, and 89 plus 81. That whole business works out to 170. That means my denominator will work out to 10 times 170, which is 1700. And as long as I'm here inside of a spreadsheet, I may as well have it do the square root for me. And that's what that comes out to. Let's write that down in our pseudo code to say half English and half Java, what we just did. So our first item is going to be the sum of all the items in x minus x bar, each individually squared. And the second item in the denominator was sum of all the items in y minus y bar, each individually squared. Again, I'm doing the same thing twice. This is a good candidate for a method. So I'm going to write a method that takes an array and returns the sum of all the squared items in that array. It's going to return a single number, and I'm going to call it sum of squares, and I'm going to give it an array of doubles to work with. And there'll be some code there. So another way of saying this, my first item is going to be the sum of squares of x minus x bar, and my second item is going to call that method again. And now I have my denominator. The denominator is the square root of my first item times my second item. Whether first item and second item are the best names for this, I'm not sure, but for right now, for my pseudo code, they'll do. Now that I have my numerator and my denominator, I can finally calculate my correlation coefficient. That's going to be the numerator divided by the denominator. Let's go back to the spreadsheet, and let's do that division. That's going to equal negative 27 divided by 41.23105626, and there it is. Now, if I wanted to check, by the way, I could also have the spreadsheet do the correlation for me. I could say, what's the correlation, and I'm going to give the first array, which is going to be B4 through E4, and the second one is going to be B5 through E5. So it turns out what I did by hand is indeed the correct answer. And there's one thing I forgot to do in my pseudo code, and that is, great, I've calculated the correlation coefficient, but now I have to print the correlation coefficient properly labeled. And there's my pseudo code. Let me make this a little bit larger here. Here's my main method that I've put together out of the parts that I found out about while doing the problem by hand. And to save myself a lot of repeated code, I have a method for the average, a method to calculate an array of deviations, and a method to do the sum of squares. And that should give you enough information to start doing the assignment.