 This is cluster analysis using basic Excel part two. This example is using some of the materials from John Foreman's great book, Data Smart. Recall, we are going to use the technique known as K-means clustering in which we evaluate our data points about our individuals, identify relationships in the database, and then assign individuals based on those characteristics into K groups, the number of groups, and then we try to minimize the distance from each individual to the group centroid. In this problem, you run JoyBags, wholesale wine and pourium up in New Jersey. It's an import-export business focused on bringing bulk wine to the States and then reselling it to Slake wine stores across the country. The way the business works is that JoyBags travels the globe finding incredible deals in large quantities and then resells it back in the States. Last year was a pretty good year, but you want to do better. You look over the data you collected. You keep in contact with your customers in a variety of ways, but you think the best thing are the email newsletters you send out about once a month. Last year you sent out, I think, 32 newsletters and in those newsletters you highlighted different varieties of wine and you gave information about the minimum quantity that people had to order, the discount, the origin, and whether or not that wine was past its peak. You have data on transactions, the orders from your customers, and you've got, I think, a total of 325 orders and you tracked the offer that the customer was responding to, so how can you use this data to better segment your customers? You think if you could find out which customer reacted to which newsletter and what part to the newsletter and trig them, you could really focus your advertising newsletters this year and do a lot better. But you have a limited advertising budget, so you wanna try to group the customers into these focused newsletters to optimize the results. In the middle school dance example, you saw that we had a space in which we had our individuals and the clusters. In this problem, we're gonna build a space, a two-dimensional space consisting of the 32 deals, 32 offers on one axis and the customers on the other axes. And in that matrix, we will show which newsletters they responded to and made an order in which they did not respond. On the transactions tab, we want to insert a pivot table. So I'm gonna just click inside my data, go to the insert tab, pivot table, and I wanna create a new worksheet, click okay. I'm gonna double click that and call that worksheet pivot, and then I wanna my offers to be the row variable. I want my customers to be the column variable, and I want to get a count of the offers. I drag that into the summation values area and then click the down area, go to the value field settings and select count. So I've got the basic matrix there with my offers in the y-axis and my customer names in the x-axis. And in the space, we have the locations of when a customer made an order in response to an offer. So you can see here, here's Alan made an order when he got or she got the offer number nine. Next, I wanna make a copy of the offers tab and name the copy matrix. So I'm gonna click on offer, right click and bring up mover copy, click on create a copy, click okay. Now I've got that tab and I'm gonna call it matrix. On the pivot tab, I wanna copy all of my data. So I'm gonna click there and start with Adams is the first customer's name and then drag this all the way down to the bottom of my orders, which is row 36. I don't need the grand total and then I'm gonna drag it to the right all the way through all my customers and I'm gonna stop there at CW 36 because I don't need the totals on the other axes. That's all selected and then I'm just going to right click and copy. Now I'm gonna go to the matrix tab and select cell H1 and then right click and then paste. And now I have created the basics of my matrix space because of a limited budget, you're only gonna do four special newsletters next year. So we need to modify our space in order to put in the four newsletter clusters. But we're not gonna do that on matrix. We're going to copy again, create a copy. And now I'm gonna rename this four matrix clusters. To make room for those clusters, I'm going to select the column H, highlight four, right click and hit insert. And that inserts four columns. And I'm just going to call these cluster one. What we're doing here in our matrix space, you can see we've got now our clusters here and we've got each of our customers and we're going to calculate the distance from a customer to a cluster and put it into the cells. That sounds a little bit tricky, but it's not if you pay attention to the steps I'm gonna show you. To help you understand how we're gonna find those distances, let's go back to the school dance. And here we have a student who's at location eight and two, eight on the vertical axis and two on the horizontal axis. And the centroid of the cluster is at four and four. So if we want to find the distance as the crow flies, that just happens to be the hypotenuse. And if you remember from geometry, Pythagorean theory says that the square of the hypotenuse is equal to the sum of the squares of the other two legs. So we've got those two legs. The vertical distance is just eight minus four or four. Horizontal distance is two minus four or two. So the distance as the crow flies is the square root of four square plus two square is equal to 4.47. We're back in our matrix space. And I want to show you how we convert that as the crow's fly distance into this space. Unlike the school dance, we really have more than two dimensions. We have to find the relationship of atoms to each one of the 32 offers. So that's 32 dimensions. And how do we run Pythagorean theorem on 32 dimensions? Well, I'll show you in a simplistic way how we do it. I've inserted two columns here. One I've labeled difference and one different square. And difference is just equal to the first cluster H2 minus the value in atoms in L2. And that is a zero. And of course, zero squared is zero. So I copied that formula down. We get to the first cell, the first offer where atoms has a real value of one. And here the difference is zero minus one, will be minus one, and we square that. So we continue that process all the way down to the bottom. And I've added in some labels here for distance to cluster one. And you can see that this value there is equal to the sum of those differences, sum of N2 to N33. And then we take the square root, getting back to the Pythagorean theorem. And that gives us say value from atoms in this space to cluster one of 1.732. Now we could add more columns and find the distance to cluster two, three, and four using the same process, but there's a shorter, quicker way using array functions.