 Hello everyone, thank you so much for coming. My name is Megan. This is Amanda, Matt, Erin, and Logan. And we looked at exploration on cannibalization and market basket analysis for furniture company. So cannibalization is just when a company introduces a new product and they are concerned that that product is taking sales away from products in their product line. So Steelcase introduced a new chair called Series 1 in fall 2017 and they're concerned that this chair is taking sales away from their other chairs. To receive two main data sets from Steelcase, the pricing data set, which was a pretty large data set about 8 gigabytes, and the cannibalization team didn't look at that one as much. We focused on the shipping data and we looked at the variables, and we were creating product line codes, shipment data, and quantity. So those six chairs of interest, we decided to use some quantity order throughout all the years of the data set. We decided to make racks. So each chart on this rack represents one chair. So this vertical black line represents the chair of interest, which was released in 2017. So as you see from this graph, it's very typical and we think that's because the chairs were stumped per month. So we decided to stump the chairs per year to get about a graph. So this graph represents all five chairs of interest that ordered quantity stumped per year. So as you see, there is an increase and then it starts to plateau. So again, there's vertical black line as the introduction of the newest chair. So as you see this steep drop off right here, and that's the 2019 data, you only see two months of it. So what's interesting is that if we zoom in, we have this second vertical black line. So that was the introduction of our previous chair. So as you see, in the year before, the year was introduced, there was a large spike. So we were expecting to see a similar spike with the introduction of this new chair, but we did it. Yes. So to try to calculate any potential cannibalization from series one, we decided to perform linear regression on the other five chairs of interest. So the goal was to predict what the sales for those five chairs would have been in 2018, had series one not been introduced, and then compare that those projected sales to the actual sales with series one. And as Amanda said, the sales for these chairs are pretty cyclical, so we decided to do a prediction for each quarter. So that gave us 20 regression equations as there's four quarters and five different chairs of interest. And then we decided to use 2013 to 2016 data, because we found that that gave us much more accurate predictions than what the entire data set gave us. This is just an example of one of our regression lines for our third quarter for one of our chairs. As you can see, there's only four data points on this graph, so that's just something to keep in mind going forward. Our squared value for this particular regression line is .733. So here we have a table of the residuals for each chair by quarter, so that's just taking what the actual sales were for this quarter for this chair, and then subtracting what we projected the sales should have meant in series one not been introduced. So as you can see, most of these sales are negative, which means that we're predicting higher sales than what actually was sold for each chair. And then this bar graph, these bar graphs kind of just reiterate that. So the bar on the left is what was actually sold for each chair with the bar on the right being what we predicted for 2018. And as you can see, for four of the five chairs of interest, the prediction was slightly higher than what was actually sold except for this metal chair. So that kind of indicates potential cannibalism. And then this bar graph compares what we projected was lost in total in this number of chairs sold, and then on the bar on the right is how many chairs of series one were sold in 2018. So as you can see, well, we had to kind of block out the quantities for legal reasons, but we projected that series one sold a little bit more than the total lost sales for the other five chairs. Yeah, so in conclusion, we are led to believe that series one did in fact cannibalize some of the chairs of interest just because we saw that plateau of the chairs in 2017 versus the lack of slight after series one was launched like we said soft seed in 2013. And then also the predicted sales were just consistently lower than what we had hoped they would be, although these conclusions may be overly pessimistic just because chairs are sold in trends like they had peaks and lows, so it's just hard to really tell. So for our future work, we're looking deeper into type series forecasting, which we did here, but the competence interval was just a little bit bigger than what we were expecting. And then we would just use more data. So we only have a year and a half after series one is like introduced. So it's just really hard to say for right now. All right, so the other problem that we've been investigating was market basket analysis, and that is a way of predicting what a future customer may buy based on what previous customers bought. So if customer A bought product A, B, and then they bought product C, and product B bought product A, and product B did customer C back do the same thing. So there's three probabilities that are investigated that are used, confidence, support, and left. And then just an example, so it's like an if, then statement. So the antecedents are the if part, so milk and eggs. And then the then part is the consequences, so bread, yogurt, and cheese. And since we're under confidentiality, we have to use generic products instead of the actual products that we looked at. We decided to go for food. Yep. So we have those support, confidence, left. I'll briefly explain what they mean. Support is a frequency of X and Y over the total number of items. So it's more or less a probability that those two items exist in that data set. Confidence is more of a conditional probability if you order X, what's the probability that you ordered Y. And then left is a little bit harder to explain. It's the total probability of support of both of those, the actual, and then over the support if they were completely independent. So it's that ratio. So what did the data look like? We got a bunch of pricing data. This is what we use. This is for the costs. And that was very large data set. They said eight gigabytes, and that had millions of rows and 40 columns. Each row represents an item in order. So we have an example of what this looked like. Like if you had order one and they ordered apples, they might order five of them. And I'm going to get that into this format in order to put that through a program called MLX 10. And this is what it kind of should look like. A binary, did they order apples and eggs in order one or milk and sausage in order two. So we put that through and got a list of rules. So from the CSV file, we threw that into a tableau and got, and we made a bar graph. So in the bar graph, we have the entity-densing consequences in the columns and then the weighted average in the row. So instead of looking at the probabilities all individually, we decided to put them together and create equation. So we did point one time support and we did a smaller support because we didn't think it was quite as important, but we still wanted it considered into the equation. And then we did point four times confidence and point four times. So this is an example of what the bar graph looks like. As you can see at the top, we have the antecedents and at the bottom, we have our consequences. And then on the columns, we have the weighted averages. And from the graph, we took all the rules and all of the weighted averages and we put them into a word document set up in the format and then the weighted average on the cat. And what we were looking at was we were looking at subgroups. So as you can see, we have three of the same products into three different subgroups here. And ideally, they're going to be around the same weighted average. And they're basically giving me the same thing. So what we did was we averaged all of these weighted averages and then just stuck them into a single group looking like this. And what we saw was two different types of products. One product was a unique product. So the chocolate peppers in hand, you can see they're only in their separate subgroups. They're not mixed in. And then the other product is one that's across the board. So if you see the eggs, the eggs are in all the subgroups, not just a single one. So in the future, we might want to look at how market bats get analysis before and after the introduction of that chair series one and see if that had any influence on that. But while we were kind of finishing up, we decided to look at how products were related in another way. We wanted to look at that in a network sense. So we created a network of products where the weights on each edge were how often those products were ordered in the same order. So using this, we want to cluster those items together. So we might want to, this was an example, we want to cluster this and then cluster those. Clustering, so what I used to, kind of clustering, it's about clustering and Mubang's method. Clustering, perspective clustering, did it in two very large clusters and some small clusters, but they often worked in just one and that was kind of not as ideal. And some clustering on those large clusters didn't really yield anything meaningful. So I tried the Louvain's method for hoping for something better and that created three clusters of decent size and around, like, 13 tiny ones, only one. But it did that with more consistency, which is what I really liked about that. So I preferred Louvain's methods clustering afterwards. And with this, CLK has an idea of what products were kind of related together. In the future, we might want to look at how product categories are related, so broader scale, and perhaps try some clusters for Louvain's. So this is a picture of us, we ended up being able to go to Steele Pace and present in front of some of their students there. So that was a really cool opportunity. And in conclusion, we just want to acknowledge our industrial partners, Brian and Connor and our professor David Austin, who had just done a huge help throughout the semester. And I'd also like to thank Pick Math for this opportunity.