 The statistics and data science offers a lot of different choices. One of the most important is going to be feature selection or the choice of variables to include in your model. It's sort of like confronting this enormous range of information and trying to choose what matters most, trying to get the needle out of the haystack. The goal of feature selection is to select the best features or variables and get rid of uninformative and noisy variables and simplify the statistical model that you're creating because that helps avoid overfitting or getting a model that works too well with the current data and works less well with other data. The major problem here is multi-colinearity, a very long word that has to do with the relationship between the predictors and the model. I'm going to show it to you graphically here. Imagine for instance, we've got a big circle here to represent the variability in our outcome variable. We're trying to predict it and we've got a few predictors. So we've got predictor number one over here and you see it's got a lot of overlap. That's nice. Then we've got predictor number two here. It also has some overlap with the outcome, but it also overlaps with predictor one. And then finally down here, we've got predictor three, which overlaps with both of them. And the problem arises the overlap between the predictors and the outcome variable. Now, there's a few ways of dealing with this. Some of these are pretty common. So for instance, there's the practice of looking at probability values and regression equations. There are standardized coefficients and there's variations on sequential regression. There are also newer procedures for dealing with the disentanglement of the association between the predictors. There's something called commonality analysis. There's dominance analysis and there are relative importance weights. Of course, there are many other choices in both the common and the newer, but these are a few that are worth taking a special look at. First is p values or probability values. This is the simplest method because most statistical packages will calculate probability values for each predictor and they'll put little asterisks next to it. And so what you're doing is you're looking at the p values, the probabilities for each predictor, or more often the asterisks next to it, which sometimes give us the name of star search. You're just kind of cruising through a large output of data and just looking for the stars or asterisks. This is fundamentally a problematic approach for a lot of reasons. The problem here is you're looking individually and it inflates false positives. Say you have 20 variables, each is entered and tested with an alpha or false positive of 5%. You end up with nearly a 65% chance of at least a false one false positive in there. It's distorted by sample size because with a large enough sample, anything becomes statistically significant. And so relying on p values can be a seriously problematic approach. Slightly better approaches to use betas or standardized regression coefficients. And this is where you put all the variables on the same scale. So usually standardized from zero and then to either minus one plus one or with a standard deviation of one. The trick is though, they're still in the same context of each other and you can't really separate them because those coefficients are only valid when you take that group of predictors as a whole. So one way to try to get around that is to do what they call stepwise procedures where you look at the variables in sequence. There are several versions of sequential regression that allow you to do that. You can put the variables into groups or blocks and enter them in blocks and look at how the equation changes overall. You can examine the change in fit at each step. The problem with a stepwise procedure like this is it dramatically increases the risk of overfitting, which again is a bad thing if you want to generalize your data. And so to do with this, there's a whole collection of newer methods. A few of them include commonality analysis, which provides separate estimates for the unique and shared contributions of each variable. Well, that's a neat statistical trick, but the problem is it just moves the problem of disentanglement to the analyst. So you're really not better off than you were as far as I can tell. There's dominance analysis, which compares every possible subset of predictors. Again, sounds really good. But you have the problem known as the combinatorial explosion. If you have 50 variables that you could use, and there are some that have millions of variables with 50 variables, you have over one quadrillion possible combinations. You're not going to finish that in your lifetime. And it's also really hard to get things like standard errors and perform inferential statistics with this kind of model. Then there's also something that's even more recent than these others called relative importance weights. And what this does is it creates a set of predictors that are orthogonal or uncorrelated with each other, basing them off of the originals. And then it predicts the scores and then it can predict the outcome without the multicollinear because these new predictors are uncorrelated. It then rescales the coefficients back to the original variables. That's the back transform. And then from that it assigns relative importance or a percentage of explanatory power to each predictor variable. Now, despite this very different approach, it tends to have results that resemble dominance analysis. It's actually really easy to do their websites. You just plug in your information and it does it for you. And so that's yet another way of dealing with the problem of multicollinearity and trying to disentangle the contribution of different variables. In some, let's say this. What you're trying to do here is choose the most useful variables to include into your model, make it simpler, be parsimonious, also reduce the noise and distractions in your data. And in doing so, you're going to always have to confront the ever present problem of multicollinearity or the association between the predictors in your model with several different ways of dealing with that.