 Centering is commonly discussed in multilevel modeling literature. Briefly, centering means that you calculate the mean of a variable and then you subtract that mean from the original values. That produces a new variable which has the same variance as the original variable but a mean of zero. So you've centered the variable at zero. This procedure is called the grand mean centering. There is also another way of centering the data called cluster mean centering or centering within clusters in which case you take the mean from each cluster separately and then you subtract those cluster means from the data. That also produces a variable with a mean of zero but the associations between that new variable and any other variables are potentially quite different from the associations with the original variable. Centering is useful but it's also sometimes misunderstood. The two most common misunderstandings of centering are that you always must either group mean centering or grand mean centering your data when you apply multilevel modeling. This is particularly in empirical articles. You quite often see people making a decision between these two ways of centering and ignoring the option of not centering the data at all which is also a useful strategy for analyzing data. The second misunderstanding of centering or confusion around centering is that when you read the multilevel modeling literature and if you read the applications of multilevel modeling in empirical journals you may be left with an impression that the decision to center the data relates to somehow the scale, relates to the scale of the variable. So if a variable doesn't have a meaningful scale then it must be centered. Centering actually is not related to the scale of the variable but it changes the interpretation of the regression coefficients. Let's take a look at what centering does and to do so we are going to take a look at the paper written by Enders and Tofiki. They present this example where they have three clusters or three individuals and the dependent variable is well-being on the y-axis and the independent variable is work hours on the x-axis and they have three individuals each measured five times and we can see generally that there's a negative correlation between these clusters so that when a person works more they have smaller well-being and there's also this within cluster negative correlation so that if a person works less then their well works more then their well-being will decrease. So how does centering these variables affect regression results and what does centering actually do? Let's take a look at these four panels. The first panel is the original data set, then we have a data set with grand mean centering, we have data set with cluster means and then we have a data set where we have cluster mean centered the data. Typically centering is only applied to the independent variables and not the dependent variable and this is the case here as well. Applying centering to a dependent variable is problematic because it reduces the variation of the error term and it causes your standard errors to be inconsistent and biased and whether the bias is large or not depends on the number of clustering and number of observations. Generally as a rule of thumb you should never center the dependent variable. Centering the independent variables can be useful and this article by Enders tells you what centering actually does. Let's take a look at the grand mean centering here first and we can see that the pattern of observations between grand mean centering and the original data is the same. So we still have the same cluster here, we have the cluster here and then we have the third cluster up here. The only difference between these two figures is that here the scale is different. So the work hours vary from minus 20 to plus 20 and here the work hours vary from 30 to 70. And if we show this graph to somebody without explaining the context they will wonder what does it mean that the person has minus 20 work hours because you can't work less than 20 hours per week. This is of course the interpretation of this axis here is that you are minus 20 from the sample mean or plus 20 from the sample mean being there at about 50 work hours. So grand mean centering simply shifts the axis, it doesn't do anything else, just these axis is shifted sideways and nothing else happens. So what does this cluster mean do? So the cluster mean is simply whatever is the mean value of these five work hour observations here and that's the mean value here and it's also the y value for this point is the mean of these well-being values here. And if we run regression on these data that gives us the between effect. So how does the average work hours of a cluster, how does it explain the average well-being of a cluster? So how do these clusters differ? That kind of questions are answered by using the cluster means in a between regression. So taking a cluster mean of a dependent variable is useful for estimating these between group differences. What does the within cluster or centering or cluster mean centering then do? We can see that here the cluster means centering, the pattern of the data is different. So we have lost all these sideways barriers. Now the group means of these variables are all the same and they are at zero. So the cluster mean centering eliminates all between cluster differences and it eliminates the between effects from the data. So how does that relate to regression estimation then? We can compare the regression estimation of cluster mean centering data and the grand mean centering data. Running a regression analysis on this cluster mean centering data will give you the within effect. So this dashed line here tells you how on average work hours and well-being is related within clusters. These three regression lines are cluster specific regression so we have the regression line going roughly the same direction in each cluster and then this overall regression line tells us the average within cluster effect. So it tells us if a person is going to work more how much one additional work hour per week reduces your well-being. That's a different question from how people differ. Then grand mean centering if we apply regression analysis it gives us the population average effect here. So the dotted line here is the within effect. So this regression here is the same as this regression here and the solid line is the between regression. So that's the regression between the cluster means and then the population average effect is a weighted average of the within effect and the between effect and it doesn't really have a clear causal interpretation. So the population average typically is quite difficult to justify because it combines these two effects. So then there is a third thing that you can do. Besides centering your data you can also combine the original data or the centered data and cluster means. So the bar symbol here the X bar hours is the mean work hours for a person so that's the cluster mean and then you have the work hours individual observations for that person and then you have the error terms. So you can run a regression analysis where you have two regression coefficients for the same variable. One version of the variable is the cluster means one is the grand mean centred or the original value of that variable. The first regression here gives us the contextual effect. So the gamma zero one here is contextual effect. Sorry this one contextual effect. So in that case the interpretation is how much my average work hours influences my well-being controlling for the work hours at a specific time. I have a separate video where I explain the contextual effect in more detail. Then if we are cluster means into the data and we use cluster means we get the between effect with as the coefficient of these cluster means. This gamma zero one is always the within effect. So depending on which effects we are interested in including the cluster means will give us either the within effect and the between effect or the within effect and the contextual effect. The contextual effect is a bit easier to understand than the between effect. So this grand mean centering or no centering at all is probably more useful for most people than this technique where you apply cluster means centering and then have the cluster means. So summary should you send your data? There are typically two ways to center your data and people make choices within these two ways. The one is a grand mean centering the first one. It simply shifts the scale of the explanatory variable. It doesn't really do anything for the regression coefficients. The only thing that changes in your analysis results is the intercept of the regression model. But in our review in the organization research method paper that I cited before out of those people who applied grand mean centering not the single article interpreted the intercept. So if only the intercept changes when a grand mean center and intercept is generally not interpreted then grand mean centering has been completely useless in these articles. It has the downside that when you draw plots then the plots will be drawn incorrectly because you for example get these are negative work hours and positive work hours instead of the actual work hours that vary between 30 and 70 per week. And our take is that grand mean centering probably should be avoided because it doesn't really accomplish anything of value and it can complicate the interpretation of the results. Cluster mean centering is useful. So cluster mean centering can be used to estimate the within effect so how a change in an individual level affects an individual level outcome. But the same thing can also be estimated if you include cluster means. So if you include cluster means and the original variable in the model you will get the same within estimate as you do if you just cluster mean center the data. Cluster mean centering after including the cluster mean is necessary and useful if you want to get the between effect but typically the contextual effect is more interesting in which case you don't need to cluster mean centering. Of course there is the option of not centering your data at all but just give them in the original form and that has the advantage that all plots of your and all predictions using your model will be in the right scale so you don't predict negative work hours and so on and if you want to get the within effect you can get that without centering just by using the cluster means. So in our article one of the main points is that in most cases where people center their data they probably wouldn't need to do that.