 first differencing is an econometrics technique, that is sometimes used in panel data sets to deal with unobsertated originating or autocorrided error term. While this technique is not as common as for example fixed effects estimation, it's useful to understand what first differencing is about because this technique is used as a part of some of the more commonly used panel data analysis techniques. Let's take a look at what first differencing is and how first differencing can be used to deal with unobsertated originating and autocorrided error term. First differencing basically means that you take every variable, all the x variables, all the y variables and you subtract the previous value. So instead of modeling y as the dependent variable we model delta y which is y at t minus y at t minus 1. So you always take the previous value of y and you subtract that from the current value. You take the previous value of x, you take the current value of x, you subtract the previous value that gives you delta x. So instead of modeling levels of y with levels of x we model differences in y as a function of differences in x. That's the idea of first differencing. First differencing can be used to deal with unobsertated originating. So let's take a look at this population model. So we have y equals beta 0 plus beta 1 x plus a plus ut. So a is the unobserved term and we don't want to make the random effects assumptions so we assume that a i can be correlated with x and we want to estimate specific values or fixed effects for a. So how do we eliminate a from the equation? We can of course use the fixed effects transformation so subtract cluster means which will eliminate a i but we can also use first differencing. How first differencing works is that we are subtract the previous value from y and then we calculate the fitted value for the previous value of y. We subtract that from the right hand side, we rearrange the equations a bit and we can see that we have the difference in y is a function of difference in x plus some error term that is a composite of these two error terms. Because the errors are assumed to be uncorrelated with everything in the model this will consistently estimate beta 1. So we can estimate beta 1 of this model using first differencing and we don't need to care about this unobserved effect. So first differencing like the fixed effect transformation it eliminates the unobserved effect and allows us to consistently estimate beta 1 from the data. There is also another reason why we might be using first differencing and it relates to autocorrelation. Particularly if we have a highly persistent data set and the error term is strongly correlated over time or the dependent variable is strongly correlated over time then we can break that correlation by using first differencing. In an extreme case we would have a random walk where particularly this reads the error term where the value of the error term depends on the previous value of the error term plus some random component. Is this drift that the current value is the sum of all past values plus some random component? How does first differencing deal with this problem? It deals with the problem by taking the difference in the error terms. So if we take the difference in the two error terms if we have this random walk then uit-1 contains all the past values and what is left in uit is simply the error term ut here. So it allows us to break the autocorrelation of the error term and then the standard errors of OLS estimation would be okay. So is this an ideal way of dealing with autocorrelation error? Maybe not, but it's one way. Another perhaps more better way would be to use cluster over standard errors in which case you don't have to adjust the model. You can simply use OLS regression without considering how the interpretation of beta 1 would be here. But this is another alternative. So the first differencing and fixed effects they accomplish the same. They are most commonly used for eliminating the unobserved effect A i from the data assuming that the A i can be correlated with x variables. When there are just two observations for each cluster these two techniques produce the same estimate. Both of these techniques are consistent so if the sample size goes to infinity they will also produce the same estimate. Similarly, they do not produce the exact same value. So which one should you apply? The fixed effects approach in practice it's much more common than first differencing. So if you simply want to deal with unobserved data generating you don't need to know much about first differencing because fixed effects is more common and will do the job. However, the first differencing is actually used as a part of some other estimation techniques for panel data that I will talk about in other video and this is the reason why understanding first differencing is important.