 In this short video, we'll be looking at a really important aspect of working with longitudinal data, using weights. Survey data sets contain information about a sample taken from a population, but what we really want to do is to use that sample to say something about that population. Data producers calculate weights to make the data more representative of the population it's designed to reflect. A weight and variable assigns a value to each observation in the data to indicate how much they should be represented in the analysis. These weights are an important part of ensuring our analysis represent the population that we're interested in. So why should we weight data? There are a number of factors which can make survey data unrepresentative of the population. Factors such as the sample design or non-response, which we discussed earlier in this section, can lead to certain groups being under or over represented in the data. And this can be a problem because if we don't adjust for these factors our results can be biased or inaccurate. This applies to descriptive statistics as well as estimates from more complex statistical models. Weights allow us to compensate for these factors and most social survey data sets will include weight in variables. So how do these weights work? We can see a very simple example with just two cases here. In unweighted analysis each of these units or cases are counted as one observation. In weighted analyses each case can represent any number of observations and the number of observations the case counts as is given as a weight invariable or weight. So let's have a look at this example. The unweighted mean income for these two cases is calculated like this. However when we apply the weight the first case now counts as two observations rather than one. And so this means that the weighted mean income is calculated as such. The survey weights are typically a combination of several weight and adjustments which aim to compensate for multiple factors. And they allow us to ensure that the sample is representative of the population we're interested in. There are different kinds of weights including design weights which adjust for the unequal selection probability we introduced earlier in this section. These can also be called probability weights. The selection probabilities are known by the survey team and therefore the weights can be calculated by dividing one by the selection probability. Non-response weights are designed to give greater weight to those cases who are underrepresented in the data. Similarly to the design weights the formula is one divided by the response probability. However this probability isn't known so it's estimated based on certain assumptions and the limited information known about those that don't respond from the sample frame. Non-response weights can help to reduce the bias in population estimates. However it's important to remember that they are only estimates and they're based partly on the assumption that within a particular group those who respond are the same as those who do not. So if this assumption doesn't hold true then our results may still be biased. Through non-response or through chance the characteristics of a sample can differ to those of the population. For example the age distribution may differ between the two. Calibration or post stratification weights adjust the sample characteristics to match that of the population and this is done often using information from the census. Calibration weights are derived by defining set groups using socio-demographic variables such as area or sex where the population totals are known. The response rate is then calculated at the population level for these groups and the calibration weight is thus the inverse of the population level response rate. Typically calibration weights are derived using multiple schemes and so they can be very complex. They're very commonly used in surveys that are organized by the Office for National Statistics for example. Now the final type of weights here is the scale or grossing weights. These weights scale the population up to make it represent a much larger sample or in the case of grossing weights to match the population total. Weights are included in many datasets as variables and applied using the functions in statistical packages. Weight and variable names often include the word weight or an abbreviation such as WT so they're easier to spot in a large dataset. For example W1 FIN WT is the design and non-response weight that's included in the first wave of the longitudinal study of young people in England. The best place to find information about the weight and variable in a dataset is in the documentation that accompanies the survey. Most often this information is included in the survey's user guide or technical report. Some datasets will contain more than one weight and variable and this is especially common in complex studies such as longitudinal studies. This is because these studies can be used in a number of different ways. They can be used cross-sectionally or longitudinally and in the case of understanding society for example the data can also be used at the individual level or at the household level. So this is where you should really consult the documentation because this will give you lots of information about what weight and variables are available, what they each reflect and that will really guide you to choose the most appropriate one for your particular analysis. As you can see from the table on the right hand side which comes from the understanding society data there are sometimes a great number of weight and variables to choose from. Statistical packages such as SPSS have functions built in that allow you to apply weights to your data. So once you've chosen the most appropriate weight and you know the name of that variable in SPSS for example you simply use the drop-down menu data and weight cases and in the next section you'll have the opportunity to practice this. In terms of when you need to use weights if you are doing your initial exploratory analysis to have a look at whether the sample size is sufficient for what you need then you shouldn't apply the weights here. This is because the weights may scale the sample size up and give a false impression of the sample size. Once you're happy that your data set has a sufficient sample size and you're ready to get started with your analysis then in most cases you will need to weight the data.