 In this video, we discuss the basics of survey data, where it comes from, what it looks like and how to understand variables. Survey data are a product of a systematic data collection process which produces a form of data called microdata. Microdata allows you to do analysis in the same way as if you had collected the data yourself. It's quite typical for these data to be collected using a standard set of questions asked to a representative sample of respondents. Most survey data are collected for a specific research purpose. Standardised information is gathered about a range of characteristics that vary. This might be done by sending out a questionnaire for respondents to complete on paper or online, or it might be completed by interviewers, either face to face or over the phone. This picture shows an example of two questions from a survey. They ask a respondent what their sex is and to rate their mental wellbeing on a scale of 0 to 10. Respondents are often individuals, but these two can vary as some surveys relate to businesses, schools, households or other units. The same information is gathered from all respondents. Individual respondents fill in the answers to the questions. Here, in this example, using the questions from the previous slide, the first female respondent has scored their mental wellbeing as 8 out of 10. Each response creates individual level data in the survey data set, shown in this table which collects the wellbeing, score and sex of the first seven respondents. Respondents information is stored as cases in the data set, which can be used to produce statistical summaries. We're now going to look at some examples of surveys. The first example we are going to take a closer look at is the British Social Attitude Survey. In all but two years since 1983, a British Social Attitude Survey has been carried out to take an annual snapshot of public opinion on a range of key political and social issues. The survey is conducted by an organisation called the National Centre for Social Research, Natsen for short, a large independent social research agency which has a team of methodologists and a large field force of interviewers. Each year some questions are repeated and some are new to reflect issues of concern at any time point. The survey has a lot of impact in the UK and it's often reported in the news as a reliable source of information on shifts in public opinion. For example, over the years it has tracked the acceptance of same-sex marriage, attitudes to austerity and views about Brexit. The 2018 survey, for example, was funded by a number of sponsors including the Economic and Social Research Council, Government Departments and Research Foundations. It was designed to produce information about the opinions of adults aged 18 or over in Great Britain. To do this, a sample of potential interviewing addresses were identified and 3,879 interviews were achieved from randomly selected adults. The survey was conducted using computer assisted face-to-face method at the respondents home. Additionally, a self-completion questionnaire was left for respondents to fill in and return. Some questions are answered by all, while other questions are answered by only those who are randomly selected to answer that particular version of the questionnaire. The resulting data have been made available for reuse. Data and information about the survey can be found in the UK Data Service catalogue. This image shows how the catalogue page for the British Social Attitude Survey 2018 appears on the UK Data Service website. Another example of survey data is the Labour Force Survey. This is a household study that collects information on employment and unemployment for all those aged 16 plus and living in the UK. Data is available at both an individual and household level and it covers a wide range of related topics such as occupation, training and hours of work, as well as demographic information and characteristics such as health and education level. So what does survey data look like? If you collect data from a survey yourself, a fairly natural way to store the information is in a matrix, with records in one direction and the things that vary, the variables, in the other. And that's just what survey microdata looks like. This, for example, is what the 2015 Quarterly Labour Force Survey Open Teaching dataset looks like in the SPSS package. Data in this format is typically referred to as microdata, as the records relate to the original data collection units. So, if a survey gathers responses from individuals, the records will be for individuals too. The generic term for each respondent unit is a case. Statistical packages that hold data in a rectangular matrix like this typically have cases in rows and the variables in columns. The value in a cell is therefore the value for a particular variable for a specific case. It is quite common for statistical packages such as SPSS or Stata to store the values as numbers and to associate a label with that value. Other packages like R manage this differently. Some values are called missing values. These values are ones where there may be a value recorded, but it represents something we normally omit from analysis, such as did not answer or not applicable. Missing values are handled differently in different datasets and by different software. In this example from SPSS, minus 9 is used to represent values that are not applicable, as the question was not asked. Now we'll look at understanding variables. Good survey design uses questions that elicit useful responses and enable meaningful comparisons between groups to be made. A well-documented survey will allow you to trace back to the question that was asked to see who was asked the question and how. Some of the responses are stored in their raw form in the dataset and are easy to trace back to the question asked. For example, these images show the full-time part-time work variable as it appears in the data file in SPSS and the information that appears in the questionnaire. It shows the response options, what the variable measures and who the question applies to and was asked to. Other variables may instead be the result of some sort of calculation or processing after data collection was completed. These are known as derived variables and documentation held in archives with the data should be able to help you to interpret the meaning of these variables, probably through allowing you to understand how the derived variable in the data relates to the original questions asked. This may be a simple description or a flowchart. In some cases the code is shared and this is shown in the example here, where the SPSS syntax for the HBAN 12 category variable has been included. To summarise, in this video we have described survey microdata. Given an example of popular surveys in the UK, noted how the data are stored for use and identified some of their key features.