 Hello, good afternoon and welcome to this webinar on the availability of data about the journey to work. My name is Oliver Duke-Williams and I'm going to be talking about data from census monquedata and longitudinal sources and some other sources available from the UK data service. And I'm joined by Basilis Rutsis who's going to be talking about census origin destination data. Today's webinar is one of a series of webinars run by a group of ESRC data and methods services. In this webinar, we've got contributions from Celsius, the Center for Longitudinal Study Information and User Support and from the UK data service and the other webinars in this series, again, drawing together two different sources of data with speakers talking about the availability of data on different subjects from different sources, the sources in which they're specialists and with some mention of other sources as well. So on March 19th, we've got the webinar coming up on ethnicity and migration. On March 27th, on data about obesity and on April 2nd, on data about education and further information about all of those can be found via the UK data service events listing. So what are we going to be talking about today? Well, I'm going to talk about census microdata and longitudinal data and then talk briefly about two other sources of data that contain information about the journey to work. After that, Bacillus is going to talk about the census origin destination data, which are, again, a different form of output that's released as part of the census. So what data are available about journeys to work in the UK? Well, before we even get on to considering that, I want to pose a question to you while listening to the webinar. And that is, how do you usually travel to work or to your place of study? So you should see that on screen now. And we've got five possible answers. OK, that seems to be settling down, so we'll close that poll now and show you the result. So what we can see is the largest group of people said they traveled by train or on a tube, and after that, Bicycle or on foot was the next largest mode of transport to work. In fact, this question is not as simple as it might initially seem. And these are some of the big questions that might have occurred to you when you were trying to answer that. Do you use more than one form of transport to get to work? Do you combine your journey to work with other activities? Do you have different places of work on different days? Do you leave for work from different addresses on different days? I also thought about answering this question by seeing what Google recommended to me. So I typed into Google the search term commuting to see what sort of pictures it showed me. And according to these pictures, we see that people mostly travel to work by car or by train. Some people by bicycle, some people walking. And we also see that unlike today, it's rarely raining. So how can we explore these ideas more carefully using data? Well, I want to talk about census data and about some other sorts of data. And the first of those that I want to talk about today are census longitudinal studies. Sorry, census longitudinal studies and census migratory data. And there are various types of questions that are asked in the census. And these include the location and characteristics of the workplace, the relationship between a workplace and a residence, the distance between the workplace and the usual residence, and method of transport, as well as all the other characteristics, socio-demographic characteristics that we ask people in the census. Census migratory data are often referred to as individual data. We have one record per person. And there are two types of migratory data, regular cross-sectional migratory data and longitudinal migratory data. And they're broadly similar. They contain all of the original responses to the census from an individual with the identifying characteristics such as name and address removed and with some responses rounded. If we think first about the census migratory data, I'm going to show you some information about what's available. There are various samples. And over time, they varied both in terms of sample size and access arrangements. And full of details about all of them and how you can use them are available by the website that's shown on the screen. I should say that all of our slides will be available after the webinar, as well as a recording of the webinar. These samples were first introduced, samples of microdata were first introduced after the 1991 census. And they were known as the samples of anonymized records. And although that term was only strictly used in 1991 and 2001, you'll find that many researchers still refer to these data collectively as samples of anonymized records. And you can see from this table that we've got quite significant variation in sample size. So the original sample of individuals in 1991 was a 2% sample. In 2001, there was a 5% sample available and also with more secure access, a 1% sample of households which have more detail. Since 1991 and 2001, we've had both a new sample of microdata from the 2011 census and also retrospectively some samples produced from all the censuses from 1961, 71 and 81. Those three older ones all have the same structure. There's a 1% open sample which is useful for teaching, a 5% safeguarded sample of individuals or 0.95% sample of households. And they're ideal for research and are very easy to use. There's also a 9% sample of individuals that's available via secure data services. And they're clearly richer than the 5% samples. But make more employment on the user of how to register to use them. You have to be an accredited researcher. The sample sizes for 2011 are broadly similar. If we think about longitudinal data, I want to look first at the ONS longitudinal study. That was the first of the longitudinal census data sets that we had in England and Wales, that we had in the UK, and it focused on England and Wales. And it was a 1.1% sample of individuals. And that sample comes from selecting four birth dates throughout the year. And four over 365 and a quarter gives us a sample rate of 1.1%. In the ONS longitudinal study, we've got census data from 1971 through 2011. And the some administrative data, most notably, perhaps, mortality data. So when people in sample have died, we get a linked record of their cause of death. There are also studies, similar studies, in Scotland and in Northern Ireland. Both of them have bigger samples. The Scottish LS has a 5% sample, and it has census data from 1991 onwards. The Northern Ireland LS has a 28% sample with census data from 1981 onwards. And both of them have more administrative data linked to them than the ONS LS does. And you can get information about all three of those from calls.ac.uk. If you want to use the longitudinal study, then we've got two access routes. You can either use them in person at a secure setting, or you can submit a status or a SPSS, etc. scripts to be run remotely by the support officers of the various students. In both cases, no data can be transferred out of the secure setting until it's had disclosure clearance. I've got some examples here of data about the journey to work from the microdata in the longitudinal study. Firstly, this is a bar graph produced showing two modes of transport to work from the 2011 census microdata. So this image has an advance, I think. On the top, we've got users of bicycles. And on the bottom, we have people who walk to work. And it's worth noting that they've got different scales. But we can see that there's a considerable difference between men and women of users of both modes of transport. Moving on to look at some work from the longitudinal study. Using longitudinal data, we can look at changes in people's behavior over time. So one thing we've looked at is whether or not people keep the same mode of transport for their journey to work over a 10-year period. And the data I'm going to show you were a very rudimentary analysis of this because they don't take into account change of address or change of workplace. But I'm going to show you some information about people who used a bicycle to travel to work in 2001. By their mode of travel to work 10 years later, the same people 10 years later in 2011. And you can start to think about whether or not you expect people who cycled to work in 2001 to still be cycling in 2011. Probably some of them have changed to a different mode of transport. But how many? Are the majority still cycling? Or is it only a small minority? Well, we can see the result of that here. So this is a Sanke diagram. On the left-hand side, we've got all the cyclists in 2001. And then divide it up into their outcomes on the right-hand side 10 years later. So the largest single group of them have moved from cycling to work to using the car to get to work. The next largest group of people are those who are still cycling. But there is a small minority of the initial cyclists in 2001. We can look at this for more than just one mode of transport. So the table that I'm going to show you on the next page is arranged as shown on this slide. As rows we've got mode of transport in 2001 and as columns we've got mode of transport in 2011. The central diagonal are people who maintain the same mode of transport over a 10-year period. And here we have the result of that. So the central diagonal are people keeping the same mode of transport. And you can see towards the bottom and right that people who are still cycling 10 years later are 30% of the original cyclists. So that compares with a 34% rate for people travelling on foot maintaining the same mode of transport. And at the other end of the spectrum, 82% of people who drove to work in 2001 also drove to work in 2011. The table is traded by the most common 2011 outcome and the second most common 2011 outcome for each 2001 mode of transport. And you can see in almost all cases, regardless of what mode of transport people used to get to work in 2001. Their most common mode of transport in 2011, 10 years later, was to use the car. The two exceptions to that are people who got the train or the tube to get to work. And we could take this idea forward, as I said, by considering more characteristics of these people. Had they changed their address? Had they changed their place of work? How had their degree of seniority in their occupation changed? And we can get that from all of the other census characteristics that we have available to us. We can also, using the LS, consider, excuse me, characteristics such as workplace location. And this illustrates some of the things that people have to think about when using longitudinal data or any form of discursive data. Workplace location is given every 10 years to varying levels of detail. In 1971, it was available at local authority level. And in 2011, it was also available at local authority level. But in the 1991 and 2001 results, we're only given a location indicator. Is it in the same district, a neighboring district, a different district and so on? However, despite that, it's in fact possible to use much more detailed workplace location data. The longitudinal studies, all three of them, have as well as general user-available fields. They have restricted fields as well. Those aren't normally available for researchers for use. But on request, they can be used and are typically used to generate derived variables. Similarly, with distance to work, there's variations. In 1971, there wasn't in the main body of variables a field for distance between workplace and home. However, in the restricted variables, we have both the ward of usual residents of the person and the ward of their place of work. And using that, it would be possible to generate a derived distance measure. And to include that in your output. In 1991, we had banded distances in various categories. 0 to 2 kilometers, 2 to 5 kilometers, 5 to 10, and so on. In 2001 and 2011, we have much more detailed observations of distance between the place of work and the rest of it. However, despite that, even though those distance measures are very detailed, we still can't use them as users, as researchers. We can't use them in an output if they're going to be disclosed. The distance between any one district and another may well be unique. And we don't want to be able to reveal that level of information. So the user will typically have to create their own distance bands. But unlike the state in 1991 where they were pre-prepared, it's possible for users to indicate their own distance bands that they want to use that suits their own study. You'll note from this slide, in the previous slide, that details of the journey to work in 1981 haven't been included. And the reason for that is that despite the fact that a question was asked in 1981 about journey to work, that information wasn't captured in the LX. The most commonly used question regarding the journey to work is the mode of transport that people use. And this question has had similar but not identical response categories in each census and in each of the three countries in which censuses are at, in England and Wales, in Scotland and in Northern Ireland. And the next three slides I'm going to show you summarise the response categories used in England and Wales, in Scotland and in Northern Ireland. So first of all, these are the response categories for England and Wales. And we work from left to right over time. So the left-hand most column is 1971. I've left 1981 blank. And you can see that most of the categories are the same, albeit with slightly different wording. Although we do have some variations. In 1971, there's only a single category for train, whereas in 1991 later that's been separated into both a main line train and an underground or light rail service. In 2001 and in 2011, we have taxi as a mode of transport, which wasn't included explicitly before. This table shows the response categories for Northern Ireland. You can see here that the data for Northern Ireland was captured in the 1981 LS sample. One of the notable things about the response categories in Northern Ireland is that in 1981 and in 1991 there's a difference between public transport buses and buses provided by employers. And so this demonstrates that the response categories are slightly different between different parts of the UK. The other element that's different from Northern Ireland is the roadside car or van pool. So that category isn't used elsewhere in England and Wales or in Scotland. If you are part of a car pool, then in England and Wales you'd be recorded as being a passenger or perhaps a driver in a car or van. Finally, these are the response categories used in the SLS in Scotland. And you can see again, those are very similar to the other sets of categories. And again we see taxi coming in in 2001. And there have been variations over time in all of them in the way that people who work at home have been recorded. And I want to talk about two other data sets that may be of interest to people who want to do research on the journey to work. The first of these is the National Travel Survey. This is a very long running study. The first survey was conducted in 1965 and follow-up surveys were done with various intervals. But in recent years it's been conducted on an annual basis. And the difference between the National Travel Survey and the census is quite significant. The National Travel Survey is based on details travel diaries completed by people talking about their journey to work and many aspects of their journey to work and other tricks that they make. And I've copied a quote from some of the documentation here. The National Travel Survey is primarily designed to measure long-term trends in travel and it's not suitable for monitoring short-term trends or year-on-year changes. There's a very different sort of beast to the census. The National Travel Survey provides microdata and this is available by the UK data service. And it focuses on five different elements of journeys on households on vehicles that are available on individuals on specific trips and then on stages in those trips. And this is especially relevant for multi-purpose trips. I mentioned at the beginning when I asked you how people got to work that it's not necessarily a simple question and people might make journeys that involve more than one activity going to work doing some shopping or attending a relative who needs care on the way home or collecting a child from school etc. The way the data are constructed doesn't generate aggregates of data across these different categories of microdata and data are typically produced for a multi-year period in order to get a large enough sample size. I've got one output that I generated from the National Travel Survey here to show you. So this is for data from 2002 to 2016 which is the most recent phase of the National Travel Survey and this gets updated when new data are available. And it's showing as rows the mode of transport used and as columns the stage within a given trip. So some trips have some trips have one stage other trips have more than one stage. We can see that for the first stage within a trip which might be the only stage I should have highlighted the cell above this, I'm sorry. The most common transport mode used was being a driver in a car. The cell I've highlighted is the second most common mode which is being a passenger in a car. When we start to look a bit further on at the second stage of a trip so again we're not including here all the trips only had one stage then the most common mode of transport for the second stage was using a train as we get further on in a journey with later stages then walking becomes more significant and walking for many people will be the last stage of their trip to work or to somewhere else as they're going but you can see from the totals at the bottom that as the number of trips increases so we're dealing with a smaller number of trips. Understanding society is one of UK data services major data sets that it supports and again it's a longitudinal study collected from individuals in a sample of UK universities and it collects a large number it asks a large number of questions to panel members and some of them, many of them include information that's relevant to the journey to work so this page here highlights questions that are asked that relate to transport and the environment and those include questions about commuting behaviour about motor travel and about other elements that might be relevant to the journey to work such as people's attitudes towards the environment it's also possible within understanding society before we get around to using the data search for various different sorts of field so here these are the results of using the search term transport and we can see a large number of different questions are relevant and this list continues to scroll down the page I've picked just one of those and this is a page generated from the documentation so it's just showing us overall figures this is the frequency of using a bicycle as reported by people in wave 8 of understanding society and the question asked about frequency of using a bicycle and we see unfortunately for those of us who'd like to promote active motor travel that the last category is that people will use a bike less than or never once or twice a year so I've talked about four groups of data longitudinal census data census microdata understanding society and the national travel survey and most of those are available via the UK data service with different sorts of license arrangements so for example census microdata are open registered and secure variants and similarly with the national travel survey there's registered special license and secure variants and as we move through those categories the data become more detailed but require a greater level of training and authentication of the person using the data the longitudinal data are included in the UK data service catalog if you explore the holdings but with a pointer to where you can actually get hold of the data the calls ACUK website gives information about all three of the studies and they all have their own individual websites so they can give further detail and information about how to use them to use the longitudinal data you need to be an accredited researcher and to have your project approved for use and now I'm going to hand over to Basilis who's going to talk about another one of the census data sets should the census origin destination data thank you Oli you have all the journey to work data that is derived from the UK census explains some of the main characteristics as well as so you have watched some of this data by the UK data service website so usually in our language we use the term flow data instead of journey data so what's our flow data flow data are also known as origin destination data and they consist of counts of flows between two locations they are produced at different special scales sometimes associated with various aggregate and special areas like overseas work at home works of store and others so the data sets available from 1981 1991 and the latest 2011 census so all types of census data stem from questions that derive from the census questionnaire so regarding the workplace tables these are based on question 40 which is in your main job what is the address of your workplace so all flow data from 1981, 1991 and 2001 census are publicly available to anyone interested in them data from 2011 is a bit more complicated because they involve multiple levels of access based on trade-offs between spatial and attribute detail these factors determine the table granularity and it doesn't require level of access the three level of access are public level which is available by our ONS, NOMIS web and the UK data service, the safe garden which is available through the UKBS to all members of academia local and central government NHS and the secure level which requires approved researchers to register through the approved research scheme via the ONS secure research service which was formally known as BML micro data laboratory so now this slide shows some of the most common geography levels of which workplace tables consist of I will help you understand the level of the spatial detail of the available information the geography levels can range from a few hundred areas as in the case of the local authority districts to more than 100,000 broad areas therefore the lower the geography level the more accurate the grammar the picture we get for specific areas within the UK so along with the spatial detail the census tables are defined by the variable detail as well so in this slide you can see some indicative variables most commonly found in workplace census flow data so for example AIDS, counter birth national statistics socioeconomic classification, ethnic group, industry and so on so the types of table counts are based upon the variables detail the most simple flow data table is a flow headcount table these consist of simple totals of people as seen in the case of WF01BUK the slide provides information on how to distinguish and understand flow data by the code names slightly more complicated table is a univariate table it relates to just one single variable in this slide it refers to an AIDS variable of people aged 16 and over so the most complex and more granular tables are the multivariate tables where one variable is cross-classified by at least another in this case the method of travel to work variable is cross-classified by two variables sex and AIDS so this is probably the most granular table apparently how at least in terms of variable detail it was a regional commission table but ONS released it as a set guide as well so based upon the spatial variable and count type levels the workplace tables are classified in a way as shown in this slide security classification schema you can see that the less detailed local authority headcount tables are public available whereas on the other hand the much more detailed output area multivariate tables are secured and accessible only to local researchers so time for some basic stats so the total number of public and that's safeguarded flow data tables that are causing a weekend are 290 of those 108 are workplace tables so with 94 are 211 census 85 of them being safeguarded and just nine being public tables so the vast majority of the workplace tables are univariate there are a few headcount tables and just three multivariate safeguarding which are all were initially commission tables that were later released as safeguarding so briefly how to access the data to access the census data the flow of the census data you need to visit the UK data service website click on the census data and as soon as the next phase loads you click on the flow data from the quick access panel on the right hand side of the screen so on the next table on the next page we click on weekend that will lead you to the main website so there are two main routes to the available data the first one allows users to download complete bulk tables in other CSV or suspect software formats the other option allows users to run flexible queries and retrieve subsets of tables based for example particular areas and or specific attributes so to get the bulk downloads page you need to click on the weekly downloads icon in the main flow data page bulk downloads are only available for the 2011 census data and this is the only route available to download suspect files to create a database and retrieve subsets click on the wicked icon as only this slide and the data selection select the commuting and journey to education data this option is available for all UK incestions from 1981 to 2011 so a reminder that if you wish to download safeguarding data through Wicked then you need to login using your UK institutional account and you need to have already registered with the UK data service and have already accepted the end user license in any other case the safeguarding tables will be grayed out and you will be unable to download any safeguarding data finally these are some links that you might find useful as always you will receive those once the webinar is over yes we'll circulate details of the where we can get the slides of the webinar but it will be via the UK data service website okay so thank you Thasilis so question which we've got some questions from users so first of all whilst we were running the poll there's a question about whether or not there's multiple selections about mode of transport to work and the poll that we did we tried to replicate the question asked in the UK census forms as far as possible and those census forms only allow you to tick one mode of transport excuse me questions about mode of transport to work are asked in many censuses and other censuses in different countries allow you to tick more than one mode of transport in some cases for example the Australian census allows you to tick all that apply however in the case of the UK we can only pick one but that's just the census in other data sets such as the national travel survey it allows users to respondents to give a very detailed answer about what's available to them about what mode of transport they use so I talked in discussing the mode of transport in discussing the national travel survey that we can split tricks into multiple stages and each of those stages in a trip can have a different mode of transport one further thing that might be worth mentioning in that regard in terms of census is that we can also do mode of transport by whether or not a household has cars available which gives some idea about alternative modes of transport that might be available to people another question we've got is to ask I'd like to be able to relate travel mode data to some measure of health do the data sets have any measure of physical and or mental well being I'll start I think with excuse me with understanding society and understanding society in some of its waves has got very detailed biomarker information so that's with clinical observations made through blood tests and similar measures and that gives very detailed health information about individuals so that can be related to information else understanding society about the journey to work in terms of the census the health information in the census is somewhat limited we have self reported general health and we have again self reported do you have an illness that is limiting a long term illness that's limiting in everyday life and we can relate both of those ideas to mode of transport used as I mentioned in the longitudinal study we also have mortality so we can see both the fact that someone has died and we know the age of which they've died but we also know there was a death and so we can relate that directly in the LS to the proportion of people sorry to the mode of transport used by people so if you're interested perhaps in death due to various diseases related to pollution I have no idea whether the numbers are large enough to study but we might be able to disaggregate different modes of transport used by deaths through respiratory disease and so on another question we've got can you clarify what 1% meant in the data is 1% of the population in England is it 1% of the population in England who took the survey although there's a couple of places where this was mentioned so I'll just go back up the slide to where I was talking about sample sizes so in terms of the census that Vassilis was talking about the census is a mandatory instrument so everyone should complete it in practice not everyone does but the response rate is in the high 90s so when I was talking about the samples of the anonymized records and the longitudinal study I mentioned 1% samples here for the 91 LS and a 1% sample for the longitudinal study and those are both samples of people who have completed their census form in effect the construction of the ONS sample is slightly different because it can invest individuals from information in the NHS but broadly it's people who have completed their census form if we assume that to be the whole population then it's 1% of the whole population of England and Wales in practice it's 1% of almost the whole population of England and Wales other data available showing journey times for those looking for work who may be classified as unemployed the national travel survey has information about journey time I'm pretty sure that understanding society has information about journey time I think in the case of understanding society that's explicitly stated as a journey to work so the the length of time taken so that might not be entirely relevant for people who are unemployed in the national travel survey because different purposes of trips can be identified not just work but also different purposes then you could look at travel time for people who aren't travelling for work people who are travelling for other purposes another question what software would you recommend to manipulate or analyse the flow data and in practice what packages work best for representation some people still use the good old Excel but it largely depends upon your technical background and also the amount of data that you want to analyse so if it's something rather small then Excel should work but if you go to if you want to download larger data set then you probably need to go to SPSS or load the data to R or Python many people that currently use R for both analysis and for visualization purposes so I would recommend to load the data in public SPSS and then try to work with R or Python or whatever you are more comfortable with but this is the latest trend to load the data in R or Python the data for all of these I think are available in a variety of formats so you can load them into whatever your preferred system is so follow up to the question about the sample the one percent sample of the population in the case of the census microdata so the samples of anonymous records and the other samples that many people refer to as samples of anonymous records they are a random sample so a random sample of individuals or a random sample of households in the case of the longitudinal studies in all three of these studies it's selected on the basis of date of birth so in the case of the the LS in England and Wales that's four dates of birth and those dates of birth are disclosed so no one people who actually have to do linkage know what those dates are all we're told is that they're distributed throughout the year as a one percent sample because it's four out of three hundred and sixty five days in the case of Scotland for example I think it's twenty days that are used so twenty over three hundred and sixty five gives you a five percent sample the way those dates are arranged the four ONS longitudinal study dates form part of the twenty Scottish dates and the twenty Scottish dates form part of the larger number of dates birth dates used in Northern Ireland okay a question following up about analysis has anyone to your knowledge used canine, I don't know if I've pronounced that the right way, or rapid minor to analyse the data which you presented today, not to my knowledge I mean most science analysis is going to be used by using python so no I haven't heard people using that, most people I talk to a specialist studies python or use our sort of heavy duty processing of the data prior to visualisation they might use load it into a SDR database and do manipulation within that